Investigate heterogenous data management techniques and polystore systems, including the application of AI & machine learning techniques to foundational data management problems including data quality, data profiling, data integration and schema matching.
Develop tools and frameworks to enable scalable development of machine learning models and data science algorithms across heterogenous data domains, including tabular, time series, text, audio and video.
Create distributed data processing architectures that leverage “compute at the edge” to enable novel use cases and applications such as predictive routing, data-driven experiences, and adaptive environments.
Work with ATG researchers to understand data-related opportunities in peer researchers’ domains such as applied AI and machine learning in Audio / Video domains.
Requirements
PhD in Computer Science, or similar fields
Background in relational databases, big data systems, and data analytic systems
Expertise in data management, data cleaning, distributed systems, database storage engines, query processing and query optimization, and applied data science
Knowledge of statistics and machine learning
Proficiency in data structures and algorithms
Familiar with git and project management tools, such as JIRA
Relevant publications in data mining / database conferences (e.g., SIGMOD, VLDB, ICDCS, IEEE BigData, ICDE, KDD, WSDM, ICDM, WEB)
Tech Stack
Distributed Systems
Benefits
Flex Work approach that is truly flexible to support where, when, and how you do your best work