Drive high performance data acquisition, preparation and synthesis pipelines to generate data for the next generation of speech and language AI foundation models
Develop advanced characterizations of complex conversational audio utilizing a diverse toolkit of signals processing techniques and deep learning models
Collaborate with DataOps and Engineering to create automated systems which scale the ability of human annotators to label high value data and provide critical feedback on model outputs
Build advanced benchmarking methodologies and curated datasets for evaluating conversational voice systems
Document and present results of data experiments and analysis for internal and external audiences
Requirements
Experience building data processing pipelines from a blank page and owning the entire data stack including data acquisition, characterization, cleaning, serving and transformation
Experience and expertise applying statistical methods and deep learning models to understand complex data
Strong communication skills and the ability to translate complex concepts in simple terms, depending on the target audience
Strong software engineering skills with particular emphasis on developing clean, modular code in Python and working with Pytorch
Background in Physics, Mechanical Engineering or Language Processing (Nice to Haves)