Speechify is a company dedicated to making reading accessible for everyone, with over 50 million users benefiting from their text-to-speech products. They are seeking a Software Engineer for their Data team to manage data collection for model training and improve the ingestion pipeline. The ideal candidate will collaborate with scientists and leadership to enhance data quality and scalability for next-generation AI models.
Responsibilities:
- Be scrappy to find new sources of audio data and bring it into our ingestion pipeline
- Operate and extend the cloud infrastructure for our ingestion pipeline, currently running on GCP and managed with Terraform
- Collaborate closely with our Scientists to shift the cost/throughput/quality frontier, delivering richer data at bigger scale and lower cost to power our next-generation models
- Collaborate with others on the AI Team and Speechify Leadership to craft the AI Team’s dataset roadmap to power Speechify’s next-generation consumer and enterprise products
Requirements:
- BS/MS/PhD in Computer Science or a related field
- 5+ years of industry experience in software development
- Proficiency with bash/Python scripting in Linux environments
- Proficiency in Docker and Infrastructure-as-Code concepts and professional experience with at least one major Cloud Provider (we use GCP)
- Ability to handle multiple tasks and adapt to changing priorities
- Strong communication skills, both written and verbal
- Experience with web crawlers, large-scale data processing workflows is a plus