TetraScience is a leading Scientific Data and AI company that is revolutionizing the Scientific AI landscape by developing AI-native scientific data sets and lab data management solutions. They are seeking a Senior Software Platform Engineer to design, build, and scale AI and data infrastructure, focusing on cloud-based MLOps pipelines and collaborating with various engineering teams.
Responsibilities:
- Design, implement, and maintain cloud-native platform to support AI and data workloads, with a focus on AI and data platforms such as Databricks and AWS Bedrock
- Build and manage scalable data pipelines to ingest, transform, and serve data for ML and analytics
- Develop infrastructure-as-code using tools like Cloudformation, AWS CDK to ensure repeatable and secure deployments
- Collaborate with AI engineers, data engineers, and platform teams to improve the performance, reliability, and cost-efficiency of AI models in production
- Drive best practices for observability, including monitoring, alerting, and logging for AI platforms
- Contribute to the design and evolution of our AI platform to support new ML frameworks, workflows, and data types
- Stay current with new tools and technologies to recommend improvements to architecture and operations
- Integrate AI models and large language models (LLMs) into production systems to enable use cases using architectures like retrieval-augmented generation (RAG)
Requirements:
- 7+ years of professional experience in software engineering and infrastructure engineering
- Extensive experience building and maintaining AI/ML infrastructure in production, including model, deployment, and lifecycle management
- Strong knowledge of AWS and infrastructure-as-code frameworks, ideally with CDK
- Expert-level coding skills in TypeScript and Python building robust APIs and backend services
- Production-level experience with Databricks MLFlow, including model registration, versioning, asset bundles, and model serving workflows
- Expert level understanding of containerization (Docker), and hands on experience with CI/CD pipelines, orchestration tools (e.g., ECS) is a plus
- Proven ability to design reliable, secure, and scalable infrastructure for both real-time and batch ML workloads
- Ability to articulate ideas clearly, present findings persuasively, and build rapport with clients and team members
- Strong collaboration skills and the ability to partner effectively with cross-functional teams
- Familiarity with emerging LLM frameworks such as DSPy for advanced prompt orchestration and programmatic LLM pipelines
- Understanding of LLM cost monitoring, latency optimization, and usage analytics in production environments
- Knowledge of vector databases / embeddings stores (e.g., OpenSearch) to support semantic search and RAG