TetraScience is a leading Scientific Data and AI company that is revolutionizing the Scientific AI landscape by developing AI-native scientific data sets and lab data management solutions. They are seeking a Senior Software Platform Engineer to design, build, and scale AI and data infrastructure, focusing on cloud-based MLOps pipelines and collaborating with various engineering teams.

Responsibilities:

Design, implement, and maintain cloud-native platform to support AI and data workloads, with a focus on AI and data platforms such as Databricks and AWS Bedrock
Build and manage scalable data pipelines to ingest, transform, and serve data for ML and analytics
Develop infrastructure-as-code using tools like Cloudformation, AWS CDK to ensure repeatable and secure deployments
Collaborate with AI engineers, data engineers, and platform teams to improve the performance, reliability, and cost-efficiency of AI models in production
Drive best practices for observability, including monitoring, alerting, and logging for AI platforms
Contribute to the design and evolution of our AI platform to support new ML frameworks, workflows, and data types
Stay current with new tools and technologies to recommend improvements to architecture and operations
Integrate AI models and large language models (LLMs) into production systems to enable use cases using architectures like retrieval-augmented generation (RAG)

Requirements:

7+ years of professional experience in software engineering and infrastructure engineering
Extensive experience building and maintaining AI/ML infrastructure in production, including model, deployment, and lifecycle management
Strong knowledge of AWS and infrastructure-as-code frameworks, ideally with CDK
Expert-level coding skills in TypeScript and Python building robust APIs and backend services
Production-level experience with Databricks MLFlow, including model registration, versioning, asset bundles, and model serving workflows
Expert level understanding of containerization (Docker), and hands on experience with CI/CD pipelines, orchestration tools (e.g., ECS) is a plus
Proven ability to design reliable, secure, and scalable infrastructure for both real-time and batch ML workloads
Ability to articulate ideas clearly, present findings persuasively, and build rapport with clients and team members
Strong collaboration skills and the ability to partner effectively with cross-functional teams
Familiarity with emerging LLM frameworks such as DSPy for advanced prompt orchestration and programmatic LLM pipelines
Understanding of LLM cost monitoring, latency optimization, and usage analytics in production environments
Knowledge of vector databases / embeddings stores (e.g., OpenSearch) to support semantic search and RAG

Senior Software Platform Engineer

Key skills

About this role

Responsibilities:

Requirements: