ThoughtFocus is seeking a Senior ML Systems Engineer contractor to bridge the gap between model development and production. The role focuses on building high-throughput pipelines for processing historical documents and transforming ML prototypes into scalable production systems on AWS infrastructure.
Responsibilities:
- Build and maintain scalable data pipelines that execute agentic workflows for entity extraction
- Deploy and manage infrastructure for both open-source LLMs (on EC2) and third-party APIs, focusing on maximum throughput and reliability
- Continuously optimize pipelines for unit cost and processing speed without compromising data integrity
- Act as a primary engineering contact for the Data Science team, operationalizing their trained models and prompt logic
- Implement advanced monitoring, alerting, and automated error-handling/retry logic for complex ML workflows
- Manage the AWS ecosystem (EC2, Lambda, S3, IAM) supporting the ML lifecycle
Requirements:
- Advanced proficiency in building robust, multi-threaded, or asynchronous applications in Python (Systems Level)
- Proficiency with Terraform or similar tools (e.g., Pulumi, CloudFormation) for provisioning and managing modular, version-controlled AWS infrastructure
- Deep expertise in EC2 Auto Scaling Groups, Lambda orchestration, and S3 data lakes
- Experience building autonomous AI workflows using AWS Agent Core or similar frameworks
- Proven track record in production monitoring (CloudWatch), logging, and troubleshooting distributed systems
- Strong grasp of AWS IAM for secure resource management
- Able to work during US business hours for collaboration and production support
- Experience with CI/CD for ML, containerization (Docker), and automated testing of data pipelines
- Experience with OCR or document digitization at scale
- Experience managing rate limits and costs across multiple LLM providers