Sedgwick is a company dedicated to supporting individuals facing unexpected challenges, and they are seeking a Senior Engineer for LLMOps & MLOps. This role involves owning the production lifecycle of AI initiatives by building automated infrastructure that connects legacy data systems with modern cloud AI services.
Responsibilities:
- Multi-Cloud Pipeline Execution: Build and maintain automated CI/CD and CT (Continuous Training) pipelines across AWS (SageMaker/Bedrock) and Azure (AI Studio)
- LLMOps Framework Implementation: Design and execute the infrastructure for Retrieval-Augmented Generation (RAG), including vector database management (OpenSearch, Pinecone, or Azure AI Search) and semantic index optimization
- Legacy Data Connectivity: Build the engineering "pipes" to securely ingest and move data from legacy systems (Mainframes, SQL Server, on-prem DBs) into cloud-native MLOps workflows
- Automated Model Evaluation: Implement systemized frameworks for LLM evaluation (LLM-as-a-judge, ROUGE, METEOR) and traditional ML validation to ensure performance before deployment
- Observability & Monitoring: Deploy real-time monitoring for model drift, hallucination detection, latency, and token consumption to manage both quality and cost
- Infrastructure as Code (IaC): Manage all AI resources using Terraform or CloudFormation, ensuring the cloud posture is reproducible, secure, and follows a "Privacy by Design" mandate
- Advanced Analytics Integration: Partner with teams using platforms like Palantir, Databricks, or Snowflake to ensure a high-fidelity data flow between analytical ontologies and production models
- IT & Security Diplomacy: Work directly with central IT and Security to navigate IAM roles, VPC peering, and firewall configurations, clearing the path for rapid transformation
- Scalable Inference Engineering: Optimize model serving endpoints for high-throughput and low-latency, utilizing containerization (Docker/Kubernetes) and serverless architectures where appropriate
- Prompt & Model Versioning: Establish rigorous version control for prompts (PromptOps), model weights, and data snapshots to ensure 100% auditability and rollback capability
- Data Science Engineering: Support the data science lifecycle by automating feature stores, feature engineering pipelines, and the transition of experimental notebooks into hardened production microservices
- Security & Compliance Hardening: Implement automated scanning and guardrails (e.g., Bedrock Guardrails or Azure Content Safety) to prevent prompt injection and data leakage
Requirements:
- Bachelor's degree in Computer Science or a related field required
- Master's degree in a quantitative discipline highly desirable
- 6+ years of engineering experience, with a minimum of 3 years strictly focused on MLOps or LLMOps in a production environment
- Deep, hands-on proficiency in both AWS and Azure ecosystems
- Ability to configure Bedrock and Azure OpenAI services, including private networking and endpoint security, on day one
- Expert Python, SQL, and PySpark
- Extensive experience with containerization (Docker, Kubernetes) and orchestration tools (Airflow, Kubeflow, or Step Functions)
- Professional experience with evaluation and observability frameworks like LangSmith, Arize Phoenix, or WhyLabs
- Strong understanding of statistical validation, model evaluation metrics
- Ability to partner with Data Scientists to optimize model performance
- Ability to move at the speed of a startup while maintaining collaborative relationships within a large-scale enterprise IT landscape