GEICO is seeking a Staff Machine Learning Engineer to help shape how Generative AI enhances customer and associate experiences across the enterprise. This is a hands-on technical role leading the strategy, architecture, and delivery of ML systems for the Claims organization.
Responsibilities:
- Work on the ML platform architecture: data/feature pipelines, experiment tracking, model registries, serving layers, offline/online evaluation, and observability
- Define standards for reliability, performance, cost efficiency, security, governance, and model risk management across ML services
- Lead design and implementation of models across classical ML and deep learning (e.g., gradient boosted trees, sequence models, Transformers for tabular/time-series/NLP where relevant)
- Translate business goals into measurable ML objectives and experiment plans; ensure robust offline metrics and real-world impact
- Build scalable training and inference pipelines; establish CI/CD for ML, automated evaluations, canary releases, and rollback strategies
- Implement monitoring for data quality, drift, fairness, latency, reliability, and cost; lead incident response and postmortems
- Partner with Claims, Product, Data Science, Platform/SRE, Security, and Legal/Compliance to gather requirements, define scope, and prioritize backlogs
- Maintain pragmatic technical roadmaps balancing business outcomes, release timelines, and engineering excellence
- Own build-vs-buy decisions and tooling/service selection (speed to market, extensibility, TCO); guide platform evolution with clear architectural principles
- Lead experienced engineers through complex platform implementations; drive system-wide architectural improvements and reliability practices
- Mentor engineers and junior tech leads; codify best practices; contribute to internal documentation and promote enterprise-wide ML standards
- Where appropriate, collaborate on retrieval-augmented workflows, prompt/context management, and LLM evaluation and safety guardrails to complement ML systems
Requirements:
- Bachelor's degree or above in Computer Science, Engineering, Statistics, or related field
- 5+ years of professional software development experience using at least two general-purpose languages (e.g., Java, C++, Python, C#)
- 5+ years architecting, designing, and building multi-component ML platforms leveraging open-source/cloud-agnostic components: Search/vector: ElasticSearch, Qdrant (as applicable to ML features and retrieval), Data warehouse/lakehouse: Snowflake; familiarity with Parquet/Delta/Iceberg, Streaming: Kafka; plus Flink/Spark Streaming experience, Datastores: PostgreSQL; NoSQL (MongoDB, Cassandra), Distributed compute: Spark, Ray, Workflow orchestration: Airflow, Temporal
- 5+ years managing end-to-end SDLC for ML systems: version control, CI/CD, Kubernetes, testing (unit/integration/data/ML eval), monitoring/alerting, production support
- 5+ years working with cloud providers (Azure and/or AWS) in production ML contexts
- Experience leveraging or fine-tuning LLMs (e.g., GPT, Llama, Mistral, Claude) to augment ML workflows, retrieval, or claims-facing tooling
- Hands-on with MLOps tooling: MLflow/Kubeflow, model registries, feature stores (e.g., Feast), experiment tracking, A/B testing and online evaluation frameworks
- Observability: Prometheus/Grafana, OpenTelemetry; SLO-driven operations and incident management
- Model safety, fairness, explainability (e.g., SHAP/LIME), and regulatory compliance; familiarity with model risk management practices
- Insurance/financial services domain experience: claims automation, fraud detection, risk modeling, subrogation, severity/triage, and regulatory stewardship
- Experience with high-throughput, low-latency inference and real-time feature pipelines