Company Confidential is seeking a proactive AI Systems Engineer to design, implement, and maintain end-to-end AI systems that scale. The role involves bridging the gap between data science, software engineering, and operations to ensure robust AI-enabled solutions that meet business objectives.
Responsibilities:
- Design, deploy, and operate production-grade AI systems and pipelines (data ingestion, preprocessing, model training, validation, deployment, monitoring, and retraining)
- Collaborate with data scientists to translate research models into scalable, maintainable, and observable services
- Implement MLOps practices: versioning for data, models, and code; CI/CD for ML pipelines; automated testing and canaries; model governance and drift monitoring
- Build and maintain scalable data architectures (ETL/ELT, streaming, data lakes/warehouses) with emphasis on data quality, lineage, and observability
- Develop APIs and services for model inference, including high-throughput, low-latency endpoints; ensure security, authentication, and access controls
- Design and implement monitoring, alerting, and incident response for AI systems (model performance, data quality, system health, latency, cost)
- Optimize infrastructure for cost, performance, and reliability (cloud platforms, containers, orchestration, GPUs/accelerators, edge devices where applicable)
- Ensure compliance with privacy, security, and regulatory requirements; implement audit trails and reproducibility
- Collaborate with product managers and stakeholders to define requirements, success metrics, and acceptance criteria
- Mentor junior engineers, contribute to standard methodologies, documentation, and best practices
Requirements:
- Bachelor's or Master's degree in Computer Science, Software Engineering, Electrical Engineering, Analytics, or related field (or equivalent practical experience)
- 3+ years of experience in systems engineering, ML/AI deployment, or MLOps
- Strong software engineering skills: proficiency in one or more general-purpose languages (e.g., Python, Java, Go, C++) and familiarity with software engineering best practices (version control, testing, code reviews)
- Experience architecting and deploying end-to-end AI pipelines (data ingestion, feature engineering, model training, deployment, and monitoring)
- Hands-on experience with ML frameworks (TensorFlow, PyTorch, scikit-learn) and model serving platforms (TensorFlow Serving, TorchServe, MLflow, Kedro, Seldon, or similar)
- Proficiency with cloud platforms (AWS, Azure, GCP) and containerization (Docker), orchestration (Kubernetes), and CI/CD tooling
- Strong understanding of data engineering concepts (ETL/ELT, data governance, data quality, lineage)
- Experience with model monitoring and drift detection, A/B testing, and experimentation pipelines
- Familiarity with security and compliance practices (IAM, secrets management, encryption, audit logging)
- Excellent problem-solving, communication, and collaboration skills; able to work cross-functionally
- Master's or PhD in a relevant field; specialization in ML systems, MLOps, or data engineering
- Experience with real-time inference, streaming data (Kafka, Kinesis), and feature stores
- Knowledge of DevOps fundamentals, SRE practices, and reliability engineering for AI systems
- Experience with edge AI deployments or on-device inference
- Publications or contributions to open-source ML systems projects