Advocate is a mission-driven technology company revolutionizing the way Americans access critical federal benefits. They are seeking a seasoned Senior AI/ML Ops Engineer to enhance their AI Platform by developing, deploying, and managing AI/ML operations, collaborating with AI researchers and developers to create scalable tools for AI model training and deployment.
Responsibilities:
- Develop, deploy, and manage AI/ML operations
- Create a robust infrastructure that supports innovative AI-driven applications and services
- Address complex workflows and enable rapid prototyping and deployment of AI models
- Collaborate closely with AI researchers, developers, and data scientists
- Develop and manage a suite of AI tools and environments designed to support the research and development of cutting-edge AI models
- Create scalable, efficient tools for model training, evaluation, and experimentation
- Maintain and optimize reliable AI pipelines and services
- Ensure seamless integration, deployment, and operation of AI models within the platform
- Automate model training, deployment, monitoring, and management processes
- Lead the development and operation of a web-based application for AI prototyping
- Provide an intuitive interface for experimenting with AI models, accessing datasets, and deploying prototypes for evaluation
Requirements:
- Advanced degree (M.S., Ph.D., or equivalent) in Computer Science, Engineering, or a related field
- Extensive experience (5+ years) in AI/ML operations, including the development and management of AI/ML tooling, pipelines, and services
- Deep understanding of DevOps practices, including continuous integration and continuous delivery (CI/CD)
- Proficiency in using DevOps methodologies to streamline AI model development and deployment
- Strong experience with containerization technologies (e.g., Docker, Kubernetes) and orchestration platforms
- Familiarity with cloud computing platforms (e.g., AWS, GCP, Azure) and their AI/ML services
- Expertise in programming languages such as Python, Java, or C++
- Knowledge of machine learning frameworks (e.g., TensorFlow, PyTorch, scikit-learn) and MLOps tools (e.g., MLflow, Kubeflow)
- Experience with version control systems (e.g., Git) and collaborative development tools (e.g., Jira, Confluence)
- Skilled in analyzing complex challenges, developing innovative solutions, and working collaboratively with cross-functional teams
- Strong problem-solving and debugging skills, with the ability to troubleshoot and resolve issues in AI/ML pipelines and infrastructure
- Excellent communication and interpersonal skills, with the ability to effectively convey technical concepts to both technical and non-technical audiences
- Proactive and self-motivated, with a strong desire to learn and stay up-to-date with the latest trends and advancements in AI/ML, DevOps, and web development
- Experience with agile development methodologies and working in fast-paced, dynamic environments