Monitoring, maintaining, and improving model performance.
Collaborating with Data Engineers and client stakeholders.
Establishing governance, documentation, and best practices.
Requirements
Strong expertise in Python and AI frameworks such as PyTorch, Keras, SciPy, or Tensorflow.
Experience with Python-based Web frameworks like FastAPI, Flask, or Django.
Knowledge of PEP 8 coding standards for Python.
Extensive experience in solving AI/ML challenges and working with LLMs.
Familiarity with OpenAI, Embeddings, Completion, and Semantic Search.
Solid experience with API integrations and working with external APIs like OpenAI, Anthropic, or similar AI service providers.
Hands-on experience with containerization and orchestration tools – especially Docker for packaging ML models, and Kubernetes (or similar) for deploying and scaling them in distributed environments.
Proficiency in DevOps and automation practices: designing CI/CD pipelines (using tools like Jenkins, GitLab CI/CD, or GitHub Actions) to automate model testing and deployment, and using Infrastructure-as-Code (CloudFormation, Terraform) to manage cloud resources.
Working knowledge of cloud computing services (AWS, Azure, GCP) for ML workloads.
Familiarity with databases and experience using SQLAlchemy, Alembic, and database management for AI models.
Strong skills in managing datasets using tools like Pandas, SciPy, and Numpy for data pre/post-processing.
Experience with monitoring and logging frameworks; Prometheus/Grafana or cloud monitoring services.
Strong analytical and problem-solving skills to diagnose issues from logs/metrics and tune system performance.
Excellent communication skills and a collaborative mindset;
Ability to work in an agile environment, manage priorities, and coordinate with remote or cross-functional team members is important.