Role Overview

Build end‑to‑end AI/ML pipelines (training → evaluation → deployment) using MLflow/Kubeflow/Databricks/Weights & Biases with experiment tracking and model registries
Develop models with Python using PyTorch, TensorFlow, JAX, scikit‑learn, and Hugging Face Transformers, package as reproducible services
Implement LLM/RAG systems with LangChain, LlamaIndex, Semantic Kernel and vector DBs (Pinecone, Weaviate, Milvus, FAISS, Chroma) for semantic retrieval and grounding
Fine‑tune and optimize models using PEFT/LoRA/QLoRA, DeepSpeed/Accelerate, distillation, and quantization; export/optimize via ONNX Runtime/TorchScript/TensorRT
Engineer scalable model serving with KServe, Seldon Core, BentoML, Ray Serve, NVIDIA Triton, supporting A/B, canary, shadow deployments
Build evaluation harnesses (offline/online) with Ragas, TruLens, Promptfoo, golden datasets, and regression gates integrated into CI/CD
Construct feature stores (e.g., Feast) and data contracts (Protobuf/Avro/Pydantic); enforce data quality with Great Expectations/Deequ
Orchestrate event‑driven pipelines with Airflow/Prefect/Dagster; streaming/messaging via Kafka/RabbitMQ/NATS and schema registries
Design Python microservices using FastAPI/gRPC; integrate with downstream systems via REST/GraphQL; write robust automation in Python/Bash/PowerShell and SQL for data ops
Use notebooks (Jupyter) and packaging (Poetry/pip/conda) with virtualenvs, environment locking, and artifacts suitable for promotion across stages
Apply testing & quality: pytest, unit/integration/e2e tests, property‑based (hypothesis), linters/formatters (ruff/flake8, black), type checks (mypy/pyright), pre‑commit
Deliver IaC with Terraform/Pulumi; manage config via Helm/Kustomize; implement GitOps with Argo CD/Flux on managed/self‑hosted Kubernetes
Build secure CI/CD (GitHub Actions/GitLab CI/Jenkins/Azure DevOps) for app/data/ML artifacts, artifact promotion, provenance, and automated rollbacks
Embed DevSecOps: SAST/DAST/IAST (Snyk/Checkmarx/SonarQube), container & IaC scanning (Trivy), dependency hygiene (Dependabot/Renovate), SBOM (Syft/CycloneDX)
Enforce policy‑as‑code (OPA/Gatekeeper, Kyverno), image signing/verification (Sigstore/cosign), supply‑chain standards (SLSA, in‑toto)
Manage secrets/KMS with Vault and native managers; adopt short‑lived workload identities, mTLS, and least‑privilege RBAC/ABAC in clusters and pipelines
Implement AI safety & governance: prompt‑injection defenses, output filtering, PII redaction, guardrails (Guardrails.ai/NeMo Guardrails/Presidio), policy checks
Monitor model/data drift, bias, and performance with Evidently/WhyLabs/Arize/Fiddler; unify telemetry via OpenTelemetry, Prometheus, Grafana, ELK/Loki, Jaeger
Optimize compute/GPU: CUDA/cuDNN/NCCL, HPA/VPA/KEDA, efficient batching, caching, concurrency control; track cost and latency SLOs
Implement progressive delivery for services/models (blue/green, canary, shadow) using Argo Rollouts/Flagger with instant rollback and health checks
Operate API gateways and service mesh (Kong/NGINX/Envoy, Istio/Linkerd) for rate limiting, mTLS, authN/Z, and zero‑trust patterns
Ensure privacy/compliance (GDPR/CCPA/DPDP/ISO 27001): data minimization, masking/tokenization, DLP, lineage (OpenLineage/Marquez), model cards/data sheets
Collaborate with security, data, and platform teams to publish golden paths, templates, and reference implementations for repeatable AI delivery
Contribute to code/design reviews and SRE practices (SLIs/SLOs/error budgets), on‑call readiness, incident response, and blameless post‑mortems

Requirements

4–7 years of hands‑on experience in cloud platforms, automation, and AI/ML engineering workflows
Strong expertise in Terraform, Kubernetes, Helm, Docker, and modern CI/CD pipelines using GitHub Actions, GitLab CI, Jenkins, or Azure DevOps
Proficient in Python with experience in FastAPI, ML libraries (PyTorch/TensorFlow), and scripting using Bash or PowerShell for automation
Solid experience in DevSecOps practices including SAST/DAST, container/IaC scanning, secrets scanning, SBOM, and policy-as-code frameworks
Hands‑on exposure to MLOps and AI integration using tools like MLflow, Kubeflow, Weights & Biases, KServe, Seldon Core, or BentoML
Experience building or integrating RAG/LLM pipelines using LangChain, LlamaIndex, or vector databases (Pinecone/FAISS/Weaviate)
Strong cloud fundamentals across AWS/Azure/GCP with ability to architect secure, automated infrastructure via IaC and GitOps (Argo CD/Flux)
Familiarity with monitoring and observability stacks (Prometheus, Grafana, OpenTelemetry, ELK/Loki) for application and model performance
Strong troubleshooting, problem‑solving, and system debugging skills with a collaborative, engineering‑first mindset
Excellent communication skills with ability to work cross‑functionally with Data, AI/ML, DevOps, Security, and Platform Engineering teams

Tech Stack

Airflow
AWS
Azure
Cloud
Docker
Flux
Google Cloud Platform
Grafana
GraphQL
GRPC
Jenkins
Kafka
Kubernetes
Microservices
NGINX
Prometheus
Python
PyTorch
RabbitMQ
Ray
SQL
Tensorflow
Terraform
Vault

Benefits

Competitive salary
Flexible working hours
Professional development opportunities

Senior AI Integration Engineer

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits