Role Overview
- Productization: Convert experimental notebooks into robust, scalable, and auditable production pipelines;
- ML Architecture: Design and implement the infrastructure required for the full model lifecycle (training, deployment, monitoring, and retraining);
- Agent Architecture: Design and implement the infrastructure required for the full lifecycle of LLM-based agents (fine-tuning, deployment, monitoring, and retraining);
- Automation: Implement CI/CD/CT practices for machine learning models and generative AI solutions;
- Observability: Ensure real-time visibility into model performance (drift detection, latency, cost monitoring);
- Governance: Standardize versioning of data, models, and code to ensure reproducibility;
- Innovation in LLMOps: Support the team in operationalizing agents and RAGs, focusing on evaluation tools and tracing;
- Build and maintain orchestration pipelines for data and models using tools such as Airflow (Composer) and Kubeflow;
- Develop machine learning models and AI agents, selecting the best architectures to solve business problems;
- Manage model deployment (REST APIs, batch, streaming) in Kubernetes or serverless environments;
- Configure experiment tracking tools;
- Work on cloud cost optimization related to AI workloads;
- Collaborate with Data Scientists to refactor code for performance and software engineering best practices;
- Implement monitoring for model and data quality in production.
Requirements
- Bachelor's degree in Data Science, Software Engineering, Computer Engineering, or related fields;
- Programming and engineering: Advanced proficiency in Python and software engineering best practices (SOLID, unit testing, Clean Code);
- Containerization: Full mastery of Docker and orchestration with Kubernetes (GKE);
- Cloud (GCP focus): Hands-on experience with Vertex AI (Pipelines, Endpoints), Cloud Build, Artifact Registry, GCS, and IAM;
- Pipelines and orchestration: Solid experience with Apache Airflow (Cloud Composer) or Kubeflow Pipelines;
- CI/CD: Building automation pipelines using GitHub Actions, GitLab CI, or Cloud Build;
- MLOps core: Experience with model tracking and versioning tools (MLflow, DVC, or Vertex AI Model Registry);
- Infrastructure as Code (IaC): Knowledge of Terraform.
⭐ Differentials:
- Experience with LLMOps: Running LangChain or LangGraph pipelines using Vertex AI Pipelines or Cloud Run infrastructure;
- RAG at Google scale: Implementing RAG (Retrieval-Augmented Generation) architectures integrating BigQuery Vector Search or Vertex AI Search;
- Feature Store: Experience implementing or using Feature Stores;
- Databases: Knowledge of BigQuery and vector databases (Vector DBs) for semantic search applications;
- Monitoring: Use of tools such as Grafana or dedicated monitoring stacks;
- Experience in mission-critical environments: Previous experience in sectors such as healthcare, finance, or insurance, handling governance and data privacy.
Tech Stack
- Airflow
- Apache
- BigQuery
- Cloud
- Docker
- Google Cloud Platform
- Grafana
- Kubernetes
- Python
- Terraform
Benefits
- Meal support: Meal Voucher / Food Allowance or on-site cafeteria (depending on location);
- Health support: Health insurance and life insurance;
- Professional development: Universidade Dasa, Development and Career Cycle, Technology Academies/PMAX, and the "Programa Crescer" within Dasa;
- Other: Transportation voucher and performance-based bonus (PPR).
💰 Unique to Dasa – Health Program based on five pillars:
- Spiritual: yoga;
- Physical: TotalPass, primary care clinic, discounts on tests and vaccinations;
- Intellectual: Universidade Dasa;
- Relational: UAU perks club and SESC benefits;
- Emotional: Telepsychology.
*Benefits may vary depending on the job location and brand.