Codvo.ai is a global empathy-led technology services company focusing on software and people transformations. The Fullstack Data Engineer will design, build, and maintain data pipelines while operationalizing machine learning models and ensuring data quality and reliability.
Responsibilities:
- Design, build, and maintain Databricks data pipelines (ETL/ELT) for ingestion, transformation, and orchestration using Spark/Delta Lake/Databricks Workflows
- Operationalize machine learning models by building inference pipelines that invoke models authored by data scientists (batch or real-time), ensuring consistency between training and inference environments
- Ensure data reliability, quality, and observability through robust validation, monitoring, alerting, and automated recovery mechanisms
- Collaborate closely with data scientists to productionize models, manage model deployment lifecycles, and optimize inference performance and cost
- Implement best-practice DevOps/MLOps processes such as CI/CD for pipelines, model versioning, environment promotion, and infrastructure-as-code
- Optimize performance and cost across compute clusters, jobs, and storage layers
- Implement and manage the enterprise data catalog, including schema design, table ownership, lineage, governance, and documentation using Unity Catalog
- Experience with some Databricks infrastructure
- Experience with building BI dashboards and visualization
- Experience with coding agents and best practices (spec-driven development, etc.)
Requirements:
- Databricks platform experience
- Python development for data processing and ETL pipelines
- Unity Catalog knowledge
- AWS data services (S3, IAM, VPC, potentially Glue/Lambda)
- Data lake/lakehouse architecture patterns
- Dashboard building experience
- RESTful API design and development (Flask, FastAPI, or similar)
- Authentication/authorization patterns (OAuth, API keys, IAM roles)
- Query optimization and performance tuning
- PySpark optimization experience
- ML/AI pipeline experience
- Databricks AI/BI