AirflowCloudETLPySparkPythonSQLMachine LearningLarge Language ModelsRAGData EngineeringData WarehousingDatabricksdbtCI/CDCommunication
About this role
Role Overview
Build, scale, and maintain robust data solutions.
Implement and optimize high-performance data pipelines: extraction, loading, transformation, and orchestration – that are designed for scalability, reliability, maintainability, and speed.
Champion modern software engineering practices as CI/CD, infrastructure-as-code, containerization, and cloud-native deployments
Collaborate closely with business stakeholders to transform use cases into production-ready services and solutions, owning the system from concept to production.
Implement rigorous testing and monitoring practices to maintain superior data quality and integrity.
Requirements
A bachelor's degree or higher in a STEM field, required
Concentration in Computer Science, Math, Physics or other engineering related field, preferred
5+ years of experience in data engineering or a related discipline, with a proven track record of success.
Expertise in Python and SQL, with a strong foundation in data manipulation and analysis.
Proficient with Databricks/PySpark and dbt for data warehousing and data transformation tasks.
Experience with workflow orchestration tools e.g. Airflow, Dagster
Experience working with large language models (LLMs) especially prompt engineering, retrieval-augmented generation (RAG)s, and/or vector databases are pluses.
Knowledge of fundamental principles of machine learning, feature engineering, and knowledge graphs are pluses.
Demonstrated experience in designing and implementing complex data systems from the ground up.
Proficient in handling large-scale data projects, including data cleaning, ETL, and information retrieval.
Excellent communication skills required, both verbal and written.