Brainlabs is a media agency focused on driving profit through data-driven insights. The Data Engineering Manager will be responsible for designing, building, and managing scalable data solutions, focusing on data pipeline development and AI/GenAI process development.
Responsibilities:
- Design, develop, and maintain ETL/ELT pipelines using GCP tools like CloudFunctions, CloudRun, Dataflow, Dataproc, or Cloud Data Fusion
- Ensure data pipelines are scalable, efficient, and optimised for performance
- Build and manage data pipelines that support LLM and GenAI applications, including Retrieval-Augmented Generation (RAG) architectures, vector data stores, and prompt context assembly workflows
- Curate and prepare datasets for AI/ML model training, covering feature engineering, labeling pipeline oversight, and data versioning using tools like Vertex AI Feature Store or DVC
- Integrate data from various sources into GCP services such as BigQuery, Cloud Storage, and Cloud SQL
- Design and implement data warehouse/mart solutions using BigQuery for analytics and reporting
- Build transformation logic using SQL, Python, or Spark for preparing clean and structured data
- Optimise query performance and storage cost in BigQuery or other GCP storage systems
- Develop processes to ensure data quality, integrity, and consistency across the pipeline
- Implement monitoring and logging systems using tools like Stackdriver or Looker
- Understand and interpret business and technical requirements to support data development tasks
- Assist in building, testing, and maintaining data pipelines while ensuring alignment with project objectives and stakeholder needs
- Work closely with cross-functional teams, including data analysts, data scientists, and business stakeholders, to understand requirements
- Provide technical guidance on GCP best practices and tools
- Maintain clear documentation of processes, workflows, and data architecture
- Ensure regular maintenance and version control of pipelines and scripts
Requirements:
- 2 to 5 years of experience in designing, building, and managing scalable data solutions on Google Cloud Platform (GCP)
- Strong background in data engineering and cloud-based architectures
- Proficiency in implementing data pipelines to transform raw data into actionable insights
- Hands-on experience with GCP services like CloudFunctions, CloudRun, Schedular, BigQuery, Dataflow, Pub/Sub, and Cloud Storage
- Strong programming skills in Python and SQL
- Knowledge of data modelling, schema design, and query optimization techniques
- Experience in building batch and streaming data pipelines
- Excellent communication and collaboration skills
- Ability to work in a fast-paced and dynamic environment
- Must be legally entitled to work in the United States
- Familiarity with orchestration tools like Apache Airflow, Cloud Composer, or similar
- Working experience on other cloud stack for ETL (AWS or Azure) is a plus
- Experience with GCP's AI/ML platform (Vertex AI, BigQuery ML, or AutoML) for building, evaluating, or serving models is a strong advantage
- Hands-on experience building or supporting LLM/GenAI pipelines using frameworks such as LangChain, LlamaIndex, or Vertex AI Agent Builder
- Familiarity with AI/ML data preparation practices, including feature engineering, dataset curation, and data versioning for model training workflows
- Knowledge of CI/CD practices and tools like Git, Jenkins, or Terraform for pipeline deployments
- Understanding of data security, governance, and compliance practices on GCP
- GCP Data Engineer or Associate Cloud Engineer certification (preferred but not mandatory)