Guidehouse is seeking a Data Engineer to join their Technology AI & Data practice, supporting public sector and health sector clients. This role involves designing, building, and maintaining scalable data pipelines, ensuring data quality, and collaborating with various teams to deliver data products for analytics.
Responsibilities:
- Design, build, test, and maintain scalable data pipelines (batch and/or streaming as applicable) with increasing independence
- Integrate data from multiple sources, resolve inconsistencies, and deliver curated datasets for analytics and operational use
- Own data quality for assigned domains by implementing validation checks, reconciliation, and monitoring/alerting patterns
- Build, maintain, and deploy data products for analytics and data science teams on cloud platforms (e.g. AWS, Azure, GCP)
- Optimize performance of pipelines and queries (tuning, partitioning patterns, efficient compute usage)
- Collaborate cross-functionally with analysts, data scientists, and stakeholders to translate requirements into technical designs and delivery plans
- Produce and maintain technical documentation for data flows, data models, and operational procedures
- Contribute to governance and compliance practices (access controls, lineage awareness, controlled data handling) within your scope
Requirements:
- Bachelor's degree from an accredited college/university
- Based on our contractual obligations, candidate must be located within the United States and US Citizen
- Must be able to OBTAIN and MAINTAIN a Federal or DoD 'PUBLIC TRUST'
- Advanced SQL and Python skills and experience with relational databases and database design
- Experience working with data ingestion tools such as AWS Lambda, AWS Data Migration Service, SFTP
- Experience making dashboards and using data visualization tools (Tableau, Power BI)
- Experience in integrating data from disparate systems and technologies (IBM Mainframe, Structured, Semi-structured and unstructured sources)
- Proficiency with one or more cloud-based solutions (e.g., AWS, Azure, GCP)
- Designing/deploying data solutions on cloud platforms (AWS, GCP, Azure)
- Hands-on experience with cloud services and REST API integrations
- Proficiency with modern data tools (e.g., Spark/Databricks, Airflow, dbt, Kafka) is a plus
- Experience working with distributed data processing tools such as PySpark, AWS Glue
- Databricks and/or Snowflake Data Engineer Associate or Professional certification
- Proficiency with workflow management systems (Nextflow, Snakemake, Airflow)
- Experience with regulated environments (GxP, 21 CFR Part 11) and data governance