GitHub is the world’s leading platform for agentic software development, and they are seeking a Senior Data Engineer to join their Revenue Data & Analytics team. In this role, you will build and maintain the governed data infrastructure that powers revenue understanding, focusing on data engineering, data modeling, and analytics engineering.
Responsibilities:
- Design, build, and maintain dbt models across medallion layers (bronze/silver/gold) in Microsoft Fabric Lakehouse and Warehouse, following Kimball dimensional modeling patterns — including SCD2 dimensions, incremental CDC pipelines, and metadata-driven approaches to minimize code duplication
- Author and enforce data quality checks and dbt tests across pipeline stages to catch anomalies before they reach downstream consumers; contribute to data cataloging and lineage to ensure governed datasets are discoverable and traceable
- Develop and maintain Airflow DAGs for orchestration — scheduling, dependency management, error handling, and alerting
- Containerize data workloads with Docker and deploy via GitHub Actions CI/CD pipelines, including automated testing, linting, and environment promotion (dev → staging → prod)
- Manage and optimize ADLS Gen2 and Delta Lake storage — partitioning, compaction, retention policies, and cost management
- Collaborate with analytics engineers, BI developers, and analysts to ensure gold-layer datasets serve Power BI, Trino, and downstream reporting needs
- Participate in architecture reviews and contribute to ADRs; support migration from legacy patterns toward a governed, metadata-driven platform with pragmatism about transition paths
- Own operational excellence across data pipelines — monitoring, alerting, incident response, and proactive detection of data drift, schema changes, and quality regressions
Requirements:
- 6+ years experience in Software Engineering, Computer Science, or related technical discipline with proven experience maintaining and delivering production software coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, Ruby, Rust, or Python
- OR Associate's Degree in Computer Science, Electrical Engineering, Electronics Engineering, Math, Physics, Computer Engineering, Computer Science, or related field AND 5+ years experience in Software Engineering, Computer Science, or related technical discipline with proven experience maintaining and delivering production software coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, Ruby, Rust, or Python
- OR Bachelor's Degree in Computer Science, Electrical Engineering, Electronics Engineering, Math, Physics, Computer Engineering, Computer Science, or related field AND 4+ years experience in Software Engineering, Computer Science, or related technical discipline with proven experience maintaining and delivering production software coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, Ruby, Rust, or Python
- OR Master's Degree in Computer Science, Electrical Engineering, Electronics Engineering, Math, Physics, Computer Engineering, Computer Science, or related field AND 2+ years experience in Software Engineering, Computer Science, or related technical discipline with proven experience maintaining and delivering production software coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Go, Ruby, Rust, or Python
- OR Doctorate in Computer Science, Electrical Engineering, Electronics Engineering, Math, Physics, Computer Engineering, Computer Science, or related field
- OR equivalent experience
- 5+ years SQL experience
- SQL fluency — window functions, CTEs, merge statements, query optimization; you think in sets, not loops
- Hands-on dbt experience (Core or Cloud) — models, tests, macros, Jinja, incremental materializations; dimensional modeling (Kimball star schemas, SCD2, conformed dimensions) a strong plus
- Orchestration experience (Airflow, Prefect, Dagster, or similar) for scheduling, dependencies, and error handling
- Cloud data platform experience — Azure preferred (Fabric, ADLS, Synapse); AWS/GCP transfers; familiarity with Delta Lake, Apache Iceberg, or Spark a bonus
- Docker, Git-based workflows, and CI/CD for data pipelines; Python or equivalent for engineering tasks
- Data quality tooling (Soda, dbt Elementary) and catalog/lineage tools (Purview, Atlan, DataHub, or similar)
- Familiarity with advanced patterns — medallion architecture, Data Vault 2.0, metadata-driven frameworks, or federated query engines (Trino/Presto)
- Experience with revenue, finance, or billing data — ARR, consumption models, hierarchy attribution, and account ownership complexity