The New York Times is a world-renowned journalism organization committed to seeking the truth and helping people understand the world. They are seeking a Data Engineer to design and implement complex data pipelines, manage data storage across cloud platforms, and ensure high-quality data for analytics.

Responsibilities:

Design, model, and implement complex ELT/ETL pipelines for the cleansed and curated data layers in the medallion architecture, taking full ownership of the data product's structure, partitioning, documentation, and performance characteristics
Develop advanced data transformations using dbt (data build tool) for relational data modeling and PySpark for large-scale data processing within the Lakehouse, ensuring outputs meet strict Service Level Agreements and quality standards
Collaborate across teams to define requirements and translate them into robust and scalable data models suitable for analytic consumption
Manage the physical data storage across both GCP and AWS, selecting optimal file formats and designing efficient partitioning and clustering strategies
Administer and tune Spark compute resources (e.g., Dataproc, EMR, or managed services) to optimize job execution time and cost
Own core components of our centralized analytics environment, specifically focused on Hex, integrations, and the methods of data exposure and access controls; and support data activation strategies, ensuring seamless data consumption by analytic tools
Optimize user queries and access patterns to maintain platform performance and cost efficiency
Implement centralized data quality checks and observability mechanisms within the data pipeline to proactively identify and resolve data issues
Contribute to the implementation of metadata management, data lineage, and role-based access control (RBAC) initiatives across the Lakehouse environment
Demonstrate support and understanding of our value of journalistic independence and a strong commitment to our mission to seek the truth and help people understand the world

Requirements:

2+ years of hands-on experience in a Data Engineering, Data Warehousing, Analytics Engineering or equivalent role
Proficiency in SQL and experience with complex, production-level data modeling (dimensional modeling, Kimball, OBT, or Data Vault)
Demonstrated experience designing, developing, and deploying end-to-end data products through the full Software Development Lifecycle
Experience with a Cloud Data Warehouse, like BigQuery
Proficiency in Python for scripting and data manipulation, including knowledge of PySpark or other Spark APIs
Familiarity with cloud services and data storage components in at least one major cloud provider (GCP or AWS)
Experience with workflow orchestration tools (e.g., Airflow, Cloud Composer, or Prefect) and version control systems (Git)
Experience operating in a dual-cloud environment (GCP/AWS)
Experience with Infrastructure-as-Code (IaC) tools like Terraform
Experience with advanced Lakehouse file formats like Iceberg or Delta Lake
Familiarity with experimentation or A/B testing platforms and the data required to support them
Experience in data product quality standards through integration advanced testing, quality checks, and monitoring into the CI/CD pipeline

Data Engineer, Analytics Data Products

Key skills

About this role

Responsibilities:

Requirements: