Dropbox is a technology company focused on building innovative solutions for data management and collaboration. They are seeking a Data Engineer to build large, scalable analytics pipelines using modern data technologies, focusing on creating new data architectures and integrations to support business needs.
Responsibilities:
- Help define company data assets (data model), Spark, SparkSQL jobs to populate data models
- Help define and design data integrations, data quality frameworks and design and evaluate open source/vendor tools for data lineage
- Work closely with Dropbox business units and engineering teams to develop strategy for long term Data Platform architecture to be efficient, reliable and scalable
- Conceptualize and own the data architecture for multiple large-scale projects, while evaluating design and operational cost-benefit tradeoffs within systems
- Collaborate with engineers, product managers, and data scientists to understand data needs, representing key data insights in a meaningful way
- Design, build, and launch collections of sophisticated data models and visualizations that support multiple use cases across different products or domains
- Optimize pipelines, dashboards, frameworks, and systems to facilitate easier development of data artifacts
Requirements:
- 5+ years of Spark, Python, Java, C++, or Scala development experience
- 5+ years of SQL experience
- 5+ years of experience with schema design, dimensional data modeling, and medallion architectures
- Experience with the Databricks platform and data lake architectures for large-scale data processing and analytics
- Excellent product strategic thinking and communications to influence product and cross-functional teams by identifying the data opportunities to drive impact
- BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent technical experience
- Experience designing, building and maintaining data processing systems
- 7+ years of SQL experience
- 7+ years of experience with schema design, dimensional data modeling, and medallion architectures
- Experience with Airflow or other similar orchestration frameworks
- Experience building data quality monitoring using MonteCarlo or similar tools