Dropbox is a global community shaping the future of work through innovative approaches. The Data Science Engineer Intern role offers a robust learning experience, allowing interns to design, build, and maintain data pipelines while collaborating with experienced professionals and gaining hands-on experience in data engineering.
Responsibilities:
- Design, build, and maintain data pipelines that ingest and process structured and unstructured data sources such as surveys, support tickets, call transcripts, and product usage data
- Develop and experiment with scalable data processing workflows that support downstream analytics, machine learning, and large language model (LLM) use cases
- Transform, validate, and model large, multi-dimensional customer behavior and usage datasets to ensure they are reliable, well-structured, and analytics-ready
- Partner with data scientists, analysts, and business stakeholders to enable clear understanding and effective use of data through well-defined datasets, documentation, and data quality standards
- Document data pipelines, schemas, and engineering best practices, and share learnings within the team to help promote a strong, data-driven culture at Dropbox
- Collaborate proactively with stakeholders across Customer Experience and Success to understand business needs, translate requirements into technical data solutions, and support accurate and timely data delivery
Requirements:
- Currently enrolled as an undergraduate (sophomore or above) or graduate student, with an expected graduation date of 2027 or later, majoring in Computer Science, Engineering, Information Systems, Data Engineering, or a related technical field
- Strong written and verbal communication skills, with the ability to explain technical concepts clearly and collaborate effectively with both technical and non-technical partners
- Familiarity with core data engineering concepts, including data ingestion, transformation, and storage workflows
- Good programming skills in Python, with experience using libraries commonly used for data processing and pipeline development (e.g., pandas, PySpark, or similar)
- Experience working with SQL and querying large datasets in relational or cloud-based data warehouses
- Basic familiarity with data modeling concepts (e.g., dimensional models, schemas) and data quality or validation practices