Spokeo is a people search engine that helps over 18 million monthly visitors reconnect with friends, reunite with families, and protect against fraud. As a Senior Data Engineer, you will develop, optimize, and improve data systems and collaborate with stakeholders to build data products and improve data pipelines.
Responsibilities:
- Build infrastructure and data automation pipelines for the ingestion, processing, and loading of data from various sources. Automate and integrate new components into the data pipeline
- Collaborate with stakeholders and data science teams to develop data products, including entity resolution and best selection, to efficiently execute product vision and strategy in alignment with organizational goals and priorities
- Collaborated with Data Scientists and ML Engineers to design, build, and maintain scalable data pipelines and infrastructure to support end-to-end machine learning workflows (development, training, deployment, and monitoring)
- Create unit and stress test components to monitor technical performance and ensure identified issues are resolved
- Develop data analysis tools to provide data insights and capture key metrics
- Research solutions and maintain technical documentation
- Follow best practices for data governance, quality, cleansing, and other ETL-related activities
Requirements:
- 7+ years of development experience in data engineering within a production environment (internships and academic settings excluded)
- Proven experience working with large datasets exceeding 100M+ records or multiple terabytes
- 2+ years of development experience in highly scalable, distributed systems and cluster architectures using AWS
- 5+ years of hands-on programming experience with Python
- 5+ years of professional experience working in big data ecosystems, Spark is required; PySpark is preferable
- 3+ years of experience with SQL, schema design, and dimensional data modeling
- 2+ years of professional experience working with dataflow orchestration tools, such as Airflow
- 2+ years of experience with non-relational databases (e.g., DynamoDB, Elasticsearch, etc.)
- A bachelor's degree in Computer Science, Information Systems, Mathematics, or a related field is required