Piper Companies is seeking a Data Engineer to support a leading organization in the data and analytics industry. The Data Engineer role is ideal for an experienced professional with strong expertise in distributed computing, ETL processes, and modern data engineering practices.
Responsibilities:
- Develop data pipelines and distributed data processing workflows using Python, Apache Spark, and related technologies
- Build and optimize ETL processes to support analytics, machine learning, and reporting requirements
- Create complex SQL queries for testing, validation, and analysis across large datasets
- Support data modeling efforts and ensure alignment with architectural standards
- Partner with engineering and analytics teams to source and prepare the right data
- Implement best practices for code quality, data governance, and performance optimization
- Utilize MLFlow, GitHub, Docker, Databricks, and cloud-based data tools
- Troubleshoot data pipeline issues and support continuous improvement
- Contribute to Agile development processes and cross-functional collaboration
Requirements:
- Bachelor's degree in Computer Science, Computer Engineering, or a related field
- 5+ years of experience in data engineering or a closely related field
- Solid understanding of R
- Strong understanding of data modeling, ETL processes, and distributed computing
- Strong experience with Python and Apache Spark
- Solid understanding of software design patterns, data structures, and algorithms
- Experience developing data or ML pipelines, including MLOps practices
- Ability to create complex SQL queries for analysis and validation
- Experience with model tuning and governance frameworks (if supporting ML workloads)
- Proficiency with MLFlow, GitHub, Docker, Databricks, and modern engineering toolsets
- Experience with Agile development methodologies
- Ability to work independently as well as collaboratively in a team environment
- Strong problem-solving, analytical, verbal, and written communication skills
- AWS experience (S3, EC2, Glue, Lambda, etc.)
- Advanced R experience
- Professional Databricks or Apache Spark certifications
- SAS experience