Photon is a company that has powered many Digital Experiences for the Fortune 500 for the past 20 years. They are seeking a Lead Data Engineer to develop and maintain data pipelines, design custom connectors, and implement DataOps principles to ensure efficient data delivery and operations.
Responsibilities:
- Develop and maintain data pipelines, ELT processes, and workflow orchestration using Apache Airflow, Python and PySpark to ensure the efficient and reliable delivery of data
- Design and implement custom connectors to facilitate the ingestion of diverse data sources into our platform, including structured and unstructured data from various document formats
- Collaborate closely with cross-functional teams to gather requirements, understand data needs, and translate them into technical solutions
- Implement DataOps principles and best practices to ensure robust data operations and efficient data delivery
- Design and implement data CI/CD pipelines to enable automated and efficient data integration, transformation, and deployment processes
- Monitor and troubleshoot data pipelines, proactively identifying and resolving issues related to data ingestion, transformation, and loading
- Conduct data validation and testing to ensure the accuracy, consistency, and compliance of data
- Stay up-to-date with emerging technologies and best practices in data engineering
- Document data workflows, processes, and technical specifications to facilitate knowledge sharing and ensure data governance
Requirements:
- Bachelor's degree in Computer Science, Engineering, or a related field
- experience in data engineering, ELT development, and data modeling
- Proficiency in using Apache Airflow and Spark for data transformation, data integration, and data management
- Experience implementing workflow orchestration using tools like Apache Airflow, SSIS or similar platforms
- Demonstrated experience in developing custom connectors for data ingestion from various sources
- Strong understanding of SQL and database concepts, with the ability to write efficient queries and optimize performance
- Experience implementing DataOps principles and practices, including data CI/CD pipelines
- Excellent problem-solving and troubleshooting skills, with a strong attention to detail
- Effective communication and collaboration abilities, with a proven track record of working in cross-functional teams
- Familiarity with data visualization tools Apache SuperSet and dashboard development
- Understanding of distributed systems and working with large-scale datasets
- Familiarity with data governance frameworks and practices
- Knowledge of data streaming and real-time data processing technologies (e.g., Apache Kafka)
- Strong understanding of software development principles and practices, including version control (e.g., Git) and code review processes
- Experience with Agile development methodologies and working in cross-functional Agile teams
- Ability to adapt quickly to changing priorities and work effectively in a fast-paced environment
- Excellent analytical and problem-solving skills, with a keen attention to detail
- Strong written and verbal communication skills, with the ability to effectively communicate complex technical concepts to both technical and non-technical stakeholders