Egen is a fast-growing and entrepreneurial company focused on data-driven solutions. They are seeking a seasoned Data Engineer to design and develop large-scale data processing pipelines, ensuring data accessibility, security, and accuracy while collaborating with stakeholders and mentoring junior engineers.
Responsibilities:
- Design, develop, and deploy large scale data processing pipelines, both batch and streaming, using technologies such as Dataflow, Apache Beam, Spark, Akka, Pub/Sub
- Expertise with multiple data storage technologies such as Bigtable/HBase, BigQuery, Spanner, CloudSQL/Postgres
- Work with stakeholders to understand business problems, develop use-cases, and translate them into pragmatic and effective technical solutions
- Design and develop appropriate schema for data based on understanding of the domain problem
- Manage data lineage and ensure data security with appropriate tools and methodologies
- Collaborate with data scientists, architects, and other stakeholders to ensure alignment between technical and business strategy
- Continuously monitor, refine and report on the performance of data management systems
- Mentor junior data engineers, reviewing their outputs and directing their professional development
Requirements:
- 2 - 4 years of experience in data engineering, particularly in designing and developing data pipelines
- Proven expertise with technologies such as Dataflow, Apache Beam, Spark, Akka, Pub/Sub
- Experience with various data storage technologies including Bigtable/HBase, BigQuery, Spanner, CloudSQL/Postgres
- Ability to design data schemas based on an understanding of the domain problem
- Experience with data security and data lineage methodologies and tools is preferred
- Familiarity with agile development methodologies
- Exceptional communication skills, able to explain complex technical concepts in clear, plain English
- BSc degree in Computer Science, Engineering or a related field, or equivalent work experience
- Experience with data migration projects
- Knowledge of dbt, Airflow, or similar orchestration tools
- Experience in multi-cloud environments
- Familiarity with data modeling and analytics use cases