Design, implement, and maintain scalable, high-performance ETL/ELT pipelines for structured and unstructured data
Understand the technical landscape and bank wide architecture to effectively design & deliver data solutions (architecture, pipeline etc.)
Create & Maintain CI / CD Pipelines (authoring & supporting CI/CD pipelines within Git Hub Actions and deploy to production)
Automate data applications using orchestration tools
Debug and improve existing source code
Support the continuous optimisation, improvement & automation of data pipelines
Coach & mentor other data engineers
Conduct peer reviews, testing, problem solving within the team
Identify technical risks and mitigate these (pre, during & post deployment)
Update / Design all application documentation aligned to the organization technical standards and risk / governance frameworks
Create business cases & solution specifications for various governance processes (e.g. CTO approvals)
Participate in incident management & DR activity – applying critical thinking, problem solving & technical expertise to get to the bottom of major incidents
Deliver on time & on budget (always)
Requirements
BSc Honours, BCom Honours, BEng, BBusSc in Computer Science, Information Systems or any Information Technology qualification that is at NQF level 8 or higher
3 or more years of experience as a Data Engineer
Understanding of and experience of using Big Data technologies (Hadoop) is essential
Experience with designing and developing Scala/ Apache Spark data applications
Understanding of Linux and Bash scripting
Understanding of Git and GitHub Actions
Experience in CA Wade or any other orchestration tool
Great SQL skills
Ability to work in either an Agile or project methodology to deliver tasks
Datawarehouse experience is beneficial but not a must
Cloud skills (AWS preferable) and Databricks are beneficial but not a must