Design, build, and maintain robust data pipelines and ETL processes to ingest, transform, and load data into the data warehouse.
Develop and optimize SQL scripts and Python-based data processing jobs (PySpark, Pandas) for large-scale workflows.
Implement automated data quality checks and validation processes to ensure data integrity and accuracy.
Monitor system performance, troubleshoot issues, and optimize pipelines for reliability and scalability.
Collaborate with product managers, analysts, and stakeholders to gather requirements and deliver data solutions that meet business needs.
Create and maintain design documents and technical documentation for data pipelines and systems.
Participate in design and code reviews to maintain high engineering standards.
Apply data modeling and schema design techniques to support efficient, scalable storage and querying.
Contribute to CI/CD pipeline usage and integration to deploy and manage data processing jobs.
Follow SDLC practices and work as an active member of a development team, modifying ETL tools and workflows as needed.
Requirements
Bachelor’s degree in Computer Science, Engineering, or a related field.
3+ years of proven experience as a Data Engineer or similar role with strong ETL and data pipeline experience.
Proficiency in SQL and scripting with Python for data processing tasks.
Proficiency with PySpark and/or Pandas for large-scale data processing.
Familiarity with data warehousing concepts and tools (e.g., AWS Redshift, Google BigQuery, Snowflake) and experience optimizing performance for large datasets.
Strong experience in database development, data modeling, schema design, and optimization techniques for scalability.
Experience writing and maintaining automation test cases for data pipelines.
Experience with Unix/Linux operating systems and shell scripting.
Practical knowledge of CI/CD pipelines and how to use them for data job deployment.
Solid understanding of SDLC and experience working within development teams.
Deep domain knowledge of Credit and Fintech and how data supports related products and processes.
Self-motivated, proactive, and able to communicate and collaborate effectively across teams.