Assisting in the migration of data pipelines from Azure-based infrastructure to GCP
Designing, building, and testing ETL workflows in cloud-native environments
Refactoring and optimizing existing data transformation processes for scalability and performance
Supporting data validation, reconciliation, and quality assurance efforts
Contributing to technical documentation and architectural diagrams
Collaborating with product engineering teams to ensure seamless integration of data services
Participating in code reviews and version control workflows

Strong SQL skills
Python for data engineering and transformation
Experience with data manipulation using Pandas and NumPy
Understanding basic machine learning concepts, including regression, classification, clustering, model evaluation, and overfitting
Ability to build, train, and evaluate ML models using tools like Scikit-learn
Familiarity with data visualization tools such as Matplotlib or Seaborn
Understanding of basic statistics, probability, and machine learning concepts (regression and classification)
Experience with data cleaning, preprocessing, and exploratory data analysis (EDA)
Knowledge of ETL processes and experience with ETL tools such as Informatica; familiarity with GCP tools (e.g., Big Query, Dataflow, Cloud Composer, Cloud Storage) is an added advantage
Familiarity with cloud platforms (Azure and/or GCP preferred)
Understanding of distributed data processing concepts
Experience with version control (e.g., GitHub)
Knowledge of data warehousing concepts and data modeling
Strong problem-solving and analytical skills

Data Engineering Intern

Key skills