Design and develop data pipelines: create robust, scalable data pipelines using Databricks, Apache Spark, and SQL to transform and process large datasets efficiently.
Performance optimization: monitor and optimize the performance of existing data pipelines and workflows to ensure high throughput and low latency.
Collaboration with stakeholders: work closely with data scientists, analysts, and business stakeholders to understand data requirements and translate them into technical specifications.
Data quality and governance: implement data quality checks and governance practices to ensure data consistency, accuracy, and compliance.
Documentation and best practices: maintain comprehensive documentation for data pipelines and processes, and contribute to the establishment of best practices within the team.
Requirements
Solid experience in data engineering or a related field, with a focus on data pipelines
Soft skills: excellent communication in English and Portuguese, with the ability to work effectively in a team-oriented environment and engage with clients.
Technical skills: proficiency in Databricks, Apache Spark, SQL, and Python (or Scala).
Cloud technologies: experience with Azure cloud.
Problem-solving: excellent analytical and problem-solving skills, with the ability to troubleshoot complex data issues.
Education: bachelor’s degree in Computer Science, Information Technology, or a related field.
Tech Stack
Apache
Azure
Cloud
Python
Scala
Spark
SQL
Benefits
Health and dental insurance
Meal and food allowance
Childcare assistance
Extended paternity leave
Partnerships with gyms and health and wellness professionals via Wellhub (Gympass) and TotalPass
Profit sharing and results participation (PLR)
Life insurance
Continuous learning platform (CI&T University)
Discount club
Free online platform dedicated to physical, mental, and overall well-being