Design, build and maintain robust, scalable data pipelines for ingestion, processing and transformation of large volumes of data;
Develop and optimize data models for analytical systems;
Implement and manage storage solutions, including data lakes and data warehouses, in cloud environments;
Ensure data integrity, quality and governance, applying best practices for management, security and FinOps;
Collaborate with multidisciplinary teams to understand business requirements and deliver data-driven solutions;
Monitor and optimize the performance of pipelines, databases and integration solutions;
Drive data architecture modernization initiatives using cutting-edge technologies and modern frameworks.
Requirements
Bachelor's degree in Computer Science, Software Engineering, Information Systems or related fields, or a postgraduate degree in related areas;
Proven experience with data visualization platforms (Tableau, Power BI and/or Looker);
Experience with GCP and its services: BigQuery, Cloud Composer, Apache Airflow, Dataflow, Pub/Sub, Cloud Run, Cloud Functions, KMS, Secret Manager;
Databases: Knowledge of data modeling, relational databases (e.g., Oracle, SQL Server, PostgreSQL) and non-relational databases (e.g., MongoDB, Cassandra);
ETL/ELT: Experience building ETL/ELT pipelines for data ingestion and transformation.
Desired Knowledge
Cloud computing;
Experience with Azure and its services, such as Azure Data Factory, Azure Synapse Analytics and Azure Data Lake Storage;
Programming languages: Proficiency in Python and PySpark (used for scripting and automation) and SQL (for data manipulation and querying);
Data Lakes: Knowledge of Data Lake architectures, storage and data optimization (e.g., Delta Lake);
DevOps: Experience with CI/CD pipelines and DataOps practices for automating deployments and monitoring data flows;
Security and Governance: Understanding of cloud security practices, data encryption and implementation of governance policies;
Data Orchestration: Use of tools such as Azure Data Factory and/or Synapse and/or Airflow/Composer for integration and data movement;
Performance and Optimization: Experience with techniques to improve the performance of data pipelines and queries in both distributed and non-distributed environments;
APIs: Knowledge of extracting data via RESTful APIs.
Tech Stack
Airflow
Apache
Azure
BigQuery
Cassandra
Cloud
ETL
Google Cloud Platform
MongoDB
Oracle
Postgres
PySpark
Python
SQL
Tableau
Benefits
Health and Dental insurance – Bradesco – extendable to dependents
PAE
Financial assistance available for dependents (children and/or stepchildren) with intellectual disabilities
Pharmacy discount program — discounts up to 85%
Supplementary Pension – FlexPrev plan — contributions ranging from 1% to 11%, depending on salary
Life Insurance – coverage for all employees from the date of hire, with no employee contribution