Analysis and Planning of Loads/Pipelines: Assess the architecture and requirements of the Data Warehouse.
Map data, transformations and processes to GCP services (Cloud Storage, BigQuery, Dataproc).
Define data migration strategy (full load, incremental, CDC).
Develop a data architecture plan on GCP.
Data Design and Modeling on GCP: Design table schemas in BigQuery considering performance, cost and scalability.
Define partitioning and clustering strategies for BigQuery.
Model data zones in Cloud Storage (Bronze, Silver and Gold).
Development of ELT/ETL Pipelines: Create data transformation routines using Dataproc (Spark) or Dataflow to load data into BigQuery.
Translate business logic and existing transformations into GCP.
Implement data validation and quality mechanisms.
Infrastructure Provisioning and Management: Use IaC tools (Terraform) to provision and manage GCP resources (BigQuery datasets/tables, Cloud Storage buckets, Dataproc clusters).
Configure and optimize Dataproc clusters for different workloads.
Manage networking, security (IAM) and access in GCP.
Performance and Cost Optimization: Optimize BigQuery queries to reduce costs and improve performance.
Tune and optimize Spark jobs on Dataproc.
Monitor and optimize GCP resource usage to control costs.
Data Security and Governance: Implement and ensure data security in transit and at rest.
Define and enforce IAM policies to control access to data and resources.
Ensure compliance with data governance policies.
Monitoring and Support: Troubleshoot performance and functional issues in data pipelines and GCP resources.
Documentation: Document the architecture, data pipelines, data models and operational procedures.
Communication: Communicate effectively with team members, stakeholders and other company areas.
Jira / Agile Methodologies: Familiarity with agile methodologies, their ceremonies and proficiency with Jira.
Requirements
Google Cloud Platform (GCP):
BigQuery: Deep knowledge in data modeling, query optimization, partitioning, clustering, data loading (streaming and batch), security and data governance.
Cloud Storage: Experience managing buckets, storage classes, lifecycle policies, access control (IAM) and data security.
Dataproc: Ability to provision, configure and manage Spark/Hadoop clusters, optimize jobs, and integrate with other GCP services.
Dataflow/Composer/DBT: Knowledge of orchestration and data processing tools for ELT/ETL pipelines.
Proven experience of at least 3 years in GCP.
Proven experience of at least 3 years in DBT (if possible).
Proven experience of at least 3 years in PySpark.
Proven experience with GitFlow.
Cloud IAM (Identity and Access Management): Implementation of security policies and granular access control.
VPC, Networking and Security: Understanding of networks, subnets, firewall rules and cloud security best practices.
Programming Languages:
Python and PySpark: Essential for automation scripts, development of data pipelines and integration with GCP APIs.
SQL (advanced): For BigQuery, DBT and data transformations.
Shell Scripting: For task automation.
Version Control: Git/GitHub/Bitbucket.
Tech Stack
BigQuery
Cloud
ETL
Google Cloud Platform
Hadoop
PySpark
Python
Shell Scripting
Spark
SQL
Terraform
Benefits
Porto Seguro Health Plan: Comprehensive care for you and your family, with the option to include spouse and children.
Porto Seguro Dental Plan: Dental coverage for you and your dependents.
Profit Sharing (PLR): Recognition for your work and contribution to the company's success.
Childcare Assistance: Financial support to help parents care for their young children.
Alelo Food and Meal Vouchers: Ensuring convenient and comfortable meals in your daily routine.
Home Office Allowance: Support to help you set up a comfortable workspace at home.
Partnerships with Educational Institutions: Access to education with discounts and incentives for courses and degrees.
Support for Certifications, including Cloud: Advance your career with certifications in major technologies such as GCP, Azure, AWS and others.
Livelo Points: Earn points and use them as you prefer, with full freedom of choice.
TotalPass: Health incentive with discounted gym plans for employees and family members.
Mindself: Support to improve quality of life through meditation and mindfulness.