Design, develop and maintain data pipelines (batch and streaming) for structured and unstructured data;
Develop and optimize ETL processes using programming languages such as Python and/or SQL;
Work with open-source technologies for data processing, integration and storage in on-premises environments;
Manage and optimize databases, ensuring performance, integrity and availability;
Ensure data quality, integrity and reliability throughout the data lifecycle (validation, monitoring and testing);
Design and improve data architectures, ensuring performance, scalability and security;
Act as a link between Data Scientists, Data Analysts and AI teams, translating needs into technical requirements and data handling;
Improve the City of Porto's Urban Platform by creating new ways to increase the value and quality of the data collected and used for decision-making that impacts the community's quality of life.
Requirements
Degree in Computer Engineering, Data Analysis, Computer Science, Electrical and Computer Engineering or similar;
Minimum of 2 years' experience in similar roles;
Knowledge of Python and SQL for data manipulation and transformation;
Experience developing, testing and maintaining ETL pipelines;
Experience with UNIX/Linux systems;
Experience with Docker and containerization concepts;
Knowledge of orchestration tools (e.g., Airflow, NiFi)
preferred;
Knowledge of search and indexing engines (e.g., Elasticsearch)
preferred;
Knowledge of relational and non-relational databases (e.g., PostgreSQL, MySQL, MongoDB)
preferred;
Knowledge of data modeling, data lakes and data warehouses
preferred;
Experience with Pandas and NumPy
preferred;
Familiarity with API integration and microservices architecture