Design and maintain optimal data pipeline architecture to ensure efficient, scalable, and reliable data flow across the organization.
Assemble and integrate large, complex datasets that meet both functional and non-functional business requirements, ensuring data quality and consistency.
Build and optimize infrastructure for extraction, transformation, and loading (ETL/ELT) of data from diverse sources using SQL, Python, distributed data processing frameworks, cloud data platforms, and cloud services (such as AWS, Azure, or GCP).
Collaborate with cross-functional stakeholders, including Executive, Product, Data, and Design teams, to address data-related technical challenges and support their data infrastructure needs.
Develop data tools and frameworks for analytics and data science teams to empower them in building and optimizing products that position the company as an innovative industry leader.
Partner with data and analytics experts to continuously enhance functionality, performance, and scalability of data systems.
Lead project and stakeholder management activities, taking full ownership of the quality, timeliness, and successful delivery of data products and solutions.
Requirements
Minimum 5 years of experience in a data engineering role, with hands-on experience with distributed data processing / big data frameworks (Databricks), Python, cloud data platforms, and AWS or other cloud environments (e.g., Azure, GCP).
Experience with workflow orchestration / scheduling platforms
Experience working with version control systems such as GitHub, GitLab, Bitbucket, or Azure Repos.
Experience maintaining and developing software in production environments utilizing CI/CD platforms such as Jenkins, GitLab CI, GitHub Actions, or Azure DevOps
Experience with modern Python API frameworks and strong understanding of RESTful services and good API design principles.
Knowledge of system design concepts, including designing scalable, reliable, and fault-tolerant data architectures.
Understanding of object-oriented programming principles and design patterns.
Strong understanding of big data / distributed systems technologies, including: streaming & messaging platforms, search & indexing / log analytics engines, structured & unstructured data, container orchestration platforms
Demonstrated ability to design and implement end-to-end scalable and performant data pipelines.
Proficiency in building data transformation processes, data structures, and workload management.
Strong ability to manipulate and extract insights from large datasets using SQL, Python, and distributed processing frameworks.
Tech Stack
AWS
Azure
Cloud
Distributed Systems
ETL
Google Cloud Platform
Jenkins
Python
SQL
Benefits
Health & Wellness: Health care coverage designed for the mind and body.
Flexible Downtime: Generous time off helps keep you energized for your time on.
Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
Family Friendly Perks: It’s not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families.
Beyond the Basics: From retail discounts to referral incentive awards—small perks can make a big difference.