Design, build, maintain, and operationalize data pipelines for high volume and complex data using appropriate tools and practices in development, test, and production environments.
Collaborate within an agile, multi-disciplinary team to deliver optimal data integration and transformation solutions
AAnalyze data requirements (functional and non-functional) to develop and design robust, scalable automated, fault-tolerant data pipeline solutions for business and technology initiatives
Profile data to assess the accuracy and completeness of data sources and provide feedback in data gathering sessions
Develop and design data mappings, programs, routines, and SQL to acquire data from legacy, web, cloud, and purchased package environments into the analytics environment
Understand and apply the appropriate use of ELT, ETL, data virtualization, and other methods to optimize the balance of minimal data movement against performance, and mentor others on their appropriate use
Drive automation of data pipeline preparation and integration tasks to minimize manual and error-prone processes and improve productivity using modern data preparation, integration, and AI-enabled metadata management tools and techniques
Leverage auditing facilities that will enable monitoring of data quality to detect emerging issues.
Deploy transformation rules to cleanse against defined rules and standards
Participate in architecture, governance, and design reviews, identifying opportunities and making recommendations
Participate in health check assessments of the existing environment and evaluations of emerging technologies
Collaborate with architects to design and model application data structures, storage, and integration in accordance with enterprise-wide architecture standards across legacy, web, cloud, and purchased package environments
Requirements
Bachelor’s degree in computer science, data science, statistics, economics, or related functional area; or equivalent experience
6+ years’ experience working in development team providing analytical capabilities
6+ years of hands-on experience in the data space spanning data preparation, SQL, integration tools, ETL/ELT/data pipeline design
SQL coding experience
Experience working in an agile development environment (Scrum, Kanban) with a focus on Continuous Integration and Delivery
Knowledge about various data architectures, patterns, and capabilities such as event-driven architecture, real-time data flows, non-relational repositories, data virtualization, cloud storage, etc.
Knowledge of and experience with multiple data integration platforms (IBM InfoSphere DataStage, Oracle Data Integrator, Informatica PowerCenter, MS SSIS, AWS Glue, Denodo), and data warehouse MPP platforms such Snowflake, Netezza, Teradata, Redshift, etc.
Familiarity with DataOps practices and their application within analytics environments as well as their ability to extend data and analytics capabilities to other operational systems and consumers
Familiarity with event store and stream processing (Apache Kafka and platforms like Confluent) and with API development and management platforms (MuleSoft, Axway) is beneficial
Capable of focusing on a specific set of tasks while also ensuring alignment to a broader strategic design