Develop, deploy, and support automated, scalable real-time and batch data streams from a variety of sources into the lakehouse.
Develop and implement data auditing strategies and processes to ensure data quality; identify and resolve problems associated with large scale data processing workflows; implement technical solutions to maintain data pipeline processes and troubleshoot failures
Collaborate with technology teams and partners to specify data requirements and provide access to data
Tune application and query performance using profiling tools and SQL or other relevant query language
Translate business and analytics requirements for data to a comprehensive data model and pipelines
Foster data expertise and own data quality for assigned areas of ownership
Work with data infrastructure to triage issues and drive to resolution
Requirements
Bachelor’s Degree in Data Science, Data Analytics, Information Management, Computer Science, Information Technology, related field, or equivalent professional experience
Overall experience should be more than 7 + years
3+ years of experience working with SQL and Python
3+ years of experience in implementing data pipelines using modern data architectures
2+ years of experience working with data warehouses such as Redshift, BigQuery, Snowflake, or similar
Experience with open-source based data architectures: Spark, Hive, Trino / Presto or similar
Excellent software engineering and scripting knowledge
Strong communication skills (both in presentation and comprehension) along with the aptitude for cross-collaboration across data management and analytics domains
Expertise with data systems working with massive data volumes from various data sources