Oversee ETL processes to ensure their success, with the primary responsibility of providing accurate and timely data across various products to support the company's revenue generation
Maintain data pipelines running on proprietary big data processing platform and providing support to make sure data delivery SLAs are met
Maintain data engineering processes using a variety of tools including T-SQL, Spark and Scala, and shell scripting
Generally focused on data ingestion for healthcare data management, data validation, statistical report generation, and program validation
Develop, support, and improve scalable and efficient data ingestion processes and techniques aimed at enhancing process efficiencies and optimizing query performance for our proprietary data applications and systems
Implement and perform data validation and quality checks to maintain high data integrity
Perform troubleshooting, data analysis, data mining, investigations and identifying root cause of issues using several cutting-edge data analysis tools in a fast-paced environment
Collaborate with stakeholders to understand data requirements and deliver high-quality data solutions
Work with Technical Operations to troubleshoot complex database issues related to the entire environment including OS, storage, and servers
Provide off hours support to resolve production issues when necessary
Develop data transformation specifications to convert source data to be loaded into target data warehouse tables using SQL and other Data Integration/ETL tools
Support the implementation of data governance policies and security measures to protect sensitive information
Create and maintain dashboards as needed
Participate in meetings with clients and/or stakeholders
Complete individual productivity tracking
Complete task assignments using department ticketing system within assigned deadline
Achieve organizational and individual goals as identified in performance reviews and goal-setting exercises
Complete all special projects and other duties as assigned
Requirements
Bachelor’s degree in Computer Science, Information Technology or equivalent work experience
2+ years of working knowledge of RDBMS (Oracle, MS SQL, Vertica, etc.) and experience using SQL, PL/SQL or other data integration/ETL tools
2+ years of experience in data engineering, data analysis, or a related field with a strong track record of building and managing data pipelines
2-4 years’ experience with data aggregation, standardization, linking, quality check mechanisms, and reporting
2-4 years’ experience with big data technologies like Hadoop and Spark
Proficient and experienced in analyzing, designing, and developing solutions and strategies involving relational databases (e.g., Oracle, Vertica, SQL Server), ETL tools (e.g., SSIS, ODI, Informatica), and data warehousing concepts
Solid understanding of Linux environments; strong knowledge of shell scripting and file systems
Knowledge of US healthcare data, preferably in data operations role, is a plus
Experience in coding principles and the ability to follow best practices for developing and deploying code for data manipulation and automation
Experience in at least one programming language such as Python, Java, Scala or Powershell
Knowledge of data governance principles, data quality, and data lifecycle management a plus
Proficient in Microsoft Office Suite applications PowerPoint, Word, Excel and Outlook
Flexible work schedule
Experience with project management tools like JIRA
Strong analytical skills
Quick Learner, energetic and flexible
Excellent verbal, listening and written communication skills
Ability to multitask and prioritize projects to meet scheduled deadlines and tight turnaround times
Tech Stack
ETL
Hadoop
Informatica
Java
Linux
Oracle
Python
RDBMS
Scala
Shell Scripting
Spark
SQL
SSIS
Benefits
medical, dental, vision, disability, and life insurance coverage
401(k) savings plans
paid family leave
9 paid holidays per year
17-27 days of Paid Time Off (PTO) per year, depending on specific level and length of service