Understand, articulate, and apply principles of the defined strategy to routine business problems that involve a single function.
Support the understanding of the priority order of requirements and service level agreements.
Helps identify the most suitable source for data that is fit for purpose.
Performs initial data quality checks on extracted data.
Extract data from identified databases.
Creates data pipelines and transform data to a structure that is relevant to the problem by selecting appropriate techniques.
Develops knowledge of current data science and analytics trends.
Translate/ co-own business problems within one's discipline to data related or mathematical solutions.
Identifies appropriate methods/tools to be leveraged to provide a solution for the problem.
Shares use cases and gives examples to demonstrate how the method would solve the business problem.
Provide recommendations to business stakeholders to solve complex business issues.
Develops business cases for projects with a projected return on investment or cost savings.
Translates business requirements into projects, activities, and tasks and aligns to overall business strategy and develops domain specific artifact.
Serves as an interpreter and conduit to connect business needs with tangible solutions and results.
Identify and recommend relevant business insights pertaining to their area of work.
Analyze complex data elements, systems, data flows, dependencies, and relationships to contribute to conceptual, physical, and logical data models.
Develops the Logical Data Model and Physical Data Models including data warehouse and data mart designs.
Defines relational tables, primary and foreign keys, and stored procedures to create a data model structure.
Evaluates existing data models and physical databases for variances and discrepancies.
Develops efficient data flows.
Analyzes data-related system integration challenges and proposes appropriate solutions.
Creates training documentation and trains end-users on data modeling.
Write code to develop the required solution and application features by determining the appropriate programming language and leveraging business, technical, and data requirements.
Creates test cases to review and validate the proposed solution design.
Creates proofs of concept.
Tests the code using the appropriate testing approach.
Deploys software to production servers.
Contributes code documentation, maintains playbooks, and provides timely progress updates.
Establish, modify, and document data governance projects and recommendations.
Implements data governance practices in partnership with business stakeholders and peers.
Interprets company and regulatory policies on data.
Educates others on data governance processes, practices, policies, and guidelines.
Provides recommendations on needed updates or inputs into data governance policies, practices, or guidelines.
Requirements
Master's degree or equivalent in Computer Science, Engineering or related field and 1 year of experience in software engineering, data engineering, database engineering, business intelligence, business analytics or related field; OR Bachelor’s degree or equivalent in Computer Science, Engineering or related field and 3 years of experience in software engineering, data engineering, database engineering, business intelligence, business analytics or related field.
Experience designing and implementing scalable data pipelines and ETL jobs using Python and Spark.
Experience orchestrating data workflows using tools including Apache Airflow and Oozie, Apache Nifi for scheduling and automation.
Experience working with distributed file systems and data stores (Distributed NoSQL data stores).
Experience deploying and managing infrastructure using Infrastructure as a Service on different cloud providers including Google Cloud Platform, Amazon Web Services, and Microsoft Azure.
Experience building recommendation systems for ad-serving platforms by analysing different metrics including Click Through Rate, bidding strategies, user engagement behavior.
Experience integrating, evaluating and publishing machine learning models (XGBoost, Random Forest, and Mleap) in data pipelines.
Experience using monitoring tools including Grafana and Prometheus for observability and health checks.
Experience writing modular, maintainable, and reusable big data code in Python using design patterns.
Experience working with CI/CD practices and Git-based version control workflows.
Experience publishing model outputs and scores to storage systems hosted in different cloud platforms as distributed systems for online consumption.
Experience performing data transformations and aggregations across multiple structured and semi-structured sources.
Tech Stack
Airflow
Apache
AWS
Azure
Cloud
Distributed Systems
ETL
Google Cloud Platform
Grafana
NoSQL
Prometheus
Python
Spark
Benefits
Health benefits include medical, vision and dental coverage.
Financial benefits include 401(k), stock purchase and company-paid life insurance.
Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty and voting.
Other benefits include short-term and long-term disability, education assistance with 100% company paid college degrees, company discounts, military service pay, adoption expense reimbursement, and more.