ApacheAzureCloudDistributed SystemsETLPySparkPythonScalaSparkSQLELTData EngineeringData LakeAnalyticsBIDatabricksApache SparkAzure FunctionsSQL ServerGitHubSource ControlAgileCI/CDRemote Work
About this role
Role Overview
Understand Cogstate data sources and develop data pipelines using Databricks to bring all data into the data lake.
Design, develop, implement, and tune large-scale distributed systems and pipelines that process large volumes of data; focusing on scalability, low-latency, and fault-tolerance in every system.
Developing scalable and re-usable frameworks for ingesting data into Azure Databricks, incorporating standards and best practices into engineering solutions.
Databricks engineering
query tuning, performance tuning, troubleshooting, and debugging pipelines.
Deep understanding of ETL/ELT design methodologies, architecture, strategy, and tactics for complex ETL solutions, including CI/CD skills.
Develop high performance scripts in PySpark to achieve objectives of enterprise data, BI, data visualization and analytics needs.
Data processing/transformation using various technologies such as Apache Spark, SQL, Python/Scala and Azure cloud services.
Manage code versions in source control and coordinate changes across teams by leveraging Github.
Participate in architecture design and discussions, provide logical and physical data design, and database modelling.
Be part of the Agile team to ensure availability of data to internal and external users.
Organize and manage data shares.
Solve complex data issues around data integration, data quality, and other data processing incidents.
Work with business system owners to resolve source data issues and refine transformation rules.
Requirements
BS/BA in Computer Science, Data Science, or a related field or relevant experience
2+ years in implementing data engineering solutions in PySpark in Databricks
Knowledge of relational databases and Apache Spark.
Strong knowledge of Databricks configuration, troubleshooting and performance tuning.
Testing, automation and orchestration, including Github and Azure functions.
Experience with development tools for CI/CD.
Deep expertise in programming languages for data processes (PySpark, Python, Scala).
Experience with relational databases like SQL Server writing complex SQL transformations.
Tech Stack
Apache
Azure
Cloud
Distributed Systems
ETL
PySpark
Python
Scala
Spark
SQL
Benefits
Remote Work Practices: Cogstate is a virtual first company. Cogstate employees can work from anywhere where Cogstate is registered to business within the United States, Australia, or the United Kingdom!
Generous Paid Time-off: Cogstate employees receive 20 days of vacation leave, 10 days of personal leave and 10 paid public holidays.
401(k) Matching: As you invest in yourself and your future, Cogstate invests in you too: we match up to3% of your yearly salary in Cogstate’s 401k program.
Competitive Salary: We offer competitive base salaries plus additional earning opportunities based on the position.
Health, Dental & Vision Coverage: We've invested in comprehensive health & dental insurance options with competitive company contributions to help when you need it most. We also offer free vision insurance for all full-time employees.
Short-Term & Long Term Disability Life Insurance: 100% employer sponsored
Pre-Tax Benefits: Healthcare and Dependent Care Flexible Spending Accounts
Learning & Development Opportunities: Cogstate offers a robust learning program from mentorships to assistance with programs to improve knowledge or obtain certifications in applicable areas of interest.