Connect Tech+Talent is seeking a Data Scientist (Big Data Engineer) 2 to work on a 5-month contract. The role involves implementing ETL/ELT workflows, automating deployments, and collaborating with cross-functional teams to design and maintain data models and storage solutions.
Responsibilities:
- Implement ETL/ELT workflows for both structured and unstructured data
- Automate deployments using CI/CD tools
- Collaborate with cross-functional teams including data scientists, analysts, and stakeholders
- Design and maintain data models, schemas, and database structures to support analytical and operational use cases
- Evaluate and implement appropriate data storage solutions, including data lakes (Azure Data Lake Storage) and data warehouses
- Implement data validation and quality checks to ensure accuracy and consistency
- Contribute to data governance initiatives, including metadata management, data lineage, and data cataloging
- Implement data security measures, including encryption, access controls, and auditing; ensure compliance with regulations and best practices
- Proficiency in Python and R programming languages
- Strong SQL querying and data manipulation skills
- Experience with Azure cloud platform
- Experience with DevOps, CI/CD pipelines, and version control systems
- Working in agile, multicultural environments
- Strong troubleshooting and debugging capabilities
- Design and develop scalable data pipelines using Apache Spark on Databricks
- Optimize Spark jobs for performance and cost-efficiency
- Integrate Databricks solutions with cloud services (Azure Data Factory)
- Ensure data quality, governance, and security using Unity Catalog or Delta Lake
- Deep understanding of Apache Spark architecture, RDDs, DataFrames, and Spark SQL
- Hands-on experience with Databricks notebooks, clusters, jobs, and Delta Lake
Requirements:
- 4 Years - Implement ETL/ELT workflows for both structured and unstructured data
- 4 Years - Automate deployments using CI/CD tools
- 4 Years - Collaborate with cross-functional teams including data scientists, analysts, and stakeholders
- 4 Years - Design and maintain data models, schemas, and database structures to support analytical and operational use cases
- 4 Years - Evaluate and implement appropriate data storage solutions, including data lakes (Azure Data Lake Storage) and data warehouses
- 4 Years - Implement data validation and quality checks to ensure accuracy and consistency
- 4 Years - Contribute to data governance initiatives, including metadata management, data lineage, and data cataloging
- 4 Years - Implement data security measures, including encryption, access controls, and auditing; ensure compliance with regulations and best practices
- 4 Years - Proficiency in Python and R programming languages
- 4 Years - Strong SQL querying and data manipulation skills
- 4 Years - Experience with Azure cloud platform
- 4 Years - Experience with DevOps, CI/CD pipelines, and version control systems
- 4 Years - Working in agile, multicultural environments
- 4 Years - Strong troubleshooting and debugging capabilities
- 3 Years - Design and develop scalable data pipelines using Apache Spark on Databricks
- 3 Years - Optimize Spark jobs for performance and cost-efficiency
- 3 Years - Integrate Databricks solutions with cloud services (Azure Data Factory)
- 3 Years - Ensure data quality, governance, and security using Unity Catalog or Delta Lake
- 3 Years - Deep understanding of Apache Spark architecture, RDDs, DataFrames, and Spark SQL
- 3 Years - Hands-on experience with Databricks notebooks, clusters, jobs, and Delta Lake
- 1 Year - Knowledge of ML libraries (MLflow, Scikit-learn, TensorFlow)
- 1 Year - Databricks Certified Associate Developer for Apache Spark
- 1 Year - Azure Data Engineer Associate