Evaluate and improve T-SQL, MDX, DAX, HiveQL programming concepts such as queries, stored procedures, functions, temporary tables, parameterization, complex joins and groupings
Develop and optimize ETL/ELT pipelines, to load data from on premise and online systems
Ensure data solutions stability and performance optimization
Conduct data warehouse model design, development and support
Prepare, cleanse, validated datasets for data science purpose
Assist data troubleshooting, data featuring and data discovery
Develop tools to automate development and monitoring process
Develop python algorithms for data processing
Support Data science environment, assists data science projects
Manage time effectively to ensure that projects are delivered on schedule
Provide on-going maintenance and support of existing and new data solutions
Support solution automation and CI/CD
Requirements
Bachelor’s Degree in Computer Science / Information technology or related field AND five (5) years as Data Engineer, Big Data Engineer, Data Architect, SQL Developer, Database Developer, or related
5 years’ experience with: Developing business intelligence solutions including data integration, data schema development, data pipelines, modeling and reporting/analytics Database design principles, data modeling, partitioning, and data warehouse Python, and Shell scripting SQL writing and query tuning, and query performance optimization data analysis, data modeling, data migration, computer programming, and problem-solving
4 years’ experience with: Data validation, cleansing, featuring: Pandas, Spark dataframes, and DQ solutions
3 years’ experience with: CI/CD. CDC
2 years’ experience with: big data pipeline development, monitoring and support: ETL, SSIS, Hadoop, HDFS, Spark, Hive, RDD, and UDF. cloud data ecosystem: Spark API, Spark SQL, PySpark, Scala, Python, and data Streaming
Demonstrative experience with: data science tools: Python ML, Scala, and Databricks.