
Job Description
Extensive experience in AWS, design, implementation, and maintenance of data pipelines using Java-Spark and PySpark.
Proficient in SQLs, able to write and execute complex queries to perform curation and build views required by end users (single and multi-dimensional).
Proven experience in performance and tuning to ensure jobs are running at optimal levels and no performance bottleneck.
Advanced proficiency in Cloud Data Warehouse Snowflake, AWS Redshift.
Demonstrated knowledge of software applications and technical processes within a technical discipline (e.g., cloud, artificial intelligence, machine learning, mobile, etc.)
Solid understanding of agile methodologies such as CI/CD, Application Resiliency, and Security
Proficiency data structures, data serialization formats such as JSON, AVRO, Protobuf, or similar, big-data storage formats such as Parquet, Iceberg, or similar, data processing methodologies such as batch, micro-batching, or stream.
one or more data modelling techniques such as Dimensional, Data Vault, Kimball, Inmon, etc., Agile methodology.Role Responsibilities
Supports review of controls to ensure sufficient protection of enterprise data.
Advises and makes custom configuration changes in one to two tools to generate a product at the business or customer request.
Updates logical or physical data models based on new use cases.
Frequently uses SQL and understands NoSQL databases and their niche in the marketplace.
Adds to team culture of diversity, opportunity, inclusion, and respect.
Develop enterprise data models, Design/ develop/ maintain large-scale data processing pipelines (and infrastructure), Lead code reviews and provide mentoring thru the process, Drive data quality, Ensure data accessibility (to analysts and data scientists), Ensure compliance with data governance requirements, and Ensure business alignment (ensure data engineering practices align with business goals).
Supports review of controls to ensure sufficient protection of enterprise data