Serve as an integral team member of our Data Engineering team, responsible for the design and development of Big Data solutions
Partner with domain experts, product managers, analysts, and data scientists to develop robust Big Data pipelines in Hadoop or Snowflake environments
Responsible for delivering a data-as-a-service framework
Responsible for moving all legacy workloads to cloud platform
Lead the migration of all legacy workloads to cloud platforms
Engage with key stakeholders to elicit and document requirements, including detailed data flow specifications
Assess appropriate solutions and collaborate with relevant teams to drive optimal implementations
Work with data scientists to build client pipelines using heterogeneous sources and provide essential engineering services for data science applications
Research and evaluate open-source technologies and components, recommending and integrating them into design and implementation efforts
Act as a technical expert, mentoring other team members on Big Data and Cloud technology stacks
Define comprehensive requirements for maintainability, testability, performance, security, quality, and usability across the data platform
Drive the implementation of consistent patterns, reusable components, and coding standards for all data engineering processes
Convert SAS-based pipelines into modern languages like PySpark and Scala for execution on Hadoop and non-Hadoop ecosystems
Optimize Big Data applications on both Hadoop and non-Hadoop platforms for peak performance
Evaluate new IT developments and evolving business requirements, recommending appropriate system alternatives and/or enhancements to current systems through analysis of business processes, systems, and industry standards
Appropriately assess risk when making business decisions, demonstrating consideration for the firm's reputation and safeguarding Citigroup, its clients, and assets.
Requirements
5+ years of experience with Hadoop and Big Data technologies
Demonstrated proficiency in Python, PySpark, and Scala, including practical experience with fundamental machine learning libraries
Experience in developing robust data solutions leveraging Google Cloud or AWS platforms; relevant certifications are preferred
Experience with SAS
Experience with containerization and related technologies (e.g., Docker, Kubernetes)
Comprehensive understanding of software engineering and data analytics
In-depth knowledge and hands-on experience with the Hadoop ecosystem and Big Data technologies (e.g., HDFS, MapReduce, Hive, Pig, Impala, Kafka, Kudu, Solr)
Knowledge of Agile (Scrum) development methodologies
Strong development and automation skills
System-level understanding of data structures, algorithms, distributed storage, and compute
A proactive approach to solving complex business problems, complemented by strong interpersonal and teamwork skills.
Tech Stack
AWS
Cloud
Docker
Hadoop
HDFS
Kafka
Kubernetes
MapReduce
PySpark
Python
Scala
Benefits
medical, dental & vision coverage
401(k)
life, accident, and disability insurance
wellness programs
paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays