Ryzlink Corporation is seeking a Big Data Engineer to design, develop, and maintain scalable big data pipelines and data processing systems. The role involves working with large-scale data using Hadoop ecosystem tools and developing distributed data processing applications using Apache Spark and Scala.
Responsibilities:
- Design, develop, and maintain scalable big data pipelines and data processing systems
- Work with large-scale data using Hadoop ecosystem tools such as Hive, Pig, and Oozie
- Develop distributed data processing applications using Apache Spark and Scala
- Build and optimize ETL pipelines for structured and unstructured data
- Work with RDD APIs in Spark for large-scale data transformations and analytics
- Implement data streaming solutions using Kafka for real-time data processing
- Collaborate with cross-functional teams including data analysts, architects, and business stakeholders
- Deploy and manage big data solutions on AWS or GCP cloud platforms
- Ensure performance optimization, scalability, and reliability of data systems
- Maintain documentation, follow best practices, and ensure high-quality code standards
Requirements:
- 10+ years of experience in Big Data and Data Engineering
- Strong hands-on experience with Hadoop ecosystem (Hive, Pig, Oozie)
- Expertise in Apache Spark and Scala programming
- Hands-on programming experience in Python and Core Java
- Strong understanding of Spark RDD APIs and distributed computing
- Experience with Kafka or other streaming frameworks
- Solid understanding of Data Warehousing concepts and Data Modeling techniques
- Strong proficiency in SQL and working with large datasets
- Experience working in Linux/Unix environments
- Experience with AWS or GCP cloud platforms
- Knowledge of ETL design and development
- Experience with data visualization and analytics tools such as Tableau or R
- Experience with real-time data processing architectures
- Exposure to CI/CD pipelines and DevOps practices
- Experience with performance tuning and optimization in Spark/Hadoop environments