KPG99 INC is seeking a Data Pipeline Reliability Engineer to ensure the stability and reliability of mission-critical workflows. The role involves diagnosing and resolving issues, collaborating with stakeholders to improve workflows, and creating documentation to share best practices across the organization.
Responsibilities:
- Develop a deep understanding of products and operational processes
- Go on-call, responding quickly and effectively to mission-critical incidents
- Diagnose, resolve, and proactively prevent issues encountered in the field
- Collaborate with internal stakeholders to increase the scalability and reliability of Foundry workflows for our customers
- Identify recurring pain points and inefficiencies, and take initiative to automate or streamline workflows
- Advocate for and implement product enhancements based on insights gleamed from the field
- Create clear, actionable documentation and share best practices to elevate team and company-wide reliability
Requirements:
- Proficiency in Python, Java, and SQL
- Familiarity with parallel data processing and Spark job optimization
- Strong organizational skills and attention to detail, with the ability to prioritize effectively
- Resourcefulness and creativity in fast-paced dynamic environments
- Ability to work independently and collaboratively to solve ambiguous technical and operational challenges
- Experience with root cause analysis and documenting solutions for broader impact
- Enthusiasm for hands-on problem solving, continuous improvement, and knowledge sharing
- Background in Computer Science, Engineering, Information Systems, or other technical field