Veritis Group Inc is seeking a Site Reliability Engineer (SRE) Developer. The role involves ensuring system reliability and stability through monitoring and supporting batch jobs, collaborating with teams, and participating in incident management processes.
Responsibilities:
- Monitor batch flow to ensure system reliability and stability
- Handle batch production incidents and escalations promptly
- Create and support batch plans for both planned and unplanned outages
- Improve alert quality and reduce noise in monitoring systems
- Provide support for batch jobs in a 24x7 shift model
- Collaborate with onshore and offshore teams to ensure effective communication and coordination
- Participate in incident, problem, and change management processes
- Conduct root cause analysis (RCA) and post-incident reviews
- Support production release and change validation efforts
Requirements:
- Minimum of 8-10 years of proven experience with a strong focus on availability, reliability, and performance
- Minimum of 5 years of experience with Unix commands and shell scripting
- At least 5 years of experience working with Informatica, including the ability to create mappings
- Minimum of 3 years of experience with proficiency in BigQuery, Cloud Spanner, Airflow, and monitoring & logging tools
- Experience in query writing with MS SQL
- Knowledge of stored procedures and batch job support with PL/SQL
- Proficient in query writing with Snowflake
- Experience with at least one scheduling tool (e.g., Control-M, Tidal)
- Familiarity with incident, problem, and change management processes
- Utilize automation and generative AI to minimize manual operational efforts
- Develop scripts and dashboards for monitoring, alert analysis, and reporting