
Role – SRE
Location – Montreal, Canada hybrid 3 days onsite
Duration – Long Term
Primary Responsibilities:
• Handle production management services including end user support, systems monitoring, incident management and
problem management, plant management and event management.
• Build extensive business and application knowledge required for supporting client facing applications.
• Diagnose and resolve application issues to ensure optimal performance and usability.
• Identify and implement automation to reduce toil, improve efficiency and eliminate customer impact.
• Provide root cause analysis with recommendations for improvements.
• Configure application monitors using industry standard monitoring tools, as well as developing customized
monitoring solutions.
• Interface with clients and other technology teams to provide governance and control around the production
environment.
• Manage / Drive outage calls and significant incidents; coordinate communications Financial controllers and IT groups.
• Act as a primary escalation/communication point between Application development teams and Business Units.
Required Skills:
• Strong coding/scripting skills : Python / Perl / Shell (Any Two)
• Strong knowledge of scheduling tools like autosys.
• Deep understating of Database Concepts, SQL Queries and Database Performance
• Strong debugging and problem solving skills.
• Good understanding of Infrastructure ( Windows/Unix), Networking concepts and protocols
• Exposure to highly Distributed, High availability, Fault tolerant Systems.
Nice to Have:
• GIT / Jenkins / Ansible / Grafana
• Splunk / Promethe Loki
• Agile methodologies (Scrum, Kanban)
• Familiarity with Java and web/client-server applications
• Familiarity with the the principles of System/Site Reliability Engineering (SRE).
• Familiarity with enterprise tools such as AppDynamics, Grafana, Splunk, OTEL
Diksha Yadav
InfiCare Staffing | 22375 Broderick Drive #225 Dulles, VA 20166
Direct: |