Kforce Inc is supporting a client seeking a Network Site Reliability Engineering contractor to enhance automation and observability within its global Network Engineering organization. The role focuses on applying SRE principles to networking to improve reliability and scalability while reducing operational toil.
Responsibilities:
- Design and implement network automation to reduce manual operations and improve reliability
- Enhance observability across monitoring, alerting, logging, and telemetry
- Build and maintain automated operations platforms and self healing workflows
- Leverage AI tools and agentic workflows to optimize network operations
- Design and maintain Model Context Protocol (MCP) servers to expose network APIs, inventory, and state data to LLMs
- Assist with M&A Integration Activities
- Collaborate with cross functional engineering and operations teams
Requirements:
- Strong proficiency in Python
- Experience with AI tools, MCP, and RAG architectures
- CI/CD pipelines
- Infrastructure as Code (IaC)
- SQL
- Experience building scalable, automated systems
- Elasticsearch, Kafka
- Grafana, Prometheus
- Network automation and observability platforms (e.g., SolarWinds)
- ServiceNow integrations
- Agentic AI and event driven/self healing system design
- Enterprise networking background (switches, firewalls, Wi Fi)