Netflix is a leading entertainment company that pushes the boundaries of storytelling and technology. They are seeking a Software Engineer L5 to lead the automation and validation of their edge platform, ensuring the reliability and performance of their content delivery systems.

Responsibilities:

Own and build scalable testing infrastructure and end-to-end automated validation for edge appliances, covering functional, resiliency, performance, and upgrade and rollback testing, with high reliability, strong observability, and clear release gates, including tests that validate the platform can meet scaling and performance requirements under production-like workloads
Improve failure triage with AI-assisted tooling that reduces time-to-detection and time-to-resolution
Lead and mentor engineers building and maintaining test automation and release qualification
Partner with OS, security, hardware, and application teams to ensure validation keeps pace with rapid product development
Debug complex regressions across hardware/firmware/OS boundaries and collaborate cross functionally to drive fixes to resolution
Build dashboards and alerting for regression detection, performance drift, and release readiness

Requirements:

10+ years software engineering experience (or equivalent depth), including ownership of CI/CD systems and architecting large scale test automation
Strong coding ability in Python, Rust and or Go, with comfort writing shell scripts
Deep hands-on experience with Linux and/or FreeBSD in systems contexts (boot, networking and storage)
Strong ability to design, build, and operate cloud services that support CI/CD and test automation, including maintaining service reliability, scalability, observability, and cost efficiency
Experience designing automated test frameworks for reliability, performance, hardware-in-loop, integration testing
Proven ability to provide technical leadership across teams through setting standards, mentoring, and owning roadmaps
Experience with modern CI systems and build and release pipelines such as GitHub Actions, Jenkins or similar tools
Strong debugging skills across distributed systems and low-level systems boundaries using logs, metrics, tracing, and performance tooling
Proficiency working on highly distributed systems
Using AI tools for operational triage (log clustering, anomaly detection), with a pragmatic approach (guardrails, fallback paths, auditability)
Performance tooling: perf, flamegraphs, bpftrace/eBPF, dtrace (FreeBSD), fio, network benchmarking
Contribute to and collaborate with relevant open-source communities
Experience with hardware lab automation and fleet provisioning workflows such as PXE boot, imaging, remote power control, serial console access, and rack automation
Experience with incident response practices including postmortems, root cause analysis, and driving preventative engineering actions
Experience validating BIOS and firmware behavior, managing firmware rollouts, and working with hardware vendors on platform issues

Software Engineer L5 - Edge Platform Automation & Validation Technical Lead

Key skills

About this role

Responsibilities:

Requirements: