Nash is building the logistics infrastructure for the internet, and they are seeking a Staff Infrastructure and Performance Engineer to enhance their core infrastructure's performance, reliability, and scalability. The role involves designing low-latency systems and leading performance engineering efforts for business-critical workflows for major retailers.
Responsibilities:
- Own infrastructure performance and reliability across Nash’s production systems, with a focus on low latency, high throughput, and predictable behavior under load
- Design, build, and optimize AWS-based infrastructure, leveraging managed services with a strong emphasis on ECS/Fargate
- Lead Postgres performance engineering, including query optimization, indexing strategies, connection management, replication, cluster design, and failover
- Architect and operate multi-region, highly availability systems with strong resiliency, disaster recovery, and failover guarantees
- Design and evolve enterprise-grade CI/CD pipelines that support safe, repeatable, and fast deployments across environments and regions
- Drive observability standards (metrics, logs, tracing, SLOs) and use data to proactively identify and eliminate performance bottlenecks
- Partner with application engineers to influence system design decisions that impact scalability, latency, and reliability
- Lead incident response and postmortems, focusing on root cause analysis, systemic fixes, and long-term resilience
- Set infrastructure and performance best practices and mentor engineers across the organization
Requirements:
- 6+ years of experience building and operating high-scale, production infrastructure for business-critical systems
- Deep expertise in AWS, including networking, compute, storage, and managed services
- Hands-on experience running production workloads on ECS/Fargate at scale
- Strong background in Postgres, including performance tuning, replication, high availability, and operational excellence
- Proven experience designing and operating multi-region architectures with strict uptime and reliability requirements
- Strong understanding of CI/CD for enterprise deployments, including rollout strategies, environment isolation, and rollback safety
- Experience building low-latency systems where milliseconds matter
- Excellent debugging and systems-level problem-solving skills
- Ability to operate autonomously and lead technical initiatives in a fast-paced startup environment