Nash is building the logistics infrastructure for the internet, and they are seeking a Staff Infrastructure and Performance Engineer to enhance their core infrastructure's performance, reliability, and scalability. The role involves designing low-latency systems and leading performance engineering efforts for business-critical workflows for major retailers.

Responsibilities:

Own infrastructure performance and reliability across Nash’s production systems, with a focus on low latency, high throughput, and predictable behavior under load
Design, build, and optimize AWS-based infrastructure, leveraging managed services with a strong emphasis on ECS/Fargate
Lead Postgres performance engineering, including query optimization, indexing strategies, connection management, replication, cluster design, and failover
Architect and operate multi-region, highly availability systems with strong resiliency, disaster recovery, and failover guarantees
Design and evolve enterprise-grade CI/CD pipelines that support safe, repeatable, and fast deployments across environments and regions
Drive observability standards (metrics, logs, tracing, SLOs) and use data to proactively identify and eliminate performance bottlenecks
Partner with application engineers to influence system design decisions that impact scalability, latency, and reliability
Lead incident response and postmortems, focusing on root cause analysis, systemic fixes, and long-term resilience
Set infrastructure and performance best practices and mentor engineers across the organization

Requirements:

6+ years of experience building and operating high-scale, production infrastructure for business-critical systems
Deep expertise in AWS, including networking, compute, storage, and managed services
Hands-on experience running production workloads on ECS/Fargate at scale
Strong background in Postgres, including performance tuning, replication, high availability, and operational excellence
Proven experience designing and operating multi-region architectures with strict uptime and reliability requirements
Strong understanding of CI/CD for enterprise deployments, including rollout strategies, environment isolation, and rollback safety
Experience building low-latency systems where milliseconds matter
Excellent debugging and systems-level problem-solving skills
Ability to operate autonomously and lead technical initiatives in a fast-paced startup environment

Staff Infrastructure and Performance Engineer

Key skills

About this role

Responsibilities:

Requirements: