PeopleFinders.com is a premier online service for locating and verifying people and businesses. They are seeking a highly skilled Site Reliability Engineer (SRE) to join their infrastructure team, focusing on maintaining and optimizing cloud infrastructure and ensuring platform performance and security.

Responsibilities:

Architect, maintain, and troubleshoot AWS cloud infrastructure, including VPCs, subnets, routing, security groups, and other cloud‑native networking components
Manage and optimize Kubernetes environments running Dockerized applications, ensuring reliability, performance, and scalability
Build, maintain, and improve GitLab CI/CD pipelines for automated testing, deployments, and infrastructure workflows
Configure and manage CDNs—particularly Cloudflare—and implement advanced bot mitigation and anti‑scraping controls
Implement cloud-focused monitoring, alerting, and observability using modern tooling (e.g., Prometheus, Grafana, CloudWatch)
Collaborate with engineering teams to support deployments, diagnose issues, and enhance production readiness
Improve infrastructure reliability and operability through automation, GitOps practices, and infrastructure-as-code
Investigate production incidents with urgency and deliver corrective actions under tight timelines
Work across systems, cloud networking, CI/CD, and security in a fast-moving, small-team environment
Own complex infrastructure projects end-to-end with minimal oversight

Requirements:

4+ years experience as an SRE, DevOps Engineer, or Systems Engineer
Strong hands-on experience with Linux (Ubuntu/CentOS) and Windows Server administration
Proficiency with Docker and containerized application architectures
Deep experience building and maintaining GitLab CI/CD pipelines
Solid understanding of CDNs (Cloudflare, Fastly, Akamai, etc.) and techniques for blocking or mitigating scraping/bot traffic
Strong networking fundamentals (DNS, TLS, routing, firewalls, load balancing)
Familiarity with infrastructure security and best practices for securing public-facing systems
Experience with monitoring and logging tools (BetterStack, DataDog, Prometheus, Grafana, ELK, etc.)
Ability to work in a small team, adapt quickly, and manage multiple priorities
Strong sense of ownership and the ability to deliver results under tight timelines
Experience with Kubernetes or orchestration platforms
Knowledge of IaC tools such as Terraform, Ansible, or Pulumi
Scripting/programming in Bash, Python, or Go
Experience with WAFs, bot mitigation systems, and rate-limiting strategies
Familiarity with Zero Trust concepts or modern access-management systems
Hands-on experience with Cloudflare, including firewall rules, bot management, workers, and advanced caching strategies
Experience securing and protecting high-value web applications from scraping, automated attacks, and data-extraction threats
Demonstrated success working in a fully remote or distributed engineering team
Background supporting organizations with fully cloud-managed infrastructure
Excellent communication and cross-team collaboration skills
Ability to work independently and make sound technical decisions
Strong problem-solving abilities and eagerness to dive into complex issues
Flexibility to pivot technologies or priorities as business needs evolve

Site Reliability Engineer (SRE)

Key skills

About this role

Responsibilities:

Requirements: