PeopleFinders.com is a premier online service for locating and verifying people and businesses. They are seeking a highly skilled Site Reliability Engineer (SRE) to join their infrastructure team, focusing on maintaining and optimizing cloud infrastructure and ensuring platform performance and security.
Responsibilities:
- Architect, maintain, and troubleshoot AWS cloud infrastructure, including VPCs, subnets, routing, security groups, and other cloud‑native networking components
- Manage and optimize Kubernetes environments running Dockerized applications, ensuring reliability, performance, and scalability
- Build, maintain, and improve GitLab CI/CD pipelines for automated testing, deployments, and infrastructure workflows
- Configure and manage CDNs—particularly Cloudflare—and implement advanced bot mitigation and anti‑scraping controls
- Implement cloud-focused monitoring, alerting, and observability using modern tooling (e.g., Prometheus, Grafana, CloudWatch)
- Collaborate with engineering teams to support deployments, diagnose issues, and enhance production readiness
- Improve infrastructure reliability and operability through automation, GitOps practices, and infrastructure-as-code
- Investigate production incidents with urgency and deliver corrective actions under tight timelines
- Work across systems, cloud networking, CI/CD, and security in a fast-moving, small-team environment
- Own complex infrastructure projects end-to-end with minimal oversight
Requirements:
- 4+ years experience as an SRE, DevOps Engineer, or Systems Engineer
- Strong hands-on experience with Linux (Ubuntu/CentOS) and Windows Server administration
- Proficiency with Docker and containerized application architectures
- Deep experience building and maintaining GitLab CI/CD pipelines
- Solid understanding of CDNs (Cloudflare, Fastly, Akamai, etc.) and techniques for blocking or mitigating scraping/bot traffic
- Strong networking fundamentals (DNS, TLS, routing, firewalls, load balancing)
- Familiarity with infrastructure security and best practices for securing public-facing systems
- Experience with monitoring and logging tools (BetterStack, DataDog, Prometheus, Grafana, ELK, etc.)
- Ability to work in a small team, adapt quickly, and manage multiple priorities
- Strong sense of ownership and the ability to deliver results under tight timelines
- Experience with Kubernetes or orchestration platforms
- Knowledge of IaC tools such as Terraform, Ansible, or Pulumi
- Scripting/programming in Bash, Python, or Go
- Experience with WAFs, bot mitigation systems, and rate-limiting strategies
- Familiarity with Zero Trust concepts or modern access-management systems
- Hands-on experience with Cloudflare, including firewall rules, bot management, workers, and advanced caching strategies
- Experience securing and protecting high-value web applications from scraping, automated attacks, and data-extraction threats
- Demonstrated success working in a fully remote or distributed engineering team
- Background supporting organizations with fully cloud-managed infrastructure
- Excellent communication and cross-team collaboration skills
- Ability to work independently and make sound technical decisions
- Strong problem-solving abilities and eagerness to dive into complex issues
- Flexibility to pivot technologies or priorities as business needs evolve