Role: Platform Engineer
Location: Berkley Heights, NJ
BNPL Domain is a must
I think candidates should have hands-on experience with AWS, Kubernetes, Terraform, core cloud networking, and familiarity with modern AI-enabled engineering tools. These capabilities are critical to building and efficiently managing the platform.
The main responsibilities for this role include:
- Designing, building, and operating scalable platform infrastructure on AWS.
- Managing and supporting Amazon EKS clusters across production and non-production environments.
- Administering and optimizing AWS services such as Dynamo DB, Amazon RDS, Cloud Watch, Route 53, AWS Secrets Manager, and AWS Certificate Manager.
- Configuring Kubernetes components, including ingress controllers, namespaces, networking, and workload deployments.
- Implementing and maintaining ingress patterns using Kubernetes Ingress Controller and AWS ALB.
- Managing DNS, hosted zones, and routing traffic with Route 53.
- Handling secrets, credentials, and secure application configuration using AWS Secrets Manager.
- Provisioning, renewing, and managing TLS/SSL certificates via AWS Certificate Manager for secure application and ingress communication.
- Developing, maintaining, and improving Infrastructure as Code with Terraform and Terraform Enterprise.
- Supporting AWS networking components, including VPCs, subnets, route tables, internet gateways, NAT gateways, and security groups.
- Configuring and troubleshooting network connectivity between applications, clusters, databases, and external endpoints using tools like ping, curl, telnet, traceroute, lookup, dig, and port-level validation.
- Reviewing and maintaining network access controls, security group rules, and routing paths to ensure secure and reliable communication.
- Troubleshooting platform, infrastructure, DNS, certificate, secret management, load balancing, and network-related issues across environments.
- Monitoring platform health, creating dashboards, defining alerts, and improving observability using Cloud Watch.
- Partnering with development and security teams to enhance CI/CD, platform reliability, and compliance.
- Supporting incident response, root cause analysis, and operational readiness activities.
- Driving automation for platform operations, environment creation, upgrades, patching, and recovery procedures.
- Leveraging AI-assisted engineering tools for scripting, automation, troubleshooting, documentation, and operational efficiency.
- Evaluating opportunities for AI-powered tooling to improve platform support workflows, knowledge management, and incident response.
- Maintaining technical documentation, architecture diagrams, run books, and operational standards.