New York Life is a Fortune 100 mutual company with a legacy of purpose and integrity. They are seeking a Senior Associate - Infrastructure Platform & Security Engineer to own the platform operating system standards and govern the image artifacts for hybrid environments, ensuring compliant and repeatable builds at scale.
Responsibilities:
- Research and download all patches for the Compute environment
- Test each of the patches to ensure that each patch resolves its intended vulnerability or issue
- Bundle the vendor patches and release them to the team for non-prod deployment; be available to resolve issues before and during and after production release
- If a critical patch is released from a vendor during or in between patch cycles, immediately research the vulnerability, test the patch and prepare it for an out of band patch cycle if necessary
- Define and maintain cross‑platform OS standards for Linux and Windows (configuration baselines, hardening, packages, services, logging, time sync, and required agents)
- Engineer hardened/certified image artifacts: install/base images, on‑prem VM templates, AWS AMIs for EC2, node images, and container base images
- Coordinate certification and security sign‑off for image releases (CIS-aligned hardening, approved crypto settings, certificates, and required controls)
- Maintain image versioning, release notes, and lifecycle (deprecation, end-of-support posture, and upgrade paths) with clear consumer guidance
- Ensure that engineering, design, server build, configuration and other related documentation is present and up to date and easily retrievable
- Own and evolve Terraform modules that implement the standard “golden path” for provisioning compliant OS platforms across environments
- Design modules to be reusable, opinionated, and safe-by-default (networking hooks, identity integrations, logging/monitoring, secrets handling, tagging/metadata)
- Enable Git-based workflows and CI/CD for module promotion and consumption at scale (testing, validation, approvals, and rollback patterns)
- Implement and operate guardrails/enforcement to prevent drift from OS standards (policy-as-code, validations, and automated compliance checks)
- Define and run the exception workflow: intake, risk assessment, approvals, time-bound waivers, tracking, and remediation plans
- Partner with Security, IAM, and Risk teams to ensure governance, auditability, and evidence collection for standards adoption
- Plan and execute rollout sequencing for new standards and image releases (pilot → early adopters → broad rollout), minimizing operational risk
- Operate production support for golden path platforms, including incident response, root cause analysis, and continuous improvements to reduce repeat issues
- Establish runbooks, operational procedures, and communications for consumers and platform operators
- Define and implement monitoring and dashboards for image/standard adoption, compliance status, and drift detection across Linux, Windows, EC2/AMI, and container bases
- Integrate telemetry with enterprise monitoring to provide proactive alerting and visibility for stakeholders and operations
- Partner with technology team to execute the standard golden path at scale, aligning on implementation patterns, operational handoffs, and support models
- Collaborate with application teams, cloud platform teams, and infrastructure engineering to onboard workloads to the golden path
- Provide technical leadership and mentorship, driving adoption through clear documentation, training, and stakeholder engagement
Requirements:
- 7+ years engineering and operating enterprise OS platforms across Linux and Windows in mission-critical, hybrid environments
- Proven expertise building and maintaining hardened/certified images (VM templates, EC2 AMIs, node images, container base images) and operating image build pipelines (e.g., Packer or equivalent)
- Strong Terraform skills (module design, versioning, testing, promotion) with ability to deliver opinionated 'golden path' modules for broad adoption; familiarity with Ansible and automation at scale
- Working knowledge of AWS compute patterns (EC2/AMI), IAM, logging/monitoring integrations, and tagging/metadata standards; exposure to Azure/Oracle Cloud and hybrid operations
- Experience implementing policy-as-code guardrails (validation, drift detection, compliance scanning) and running structured exception/waiver workflows
- Strong grounding in networking (TCP/IP, DNS, HTTP/S), storage (SAN/NAS/local/filesystems), HA/resiliency, and virtualization (VMware/UCS)
- Excellent incident/change discipline, clear communication to technical and non-technical stakeholders, and ability to partner with ETS and cross-functional teams to execute standards at scale