Dimensional Fund Advisors is a global investment firm that applies advanced financial science to investment strategies. They are seeking a strategic, hands-on Manager of Site Reliability Engineering to lead a global team of SREs, ensuring the stability and performance of their data platforms while collaborating closely with engineering and other departments.
Responsibilities:
- Manage a global team of SREs, driving professional growth and operational excellence through coaching and mentorship
- Own our monitoring strategy and keep the team apprised of our service’s health and performance indicators through dashboarding and alerting
- Lead infrastructure capacity planning and headroom management for the team and its infrastructure to ensure we scale effectively
- Collaborate with product and engineering teams to negotiate and manage error budgets, SLOs and SLIs
- Drive the standardization of approaches, logging practices, and observability across the organization
- Develop a strategy for intuitively navigable documentation and oversee its implementation, ensuring all our existing and future products are sufficiently covered
- Act as the primary liaison between SRE, TPMs, DevOps, development teams and business stakeholders
- Negotiate investments in our solutions to enhance their supportability
- Relentlessly pursue opportunities to eradicate toil through automation
- Build confidence in deployments through enhanced data quality assurance processes, consistently coordinated deployments and automated testing
- Lead the debugging, troubleshooting, diagnosing, and resolving incidents, ensuring rapid response and effective post-mortems
Requirements:
- Deep expertise in ELK, Prometheus, and Grafana
- Proficiency in Python-based service development, Linux administration, and CI/CD
- Experience with data flows using Airflow, dbt and Snowflake
- Capability to write and run automated tests
- Experience running software projects from ideation through design, implementation, deployment and operations
- Demonstrated ability to be self-organized and self-driven with strong communication skills to influence cross-functional partners at all levels