WebstaurantStore is a leading online source for restaurant equipment and supplies. They are seeking a Senior Database Reliability Engineer to ensure the resilience and performance of enterprise database systems, focusing on data availability, automation, and security.
Responsibilities:
- Protect, preserve, and ensure the continuous availability of data
- Measure, analyze, and fine-tune database performance, ensuring optimal configuration and resource utilization
- Document and automate routine database tasks to reduce manual effort and improve operational efficiency
- Build, test, and maintain internal tools that empower application and database developers to manage their own data needs
- Gain proficiency in both in-house developed solutions and third-party tools used for database monitoring, management, and automation
- Assume responsibility for implementing and maintaining database security, backup strategies, and routine maintenance procedures
- Join the on-call rotation to respond to incidents, contribute to root cause analysis, and implement long-term fixes to prevent recurrence
- Implement new SQL Server features to improve performance, reliability, and scalability
- Respond to and resolve database performance incidents
- Identify project opportunities and lead them through implementation
- Provide mentorship and technical guidance to less experienced team members
Requirements:
- 5+ years of experience supporting Microsoft SQL Server in production environments
- Proven experience designing, configuring, and supporting Always On Availability Groups in production, including patching, upgrades, and failover testing
- Strong command of relational database theory and SQL Server internals (transactions/isolation, indexing & statistics, query plans, etc.)
- Experience creating runbooks and documentation to make work repeatable
- Ability to lead complex incident triage and drive post-incident improvements for database-related issues
- Possess a strong attention to detail
- Access to a reliable and secure high-speed internet connection. Cable or fiber internet connections (at least 75mbps download/10mbps upload) are preferred, as satellite connections often cannot support the technologies used to perform day-to-day tasks
- Access to a home router and modem
- A dedicated home office space that is noise- and distraction-free. The space should have strong wireless connection or a wired Ethernet connection (wired connection is preferred, if possible)
- A valid, physical address (apartment, suite, etc.). PO Boxes are not supported, as a physical address is required for you to receive your computer equipment
- The desire and ability to work and communicate with other team members via chat, webcam, etc
- Legal residents of one of the following states: (AK, AL, AR, AZ, CT, DE, FL, GA, IA, ID, IN, KS, KY, LA, MD, ME, MI, MN, MO, MS, NC, ND, NH, NM, NV, OH, OK, PA, SC, SD, TN, TX, UT, VA, VT, WI, WV, or WY). H-1B Visa Sponsorship Not Available, W2 only
- Proficient with PowerShell for managing infrastructure, automating routine tasks, and developing custom scripts to support reliability engineering
- Hands-on experience with PostgreSQL and/or MariaDB for common admin tasks (users/roles, backups/restores, minor version upgrades) and query analysis (EXPLAIN/ANALYZE basics, index tuning)
- Skilled in Git workflows, including pull requests, conflict resolution, and CI/CD integration
- Proficient with operating scheduling/orchestration tools for database and system tasks, including monitoring, alerting, and failure handling, to streamline operations and reduce human intervention
- Strong experience with Distributed Availability Groups and additional HA/DR technologies (failover clustering, log shipping, replication), including conducting regular DR exercises
- Advanced T-SQL skills including designing schemas, writing and optimizing complex queries, interpreting execution plans, and resolving issues like parameter sniffing, blocking, and deadlocking
- Defined and advanced security and compliance strategies for databases, including least privilege, auditing, encryption in transit/at rest, and enterprise-grade secrets management to meet organizational and regulatory requirements
- Working knowledge of at least one general-purpose language (e.g., Python, C#, Go) for tooling and automation
- Experience in implementing and refining database observability and alerting strategies to support reliability improvements and ensuring adherence to service-level objectives
- Ability to perform capacity planning and cost/performance optimization across on-prem and/or managed services