Rithum™ is the world’s most trusted commerce network, accelerating how brands, suppliers, and retailers work together to deliver seamless e-commerce experiences. As a Senior Database Reliability Engineer, you will manage database systems, enhance observability, and lead projects while fostering a culture of collaboration and continuous learning within the team.
Responsibilities:
- Ensure maximum availability and reliability of mission-critical database systems across hybrid infrastructure
- Design, implement, and maintain SQL Server Always-on Availability Groups, clustering, and replication topologies. Constantly improve the observability of all database systems
- Lead major database upgrade initiatives and modernization efforts. Support other engineers and teams in their use of database systems
- Continuously enhance observability using telemetry, performance analysis, and proactive monitoring
- Continuously enhance processes through automation. Automate operational workflows using PowerShell, Python, and CI/CD tooling
- Ensure all data is protected and secure
- Participate in our on-call rotation
- Troubleshoot and tune high-load production systems, including complex performance and replication issues
- Lead technical response during high-severity incidents and conduct root cause analysis
- Ensure database security, backup integrity, and disaster recovery readiness
- Contribute to the development of best practices for database engineering and reliability
- Collaborate cross-functionally to design scalable, resilient data architectures
- Mentor team members and contribute to engineering best practices
Requirements:
- 3+ years of hands-on experience managing database systems
- Basic understanding of multi-tenant, database-driven applications
- Familiarity with common data storage technologies and use cases
- Proficiency in a common high-level language such as PowerShell or Python
- Design, implement, and support SQL Server Always On Availability Groups, clustering, and transactional replication
- Lead and execute major upgrade initiatives (e.g., SQL Server version upgrades, compatibility level transitions)
- Own disaster recovery validation and failover strategy execution
- Investigate and resolve complex replication and distribution database failures
- 5+ years of hands-on experience administering production database systems at scale
- Bachelor's degree in a related field
- Experience with database high-availability technologies
- Experience with MongoDB (including replica sets and Atlas), DynamoDB, or other distributed NoSQL platforms
- Familiarity with cloud-native database architectures (AWS preferred)
- Exposure to data platform modernization initiatives (e.g., migration to newer engine versions, consolidation, cloud adoption)
- Knowledge of relational database design concepts in an OLTP environment
- Experience managing database systems in a cloud environment
- Strong problem-solving and analytical skills
- Excellent communication and teamwork abilities
- Commitment to continuous learning and professional development