Infinite Computer Solutions is seeking a Senior Distributed Database & Systems Engineer to design, architect, and implement distributed database solutions. The role involves leading the administration and optimization of PostgreSQL clusters and managing multi-region cloud database deployments while ensuring high availability and performance.
Responsibilities:
- Design, architect, and implement distributed database solutions using YugabyteDB (PostgreSQL- and Cassandra-compatible) to support high availability, scalability, and geo-distributed workloads
- Lead administration and optimization of PostgreSQL clusters, including replication, failover, backup/recovery, and performance tuning for large-scale production environments
- Evaluate and implement data distribution strategies such as sharding, partitioning, and replication to ensure consistency, fault tolerance, and low-latency access
- Architect and manage multi-region and hybrid cloud database deployments across platforms such as AWS, Azure, or GCP
- Analyze and resolve complex system and database performance issues by leveraging expertise across Linux/Unix systems and database internals
- Define and enforce database reliability standards, including Service Level Objectives (SLOs), monitoring, alerting, and incident response frameworks
- Lead database modernization initiatives, migrating legacy systems to cloud-native distributed platforms
- Collaborate with application and platform teams to design scalable data models and integration strategies aligned with microservices architecture
- Evaluate trade-offs between SQL (PostgreSQL) and NoSQL (Cassandra-style) approaches based on workload characteristics and business requirements
- Implement and maintain automated deployment, configuration management, and CI/CD pipelines for database infrastructure
- Drive capacity planning, performance benchmarking, and cost optimization for distributed database environments
- Ensure data security, compliance, and governance through encryption, access control, and auditing mechanisms
- Provide technical leadership and mentorship to engineering teams on distributed systems and database best practices
- Lead root cause analysis (RCA) and post-incident reviews to improve system resilience and prevent recurrence
- Support 24/7 production operations, ensuring adherence to SLAs and rapid resolution of critical issues
Requirements:
- Design, architect, and implement distributed database solutions using YugabyteDB (PostgreSQL- and Cassandra-compatible) to support high availability, scalability, and geo-distributed workloads
- Lead administration and optimization of PostgreSQL clusters, including replication, failover, backup/recovery, and performance tuning for large-scale production environments
- Evaluate and implement data distribution strategies such as sharding, partitioning, and replication to ensure consistency, fault tolerance, and low-latency access
- Architect and manage multi-region and hybrid cloud database deployments across platforms such as AWS, Azure, or GCP
- Analyze and resolve complex system and database performance issues by leveraging expertise across Linux/Unix systems and database internals
- Define and enforce database reliability standards, including Service Level Objectives (SLOs), monitoring, alerting, and incident response frameworks
- Lead database modernization initiatives, migrating legacy systems to cloud-native distributed platforms
- Collaborate with application and platform teams to design scalable data models and integration strategies aligned with microservices architecture
- Evaluate trade-offs between SQL (PostgreSQL) and NoSQL (Cassandra-style) approaches based on workload characteristics and business requirements
- Implement and maintain automated deployment, configuration management, and CI/CD pipelines for database infrastructure
- Drive capacity planning, performance benchmarking, and cost optimization for distributed database environments
- Ensure data security, compliance, and governance through encryption, access control, and auditing mechanisms
- Provide technical leadership and mentorship to engineering teams on distributed systems and database best practices
- Lead root cause analysis (RCA) and post-incident reviews to improve system resilience and prevent recurrence
- Support 24/7 production operations, ensuring adherence to SLAs and rapid resolution of critical issues