Role Overview
Key Responsibilities
- Production Reliability & Guardrails: Partner with the Platform Engineering team to implement reliability guardrails, ensuring applications running on AWS meet strict uptime and SLA requirements.
- CI/CD & Repository Management: Own the deployment pipelines and code management practices extensively via GitHub.
- Incident Management: Lead rapid-response troubleshooting during production incidents; conduct thorough blameless post-mortems to continuously harden our systems.
- Observability & Performance: Implement advanced monitoring, logging, and alerting systems to proactively detect and mitigate system anomalies.
- Cross-Border Collaboration: Act as a key technical bridge between our US operations and international engineering hubs, leveraging bilingual communication to streamline complex technical alignment.
Requirements
1. Technical Focus
- Ecosystem Expertise (Must-Haves): Deep, practical experience managing application deployment and runtime environments on AWS, alongside master-level knowledge of advanced Git workflows and actions on GitHub.
- Core Toolkit: Strong proficiency in monitoring tools, log management, and scripting for quick triaging and troubleshooting.
2. Soft Skills & Characteristics
- Ownership & Transparency: You are radically open, highly responsive, and communicative. You don't just clear tickets; you own the production environment's health end-to-end.
- Pressure-Resistance: High psychological resilience. You maintain a happy, positive attitude during smooth operations, yet feel a healthy, driving sense of urgency and laser-focus during high-stakes incidents.
- Bilingual Capability: Absolute fluency in Mandarin and English (verbal and written) is mandatory for effective technical alignment across our global teams.
Tech Stack
Benefits
- Competitive base salary + equity packages aligned with California market standards.