Monitor system performance data and receive alerts through monitoring tools, identify abnormal indicators, recognize system risks, and notify relevant colleagues to resolve issues.
Perform emergency system recovery, execute related contingency plans, and minimize business losses.
Address and resolve system issues and queries from local stakeholders.
Collaborate with team members in other regions to ensure the stability of JD's European e-commerce platform system.
Summarize system issues and solutions to continuously improve system stability.
Requirements
Bachelor's degree in Computer Science, Software Engineering, or a related field.
3+ years of experience in DevOps, site reliability engineering (SRE), system stability assurance, operations and maintenance development, or related fields.
Experience with common public cloud platform products (e.g., cloud hosting, cloud storage, object storage, CDN, etc.) and proficiency in containerization technologies such as Docker and Kubernetes.
Familiarity with Linux operating systems and common commands, with scripting languages, Shell, Python, or Go.
Knowledge of mainstream monitoring tools such as Prometheus, Grafana, and Zabbix.
Experience in Java microservices architecture development or operations, with expertise in Java memory tuning and performance optimization.
Experience with common middleware, including but not limited to MySQL, Kafka, ElasticSearch, and Redis.