Participate in core production management functions including incident, change, and problem management—working closely with cross‑functional teams for metrics-based tracking, planning, risk mitigation, and certification support for top-tier clients.
Support Kafka clusters, topics, consumer groups, and message flows to ensure high availability and consistent data processing across production systems.
Oversee and troubleshoot Chronos job scheduling workflows, ensuring SLA adherence, timely batch execution, and automated recovery handling for failed tasks.
Implement plan reviews and submit changes that improve platform resiliency, change hygiene, and operational success, particularly for Kafka and Chronos‑driven workloads.
Collaborate with production assurance teams to maintain production run books, change registers, automations, improvements, and conduct detailed impact/root-cause analysis for issues involving data streaming or scheduling components.
Provide production support for innovative, high‑quality, large-scale financial solutions, working alongside industry-leading engineering and operations talent.
Perform proactive monitoring across your environment using tools like Splunk, Dynatrace, Grafana, Moogsoft—including alerts and dashboards to track Kafka lag, consumer health, broker performance, and Chronos job status.
Requirements
5+ years of experience working in an enterprise IT environment.
4+ years of experience in Unix and Windows, scheduling tools.
4+ years of experience with monitoring tools like Splunk, Dynatrace, Grafana and Moog Soft.
4+ years of experience with incidents, change and problem management and ticketing tools like Service Now, Remedy, CA Service Desk.
3+ years of experience in Kafka, and Chronos.
3+ years of experience in production management, application/platform monitoring, on call schedule.
Bachelor's degree in a related field or an equivalent combination of education, military, and work experience.
Knowledge of ITIL frameworks and best practices.
Experience with automation and scripting languages.
Good analytical and troubleshooting skills.
Experience with ARO, PCF, AWS, Azure, Google Cloud.
Experience using versioning control tools like MS Visual Source Safe (VSS), Git, and Bitbucket.
Tech Stack
AWS
Azure
Cloud
Grafana
Kafka
Splunk
Unix
Benefits
Fuel Your Life program to support your physical, financial, social, and emotional well-being.
Paid holidays and generous time away policies.
No-cost mental health support through Employee Assistance Programs.
Living Proof program to recognize your peers’ extra effort with points redeemable for rewards.
Eight Employee Resource Groups to foster a collaborative culture and expand your network.
Unparalleled professional growth with training, development, and internal mobility opportunities.
Medical, dental, vision, life, and disability insurance options available from day one.
Retirement planning and discounted shares with the Employee Stock Purchase Plan.