Deep Systems LLC develops technology that supports major financial institutions in trading and money movement across global markets. The Senior DevOps Engineer will focus on building and maintaining infrastructure to ensure high performance and reliability of systems that handle billions of messages daily.
Responsibilities:
- Comfortable working from and improving runbooks
- Expanding the knowledge base and documenting tribal knowledge
- Balance speed and caution—know when to apply quick fixes versus when issues require deeper investigation before action
- Willingness to learn the ever-expanding breadth of services provided by Deep Systems
- Promptly respond to alerts and escalate based on rules and earned experience
- Curious mindset; uncomfortable relying on untested assumptions
- Transaction-level thinking: able to find specific customer activity, not just system aggregates
- Ability to quickly determine failure domain (ours, customer's, or external) with evidence, and communicate this clearly to affected customers
- Communicate confidently during uncertainty—declare what's known and unknown without waiting for full diagnosis, avoiding unnecessary or incomplete information that could confuse the situation
- Ability to interact with emotionally charged customers to identify key concerns and respond calmly while working to resolve their issue
- Clear, unemotional communication and behavior when under pressure
- Comfortable in a highly asynchronous environment
- Represent the status of work clearly and ask for help when stuck
- Clearly state when things are opinion or preference and why
- Run Level 1(routine) Incidents/Requests
- Run incidents and bring people in, and manage the incident unless prudent to handoff
- Escalate as needed and apply judgment beyond "based on rules"
- Help evolve the incident process
- Proactive monitoring
- Postmortem participation
- Handling customer support calls and emails
Requirements:
- Comfortable working from and improving runbooks
- Expanding the knowledge base and documenting tribal knowledge
- Balance speed and caution—know when to apply quick fixes versus when issues require deeper investigation before action
- Willingness to learn the ever-expanding breadth of services provided by Deep Systems
- Promptly respond to alerts and escalate based on rules and earned experience
- Curious mindset; uncomfortable relying on untested assumptions
- Transaction-level thinking: able to find specific customer activity, not just system aggregates
- Ability to quickly determine failure domain (ours, customer's, or external) with evidence, and communicate this clearly to affected customers
- Communicate confidently during uncertainty—declare what's known and unknown without waiting for full diagnosis, avoiding unnecessary or incomplete information that could confuse the situation
- Ability to interact with emotionally charged customers to identify key concerns and respond calmly while working to resolve their issue
- Clear, unemotional communication and behavior when under pressure
- Comfortable in a highly asynchronous environment
- Represent the status of work clearly and ask for help when stuck
- Clearly state when things are opinion or preference and why
- English spoken and written
- Linux Environment
- Working safely on a production system
- Shell
- Navigating the filesystem
- Shell scripting as needed
- SSH
- Strong in searching and organizing information from log files
- Grep, awk, sort, uniq, etc
- Working with compressed files
- Working without needing to pull down locally
- Querying structured logs to answer customer-specific questions fast
- Creating utility scripts to streamline ones work as needed
- Understanding of how the underlying hardware is represented in the OS
- systemctl
- Ansible
- Crontab
- sudo
- Software Development
- Python
- Version 3
- Familiarity with Python's C API (for debugging or extending)
- C
- Comfortable reading C code for troubleshooting purposes
- General Development
- Understanding of multithreading in long-running applications
- Postgres
- Connectivity testing
- Dumping and loading data
- Schema design
- Query optimization
- Experience with message buses or middleware communication
- Git & CI/CD
- Proficient with Git as source control
- Experience with CI/CD pipelines
- Windows Desktop
- Supporting and debugging our Windows-based custom application installations
- Basic Windows troubleshooting skills
- Networking
- Able to test connectivity
- Able to identify DNS functionality
- VPN
- Observability Tools
- Alert management systems
- Logging systems
- Metrics and trends analysis
- An interest in building a bespoke observability platform
- Run Level 1(routine) Incidents/Requests
- Run incidents and bring people in, and manage the incident unless prudent to handoff
- Escalate as needed and apply judgment beyond 'based on rules'
- Help evolve the incident process
- Proactive monitoring
- Postmortem participation
- Handling customer support calls and emails
- Capital markets experience preferred
- Experience at the intersection of internal systems with external systems in critical business paths
- Familiarity with B2B customer support in finance
- FIX protocol knowledge a plus but not required
- GitLab experience preferred but not required
- ELK preferred but not required