InvoiceCloud is a fast-growing fintech leader recognized for its innovative solutions and commitment to customer service. The Associate Product Reliability Engineer will support production operations for InvoiceCloud’s Payment Service Network while building foundational technical skills and contributing to system reliability through automation and monitoring.
Responsibilities:
- Supports issue triage and debugging across production systems, using logs, metrics, and traces to identify symptoms and narrow hypotheses
- Writes clean, functional, and well-tested code (primarily .NET/C#) to deliver small reliability improvements, automation, and fixes with defined scope and guidance
- Assists in building and maintaining monitoring dashboards and alerting to improve visibility into platform health
- Participates in incident response activities and post-incident reviews, demonstrating attention to detail and follow-through
- Owns assigned incident tickets or operational work items through resolution, communicating progress, impact, and blockers clearly
- Documents recurring issues, troubleshooting steps, and runbooks so others can respond consistently and efficiently
- Partners with senior engineers and product support teams to reproduce issues, validate fixes, and confirm service restoration
- Follows InvoiceCloud’s development, security, and change-management standards, taking accountability for safe and reliable production outcomes
- Uses Git and standard branching/review practices to streamline collaboration and ensure operational changes are traceable
- Creates or improves automation scripts (PowerShell and/or Python) to reduce repetitive operational work and speed up diagnostics
- Learns to prioritize reliability work using impact and urgency (e.g., incident severity, customer impact, and SLO risk) while meeting sprint goals
- Helps improve incident response processes by identifying gaps (monitoring, runbooks, alerts) and proposing actionable remediations
- Explores reliability and observability tools and techniques that improve detection, diagnosis, and recovery (e.g., better logging, actionable alerts, and dashboards)
- Leverages AI-assisted development tools (e.g., GitHub Copilot, Cursor, Windsurf) for debugging, code generation, and documentation, while learning to validate AI output critically
- Contributes ideas for improving incident response, post-mortem practices, and production readiness during feature delivery
- Demonstrates curiosity, experimentation, and a learning mindset, seeking feedback to build reliability engineering best practices
Requirements:
- Bachelor's degree in Computer Science, Engineering, or related technical discipline
- 0–2 years of experience in software engineering, DevOps, production support, or technical support (internship, co-op, or professional)
- Understanding of object-oriented programming, basic data structures, and algorithms
- Familiarity with .NET/.NET Framework, C#, SQL, and version control systems (Git)
- Exposure to cloud environments (Azure preferred) and basic concepts like deployments, configuration, and networking fundamentals
- Exposure to scripting for automation and troubleshooting (Python and/or PowerShell)
- Familiarity with monitoring/observability tools or concepts (dashboards, alerts, log queries); experience with New Relic or similar is a plus
- Strong debugging/troubleshooting, problem-solving, collaboration, and written communication skills