Design and implement robust backend services and APIs that handle AI model inference, resource orchestration, and workload distribution across distributed GPU infrastructure
Build responsive and intuitive web interfaces for training job management, model deployment workflows, and real-time monitoring dashboards using modern JavaScript frameworks
Contribute to the design and implementation of distributed systems using peer-to-peer technologies (Holepunch stack)
Develop and maintain APIs that support both synchronous and asynchronous inference patterns, ensuring compatibility with industry standards
Implement monitoring, logging, and telemetry solutions to ensure high availability and performance of the platform services
Work closely with DevOps, AI/ML engineers, and product teams to deliver integrated solutions that meet technical and business requirements
Maintain high standards for code quality through peer reviews, testing, and documentation while championing security best practices
Requirements
5+ years of experience in fullstack development with strong emphasis on backend systems
Expert-level proficiency in Node.js/JavaScript for backend development and React frontend framework
Proven experience building and scaling distributed systems or event-driven architectures
Strong understanding of API design and implementation, including authentication, rate limiting, and versioning
Experience with containerization technologies (Docker) and orchestration platforms (Kubernetes)
Proficiency with databases, and a deep understanding of data modeling and optimization
Solid understanding of networking, security principles, and best practices for production systems
Experience with real-time data streaming and RPC implementations
Ability to work independently in a remote environment and communicate effectively across time zones
Preferred: Experience with peer-to-peer technologies (Hyperswarm, libp2p, WebRTC) or similar distributed communication protocols
Familiarity with AI/ML inference APIs and OpenAI-compatible endpoints
Previous experience building AI SaaS or PaaS platforms
Knowledge of GPU resource management and ML framework infrastructure
Experience with message queuing systems (Redis, RabbitMQ, Kafka)
Familiarity with observability tools (Prometheus, Grafana, ELK stack)
Understanding of WebAssembly or edge computing paradigms
Contributions to open-source projects in relevant domains