Crossing Hurdles is seeking a Senior Software Engineer specializing in Python for LLM Evaluation and Repository Validation. The role involves designing verifiable software engineering tasks, analyzing GitHub issues, and collaborating with research teams to enhance dataset diversity for LLM training.
Responsibilities:
- Design and develop verifiable software engineering tasks using public repository data
- Analyze and triage GitHub issues across widely-used open-source repositories
- Set up, configure, and manage development environments including Dockerization
- Evaluate unit test coverage, code quality, and repository robustness
- Run, modify, and test real-world codebases to assess LLM performance
- Identify grounding issues, incorrect outputs, and weak reasoning in model evaluations
- Collaborate with research teams to identify challenging datasets for LLM training
- Contribute to expanding dataset diversity across languages and difficulty levels
- Lead or support junior engineers in repository evaluation and task creation
Requirements:
- Strong proficiency in Python is mandatory
- Hands-on experience with Git and Docker
- Minimum 3 years of software engineering experience
- Ability to understand and navigate complex, large-scale codebases
- Experience working with high-quality public repositories (5000+ stars preferred)
- Strong analytical thinking and problem-solving skills
- Familiarity with software testing, debugging, and pipeline setup
- Ability to work independently in a remote environment
- Reliable system setup with stable internet connection