About this role

Crossing Hurdles is seeking a Senior Software Engineer specializing in Python for LLM Evaluation and Repository Validation. The role involves designing verifiable software engineering tasks, analyzing GitHub issues, and collaborating with research teams to enhance dataset diversity for LLM training.

Responsibilities:

Design and develop verifiable software engineering tasks using public repository data
Analyze and triage GitHub issues across widely-used open-source repositories
Set up, configure, and manage development environments including Dockerization
Evaluate unit test coverage, code quality, and repository robustness
Run, modify, and test real-world codebases to assess LLM performance
Identify grounding issues, incorrect outputs, and weak reasoning in model evaluations
Collaborate with research teams to identify challenging datasets for LLM training
Contribute to expanding dataset diversity across languages and difficulty levels
Lead or support junior engineers in repository evaluation and task creation

Requirements:

Strong proficiency in Python is mandatory
Hands-on experience with Git and Docker
Minimum 3 years of software engineering experience
Ability to understand and navigate complex, large-scale codebases
Experience working with high-quality public repositories (5000+ stars preferred)
Strong analytical thinking and problem-solving skills
Familiarity with software testing, debugging, and pipeline setup
Ability to work independently in a remote environment
Reliable system setup with stable internet connection

Senior Software Engineer – Python (LLM Evaluation & Repository Validation) | $19/hr Remote

Key skills

About this role

Responsibilities:

Requirements: