Python Infrastructure Engineer --- Model Evaluation (AI Training) About The Role What if your Python expertise could directly shape how the world's most advanced AI models are built, tested, and improved? We're looking for a senior Python engineer to design and build the data pipelines, evaluation harnesses, and annotation tooling that sit at the heart of cutting-edge AI development.
This is a fully remote, flexible contract role working alongside leading AI research labs on real production systems. If you're a strong Python engineer who wants to do meaningful, high-impact work at the frontier of AI --- this is the role for you.
- Organization: Alignerr
- Type: Hourly Contract
- Location: Remote
- Commitment: 20--40 hours/week
What You'll Do
- Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation workflows
- Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control
- Build and maintain evaluation harnesses that integrate with ML inference frameworks
- Improve reliability, performance, and safety across existing Python codebases
- Instrument systems with observability and metrics collection to monitor reliability and model performance
- Identify bottlenecks and edge cases in data and system behavior, and implement scalable fixes
- Collaborate with data, research, and engineering teams to support model training and evaluation workflows
- Participate in synchronous design reviews to iterate on architecture and implementation decisions
Who You Are
- Native or fluent English speaker with clear written and verbal communication skills
- Full-stack developer with a strong systems programming background
- 3--5+ years of professional experience writing production-grade Python
- Experienced building evaluation harnesses for ML models and integrating with inference frameworks
- Solid background in observability, metrics collection, and monitoring for production systems
- Self-motivated and reliable --- able to commit 20--40 hours per week
Nice to Have
- Prior experience with data annotation, data quality, or evaluation systems
- Familiarity with AI/ML workflows, model training, or benchmarking pipelines
- Experience with distributed systems or developer tooling
- Background in MLOps or AI infrastructure
Why Join Us
- Work directly on cutting-edge AI projects alongside leading research labs
- Fully remote and flexible --- structure your work week around your life
- Freelance autonomy with the depth and consistency of meaningful, long-term technical work
- Make a tangible impact on how next-generation AI models are evaluated and improved
- Potential for ongoing work and contract extension as new projects launch