Video: Why Evaluation is the Missing Link in GenAI Product Success

Can you fine-tune a language model, but without a dataset? Here’s how Outerbounds is doing it.

What if you could fine-tune an LLM using just Python code—no curated datasets, no manual labeling? Ville Tuulos explains how this is already possible, using open tools and real-world examples that cost less than $600 to run.

Talk title: Reward Function As A Service: A (Relatively) Easy Recipe for Training Your Own Reasoning Model

Context

In this presentation recorded as part of Stack Session 02, Ville Tuulos, CEO and Co-founder of Outerbounds, breaks down a pragmatic approach to fine-tuning large language models (LLMs) without traditional datasets. Drawing from his work on Metaflow and insights from projects like DeepSeek, Ville offers a blueprint for teams stuck between API-only solutions and fully custom models. The talk is aimed at teams looking to differentiate their AI products without requiring hundreds of GPUs or massive datasets.

What You’ll Learn

This talk is for ML engineers and builders who want more control over GenAI systems—but don’t have the compute to train from scratch:

  • The 5-stage evolution of AI product maturity, and why most teams are stuck at stage 2 or 3
  • How to define a Python-based reward function instead of labeling data
  • Real-world examples of quirky and practical reward setups (e.g., palindromes, math problems, literary classification)
  • How Outerbounds used Metaflow to orchestrate fine-tuning and evaluation
  • Cost breakdown and performance insights from training a reasoning model on 8 H100s for 32 hours

MLOps World | GenAI Summit 2025

Our next flagship event is taking place October 7–9 at the Austin Renaissance Hotel. Join us for real-world case studies, hands-on workshops, career-defining meetups, and more. Tickets on sale now

About the Speaker

Ville Tuulos is the CEO and Co-founder of Outerbounds, where he leads development on Metaflow, an open-source framework for building and managing real-world ML workflows. Previously, Ville led infrastructure teams at Netflix. His current work focuses on helping teams build production-grade AI systems using practical, reproducible methods.

Stay Connected

Want more content like this, straight to your inbox? Sign up for the TMLS monthly newsletter to learn from expert voices, attend free virtual events, and get open resources including exclusive offers on leading AI/ML stack tools.

Visit: mlopsworld.com

About TMLS

TMLS is a global community of AI Engineers, Data Scientists, ML Engineers, Full-Stack Developers, infrastructure teams, and entrepreneurs. We share lessons, support each other’s growth, and stay grounded in what actually works. Through peer-curated events like TMLS Summit (June, Toronto) and MLOps World (October, Austin), free online learning sessions, and exclusive resources, we help teams deliver and scale AI projects safely and responsibly. In a space moving as fast as artificial intelligence, context is everything. Our mission is to help top practitioners find it.

Table of Contents
Share This Post