Free Virtual Summit | October 6-7, 2025
Ticketed In-Person Summit | October 8-9, 2025 | Austin Renaissance Hotel

6th Annual MLOps World | GenAI Summit 2025

The event that takes AI/ML & agentic systems from concept to large-scale production

2 Days • 16 Tracks • 75 Sessions • Vibrant Expo

Why attend: Optimize & Accelerate

Build optimal strategies

Learn emerging techniques and approaches shared by leading teams who are actively scaling ML, GenAI, and agents in production.

Increase project efficiency

Minimize project risks, delays, and missteps by learning from case studies that set new standards for impact, quality, and innovation. Tools for Agent driven apps, multi agent systems, & AI assisted development.

Make better decisions

Make better, faster decisions with lessons and pro tips from the teams shaping ML, GenAI and Agenetic AI systems in production.

Why attend

Technical Workshops

  • Workshop: Context Engineering – Practical Techniques
  • Workshop: Building AI Agents from Scratch
  • Workshop: Managing RAG
  • Workshop: Vibe-Coding Your First LLM App
  • Workshop: Conversational Agents with Thread-Level Metrics

Industry Case Studies & Roundtable Discussions

  • Case study + Discussion Groups on
  • SLMs + Fine-Tuning: with Arcee
  • Sustainable GenAI Systems at Target
  • Explainable AI at Fujitsu
  • Agentic Workforces at Marriot International
  • Agent-Powered Code Migration at Realtor.com

and more!

Community Event App

Meetup 800+ attendees and build your network onsite. See all attendees, connect with speakers. Share your story, build your community.

2 Days of workshops, Case studies, Discussions & Socials

Learn from leading minds, sharpen your skills, and connect with innovators driving safe and effective AI in the real world.

Free Online Stage
  • Skills Workshops (Oct 6)
  • Expert Sessions (Oct 7)
Day 1
  • Summit:
    • Talks, Panels, & Workshops
  • Expo:
    • Lightning Talks
    • Brain Dates
    • Community Square
    • Startup Zone
    • Vendor Booths
  • Opening Party
Day 2
  • Keynote
  • Summit:
    • Talks, Panels, & Workshops
  • Expo:
    • Lightning Talks
    • Brain Dates
    • Community Square
    • Startup Zone

Why attend: Connect & Grow

Grow industry influence

Join Brain Dates, Speaker’s Corner, Community Square, or deliver a talk to share your expertise and amplify your industry impact.

Equip your team to win

Stay ahead of fast-moving competitors by giving your team the insights, skills, and contacts they need to exceed expectations.

Build career momentum

Make every hour count by using our event app to hyper-focus on the right topics and people who will help shape your future in AI.

2025 Summit: Full-Spectrum AI

All themes, talks, and workshops curated by top AI practitioners to deliver real-world value. Explore sessions

2025 THEME: AI Agents & Agentic Workforces

AI Agents for Developer Productivity

This track highlights practical uses of agents to streamline dev workflows—from debugging and code generation to test automation and CI/CD integration.

Agents can now assist in model testing, monitoring, and rollback decisions. The track focuses on how teams are using autonomous systems to harden their ML deployment workflows.

This track explores how teams are combining human oversight with semi-autonomous agents to scale support, operations, and decision-making across the business.

This track explores the design patterns shaping modern agents, from prompt engineering and tool integration to memory and planning strategies, focusing on real-world systems, not just frameworks. It also covers the infrastructure, safety checks, and governance required to deploy agents reliably and securely in production environments, with expert presenters sharing their insights on the challenges of running agents at scale.
This track covers the key architectural choices and infrastructure strategies behind scaling AI and LLM systems in production, from bare metal to Kubernetes, GPU scheduling to inference optimization. It also addresses the complexities of managing model, data, and pipeline versions in a reproducible, team-friendly way, alongside the unique challenges of deploying ML in regulated, resource-constrained, or air-gapped environments. Expert speakers will share insights on building and operating reliable GenAI and agent platforms at scale while navigating the tradeoffs when cloud-based solutions aren’t an option.

2025 THEME: MLOps & Organizational Scale

Governance, Auditability & Model Risk Management

This track covers how teams manage AI risk in production—through model governance, audit trails, compliance workflows, and strategies for monitoring model behavior over time.

Not every team has a platform squad or unlimited infra budget. This track shares practical approaches to shipping ML with lean teams—covering lightweight tooling, automation shortcuts, and lessons from teams doing more with less.

Security doesn’t end at deployment. This track covers threat models, model hardening, data protection, and supply chain risks across the entire ML lifecycle.
Training isn’t just about epochs and GPUs. Talks focus on reproducibility, retraining triggers, pipeline automation, and how teams manage iterative experimentation at scale.
This track focuses on scoping and delivering complex AI projects, exploring how teams are adapting their scoping processes to account for LLMs, agents, and evolving project boundaries in fast-moving environments. It also dives into the strategies behind AI product development, from aligning business goals to driving successful delivery and scaling. Expert presenters will share practical insights on navigating the complexities of AI product strategy and execution.

2025 THEME: LLM Infrastructure & Operations

LLMs on Kubernetes

This track covers the key architectural choices and infra strategies behind scaling AI and LLM systems in production—from bare metal to Kubernetes, GPU scheduling to inference optimization. Learn what it really takes to build and operate reliable GenAI and agent platforms at scale.

This 2025 track covers real-world patterns and pitfalls of running LLMs on Kubernetes. Topics include GPU scheduling, autoscaling, memory isolation, and managing cost and complexity at scale.
This 2025 track explores the realities of deploying ML in regulated, resource-constrained, or air-gapped environments. Talks focus on infrastructure design, data access, and managing tradeoffs when the cloud isn’t an option.
What does it mean to observe an LLM in production? This 2025 track unpacks logging, tracing, token-level inspection, and metrics that actually help teams debug and improve deployed models.
This track addresses the performance, cost, and reliability challenges of running inference at scale, exploring techniques from token streaming and caching strategies to hardware-aware scheduling. It also delves into low-level optimizations, model compilation, and inference kernels, covering everything from Triton and ONNX to custom CUDA solutions. Expert presenters will share insights into the systems that power fast, efficient, and production-ready AI inference across modern hardware.
From Triton to ONNX to custom CUDA, this track explores how inference gets faster. Talks focus on low-level optimization, compilation, and maximizing performance on modern hardware.

Our Expo is where innovation, ideas, and connections come to life

Transform from attendee to active participant by leveling-up your professional contacts, exchanging ideas, and even grabbing the mic to share a passion project.

Make New Connections​

Connect with AI Practitioners

Transform from attendee to active participant by leveling-up your professional contacts, exchanging ideas, and even grabbing the mic to share a passion project.

Expo Expo Expo Expo Expo Expo Expo Expo Expo Expo Expo Expo

40+ Technical Workshops and Industry Case Studies

Speakers

Meet the experts bringing techniques, best practices, and strategies to this year’s stage.

Bryan McCann

CTO, You.com

Building Open Infrastructure for the Agentic Era

Claire Longo

Lead AI Researcher, Comet

How Math-Driven Thinking Builds Smarter Agentic Systems

Rajiv Shah

Chief Evangelist, Contextual AI

From Vectors to Agents: Managing RAG in an Agentic World

Irena Grabovitch-Zuyev

Staff Applied Scientist, PagderDuty

Testing AI Agents: A Practical Framework for Reliability and Performance

Eric Riddoch

Director of ML Platform, Pattern AI

Insights and Epic Fails from 5 Years of Building ML Platforms

Linus Lee

EIR & Advisor, AI, Thrive Capital

Agents as Ordinary Software: Principled Engineering for Scale

Niels Bantilan

Chief ML Engineer, Union.ai

A Practical Field Guide to Optimizing the Cost, Speed, and Accuracy of LLMs for Domain-Specific Agents

Aishwarya Naresh Reganti

Founder, LevelUp Labs, Ex-AWS

Why CI/CD Fails for AI, and How CC/CD Fixes It

Tony Kipkemboi

Head of Developer Relations, CrewAI

Building Conversational AI Agents with Thread-Level Eval Metrics

Denise Kutnick

Co-Founder & CEO, Variata

Opening Pandora’s Box: Building Effective Multimodal Feedback Loops

James Le

Head of Developer Experience, TwelveLabs

Video Intelligence Is Going Agentic

Zachary Carrico

Senior Machine Learning Engineer, Apella

A Practical Guide to Fine-Tuning and Deploying Vision Models

Paul Yang

Member of Technical Staff, Runhouse

Why is ML on Kubernetes Hard? Defining How ML and Software Diverge

Romil Bhardwaj

Co-Creator, SkyPilot

Building Multi-Cloud GenAI Platforms without The Pains

Freddy Boulton

Open Source Software Engineer, Hugging Face

Gradio: The Web Framework for Humans and Machines

Aleksandr Shirokov

Team Lead MLOps Engineer, Wildberries

LLM Inference: A Comparative Guide to Modern Open-Source Runtimes

Vaibhav Misra

Director - Distinguished Engineer, CapitalOne

RAG architecture at CapitalOne

Srishti Bhargava

Software Engineer, Amazon Web Services

The Rise of Self-Aware Data Lakehouses

Federico Bianchi

Senior ML Scientist, TogetherAI

From Zero to One: Building AI Agents From The Ground Up

Kshetrajna Raghavan

Principal Machine Learning Engineer, Shopify

Where Experts Can't Scale: Orchestrating AI Agents to Structure the World's Product Knowledge

Calvin Smith

Senior Researcher Agent R&D, OpenHands

Code-Guided Agents for Legacy System Modernization

Partners Partners Partners Partners Partners Partners Partners

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Official Open Source Sponsor

Community Partners

Media Partners

Latest News

Why attend

Event Parties & Networking

Join our rooftop party, and the many socials taking place during and after the event

Explore Frontier Tools & Startups

Give your team an edge with insights, skills, and connections from the industry’s top innovators —
click here to see the exhibiting sponsors.

Grow industry influence

Join Brain Dates, Speaker’s Corner, Community Square, or deliver a talk to share your expertise and amplify your industry impact.

Curated by AI Practitioners

All sessions and workshops have been hand-picked by a Steering Committee of fellow AI practitioners who obsess about delivering real-world value for attendees.

Denys Linkov

Event Co-Chair & Head of ML at WiseDocs

“We built this year’s summit around practical takeaways. Not theory but actual workflows, strategies, and the next three steps for your team. We didn’t want another ‘Intro to RAG’ talk. We wanted the things people are debugging, scaling, and fixing right now.”

Volunteering

Apply for the opportunity to get exclusive behind the scenes access to the MLOps World experience while growing your network and skills in real-world artificial intelligence.

Austin

Renaissance Austin Hotel

Once again our venue is the beautiful Renaissance Austin Hotel which delivers an exceptional 360 experience for attendees, complete with restaurants, rooftop bar, swimming pool, spa, exercise facilities, and nearby nature walks. Rooms fill up fast, so use our code (MLOPS25) for discounted rates.

Choose Your Email Adventure

Join our Monthly Newsletter to be first to get expert videos from our flagship events and community offers including the latest Stack Drops.

Join Summit Updates to learn about event-specific news like ticket promos and agenda updates as well invites to join our free online Stack Sessions.

Choose what works best for you and update your email preferences at any time.

Hear From Past Attendees

Free Virtual October 6-7 | In-person October 8-9

What Your Ticket Includes

Your pass gives you complete access to the full summit experience, both in Austin and online:

  • Full access to Summit sessions – Day 1 (Oct 8) & Day 2 (Oct 9) in Austin
  • Bonus virtual program – live talks and workshops on Oct 6 & 7
  • Hands-on learning – in-person talks, virtual workshops, and skill-building sessions
  • Food & networking – connect with peers over meals, socials, and receptions
  • AI-powered event app – desktop & mobile access for networking and schedules
  • Networking events – structured meetups and community mixers
  • On-demand replays – access to all post-summit videos
  • 30 days of O’Reilly online learning – unlimited access to books, courses, and videos from O’Reilly and 200+ publishers

Agenda

This agenda is still subject to changes.

Join free virtual sessions October 6–7, then meet us in Austin for in-person case studies, workshops, and expo October 8–9

FAQ

When and where is the event?

The in-person portion of MLOps World | GenAI Summit takes place October 8-9, 2025 at the Renaissance Austin Hotel.

Address: 9721 Arboretum Blvd, Austin, TX 78759, United States See booking details.

Access to all sessions, workshops, networking events, expo hall, and post-event recordings Meals and coffee breaks are provided for in-person attendees. Attendees also have access to the official conference app where you can message speakers, set up brain-dates, attend parties and social functions, post and search for jobs, and see a list of all the other attendees that will be joining in Austin . Attendees will have digital access to over 60k titles and over 200 + other publishers through our official media partnership with O’Reilly publishing Each conference pass will include a 30 days free trial giving you on-demand access to;
  • Live training courses
  • In-depth learning paths
  • Interactive coding environments
  • Certification prep materials
  • Most major AI publications
Yes. In the lead-up to the main event, we host 2 bonus virtual days featuring skills training and insights from top AI experts, Oct 6-7th . Learn more
Technical deep dives, case studies, live demos, hands-on workshops, expert panels, and roundtables across 16+ tracks that have been curated by a volunteer Steering Committee composed of 75+ leading AI practitioners. Also the opportunity to schedule 1-1 brain dates with speakers and other attendees via the app.
Yes. The expo is where you’ll shift from focused learning to active participating and networking, with Brain Dates, Speakers’ Corner, Community Stage, and Startup Zone. You’ll also find exhibits from companies driving the next wave of GenAI and an opening-night reception to connect with peers.
You can register from any of the links on our website, including the button in the header.
Yes. Tickets are refundable and transferable 30 days prior to the event. See our ticket policy for details
Yes. Purchases of multiple tickets receive an additional discount which may vary depending on timing of purchase.
AI Engineers, Agentic Developers, Solution Architects, Full-Stack Developers, enterprise AI teams, startup teams, and AI founders. View our About Page to learn more about the event, including the organizing team, Steering Committee, volunteers, and sponsors.

Yes. The majority of presenters grant permission for their sessions to be recorded and shared. These recordings are made available after the event. The best way to be notified when new learning resources are released is by subscribing to our newsletter.

We’ve got you covered. Let us know during registration and we’ll make arrangements.
How do I apply to speak?

Submit your proposal via the Call for Speakers link in our site header (available ahead of each event) or subscribe to our newsletter for MLOps and other speaking alerts. Learn more

Our Steering Committee reviews technical deep dives, case studies, roadmaps, and skills workshops covering MLOps, GenAI, LLMOps, AI infrastructure, and agentic systems from across the AI spectrum.
Speaker slots are unpaid, but all accepted speakers receive a free conference pass and access to networking events.
Final slide decks are due 3 weeks before the event. Early drafts may be requested for feedback.
Speakers receive a free in-person or virtual pass. We don’t cover travel or lodging, but limited support may be available for nonprofit or academic speakers.
Both options are available. Some tracks are fully virtual, and remote presentations can be pre-recorded or live-streamed.
Yes. Most sessions are recorded and distributed publicly through our email newsletter, YouTube channel, blog, and social media pages.
In-person speakers get full A/V support: mic, projector, and a session moderator. Virtual speakers will receive tech check guidance and support in advance.
What are the sponsorship packages and benefits?
We offer tiered sponsorship packages that include booth space for lead generation, speaking slots, branding on signage, and digital promotion. Unique package extensions are also available.

Visit our sponsor page to get more details and download our Sponsorship Guide, or contact Faraz Thambi at [email protected] to discuss availability and options.

Attendees include ML/Data Engineers, Developers, Solution Architects / Principal Engineers, ML/AI Infra Leads, Technical Leaders, and Senior Leadership (Director, VP, C-suite, Founder) decision-makers from startups, scaleups, and enterprises across North America and around the globe.

Yes. Virtual-only and track-specific sponsorships are available. We also offer branding around keynotes, networking lounges, and workshop zones.
With the exception of the Startup Package, all sponsors get lead scanning tools. Virtual sponsors receive opt-in attendee data based on session engagement and resource downloads.

Yes. Booth packages vary in size depending on the tier; they range from a 20’x20’ island booth (Platinum) to a 6’ x 10’ draped booth (Bronze). Please see the guide for full specifications.

Yes. We offer limited opportunities for sponsor-hosted workshops, roundtables, and after-hours events, pending approval and availability.

Yes, leading companies can apply to contribute discounts and free trials to our audience of AI/ML practitioners as part of our Stack Drop and Community Code programs. Learn more from our blog or email [email protected]

Talk: Building Open Infrastructure for the Agentic Era

Presenter:
Bryan McCann, CTO, You.com

About the Speaker:
Bryan McCann is the co-founder and CTO of You.com. Previously, he was a Lead Research Scientist at Salesforce Research working on Deep Learning and its applications to Natural Language Processing (NLP) and presented his work directly to customers on the keynote stage at Dreamforce, on behalf of collaboration between Salesforce, Google, and Meta at the Pytorch Developers Conference (and a second time), as well as at broad, business-facing venues like VentureBeat Transform.
He authored the first paper and holds the patent on contextualized word vectors, which eventually led to the transfer learning revolution in NLP with BERT and other transformer-based architectures for contextualized word vectors. Other notable work includes early unified models for multi-tasking in NLP, training the largest public and open source language model in the world in 2019, and applying language models to biology in which his team generated proteins that were synthesized by a lab and shown to be as or more effective than those in nature. His work has been cited thousands of times and he has spoken about the cutting edge of NLP and AI at research labs around the world.
Bryan’s work comes out of a deep philosophical interest in meaning and the desire to use AI to complement human creativity, inspire new thoughts, and ultimately develop tools for more fulfilling lives. He was the recipient of the 1st ever eVe award at SXSW 2021 for his
collaboration with award-winning (and Netflix show writing) author Daniel Kehlmann. He is a regular speaker on topics of literature and AI, poetry and AI, and other crossovers between AI and the arts.

Talk Abstract:
We’re entering an era where AI agents will interact with the web more than humans ever have, but the infrastructure of the internet was built for humans – much of it for consumers – rather than agents working on behalf of humans. This has spurred a race to grab land in this new era of agentic-driven economics and a round of defensive measures – closing APIs and walling off gardens to push users and enterprises towards consolidation within a single ecosystem. Consequently, there’s a widening gap between foundation models and working applications.

All of the infrastructure we’ve built for the consumer web needs to be rebuilt for agents – even when they’re doing our consumer and professional activities on our behalf. It is crucial that this new infrastructure is open to all rather than more closed off than what currently exists. Otherwise, every business risks devolving into a mere data provider to the dominant platforms, a trend we’re already seeing.

This talk explores what agentic infrastructure actually looks like, why it’s essential for innovation, and how the MLOps World community can help build the foundation layer that enables sophisticated AI applications without platform lock-in.

Talk: How Math-Driven Thinking Builds Smarter Agentic Systems

Presenter:
Claire Longo, Lead AI Researcher, Comet

About the Presenter:
Claire Longo is an AI leader and Mathematician with over a decade of experience in Data Science and AI. She has led cross-functional AI teams at Twilio, Opendoor, and Arize AI and is currently a Lead AI Researcher at Comet. She holds a Bachelor’s in Applied Mathematics and a Master’s in Statistics from The University of New Mexico. Beyond her technical work, Claire is a Speaker, Advisor, YouTuber, and Poker Player. She is dedicated to mentoring Engineers and Data Scientists while championing diversity and inclusion in AI. Her mission is to empower the next generation of AI practitioners.

Talk Track: Evolution of Agents

Technical Level: 3

Talk Abstract:
Everyone’s buzzing about LLMs, but too few are talking about the math that should guide how we apply them to real-world problems. Mathematics is the language of AI, and a foundational understanding of the math behind AI model architectures should drive decisions when we’re building AI systems.

In this talk, I will do a technical deep dive to demystify how different mathematical architectures in AI models can guide us on how and when to use each model type, and how this knowledge can help us design agent architectures and anticipate potential weaknesses in production so we can safeguard against them. I’ll break down what LLMs can do (and where they fall apart), clarify the elusive concept of “reasoning,” and introduce a benchmarking mindset rooted in math and modularity.

To put it all into context, I’ll share a real-world example of an Agentic use case from my own recent project: a poker coaching app that blends an LLM reasoning model as the interface with statistical models analyzing a player’s performance using historical data. This is a strong example of the future of hybrid agents, where LLMs and other mathematical algorithms work together, each solving the part of the problem it’s best suited for. It demonstrates the proper application of reasoning models grounded in their mathematical properties and shows how modular agent design allows each model to focus on the piece of the system it was built to handle.

I’ll also introduce a scientifically rigorous approach to benchmarking and comparing models, based on statistical hypothesis testing, so we can quantify and measure the impact of different models on our use cases as we evaluate and evolve agentic design patterns.

Whether you’re building RAG agents, real-time LLM apps, or reasoning pipelines, you’ll leave with a new lens for designing agents. You’ll no longer have to rely on trial and error or feel like you’re flying blind with a black-box algorithm. Foundational mathematical understanding will give you the intuition to anticipate how a model is likely to behave, reduce time to production, and increase system transparency.

What You’ll Learn:
It’s easier than you think to understand foundational mathematical concepts in AI, and use that knowledge to guide you build better AI systems

Talk: From Vectors to Agents: Managing RAG in an Agentic World

Presenter:
Rajiv Shah, Chief Evangelist, Contextual AI

About the Presenter:
Rajiv Shah is the Chief Evangelist at Contextual AI with a passion and expertise in Practical AI. He focuses on enabling enterprise teams to succeed with AI. Rajiv has worked on GTM teams at leading AI companies, including Hugging Face in open-source AI, Snorkel in data-centric AI, Snowflake in cloud computing, and DataRobot in AutoML. He started his career in data science at State Farm and Caterpillar.

Rajiv is a widely recognized speaker on AI, published over 20 research papers, been cited over 1000 times, and received over 20 patents. His recent work in AI covers topics such as sports analytics, deep learning, and interpretability.

Rajiv holds a PhD in Communications and a Juris Doctor from the University of Illinois at Urbana Champaign. While earning his degrees, he received a fellowship in Digital Government from the John F. Kennedy School of Government at Harvard University. He is well known on social media with his short videos, @rajistics, that have received over ten million views.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
The RAG landscape has evolved so quickly. We’ve gone from simple keyword search to semantic embeddings to multi-step agentic reasoning. With all these approaches, we see the rise of context engineering in mastering the best RAG for the problem. This talk helps you understand the right search architecture for your use case.
We’ll examine three distinct architectural patterns, including Speedy Retrieval (<500 ms), Accuracy Optimized RAG (<10 seconds), and Exhaustive Agentic Search (10s to several minutes). You’ll see how context engineering evolves across these patterns: from basic prompt augmentation in Speed-First RAG, to dynamic context selection and compression in hybrid systems, to full context orchestration with memory, tools, and state management in agentic approaches.
The talk will include a framework for selecting RAG architectures, architectural patterns with code examples, and guidance on practical issues around RAG infrastructure.

What You’ll Learn:
RAG has matured enough that we can stop chasing the bleeding edge and start making boring, practical decisions about what actually ships.

Points:
– Attendees should leave knowing exactly when to use speedy retrieval vs. agentic search
Most use cases don’t need agents (and shouldn’t pay for them)
– As retrieval improves, managing the context window becomes the real challenge
Success isn’t about retrieving more – it’s about orchestrating what you retrieve
– Agentic search can cost 100x more than vector search
Sometimes “good enough” at 500ms beats “perfect” at 2 minutes

Talk: Testing AI Agents: A Practical Framework for Reliability and Performance

Presenter:
Irena Grabovitch-Zuyev, Staff Applied Scientist, PagerDuty

About the Presenter:
Irena Grabovitch-Zuyev is a Staff Applied Scientist at PagerDuty and a driving force behind PagerDuty Advance, the company’s generative AI capabilities. She leads the development of AI agents that are transforming how customers interact with PagerDuty, pushing the boundaries of incident response and automation.

With over 15 years of experience in machine learning, Irena specializes in generative AI, data mining, machine learning, and information retrieval. At PagerDuty, she partners with stakeholders and customers to identify business challenges and deliver innovative, data-driven solutions.

Irena earned her graduate degree in Information Retrieval in Social Networks from the Technion – Israel Institute of Technology. Before joining PagerDuty, she spent five years at Yahoo Research as part of the Mail Mining team, where her machine learning solutions for automatic extraction and classification were deployed at scale, powering Yahoo Mail’s backend and processing hundreds of millions of messages daily.

She is the author of several academic articles published at top conferences and the inventor of multiple patents. Irena is also a passionate advocate for increasing representation in tech, believing that diversity and inclusion are essential to innovation.

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
As AI agents powered by large language models (LLMs) become integral to production systems, ensuring their reliability and safety is both critical and uniquely challenging. Unlike traditional software, agentic systems are dynamic, probabilistic, and highly sensitive to subtle changes—making conventional testing approaches insufficient.

This talk presents a practical framework for testing AI agents, grounded in real-world experience developing and deploying production-grade agents at PagerDuty. The main focus will be on iterative regression testing: how to design, execute, and refine regression tests that catch failures and performance drifts as agents evolve. We’ll walk through a real use case, highlighting the challenges and solutions encountered along the way.

Beyond regression testing, we’ll cover the additional layers of testing essential for agentic systems, including unit tests for individual tools, adversarial testing to probe robustness, and ethical testing to evaluate outputs for bias, fairness, and compliance. Finally, I’ll share how we’re building automated pipelines to streamline test execution, scoring, and benchmarking—enabling rapid iteration and continuous improvement.

Attendees will leave with a practical, end-to-end framework for testing AI agents, actionable strategies for regression and beyond, and a deeper understanding of how to ensure their own AI systems are reliable, robust, and ready for real-world deployment.

What You’ll Learn:
Attendees will learn a practical, end-to-end framework for testing AI agents—covering correctness, robustness, and ethics—so they can confidently deploy reliable, high-performing LLM-based systems in production.

Talk: Insights and Epic Fails from 5 Years of Building ML Platforms

Presenter:
Eric Riddoch, Director of ML Platform, Pattern AI

About the Presenter:
Eric leads the ML Platform team at Pattern, the largest seller on Amazon.com besides Amazon themselves.

Talk Track: ML Collaboration in Large Organizations

Technical Level: 2

Talk Abstract:
Building an internal ML Platform is a good idea as your amount of data scientists, projects, or data increases. But the MLOps toolscape is overwhelming. How do you pick tools and set your strategy? How important is drift detection? Should I serve all my models as endpoints? How “engineering-oriented” should my data scientists be?

Join Eric on a tour 3 ML Platforms he has worked on to serve 14 million YouTubers and the largest 3P seller on Amazon. Eric will share specific architectures, honest takes from epic failures, things that turned out not to be important, and principles for building a platform with great adoption.

What You’ll Learn:
– Principles > tools. Ultimately all MLOps tools cover ~9 “jobs to be done”.
– “Drift monitoring” is overstated. Data quality issues account for most model failures.
– Offline inference exists and is great! Resist the temptation to use endpoints.
– Data lineage is underrated. Helps catch “target leakage” and upstream/downstream errors.
– Cloud GPUs from non-hyperscalers are getting cheaper. You may not need on-prem.
– DS can get away with “medium-sized” data tools for a long time.

Talk: Agents as Ordinary Software: Principled Engineering for Scale

Presenter:
Linus Lee, EIR & Advisor, AI, Thrive Capital

About the Presenter:
Linus Lee is an EIR and advisor at Thrive Capital, where he focuses on AI as part of the product and engineering team and supports portfolio companies on adopting and deploying frontier AI capabilities. He previously pursued independent HCI and machine learning research before joining Notion as an early member of the AI team.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
Thrive Capital’s in-house research engine Puck executes thousands of research and automation tasks weekly, surfacing current events, drafting memos, and triggering workflows unassisted. This allows Puck to power the wide ecosystem of software tools and automations supporting the Thrive team. A single Puck run may traverse millions of tokens across hundreds of documents and LLM calls, and run for 30 minutes before returning multi-page reports or taking actions. With fewer than 10 engineers, we sustain this scale and complexity by embracing four values — composability, observability, statelessness, and changeability — in our orchestration library Polymer. We’ll share patterns that let us quickly add data sources or tools without regressions, enjoy deep observability to root cause every issue in minutes, and evolve the system smoothly as new model capabilities come online. We’ll end by discussing a few future capabilities we hope to unlock next, like RL, durable execution across hours or days, and scaling via parallel search.

What You’ll Learn:
Concretely, attendees will (1) learn design patterns like composition, adapters, and stateless effects that let us write more robust LLM systems faster and more confidently, and (2) see concrete code examples that illustrate these principles in action in a production system. Our goal is not to sell the audience on the library itself, but rather to advocate for the design patterns behind it.

More broadly, in such a rapidly evolving landscape it can feel tempting to trade off classic engineering principles like composability in favor of following frontier capabilities, subscribing to frameworks that obscure implementation detail or lock you into shortsighted abstractions. This talk will explore how we can have both rigor and frontier velocity with the right foundation.

Talk: A Practical Field Guide to Optimizing the Cost, Speed, and Accuracy of LLMs for Domain-Specific Agents

Presenter:
Niels Bantilan, Chief ML Engineer, Union.ai

About the Presenter:
Niels is the Chief Machine Learning Engineer at Union, a core maintainer of Flyte, an open source workflow orchestration tool, and creator of Pandera, a data validation and testing tool for dataframes. His mission is to help data science and machine learning practitioners be more productive. He has a Masters in Public Health Informatics, and prior to that a background in developmental biology and immunology. His research interests include reinforcement learning, NLP, ML in creative applications, and fairness, accountability, and transparency in automated systems.

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
As the dust settles from the initial boom of applications using hosted large language model (LLM) APIs, engineering teams are discovering that while LLMs get you to a working demo quickly, they often struggle in production with latency spikes, context limitations, and explosive compute costs. This session provides a practical roadmap for navigating not only the experiment-to-production gap using small language models (SLMs), but also the AI-native orchestration strategies that will get you the most bang for your buck.
We’ll explore how SLMs (models that range from hundreds of millions to a few billion parameters) offer a compelling alternative for domain-specific applications by trading off the generalization power of LLMs for significant gains in speed, cost-efficiency, and task-specific accuracy. Using the example of an agent that translates natural language into SQL database queries, this session will demonstrate when and how to deploy SLMs in production systems, how to progressively swap out LLMs for SLMs while maintaining quality, and which orchestration strategies help you customize and maintain SLMs in a cost-effective way.

Key topics include:
– Identifying key leverage points: Which LLM calls should you swap out for SLMs first? We’ll cover how to identify speed, cost, and accuracy leverage points in your AI system so that you can speed up inference, reduce cost, and maintain accuracy.
– Speed Optimization: It’s not just about the speed of inference, which SLMs already excel at, it’s also about accelerating experimentation when you fine-tune and retrain SLMs on a specific domain/task. We’ll cover parallelized optimization runs, intelligent caching strategies, and task fanout techniques for both prompt and hyperparameter optimization.
– Cost Management: Avoiding common pitfalls that negate SLMs’ cost advantages, including resource mismatching (GPU vs CPU workloads), infrastructure provisioning inefficiencies, and idle compute waste. Attendees will learn resource-aware orchestration patterns that scale to zero and recover gracefully from failures.
– Accuracy Enhancement: Maximizing domain-specific performance by implementing the equivalent of “AI unit tests” and incorporating it into your experimentation and deployment pipelines. We’ll cover how this can be done with synthetic datasets, LLM judges, and deterministic evaluation functions that help you catch regressions early and often.

What You’ll Learn:
Attendees will leave with actionable strategies for cost-effective AI deployment, a decision framework for SLM adoption, and orchestration patterns that compound the value of smaller models in domain-specific applications.

Talk: Why CI/CD Fails for AI, and How CC/CD Fixes I

Presenter:
Kiriti Badam, Member of Technical Staff, OpenAI

About the Presenter:
Kiriti Badam is a member of the technical staff at OpenAI, with over a decade of experience designing high-impact enterprise AI systems. He specializes in AI-centric infrastructure, with deep expertise in large-scale compute, data engineering, and storage systems. Prior to OpenAI, Kiriti was a founding engineer at Kumo.ai, a Forbes AI 50 startup, where he led the development of infrastructure that enabled training hundreds of models daily—driving significant ARR growth for enterprise clients. Kiriti brings a rare blend of startup agility and enterprise-scale depth, having worked at companies like Google, Samsung, Databricks, and Kumo.ai.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
AI products break the assumptions traditional software is built on. They’re non-deterministic, hard to debug, and come with a tradeoff no one tells you about: every time you give an AI system more autonomy, you lose a bit of control.

This talk introduces the Continuous Calibration / Continuous Development (CC/CD) framework, designed for building AI systems that behave unpredictably and operate with increasing levels of agency. Based on 50+ real-world deployments, CC/CD helps teams start with low-agency, high-control setups, then scale safely as the system earns trust.

What You’ll Learn:
You’ll learn how to scope capabilities, design meaningful evals, monitor behavior, and increase autonomy intentionally, so your AI product doesn’t collapse under real-world complexity.

Talk: Evaluating LLM-Judge Evaluations: Best Practices

Presenter:
Aishwarya Naresh Reganti, Applied Scientist, Amazon

About the Speaker:
Aishwarya is an Applied Scientist in the Amazon Search Science and AI Org. She works on developing large scale graph-based ML techniques that improve Amazon Search Quality, Trust and Recommendations. She obtained my Master’s degree in Computer Science (MCDS) from Carnegie Mellon’s Language Technology Institute, Pittsburgh. Aishwarya has over 6+ years of hands-on Machine Learning experience and 20+ publications in top-tier conferences like AAAI, ACL, CVPR, NeurIPS, EACL e.t.c. She has worked on a wide spectrum of problems that involve Large Scale Graph Neural Networks, Machine Translation, Multimodal Summarization, Social Media and Social Networks, Human Centric ML, Artificial Social Intelligence, Code-Mixing e.t.c. She has also mentored several Masters and PhD students in the aforementioned areas. Aishwarya serves as a reviewer in various NLP and Graph ML conferences like ACL, EMNLP, AAAI, LoG e.t.c. She has had the opportunity of working with some of the best minds in both academia and industry through collaborations and internships in Microsoft Research, University of Michigan, NTU Singapore, IIIT-Delhi, NTNU-Norway, University of South Carolina e.t.c.

Talk Track: In-Person Workshop

Talk Technical Level: 5/7

Talk Abstract:
The use of LLM-based judges has become common for evaluating scenarios where labeled data is not available or where a straightforward test set evaluation isn’t feasible. However, this approach brings the challenge of ensuring that your LLM judge is properly calibrated and aligns with your evaluation goals. In this talk, I will discuss some best practices to prevent what I call the “AI Collusion Problem,” where multiple AI entities collaborate to produce seemingly good metrics but end up reinforcing each other’s biases or errors. This creates a ripple effect.

What You’ll Learn
– Gain insight into what LLM judges are and the components that make them effective tools for evaluating complex use cases.
– Understand the AI Collusion problem in context of evaluation and how it can create a ripple effect of errors.
– Explore additional components and calibration techniques that help maintain the integrity and accuracy of evaluations.

Talk: Building Conversational AI Agents with Thread-Level Eval Metrics

Presenters:
Claire Longo, Lead AI Researcher, Comet | Tony Kipkemboi, Head of Developer Relations, CrewAI

About the Presenters:
Tony Kipkemboi leads Developer Advocacy at CrewAI, where he helps organizations adopt AI agents to drive efficiency and strategic decision-making. With a background spanning developer relations, technical storytelling, and ecosystem growth, Tony specializes in making complex AI concepts accessible to both technical and business audiences.

He is an active voice in the AI agent community, hosting workshops, podcasts, and tutorials that explore how multi-agent orchestration can reshape the way teams build, evaluate, and deploy AI systems. Tony’s work bridges product experimentation with real-world application; empowering developers, startups, and enterprises to harness AI agents for measurable impact.

At MLops World, Tony brings his experience building and scaling with CrewAI to demonstrate how agent orchestration, when paired with rigorous evaluation, accelerates the path from prototype to production.

Claire Longo is an AI leader and Mathematician with over a decade of experience in Data Science and AI. She has led cross-functional AI teams at Twilio, Opendoor, and Arize AI and is currently a Lead AI Researcher at Comet. She holds a Bachelor’s in Applied Mathematics and a Master’s in Statistics from The University of New Mexico. Beyond her technical work, Claire is a Speaker, Advisor, YouTuber, and Poker Player. She is dedicated to mentoring Engineers and Data Scientists while championing diversity and inclusion in AI. Her mission is to empower the next generation of AI practitioners.

Talk Track: Agents in Production

Technical Level: 4

Talk Abstract:
Building modern conversational AI Agents means dealing with dynamic, multi-step LLM reasoning processes and tool calling that cannot always be predicted or debugged at the trace level alone. During the conversation, we need to understand if the AI accomplishes the user’s goal while staying aligned with intent and delivering a smooth interaction. To truly measure quality, we need to trace and evaluate entire conversation sessions.

In this talk, we introduce a practical workflow for designing, orchestrating, and evaluating conversational AI Agents by combining CrewAI as the Agent development framework with Comet Opik for custom eval metrics.

On the CrewAI side, we’ll showcase how developers can define multi-agent workflows, specialized roles, and task orchestration that mirror real-world business processes. We’ll demonstrate how CrewAI simplifies experimentation with different agent designs and tool integrations, making it easier to move from prototypes to production-ready agents.

On the Opik side, we’ll go over how to capture expert human-in-the-loop feedback and build thread-level evaluation metrics. We’ll show how to log traces, annotate sessions with expert insights, and design LLM-as-a-Judge metrics that mimic human reasoning; turning domain expertise into a repeatable feedback loop.

Together, this workflow combines agentic orchestration + rigorous evaluation, giving developers deep observability, actionable insights, and a clear path to systematically improving conversational AI in real-world applications.

What You’ll Learn:
You can’t reliably build conversational AI agents without treating orchestration and evaluation as two halves of the same workflow; CrewAI structures the agent, Comet Opik ensures you can measure and improve it.

Talk: Opening Pandora’s Box: Building Effective Multimodal Feedback Loops

Presenter:
Denise Kutnick, Co-Founder & CEO, Variata

About the Presenter:
Denise Kutnick is a technologist with over a decade of experience building multimodal systems and evaluation pipelines used by millions, with roles spanning large companies like Intel and high-growth startups like OctoAI (acquired by Nvidia). She is the Co-Founder and CEO of Variata, a company building AI that sees, thinks, and interacts like a user to run visual regression tests at scale and keep digital experiences reliable. Denise is passionate about tackling problems at the intersection of AI and UX.

Talk Track: Multimodal Systems in Production

Technical Level: 3

Talk Abstract:
AI market maps are overflowing with multimodal SDKs promising to blend vision, language, audio, and more into a seamless package. But when they fail in production, you may find yourself locked in without the visibility or tools to fix it.

In this talk, we’ll open the box and explore how to build and interpret multimodal feedback loops that keep complex AI systems healthy in production.

We’ll cover:
– Closed-box vs Open-box Workflows: How exposing intermediate signals in your agentic pipeline grants finer-grained control, faster debugging, and better calibration towards user needs.
– Defining the Right Evals: Why human-understandable checkpoints are essential for model introspection and human-in-the-loop review.
– Data Pipeline Building Blocks: Leveraging tooling such as declarative pipelines, computed columns, and batch execution to catch issues and surface improvements without slowing deployment.

What You’ll Learn:
Regardless of the model or SDKs you choose to build on top of, building the right scaffolding around it will open the box and give you control, visibility, and interpretability of your multimodal AI workflows.

Talk: Video Intelligence Is Going Agentic

Presenter:
James Le, Head of Developer Experience, TwelveLabs

About the Presenter:
James Le is currently leading Developer Experience at Twelve Labs – a startup building multimodal foundation models for video understanding. Previously, he has worked at the nexus of enterprise ML/AI and data infrastructure. He also hosted a podcast that features raw conversations with founders, investors, and operators in the space.

Talk Track: Multimodal Systems in Production

Technical Level: 4

Talk Abstract:
While 90% of the world’s data exists in video format, most AI systems treat video like static images or text—missing crucial temporal relationships and multimodal context. This talk explores the paradigm shift toward agentic video intelligence, where AI agents don’t just analyze video but actively reason about content, plan complex workflows, and execute sophisticated video operations.

Drawing from real-world implementations including MLSE’s 98% efficiency improvement in highlight creation (reducing 16-hour workflows to 9 minutes), this session demonstrates how video agents combine multimodal foundation models with agent architectures to solve previously intractable problems. We’ll explore the unique challenges of video agents—from handling high-dimensional temporal data to maintaining context across multi-step workflows—and showcase practical applications in media, entertainment, and enterprise video processing.

Attendees will learn how to architect video agent systems using planner-worker-reflector patterns, implement transparent agent reasoning, and design multimodal interfaces that bridge natural language interaction with visual media manipulation.

What You’ll Learn:
1. Why traditional approaches fail: Understanding the fundamental limitations of applying text/image AI techniques to video, and why agentic approaches are necessary for complex video understanding.

2. Video agent architecture patterns: How to design and implement planner-worker-reflector architectures that can maintain context across complex multi-step video workflows.

3. Practical implementation strategies: Real-world approaches to building transparent agent reasoning, handling multimodal interfaces, and orchestrating video foundation models.

4. Business impact and ROI: Concrete examples of dramatic efficiency improvements and how to identify high-impact use cases in their own organizations

Talk: A Practical Guide to Fine-Tuning and Deploying Vision Models

Presenter:
Zachary Carrico, Senior Machine Learning Engineer, Apella

About the Presenter:
Zac is a Senior Machine Learning Engineer at Apella, specializing in machine learning products for improving surgical operations. He has a deep interest in healthcare applications of machine learning, and has worked on cancer and Alzheimer’s disease diagnostics. He has end-to-end experience developing ML systems: from early research to serving thousands of daily customers. Zac is an active member of the Data and ML community, having presented at conferences such as Ray Summit, TWIML AI, Data Day, and MLOps & GenAI World. He has also published eight journal articles. His passion lies in advancing ML and streamlining the deployment and monitoring of models, reducing complexity and time. Outside of work, Zac enjoys spending time with his family in Austin and traveling the world in search of the best surfing spots.

Talk Track: ML Training Lifecycle

Technical Level: 3

Talk Abstract:
As video foundation models become integral to applications in healthcare, security, retail, robotics, and consumer applications, MLOps teams face a new class of challenges: how to efficiently fine-tune these large models for domain-specific tasks without overcomplicating infrastructure, overloading compute resources, or degrading real-time performance.

This session presents tips for selecting and intelligently fine-tuning video foundation models at scale. Using a state-of-the-art vision foundation model, we’ll cover techniques for efficient data sampling, temporal-aware augmentation, adapter-based tuning, and scalable optimization strategies. Special focus will be given to handling long and sparse videos, deploying chunk-based inference, and integrating temporal fusion modules with minimal latency overhead. Attendees of this talk will come away with strategies for quickly deploying optimally fine-tuned foundation models.

What You’ll Learn:
Attendees will learn practical strategies for efficiently fine-tuning and deploying video foundation models at scale. They’ll take away techniques for data sampling, temporal-aware augmentation, adapter-based tuning, and scalable optimization—plus methods to handle long/sparse videos and deploy low-latency, chunk-based inference with temporal fusion.

Talk: Why is ML on Kubernetes Hard? Defining How ML and Software Diverge

Presenters:
Donny Greenberg, Co-Founder / CEO, Runhouse | Paul Yang, Member of Technical Staff, Runhouse

About the Presenters:
Donny is the co-founder and CEO of 🏃‍♀️Runhouse🏠. He was previously the product lead for PyTorch at Meta, supporting the AI community across research, production, OSS, and enterprise. Notable projects include TorchRec, the open-sourcing of Meta’s large-scale recommendations infra, TorchArrow & TorchData, PyTorch’s next generation of data APIs.

At Runhouse, Paul is helping to build, test, and deploy Kubetorch at leading AI labs and enterprises for RL, training, and inference use cases. Previously, he worked across a range of ML/DS and infra domain areas, from language model tuning and evaluations for contextually aware code generation to productizing causal ML / pseudo-causal inference.

Talk Track: ML Training Lifecycle

Technical Level: 2

Talk Abstract:
Mature organizations run ML workloads on Kubernetes, but implementations vary widely, and ML engineers rarely enjoy the streamlined development and deployment experiences that platform engineering teams provide for software engineers. Making small changes takes an hour to test and moving from research to production frequently takes multiple weeks – these unergonomic and inefficient processes are unthinkable for software, but standard in ML. To explain this, we first trace the history of ML platforms and how early attempts like Facebook’s FBLearner as “notebooks plus DAGs” led to incorrect reference implementations. Then we define the critical ways that ML diverges from software, such as inability to do local testing due to data size and acceleration needs (GPU), heterogeneity in distributed frameworks and their requirements (Ray, Spark, PyTorch, Tensorflow, Dask, etc.), non-trivial observability and logging. Finally, we propose a solution, Kubetorch, which bridges between an iterable and debuggable Pythonic API for ML Engineers and Kubernetes-first scalable execution.

What You’ll Learn:
ML, especially at sophisticated organizations, is done on Kubernetes. However, there are no definitive reference implementations and well-used projects to date for ML-on-Kubernetes like Kubeflow have had mixed reactions from the community. Kubetorch is an introduction of a novel compute platform that is Kubernetes-native that offers a great, iterable, and debuggable interface into powerful compute for developers, without introducing new pitfalls of brittle infrastructure or long deployment times. In short, Kubetorch is a recognition that ML teams are demanding better platform engineering (rather than “ML Ops” / DevOps) and the right abstraction over Kubernetes is necessary to achieve this.

Talk: Building Multi-Cloud GenAI Platforms without The Pains

Presenter:
Romil Bhardwaj, Co-creator, SkyPilot

About the Presenter:
Romil Bhardwaj is the co-creator of SkyPilot, a widely adopted open-source project that enables running AI workloads seamlessly across multiple cloud platforms. He completed his Ph.D. in Computer Science at UC Berkeley’s RISE Lab, advised by Ion Stoica, focusing on large-scale systems and resource management for machine learning. Romil’s work, recognized with multiple patents, 1,100+ citations in top conferences, and awards such as the USENIX ATC 2024 Distinguished Artifact Award and ACM BuildSys 2017 Best Paper, builds on a strong foundation in both academia and industry. He was previously a contributor to the Ray project, and a Research Fellow at Microsoft Research, where he developed systems for machine learning and wireless networks, including award-winning projects and granted patents. He remains an active reviewer and speaker at leading systems and AI venues.

Talk Track: LLMs on Kubernetes

Technical Level: 2

Talk Abstract:
GenAI workloads are redefining how AI platforms are built. Teams can no longer rely on a single cloud to satisfy their GPU needs, infra costs are growing and productivity of ML engineers is paramount. Going multi-cloud secures GPU capacity, reduces costs and eliminates vendor lock-in, but introduces operational complexity that can slow down ML teams.

This talk is a hands-on guide to building a multi-cloud AI platform that unifies cloud VMs and Kubernetes clusters across Hyperscalers (AWS, GCP, and Azure), Neoclouds (Coreweave, Nebius, Lambda), and on-premise clusters into a single compute abstraction. We’ll walk through practical implementation details including workload scheduling strategies based on resource availability and cost, automated cloud selection for cost optimization, and handling cross-cloud data movement and dependency management. This approach lets ML engineers use the same interface for both interactive development sessions and large-scale distributed training jobs, enabling them to focus on building great AI products rather than wrestling with cloud complexity.

What You’ll Learn:
Multi-cloud solves GenAI’s capacity and cost challenges; the right abstraction layer makes it easy for infra teams and researchers alike.

Talk: Gradio: The Web Framework for Humans and Machines

Presenter:
Freddy Boulton, Open Source Software Engineer, Hugging Face

About the Presenter:
Freddy Boulton, an Open Source Engineer at Hugging Face, brings six years of experience in developing tools that simplify AI sharing and usage. He’s a core maintainer of Gradio, an open-source Python package for building production-ready AI web applications. His latest work focuses on making Gradio applications MCP-compliant, enabling Python developers to create seamless, beautifully designed web interfaces for their AI models that integrate with any MCP client without additional configuration.

Talk Track: Augmenting Workforces with Agents

Technical Level: 2

Talk Abstract:
The Model Context Protocol (MCP) has ushered in a new paradigm, enabling applications to be accessible to AI agents. But shouldn’t these same applications be just as accessible and intuitive for humans? What if building a user-friendly interface for people could automatically create a powerful interface for machines too? This presentation introduces Gradio as The Web Framework for Humans and Machines. We’ll explore how Gradio allows developers to build performant and delightful web UIs for human users, while simultaneously, thanks to its automatic Model Context Protocol (MCP) integration, generating a fully compliant and feature-rich interface for AI agents.

Discover how Gradio simplifies the complexities of MCP, offering “”batteries-included”” functionality like robust file handling, real-time progress updates, and authentication, all with minimal additional effort. We’ll also highlight the Hugging Face Hub’s role as the world’s largest open-source MCP “”App Store,”” showcasing how Gradio-powered Spaces provide a vast ecosystem of readily available AI tools for LLMs. Join us to learn how Gradio uniquely positions you to develop unified AI applications that serve both human users and intelligent agents.

What You’ll Learn:
Developers can build performant feature rich UIs for AI models entirely in python with Gradio. These apps can be easily shared with human users as well as plugged into any MCP-compliant AI agent. Write once, deploy for truly every possible user.

Talk: LLM Inference: A Comparative Guide to Modern Open-Source Runtimes

Presenter:
Aleksandr Shirokov, Team Lead MLOps Engineer, Wildberries

About the Presenter:
My name is Aleksandr Shirokov, I am a T3 Fullstack AI Software Engineer with 5+ years of experience and Team Lead management competence. Currently, I am leading MLOps Team in world-famous marketplace Wildberries in the RecSys department, launching AI products, building ML infrastructure and tools for 300+ ML engineers. I and my team support the full ML lifecycle, from research to production, and work closely with real user-facing products, directly impacting business metrics. https://aptmess.io for more info

Talk Track: LLMs on Kubernetes

Technical Level: 3

Talk Abstract:
In this session, we’ll share how our team built and battle-tested a production-grade LLM serving platform using vLLM, Triton TensorRT-LLM, Text Generation Inference (TGI), and SGLang. We’ll walk through our custom benchmark setup, the trade-offs across frameworks, and when each one makes sense depending on model size, latency, and workload type. We’ll cover how we implemented HPA for vLLM, reduced cold start times with Tensorize, co-located multiple vLLM models in a single pod to save GPU memory, and added lightweight SAQ-based queue wrappers for fair and efficient request handling. To manage usage and visibility, we wrapped all endpoints with Kong, enabling per-user rate limits, token quotas, and usage observability. Finally, we’ll share which LLM and VLM models are running in production today (we are serving DeepSeek R1‑0528 in production), and how we maintain flexibility while keeping costs and complexity in check. If you’re exploring LLM deployment, struggling with infra choices, or planning to scale up usage, this talk will help you avoid common pitfalls, choose the right stack, and design a setup that truly fits your use case.

What You’ll Learn:
There’s no one-size-fits-all LLM serving stack – we’ve benchmarked, deployed, and optimized multiple runtimes in production, and we’ll share what works, when, and why, so you can build the right setup for your use case.

Prequisite Knowledge:
Base info about NLP transformers, Python and Docker

Talk: RAG architecture at CapitalOne

Presenter:
Vaibhav Misra, Director – Distinguished Engineer, CapitalOne

About the Speaker:
An experienced hands on technologist and Engineering Leader with a proven track record of delivering successful large scale (~20 years), cloud based,scalable, robust, secure and fault-tolerant enterprise level distributed systems to meet evolving business requirements capable of processing hundreds of TB of data daily across thousands of customers.

Experience building high performance engineering teams with 8+ years of technical leadership experience providing architecture design, influencing product roadmap, setting technical direction.

Driven by passion for excellence, continuously look to up skill myself on the latest and greatest in technology world and also provide technical guidance, coaching and mentorship to grow other technical leaders with proven ability to lead by influence.

Experience collaborating with cross functional stakeholders, product and engineering leadership to define prioritization of architectural & product roadmap items, build technology strategies across multiple teams, ensuring alignment with business objectives.

Experience handling data at scale in cloud, employing various storage technologies providing secure and reliable cloud solutions which uses encryption for data in transit as well data at rest.

Experience building data intensive applications on top of AI/ML/ LLMs

Talk Track: Data Engineering in an LLM era

Talk Technical Level: 2/7

Talk Abstract:
Shortcomings of LLMs With RAG
RAG Use Cases
Building RAG Data Pipeline with Vector Search
Using RAG with Prompt Engineering and Fine Tuning

What You’ll Learn:
Shortcomings from GEN AI and how to overcome

Talk: The Rise of Self-Aware Data Lakehouses

Presenter:
Srishti Bhargava, Software Engineer, Amazon Web Services

About the Speaker:
I’m Srishti! I’m a software engineer at AWS where I work on data platforms, focusing on systems like Apache Iceberg and SageMaker Lakehouse. I help teams build analytics and machine learning solutions that actually work at scale – turning messy data into something useful.
I really care about making data engineering more approachable. A lot of modern data tools feel unnecessarily complex, so I write about the practical stuff, how to keep tables performing well, handle schema changes gracefully, and build systems that don’t break in production.
Outside of work, I love hiking and catching sunrises when I can. I also spend a lot of time cooking – it’s how I relax and unwind. There’s something satisfying about taking simple ingredients and making something good with them. Some of my best ideas actually come to me while I’m in the kitchen, just taking things slow and enjoying the process.

Talk Track: Data Engineering in an LLM era

Talk Technical Level: 2/7

Talk Abstract:
If you’re managing more than 50 tables and a handful of data models, you’ve probably felt the pain. Schema changes break production. Impact analysis takes hours. New engineers spend weeks figuring out what data exists and how it connects.
In this session, we’ll show you how to build an AI assistant that understands your data platform. Not just another chatbot, but a system that can analyze your schemas, parse dependencies, and predict exactly which models will break when you change a column.
We’ll demonstrate a working implementation that extracts metadata from Apache Iceberg tables, analyzes SQL dependencies, and creates an AI assistant that answers questions like – Which tables are burning through our storage budget?, What’s the blast radius if this critical system goes down?, Where is all our customer PII hiding across 500 tables? Which data pipelines haven’t been touched in months and might be zombie processes? Which tables in the data lakehouse can benefit from iceberg compaction? – analysis that would take days of detective work manually and complex queries. The result is a powerful, natural language interface for data discovery.
Attendees will see live examples of querying table schemas and identifying datasets using simple English prompts, leaving with a practical blueprint for leveraging LLMs to unlock the full potential of their data infrastructure in production settings.

What You’ll Learn:
1. Metadata problem is becoming worse not better, as organizations store large amounts of data across complex systems, it’s getting harder to derive insights from your data in a non-trivial manner.
2. LLMs can actually understand your data architecture.
3. Small and simple changes in how you structure your tables can be extremely beneficial for your organization.
4. This approach scales exponentially – manual approaches don’t. At 10 tables, spreadsheets or manual queries work fine, but when we’re dealing with the scale at organizations today, only an LLM powered approach can keep up with the complexity.
5. This approach can be integrated into existing systems today. We’ll show you how to extract metadata from real Apache Iceberg tables, analyse dependencies, create embeddings and build systems that work with your current data stack
6. Metadata contains way more business value than we realize. The schemas, dependencies and usage patterns tell stories about performance bottlenecks, governance gaps, and business impact that most of us are completely missing.

Talk: From Zero to One: Building AI Agents From The Ground Up

Presenter:
Federico Bianchi, Senior ML Scientist, TogetherAI

About the Presenter:
Federico Bianchi is a Senior ML Scientist at TogetherAI, working on self-improving agents. He was a post-doc at Stanford University. His work has been published in major journals such as Nature and Nature Medicine and conferences such as ICLR, ICML and ACL.

Talk Track: Augmenting Workforces with Agents

Technical Level: 4

Talk Abstract:
What does it take to build a truly autonomous AI agent, from scratch and in the open? In this talk, I’ll share how we’ve developed agents capable of executing full analytical workflows, from raw data to insights. I’ll walk through key principles for designing robust, transparent agents that reason, reflect, and act in complex scientific domains. We’ll explore how architectural choices, tool use, and learning approaches—including reinforcement learning—can be combined to build agents that improve over time and generalize to new tasks.

What You’ll Learn:
Building agents is easy but requires some thinking about the context in which the agents are going to be embedded.

Talk: Where Experts Can't Scale: Orchestrating AI Agents to Structure the World's Product Knowledge

Presenters:
Kshetrajna Raghavan, Principal Machine Learning Engineer, Shopify | Ricardo Tejedor Sanz, Senior Taxonomist, Shopify

About the Presenters:
Kshetrajna is a Principal Machine Learning Engineer at Shopify with 15 years of experience delivering AI solutions across technology, healthcare, and retail. He has led initiatives in large-scale product search, computer vision, natural language processing, and predictive modeling—translating cutting-edge research into systems used by millions. Known for his pragmatic approach, he focuses on building scalable, high-impact machine learning products that drive measurable business results.

Ricardo Tejedor Sanz is a Senior Taxonomist at Shopify with a distinctive background spanning legal experience, linguistics, and machine learning. With diverse analytical experience across international contexts and master’s degrees in English Literature and Audiovisual Translation, plus fluency in four languages, Ricardo brings exceptional rigor and customer-focused problem-solving to taxonomy challenges. He evolved from traditional manual taxonomy methods built on deep market research, competitive analysis, and semantic understanding, to pioneering AI-driven classification systems benefiting millions of merchants globally.

Talk Track: Augmenting Workforces with Agents

Technical Level: 2

Talk Abstract:
How do you maintain a product taxonomy spanning millions of items across every industry—from guitar picks to industrial sensors—when no human team could possibly possess expertise in all these domains? At Shopify, we faced this exact challenge and built an AI agentic system that transforms an impossible human task into a scalable, automated workflow.

In this talk, we reveal how we orchestrate multiple specialized AI agents to analyze, improve, and validate taxonomy changes at unprecedented scale.

You’ll discover:
– How parallel AI agents can augment human expertise across domains where deep knowledge is impossible to maintain
– The architecture patterns that enable agents to work together while maintaining quality and consistency
– Why LLM-as-judge systems are game-changers for scaling quality control
– Critical lessons learned from production deployment, including surprising failures and how we fixed them

We share real metrics showing how this approach transformed a years-long manual process into days of AI-augmented work, and provide actionable insights you can apply to your own “impossible” classification and curation challenges.
Whether you’re dealing with content moderation, data classification, or any task requiring expertise across vast domains, you’ll leave with concrete strategies for building AI agent systems that scale human judgment beyond traditional limitations.

What You’ll Learn:
1. Decompose “Impossible” Into Specialized Agents
Don’t build one AI to know everything. Build many agents that each know something, then orchestrate them.

2. LLM-as-Judge Unlocks Scale
Shifting from “humans review 100%” to “AI pre-screens, humans see 10%” is the game-changer. Key: Let AI fix minor issues, not just reject.

3. Production Lessons Are Brutal
– Prompt overload breaks reasoning
– Always build fallbacks for when services fail

4. Trust Through Transparency
Every AI decision needs reasoning, audit trails, and escalation paths. No black boxes.

5. The Meta-Lesson
Scale isn’t about replacing humans—it’s about amplifying the expertise you have across domains you couldn’t possibly cover.

Talk: Code-Guided Agents for Legacy System Modernization

Presenter:
Calvin Smith, Senior Researcher Agent R&D, OpenHands

About the Speaker:
Calvin Smith is a software engineer and researcher who spent years developing formal methods for generating and understanding code at scale. He joined OpenHands to apply these techniques to real-world software engineering challenges. His current focus: building AI agents that leverage formal methods to modernize legacy codebases and pushing the boundaries of what autonomous agents can accomplish in software engineering.

Talk Track: AI Agents for Developer Productivity

Talk Technical Level: 2/7

Talk Abstract:
Legacy code modernization often fails because we try to boil the ocean. After early attempts at using autonomous agents for whole-codebase transformations resulted in chaos, we developed a novel approach: combine static dependency analysis with intelligent agents to break modernization into reviewable, incremental chunks. This talk explores how we use static-analysis tools to understand codebases, identify optimal modernization boundaries, and orchestrate multiple agents to collaboratively transform codebases to turn an impossible problem into a series of manageable PRs.

What You’ll Learn:
The solution space for AI-automated software engineering extends beyond “AI for code” or “code for AI”. It’s about creating feedback loops where static analysis, AI agents, and human expertise continuously inform and enhance each other.