Free Virtual October 6-7 | In-person October 8-9

Free Virtual Summit | October 6-7, 2025
Ticketed In-Person Summit | October 8-9, 2025 | Austin Renaissance Hotel

6th Annual MLOps World | GenAI Summit 2025

The event that takes AI/ML & agentic systems from concept to large-scale production

2 Days • 16 Tracks • 75 Sessions • Vibrant Expo

Why attend: Optimize & Accelerate

Build optimal strategies

Learn emerging techniques and approaches shared by leading teams who are actively scaling ML, GenAI, and agents in production.

Increase project efficiency

Minimize project risks, delays, and missteps by learning from case studies that set new standards for impact, quality, and innovation. Tools for Agent driven apps, multi agent systems, & AI assisted development.

Make better decisions

Make better, faster decisions with lessons and pro tips from the teams shaping ML, GenAI and Agenetic AI systems in production.

2 Days of Context, Insights, and Connections

Learn from leading minds, sharpen your skills, and connect with innovators driving safe and effective AI in the real world.

Pre-Event
  • Virtual Summit
  • Super Early Bird Event
Day 1
  • Summit:
    • Talks, Panels, & Workshops
  • Expo:
    • Lightning Talks
    • Brain Dates
    • Community Square
    • Startup Zone
    • Vendor Booths
  • Opening Party
Day 2
  • Keynote
  • Summit Day 2
  • Expo Day 2

Why attend: Connect & Grow

Grow industry influence

Join Brain Dates, Speaker’s Corner, Community Square, or deliver a talk to share your expertise and amplify your industry impact.

Equip your team to win

Stay ahead of fast-moving competitors by giving your team the insights, skills, and contacts they need to exceed expectations.

Build career momentum

Make every hour count by using our event app to hyper-focus on the right topics and people who will help shape your future in AI.

2025 Summit: Full-Spectrum AI

All themes, talks, and workshops curated by top AI practitioners to deliver real-world value. Explore sessions

2025 THEME: AI Agents & Agentic Workforces

AI Agents for Developer Productivity

This track highlights practical uses of agents to streamline dev workflows—from debugging and code generation to test automation and CI/CD integration.

Agents can now assist in model testing, monitoring, and rollback decisions. The track focuses on how teams are using autonomous systems to harden their ML deployment workflows.

This track explores how teams are combining human oversight with semi-autonomous agents to scale support, operations, and decision-making across the business.

This track explores the design patterns shaping modern agents, from prompt engineering and tool integration to memory and planning strategies, focusing on real-world systems, not just frameworks. It also covers the infrastructure, safety checks, and governance required to deploy agents reliably and securely in production environments, with expert presenters sharing their insights on the challenges of running agents at scale.
This track covers the key architectural choices and infrastructure strategies behind scaling AI and LLM systems in production, from bare metal to Kubernetes, GPU scheduling to inference optimization. It also addresses the complexities of managing model, data, and pipeline versions in a reproducible, team-friendly way, alongside the unique challenges of deploying ML in regulated, resource-constrained, or air-gapped environments. Expert speakers will share insights on building and operating reliable GenAI and agent platforms at scale while navigating the tradeoffs when cloud-based solutions aren’t an option.

2025 THEME: MLOps & Organizational Scale

Governance, Auditability & Model Risk Management

This track covers how teams manage AI risk in production—through model governance, audit trails, compliance workflows, and strategies for monitoring model behavior over time.

Not every team has a platform squad or unlimited infra budget. This track shares practical approaches to shipping ML with lean teams—covering lightweight tooling, automation shortcuts, and lessons from teams doing more with less.

Security doesn’t end at deployment. This track covers threat models, model hardening, data protection, and supply chain risks across the entire ML lifecycle.
Training isn’t just about epochs and GPUs. Talks focus on reproducibility, retraining triggers, pipeline automation, and how teams manage iterative experimentation at scale.
This track focuses on scoping and delivering complex AI projects, exploring how teams are adapting their scoping processes to account for LLMs, agents, and evolving project boundaries in fast-moving environments. It also dives into the strategies behind AI product development, from aligning business goals to driving successful delivery and scaling. Expert presenters will share practical insights on navigating the complexities of AI product strategy and execution.

2025 THEME: LLM Infrastructure & Operations

LLMs on Kubernetes

This track covers the key architectural choices and infra strategies behind scaling AI and LLM systems in production—from bare metal to Kubernetes, GPU scheduling to inference optimization. Learn what it really takes to build and operate reliable GenAI and agent platforms at scale.

This 2025 track covers real-world patterns and pitfalls of running LLMs on Kubernetes. Topics include GPU scheduling, autoscaling, memory isolation, and managing cost and complexity at scale.
This 2025 track explores the realities of deploying ML in regulated, resource-constrained, or air-gapped environments. Talks focus on infrastructure design, data access, and managing tradeoffs when the cloud isn’t an option.
What does it mean to observe an LLM in production? This 2025 track unpacks logging, tracing, token-level inspection, and metrics that actually help teams debug and improve deployed models.
This track addresses the performance, cost, and reliability challenges of running inference at scale, exploring techniques from token streaming and caching strategies to hardware-aware scheduling. It also delves into low-level optimizations, model compilation, and inference kernels, covering everything from Triton and ONNX to custom CUDA solutions. Expert presenters will share insights into the systems that power fast, efficient, and production-ready AI inference across modern hardware.
From Triton to ONNX to custom CUDA, this track explores how inference gets faster. Talks focus on low-level optimization, compilation, and maximizing performance on modern hardware.

Our Expo is where innovation, ideas, and connections come to life

Transform from attendee to active participant by leveling-up your professional contacts, exchanging ideas, and even grabbing the mic to share a passion project.

Make New Connections​

Connect with AI Practitioners

Transform from attendee to active participant by leveling-up your professional contacts, exchanging ideas, and even grabbing the mic to share a passion project.

Expo Expo Expo Expo Expo Expo Expo Expo Expo Expo Expo Expo

40+ Technical Workshops and Industry Case Studies

Speakers

Meet the experts bringing techniques, best practices, and strategies to this year’s stage.

Claire Longo

Lead AI Researcher, Comet

How Math-Driven Thinking Builds Smarter Agentic Systems

Rajiv Shah

Chief Evangelist, Contextual AI

From Vectors to Agents: Managing RAG in an Agentic World

Irena Grabovitch-Zuyev

Staff Applied Scientist, PagderDuty

Testing AI Agents: A Practical Framework for Reliability and Performance

Eric Riddoch

Director of ML Platform, Pattern AI

Insights and Epic Fails from 5 Years of Building ML Platforms

Linus Lee

EIR & Advisor, AI, Thrive Capital

Agents as Ordinary Software: Principled Engineering for Scale

Niels Bantilan

Chief ML Engineer, Union.ai

A Practical Field Guide to Optimizing the Cost, Speed, and Accuracy of LLMs for Domain-Specific Agents

Aishwarya Naresh Reganti

Founder, LevelUp Labs, Ex-AWS

Why CI/CD Fails for AI, and How CC/CD Fixes It

Tony Kipkemboi

Head of Developer Relations, CrewAI

Building Conversational AI Agents with Thread-Level Eval Metrics

Latest News

Curated by AI Practitioners

All sessions and workshops have been hand-picked by a Steering Committee of fellow AI practitioners who obsess about delivering real-world value for attendees.

Denys Linkov

Event Co-Chair & Head of ML at WiseDocs

“We built this year’s summit around practical takeaways. Not theory but actual workflows, strategies, and the next three steps for your team. We didn’t want another ‘Intro to RAG’ talk. We wanted the things people are debugging, scaling, and fixing right now.”

Volunteering

Apply for the opportunity to get exclusive behind the scenes access to the MLOps World experience while growing your network and skills in real-world artificial intelligence.

Austin

Renaissance Austin Hotel

Once again our venue is the beautiful Renaissance Austin Hotel which delivers an exceptional 360 experience for attendees, complete with restaurants, rooftop bar, swimming pool, spa, exercise facilities, and nearby nature walks. Rooms fill up fast, so use our code (MLOPS25) for discounted rates.

Choose Your Email Adventure

Join our Monthly Newsletter to be first to get expert videos from our flagship events and community offers including the latest Stack Drops.

Join Summit Updates to learn about event-specific news like ticket promos and agenda updates as well invites to join our free online Stack Sessions.

Choose what works best for you and update your email preferences at any time.

Hear From Past Attendees

Partners Partners Partners Partners Partners Partners Partners

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Community Partners

Media Partners

Free Virtual October 6-7 | In-person October 8-9

What Your Ticket Includes

Your pass gives you complete access to the full summit experience, both in Austin and online:

  • Full access to Summit sessions – Day 1 (Oct 8) & Day 2 (Oct 9) in Austin
  • Bonus virtual program – live talks and workshops on Oct 6 & 7
  • Hands-on learning – in-person talks, virtual workshops, and skill-building sessions
  • Food & networking – connect with peers over meals, socials, and receptions
  • AI-powered event app – desktop & mobile access for networking and schedules
  • Networking events – structured meetups and community mixers
  • On-demand replays – access to all post-summit videos
  • 30 days of O’Reilly online learning – unlimited access to books, courses, and videos from O’Reilly and 200+ publishers

FAQ

When and where is the event?

The in-person portion of MLOps World | GenAI Summit takes place October 8-9, 2025 at the Renaissance Austin Hotel.

Address: 9721 Arboretum Blvd, Austin, TX 78759, United States See booking details.

Access to all sessions, workshops, networking events, expo hall, and post-event recordings Meals and coffee breaks are provided for in-person attendees. Attendees also have access to the official conference app where you can message speakers, set up brain-dates, attend parties and social functions, post and search for jobs, and see a list of all the other attendees that will be joining in Austin . Attendees will have digital access to over 60k titles and over 200 + other publishers through our official media partnership with O’Reilly publishing Each conference pass will include a 30 days free trial giving you on-demand access to;
  • Live training courses
  • In-depth learning paths
  • Interactive coding environments
  • Certification prep materials
  • Most major AI publications
Yes. In the lead-up to the main event, we host 2 bonus virtual days featuring skills training and insights from top AI experts, Oct 6-7th . Learn more
Technical deep dives, case studies, live demos, hands-on workshops, expert panels, and roundtables across 16+ tracks that have been curated by a volunteer Steering Committee composed of 75+ leading AI practitioners. Also the opportunity to schedule 1-1 brain dates with speakers and other attendees via the app.
Yes. The expo is where you’ll shift from focused learning to active participating and networking, with Brain Dates, Speakers’ Corner, Community Stage, and Startup Zone. You’ll also find exhibits from companies driving the next wave of GenAI and an opening-night reception to connect with peers.
You can register from any of the links on our website, including the button in the header.
Yes. Tickets are refundable and transferable 30 days prior to the event. See our ticket policy for details
Yes. Purchases of multiple tickets receive an additional discount which may vary depending on timing of purchase.
AI Engineers, Agentic Developers, Solution Architects, Full-Stack Developers, enterprise AI teams, startup teams, and AI founders. View our About Page to learn more about the event, including the organizing team, Steering Committee, volunteers, and sponsors.

Yes. The majority of presenters grant permission for their sessions to be recorded and shared. These recordings are made available after the event. The best way to be notified when new learning resources are released is by subscribing to our newsletter.

We’ve got you covered. Let us know during registration and we’ll make arrangements.
How do I apply to speak?

Submit your proposal via the Call for Speakers link in our site header (available ahead of each event) or subscribe to our newsletter for MLOps and other speaking alerts. Learn more

Our Steering Committee reviews technical deep dives, case studies, roadmaps, and skills workshops covering MLOps, GenAI, LLMOps, AI infrastructure, and agentic systems from across the AI spectrum.
Speaker slots are unpaid, but all accepted speakers receive a free conference pass and access to networking events.
Final slide decks are due 3 weeks before the event. Early drafts may be requested for feedback.
Speakers receive a free in-person or virtual pass. We don’t cover travel or lodging, but limited support may be available for nonprofit or academic speakers.
Both options are available. Some tracks are fully virtual, and remote presentations can be pre-recorded or live-streamed.
Yes. Most sessions are recorded and distributed publicly through our email newsletter, YouTube channel, blog, and social media pages.
In-person speakers get full A/V support: mic, projector, and a session moderator. Virtual speakers will receive tech check guidance and support in advance.
What are the sponsorship packages and benefits?
We offer tiered sponsorship packages that include booth space for lead generation, speaking slots, branding on signage, and digital promotion. Unique package extensions are also available.

Visit our sponsor page to get more details and download our Sponsorship Guide, or contact Faraz Thambi at [email protected] to discuss availability and options.

Attendees include ML/Data Engineers, Developers, Solution Architects / Principal Engineers, ML/AI Infra Leads, Technical Leaders, and Senior Leadership (Director, VP, C-suite, Founder) decision-makers from startups, scaleups, and enterprises across North America and around the globe.

Yes. Virtual-only and track-specific sponsorships are available. We also offer branding around keynotes, networking lounges, and workshop zones.
With the exception of the Startup Package, all sponsors get lead scanning tools. Virtual sponsors receive opt-in attendee data based on session engagement and resource downloads.

Yes. Booth packages vary in size depending on the tier; they range from a 20’x20’ island booth (Platinum) to a 6’ x 10’ draped booth (Bronze). Please see the guide for full specifications.

Yes. We offer limited opportunities for sponsor-hosted workshops, roundtables, and after-hours events, pending approval and availability.

Yes, leading companies can apply to contribute discounts and free trials to our audience of AI/ML practitioners as part of our Stack Drop and Community Code programs. Learn more from our blog or email [email protected]

Talk: How Math-Driven Thinking Builds Smarter Agentic Systems

Presenter:
Claire Longo, Lead AI Researcher, Comet

About the Presenter:
Claire Longo is an AI leader and Mathematician with over a decade of experience in Data Science and AI. She has led cross-functional AI teams at Twilio, Opendoor, and Arize AI and is currently a Lead AI Researcher at Comet. She holds a Bachelor’s in Applied Mathematics and a Master’s in Statistics from The University of New Mexico. Beyond her technical work, Claire is a Speaker, Advisor, YouTuber, and Poker Player. She is dedicated to mentoring Engineers and Data Scientists while championing diversity and inclusion in AI. Her mission is to empower the next generation of AI practitioners.

Talk Track: Evolution of Agents

Technical Level: 3

Talk Abstract:
Everyone’s buzzing about LLMs, but too few are talking about the math that should guide how we apply them to real-world problems. Mathematics is the language of AI, and a foundational understanding of the math behind AI model architectures should drive decisions when we’re building AI systems.

In this talk, I will do a technical deep dive to demystify how different mathematical architectures in AI models can guide us on how and when to use each model type, and how this knowledge can help us design agent architectures and anticipate potential weaknesses in production so we can safeguard against them. I’ll break down what LLMs can do (and where they fall apart), clarify the elusive concept of “reasoning,” and introduce a benchmarking mindset rooted in math and modularity.

To put it all into context, I’ll share a real-world example of an Agentic use case from my own recent project: a poker coaching app that blends an LLM reasoning model as the interface with statistical models analyzing a player’s performance using historical data. This is a strong example of the future of hybrid agents, where LLMs and other mathematical algorithms work together, each solving the part of the problem it’s best suited for. It demonstrates the proper application of reasoning models grounded in their mathematical properties and shows how modular agent design allows each model to focus on the piece of the system it was built to handle.

I’ll also introduce a scientifically rigorous approach to benchmarking and comparing models, based on statistical hypothesis testing, so we can quantify and measure the impact of different models on our use cases as we evaluate and evolve agentic design patterns.

Whether you’re building RAG agents, real-time LLM apps, or reasoning pipelines, you’ll leave with a new lens for designing agents. You’ll no longer have to rely on trial and error or feel like you’re flying blind with a black-box algorithm. Foundational mathematical understanding will give you the intuition to anticipate how a model is likely to behave, reduce time to production, and increase system transparency.

What You’ll Learn:
It’s easier than you think to understand foundational mathematical concepts in AI, and use that knowledge to guide you build better AI systems

Talk: From Vectors to Agents: Managing RAG in an Agentic World

Presenter:
Rajiv Shah, Chief Evangelist, Contextual AI

About the Presenter:
Rajiv Shah is the Chief Evangelist at Contextual AI with a passion and expertise in Practical AI. He focuses on enabling enterprise teams to succeed with AI. Rajiv has worked on GTM teams at leading AI companies, including Hugging Face in open-source AI, Snorkel in data-centric AI, Snowflake in cloud computing, and DataRobot in AutoML. He started his career in data science at State Farm and Caterpillar.

Rajiv is a widely recognized speaker on AI, published over 20 research papers, been cited over 1000 times, and received over 20 patents. His recent work in AI covers topics such as sports analytics, deep learning, and interpretability.

Rajiv holds a PhD in Communications and a Juris Doctor from the University of Illinois at Urbana Champaign. While earning his degrees, he received a fellowship in Digital Government from the John F. Kennedy School of Government at Harvard University. He is well known on social media with his short videos, @rajistics, that have received over ten million views.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
The RAG landscape has evolved so quickly. We’ve gone from simple keyword search to semantic embeddings to multi-step agentic reasoning. With all these approaches, we see the rise of context engineering in mastering the best RAG for the problem. This talk helps you understand the right search architecture for your use case.
We’ll examine three distinct architectural patterns, including Speedy Retrieval (<500 ms), Accuracy Optimized RAG (<10 seconds), and Exhaustive Agentic Search (10s to several minutes). You’ll see how context engineering evolves across these patterns: from basic prompt augmentation in Speed-First RAG, to dynamic context selection and compression in hybrid systems, to full context orchestration with memory, tools, and state management in agentic approaches.
The talk will include a framework for selecting RAG architectures, architectural patterns with code examples, and guidance on practical issues around RAG infrastructure.

What You’ll Learn:
RAG has matured enough that we can stop chasing the bleeding edge and start making boring, practical decisions about what actually ships.

Points:
– Attendees should leave knowing exactly when to use speedy retrieval vs. agentic search
Most use cases don’t need agents (and shouldn’t pay for them)
– As retrieval improves, managing the context window becomes the real challenge
Success isn’t about retrieving more – it’s about orchestrating what you retrieve
– Agentic search can cost 100x more than vector search
Sometimes “good enough” at 500ms beats “perfect” at 2 minutes

Talk: Testing AI Agents: A Practical Framework for Reliability and Performance

Presenter:
Irena Grabovitch-Zuyev, Staff Applied Scientist, PagderDuty

About the Presenter:
Irena Grabovitch-Zuyev is a Staff Applied Scientist at PagerDuty and a driving force behind PagerDuty Advance, the company’s generative AI capabilities. She leads the development of AI agents that are transforming how customers interact with PagerDuty, pushing the boundaries of incident response and automation.

With over 15 years of experience in machine learning, Irena specializes in generative AI, data mining, machine learning, and information retrieval. At PagerDuty, she partners with stakeholders and customers to identify business challenges and deliver innovative, data-driven solutions.

Irena earned her graduate degree in Information Retrieval in Social Networks from the Technion – Israel Institute of Technology. Before joining PagerDuty, she spent five years at Yahoo Research as part of the Mail Mining team, where her machine learning solutions for automatic extraction and classification were deployed at scale, powering Yahoo Mail’s backend and processing hundreds of millions of messages daily.

She is the author of several academic articles published at top conferences and the inventor of multiple patents. Irena is also a passionate advocate for increasing representation in tech, believing that diversity and inclusion are essential to innovation.

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
As AI agents powered by large language models (LLMs) become integral to production systems, ensuring their reliability and safety is both critical and uniquely challenging. Unlike traditional software, agentic systems are dynamic, probabilistic, and highly sensitive to subtle changes—making conventional testing approaches insufficient.

This talk presents a practical framework for testing AI agents, grounded in real-world experience developing and deploying production-grade agents at PagerDuty. The main focus will be on iterative regression testing: how to design, execute, and refine regression tests that catch failures and performance drifts as agents evolve. We’ll walk through a real use case, highlighting the challenges and solutions encountered along the way.

Beyond regression testing, we’ll cover the additional layers of testing essential for agentic systems, including unit tests for individual tools, adversarial testing to probe robustness, and ethical testing to evaluate outputs for bias, fairness, and compliance. Finally, I’ll share how we’re building automated pipelines to streamline test execution, scoring, and benchmarking—enabling rapid iteration and continuous improvement.

Attendees will leave with a practical, end-to-end framework for testing AI agents, actionable strategies for regression and beyond, and a deeper understanding of how to ensure their own AI systems are reliable, robust, and ready for real-world deployment.

What You’ll Learn:
Attendees will learn a practical, end-to-end framework for testing AI agents—covering correctness, robustness, and ethics—so they can confidently deploy reliable, high-performing LLM-based systems in production.

Talk: Insights and Epic Fails from 5 Years of Building ML Platforms

Presenter:
Eric Riddoch, Director of ML Platform, Pattern AI

About the Presenter:
Eric leads the ML Platform team at Pattern, the largest seller on Amazon.com besides Amazon themselves.

Talk Track: ML Collaboration in Large Organizations

Technical Level: 2

Talk Abstract:
Building an internal ML Platform is a good idea as your amount of data scientists, projects, or data increases. But the MLOps toolscape is overwhelming. How do you pick tools and set your strategy? How important is drift detection? Should I serve all my models as endpoints? How “engineering-oriented” should my data scientists be?

Join Eric on a tour 3 ML Platforms he has worked on to serve 14 million YouTubers and the largest 3P seller on Amazon. Eric will share specific architectures, honest takes from epic failures, things that turned out not to be important, and principles for building a platform with great adoption.

What You’ll Learn:
– Principles > tools. Ultimately all MLOps tools cover ~9 “jobs to be done”.
– “Drift monitoring” is overstated. Data quality issues account for most model failures.
– Offline inference exists and is great! Resist the temptation to use endpoints.
– Data lineage is underrated. Helps catch “target leakage” and upstream/downstream errors.
– Cloud GPUs from non-hyperscalers are getting cheaper. You may not need on-prem.
– DS can get away with “medium-sized” data tools for a long time.

Talk: Agents as Ordinary Software: Principled Engineering for Scale

Presenter:
Linus Lee, EIR & Advisor, AI, Thrive Capital

About the Presenter:
Linus Lee is an EIR and advisor at Thrive Capital, where he focuses on AI as part of the product and engineering team and supports portfolio companies on adopting and deploying frontier AI capabilities. He previously pursued independent HCI and machine learning research before joining Notion as an early member of the AI team.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
Thrive Capital’s in-house research engine Puck executes thousands of research and automation tasks weekly, surfacing current events, drafting memos, and triggering workflows unassisted. This allows Puck to power the wide ecosystem of software tools and automations supporting the Thrive team. A single Puck run may traverse millions of tokens across hundreds of documents and LLM calls, and run for 30 minutes before returning multi-page reports or taking actions. With fewer than 10 engineers, we sustain this scale and complexity by embracing four values — composability, observability, statelessness, and changeability — in our orchestration library Polymer. We’ll share patterns that let us quickly add data sources or tools without regressions, enjoy deep observability to root cause every issue in minutes, and evolve the system smoothly as new model capabilities come online. We’ll end by discussing a few future capabilities we hope to unlock next, like RL, durable execution across hours or days, and scaling via parallel search.

What You’ll Learn:
Concretely, attendees will (1) learn design patterns like composition, adapters, and stateless effects that let us write more robust LLM systems faster and more confidently, and (2) see concrete code examples that illustrate these principles in action in a production system. Our goal is not to sell the audience on the library itself, but rather to advocate for the design patterns behind it.

More broadly, in such a rapidly evolving landscape it can feel tempting to trade off classic engineering principles like composability in favor of following frontier capabilities, subscribing to frameworks that obscure implementation detail or lock you into shortsighted abstractions. This talk will explore how we can have both rigor and frontier velocity with the right foundation.

Talk: A Practical Field Guide to Optimizing the Cost, Speed, and Accuracy of LLMs for Domain-Specific Agents

Presenter:
Niels Bantilan, Chief ML Engineer, Union.ai

About the Presenter:
Niels is the Chief Machine Learning Engineer at Union, a core maintainer of Flyte, an open source workflow orchestration tool, and creator of Pandera, a data validation and testing tool for dataframes. His mission is to help data science and machine learning practitioners be more productive. He has a Masters in Public Health Informatics, and prior to that a background in developmental biology and immunology. His research interests include reinforcement learning, NLP, ML in creative applications, and fairness, accountability, and transparency in automated systems.

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
As the dust settles from the initial boom of applications using hosted large language model (LLM) APIs, engineering teams are discovering that while LLMs get you to a working demo quickly, they often struggle in production with latency spikes, context limitations, and explosive compute costs. This session provides a practical roadmap for navigating not only the experiment-to-production gap using small language models (SLMs), but also the AI-native orchestration strategies that will get you the most bang for your buck.
We’ll explore how SLMs (models that range from hundreds of millions to a few billion parameters) offer a compelling alternative for domain-specific applications by trading off the generalization power of LLMs for significant gains in speed, cost-efficiency, and task-specific accuracy. Using the example of an agent that translates natural language into SQL database queries, this session will demonstrate when and how to deploy SLMs in production systems, how to progressively swap out LLMs for SLMs while maintaining quality, and which orchestration strategies help you customize and maintain SLMs in a cost-effective way.

Key topics include:
– Identifying key leverage points: Which LLM calls should you swap out for SLMs first? We’ll cover how to identify speed, cost, and accuracy leverage points in your AI system so that you can speed up inference, reduce cost, and maintain accuracy.
– Speed Optimization: It’s not just about the speed of inference, which SLMs already excel at, it’s also about accelerating experimentation when you fine-tune and retrain SLMs on a specific domain/task. We’ll cover parallelized optimization runs, intelligent caching strategies, and task fanout techniques for both prompt and hyperparameter optimization.
– Cost Management: Avoiding common pitfalls that negate SLMs’ cost advantages, including resource mismatching (GPU vs CPU workloads), infrastructure provisioning inefficiencies, and idle compute waste. Attendees will learn resource-aware orchestration patterns that scale to zero and recover gracefully from failures.
– Accuracy Enhancement: Maximizing domain-specific performance by implementing the equivalent of “AI unit tests” and incorporating it into your experimentation and deployment pipelines. We’ll cover how this can be done with synthetic datasets, LLM judges, and deterministic evaluation functions that help you catch regressions early and often.

What You’ll Learn:
Attendees will leave with actionable strategies for cost-effective AI deployment, a decision framework for SLM adoption, and orchestration patterns that compound the value of smaller models in domain-specific applications.

Talk: Why CI/CD Fails for AI, and How CC/CD Fixes I

Presenters:
Aishwarya Naresh Reganti, Founder, LevelUp Labs, Ex-AWS | Kiriti Badam, Member of Technical Staff, OpenAI

About the Presenters:
Aishwarya Naresh Reganti is the founder of LevelUp Labs, an AI services and consulting firm that helps organizations design, build, and scale AI systems that actually work in the real world. She has led engagements with multiple Fortune 500 companies and fast-growing startups, helping them move beyond demos to production-grade AI.

Before founding LevelUp Labs, she served as a tech lead at the AWS Generative AI Innovation Center, where she led and implemented AI solutions for a wide range of AWS clients. Her work spanned industries such as ISVs, banking, healthcare, e-commerce, and legal tech, with publicly referenced engagements including Bayer, NFL, Zillow, Kayak, and Imply (creators of Apache Druid).

Aishwarya holds a Master’s in Computer Science from Carnegie Mellon University (MCDS) and has authored 35+ papers in top-tier conferences including NeurIPS, ACL, CVPR, AAAI, and EACL. Her research background includes work on graph neural networks, multilingual NLP, multimodal summarization, and human-centric AI. She has mentored graduate students, served as a reviewer for major AI conferences, and collaborated with research teams at Microsoft Research, NTU Singapore, University of Michigan, and more.

Today, Aishwarya teaches top-rated applied AI courses, advises executive teams on AI strategy, and speaks at global conferences including TEDx, ReWork, and MLOps World. Her insights reach over 100,000 professionals on LinkedIn.

Kiriti Badam is a member of the technical staff at OpenAI, with over a decade of experience designing high-impact enterprise AI systems. He specializes in AI-centric infrastructure, with deep expertise in large-scale compute, data engineering, and storage systems. Prior to OpenAI, Kiriti was a founding engineer at Kumo.ai, a Forbes AI 50 startup, where he led the development of infrastructure that enabled training hundreds of models daily—driving significant ARR growth for enterprise clients. Kiriti brings a rare blend of startup agility and enterprise-scale depth, having worked at companies like Google, Samsung, Databricks, and Kumo.ai.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
AI products break the assumptions traditional software is built on. They’re non-deterministic, hard to debug, and come with a tradeoff no one tells you about: every time you give an AI system more autonomy, you lose a bit of control.

This talk introduces the Continuous Calibration / Continuous Development (CC/CD) framework, designed for building AI systems that behave unpredictably and operate with increasing levels of agency. Based on 50+ real-world deployments, CC/CD helps teams start with low-agency, high-control setups, then scale safely as the system earns trust.

What You’ll Learn:
You’ll learn how to scope capabilities, design meaningful evals, monitor behavior, and increase autonomy intentionally, so your AI product doesn’t collapse under real-world complexity.

Talk: Building Conversational AI Agents with Thread-Level Eval Metrics

Presenters:
Claire Longo, Lead AI Researcher, Comet | Tony Kipkemboi, Head of Developer Relations, CrewAI

About the Presenters:
Tony Kipkemboi leads Developer Advocacy at CrewAI, where he helps organizations adopt AI agents to drive efficiency and strategic decision-making. With a background spanning developer relations, technical storytelling, and ecosystem growth, Tony specializes in making complex AI concepts accessible to both technical and business audiences.

He is an active voice in the AI agent community, hosting workshops, podcasts, and tutorials that explore how multi-agent orchestration can reshape the way teams build, evaluate, and deploy AI systems. Tony’s work bridges product experimentation with real-world application; empowering developers, startups, and enterprises to harness AI agents for measurable impact.

At MLops World, Tony brings his experience building and scaling with CrewAI to demonstrate how agent orchestration, when paired with rigorous evaluation, accelerates the path from prototype to production.

Claire Longo is an AI leader and Mathematician with over a decade of experience in Data Science and AI. She has led cross-functional AI teams at Twilio, Opendoor, and Arize AI and is currently a Lead AI Researcher at Comet. She holds a Bachelor’s in Applied Mathematics and a Master’s in Statistics from The University of New Mexico. Beyond her technical work, Claire is a Speaker, Advisor, YouTuber, and Poker Player. She is dedicated to mentoring Engineers and Data Scientists while championing diversity and inclusion in AI. Her mission is to empower the next generation of AI practitioners.

Talk Track: Agents in Production

Technical Level: 4

Talk Abstract:
Building modern conversational AI Agents means dealing with dynamic, multi-step LLM reasoning processes and tool calling that cannot always be predicted or debugged at the trace level alone. During the conversation, we need to understand if the AI accomplishes the user’s goal while staying aligned with intent and delivering a smooth interaction. To truly measure quality, we need to trace and evaluate entire conversation sessions.

In this talk, we introduce a practical workflow for designing, orchestrating, and evaluating conversational AI Agents by combining CrewAI as the Agent development framework with Comet Opik for custom eval metrics.

On the CrewAI side, we’ll showcase how developers can define multi-agent workflows, specialized roles, and task orchestration that mirror real-world business processes. We’ll demonstrate how CrewAI simplifies experimentation with different agent designs and tool integrations, making it easier to move from prototypes to production-ready agents.

On the Opik side, we’ll go over how to capture expert human-in-the-loop feedback and build thread-level evaluation metrics. We’ll show how to log traces, annotate sessions with expert insights, and design LLM-as-a-Judge metrics that mimic human reasoning; turning domain expertise into a repeatable feedback loop.

Together, this workflow combines agentic orchestration + rigorous evaluation, giving developers deep observability, actionable insights, and a clear path to systematically improving conversational AI in real-world applications.

What You’ll Learn:
You can’t reliably build conversational AI agents without treating orchestration and evaluation as two halves of the same workflow; CrewAI structures the agent, Comet Opik ensures you can measure and improve it.