Free Virtual October 6-7 | In-person October 8-9

All sessions and workshops curated by leading AI/ML practitioners

Agents in Production

Ville Tuulos

Co-Founder, CEO, Outerbounds

Metaflow: The Baseplate for Agentic Systems

Paco Nathan

Principal DevRel Engineer, Senzing

Doxing Dark Money: Entity Resolution to Empower Downstream Ai Applications in Anti-Fraud

Kyle Corbitt

Co-Founder & CEO, OpenPipe

How to Train Your Agent: Building Reliable Agents with RL

Aishwarya Naresh Reganti

Founder, LevelUp Labs, Ex-AWS

Why CI/CD Fails for AI, and How CC/CD Fixes It

Kiriti Badam

Member of Technical Staff, OpenAI

Why CI/CD Fails for AI, and How CC/CD Fixes It

Pablo Salvador Lopez

Principal AI Application Development Architect – AI Solution Engineering Global Black Belt Team, Microsoft

From Static IVRs to Agentic Voice AI: Building Real-Time Intelligent Conversations

Hannes Hapke

Principal Machine Learning Engineer, Digits

The Hard Truth About AI Agents: Lessons Learned from Running Agents in Production

Linus Lee

EIR & Advisor, AI, Thrive Capital

Agents as Ordinary Software: Principled Engineering for Scale

Tony Kipkemboi

Head of Developer Relations, CrewAI

Building Conversational AI Agents with Thread-Level Eval Metrics

Claire Longo

Lead AI Researcher, Comet

Building Conversational AI Agents with Thread-Level Eval Metrics

Dr. Hemant Joshi

CTO, FloTorch

How to Build and Evaluate Agentic AI Workflows with FloTorch

Philipp Krenn

Head of Developer Relations , Elastic

Hope is Not a Strategy: Retrieval Patterns for MCP

Rajiv Shah

Chief Evangelist, Contextual AI

From Vectors to Agents: Managing RAG in an Agentic World

Akshay Mittal

Staff Software Engineer, PayPal

Agent Name Service (ANS) in Action – A DNS-like Trust Layer for Secure, Scalable AI-Agent Deployments on Kubernetes

Anish Shah

AI Engineer, Weights & Biases

Building and Evaluating Agents

Irena Grabovitch-Zuyev

Staff Applied Scientist, PagerDuty

Testing AI Agents: A Practical Framework for Reliability and Performance

Kumaran Ponnambalam

Principal AI Engineer, Cisco

Agent Drift: Understanding and Managing AI Agent Performance Degradation in Production

Augmenting Workforces with Agents

Vaibhav Page

Principal Engineer, Blackrock

Context is King: Scaling Beyond Prompt Engineering at BlackRock

Infant Vasanth

Senior Director of Engineering, Blackrock

Context is King: Scaling Beyond Prompt Engineering at BlackRock

Kshetrajna Raghavan

Principal Machine Learning Engineer, Shopify

Where Experts Can't Scale: Orchestrating AI Agents to Structure the World's Product Knowledge

Ricardo Tejedor Sanz

Senior Taxonomist, Shopify

Where Experts Can't Scale: Orchestrating AI Agents to Structure the World's Product Knowledge

Federico Bianchi

Senior ML Scientist, TogetherAI

From Zero to One: Building AI Agents From The Ground Up

Evolution of Agents

Claire Longo

Lead AI Researcher, Comet

How Math-Driven Thinking Builds Smarter Agentic Systems

ML Collaboration in Large Organizations

Eric Riddoch

Director of ML Platform, Pattern AI

Insights and Epic Fails from 5 Years of Building ML Platforms

AI Agents for Developer Productivity

Purshotam Shah

Senior Princ Software Developer Engineer, Yahoo

From Schema Discovery to Kubernetes: Building an Autonomous Agent for Real-Time Apache Flink Apps with LangGraph

Yegor Denisov-Blanch

Researcher, Stanford University

Impact of AI on Developer Productivity

Calvin Smith

Senior Researcher Agent R&D, OpenHands

Code-Guided Agents for Legacy System Modernization

AI Agents for Model Validation and Deployments

Eric Reese

Senior Manager, Site Reliability Engineering, BestBuy

Don't Page the Planet: Trust-Weighted Ops Decisions

Ankur Goyal

Founder & CEO, Braintrust

Five Hard-Earned Lessons About Evals

Data Engineering in an LLM era

Bhavana Sajja

Senior Machine Learning Engineer, Expedia Inc

Fake Data, Real Power: Crafting Synthetic Transactions for Bulletproof AI

ML Training Lifecycle

Zachary Carrico

Senior Machine Learning Engineer, Apella

Smart Fine-Tuning of Video Foundation Models for Fast Deployments

Paul Yang

Member of Technical Staff, Runhouse

Why is ML on Kubernetes Hard? Defining How ML and Software Diverge

Latest MLOps Trends

Hudson Buzby

Solutions Architect, JFrog

Securing Models

LLMs on Kubernetes

Romil Bhardwaj

Co-Creator, SkyPilot

Building Multi-Cloud GenAI Platforms without The Pains

Multimodal Systems in Production

Denise Kutnick

Co-Founder & CEO, Variata

Opening Pandora’s Box: Building Effective Multimodal Feedback Loops

James Le

Head of Developer Experience, TwelveLabs

Video Intelligence Is Going Agentic

Scoping and Delivering Complex AI Projects

Denys Linkov

Head of ML, Wisedocs

Future of AI in Healthcare

David Baum

UX Researcher & Design Strategist, Amazon

Humans in the Loop: Designing Trustworthy AI Through Embedded Research

Virtual Day

Aleksandr Shirokov

Team Lead MLOps Engineer, Wildberries

LLM Inference: A Comparative Guide to Modern Open-Source Runtimes

Anish Shah

AI Engineer, Weights & Biases

Architecting and Orchestrating AI Agents

Suhas Pai

CTO & Co-Founder, Hudson Labs

Architecting a Deep Research System

Freddy Boulton

Open Source Software Engineer, Hugging Face

Gradio: The Web Framework for Humans and Machines

Srishti Bhargava

Software Engineer, Amazon Web Services

The Rise of Self-Aware Data Lakehouses

Shelby Heinecke

Senior AI Research Manager, Salesforce

What’s Next in the Agent Stack

Sushant Mehta

Senior Research Engineer, Google DeepMind

Building Effective Agents

Remy Muhire

CEO, Pindo.ai

From Hello to Repayment: Voice AI in African Finance

Sanket Badhe

Senior Machine Learning Engineer, TikTok

Adversarial Threats Across the ML Lifecycle: A Red Team Perspective

Lin Liu

Director, Data Science, Wealthsimple

Story is All You Need

Madhu Ramanathan

Principal Group Engineering Manager, Trust, Safety and Intelligence, Microsoft

The Efficiency Equation: Leveraging AI Agents to Augment Human Labelers in Building Trust and Safety Systems at Scale

Niels Bantilan

Chief ML Engineer, Union.ai

A Practical Field Guide to Optimizing the Cost, Speed, and Accuracy of LLMs for Domain-Specific Agents

Kishan Rao

Engineering Manager, Delivery and Automation Platform, Okta

Your Infrastructure Just Got Smarter: AI Agents in the DevOps Loop

Alessandro Pireno

Founder, Stealth Company

I Tried Everything: A Pragmatist's Guide to Building Knowledge Graphs from Unstructured Data

Kelvin Ma

Staff Software Engineer, Google Photos

Productizing Generative AI at Google Scale: Lessons on Scoping and Delivering Ai Powered Editors

Lightning Talks

Nicholas Luzio

AI Solutions Lead, Arize AI

Shipping AI That Works

Robert Shelton

Applied AI Engineer, Redis

Beyond the Vibe: Eval Driven Development

Mariam Jabara

Senior Field Engineer, Arcee AI

SLMs + Fine-Tuning: Building the Infrastructure for Multi-Agent Systems

Speakers Corner

Alexej Penner

Founding Engineer, ZenML

The Real Problem building Agentic applications (And How MLOps Solves It)

Ville Tuulos

CEO, Co-Founder, Outerbounds

Agentic Metaflow in Action

Claire Longo

Lead AI Researcher, Comet

A Simple Recipe for LLM Observability

Chris Matteson

Head of Sales Engineering, Union.ai

What gets AI Agents to Production

Danny Chiao

Engineering Lead, Databricks

Techniques to build high quality agents faster with MLflow

Hudson Buzby

Solutions Architect, JFrog

AI Catalog by JFrog - Control access to Open Source LLM's

Nicholas Luzio

AI Solutions Lead, Arize AI

Building Feedback-Driven Agentic Workflows

Nikunj Bajaj

CEO, TrueFoundry

Unified Control Plane for Enterprise GenAI: Powered by Agentic Deployment Platform with Central AI Gateway & MCP Integration

The Next Wave of AI

Hamza Tahir

Co-Founder, ZenML

MLOps for Agents: Bringing the Outer Loop to Autonomous AI

Robert Shelton

Applied AI Engineer, Redis

Memory and Memory Accessories: Building an Agent from Scratch

Aish Agarwal

CEO, Connecty AI

Live Demo - World's First ‍Data Agentic AI With Business Logic Intelligence

more coming soon

Agenda

This agenda is still subject to changes.

Join free virtual sessions October 6–7, then meet us in Austin for in-person case studies, workshops, and expo October 8–9

Talk: Metaflow: The Baseplate for Agentic Systems

Presenter:
Ville Tuulos, Co-Founder, CEO, Outerbounds

About the Speaker:
Ville Tuulos is the co-founder and CEO of Outerbounds, a platform that empowers enterprises to build production-ready, standout AI systems. He has been building infrastructure for machine learning and AI for over two decades. Ville began his career as an AI researcher in academia, authored Effective Data Science Infrastructure, and has held leadership roles at several companies—including Netflix, where he led the team that created Metaflow, a widely adopted open-source framework for end-to-end ML and AI systems.

Talk Track: Agents in Production

Talk Technical Level: 2/7

Talk Abstract:
Agent frameworks like LangChain or OpenAI’s Agent SDK make it easy to prototype agents, but they must be deployed in a production-grade environment that provides resilience, memory, and a runtime environment, with robust access to services and tools via MCP. The newly released open-source Metaflow 2.18 delivers such a baseplate for agentic systems, building on the battle-tested and versatile infrastructure Metaflow has refined over years. Paired with your favorite agent framework, Metaflow offers a complete stack for agents – and the tools they depend on – ready for serious production use cases.

This talk introduces Metaflow’s new agentic features and demonstrates a practical example you can easily adapt to your own use cases.

What You’ll Learn:
Understanding the full stack required by production-grade agents, and how one can leverage open-source Metaflow to deliver it

Talk: Doxing Dark Money: Entity Resolution to Empower Downstream Ai Applications in Anti-Fraud

Presenter:
Paco Nathan, Principal DevRel Engineer, Senzing

About the Presenter:
Paco Nathan leads DevRel for the Entity Resolved Knowledge Graph practice area at Senzing.com and is a computer scientist with +40 years of tech industry experience and core expertise in data science, natural language, graph technologies, and cloud computing. He’s the author of numerous books, videos, and tutorials about these topics. He also hosts the monthly “Graph Power Hour!” webinar.

Paco advises Kurve.ai, EmergentMethods.ai, and is lead committer for the `pytextrank` and `kglab` open source projects. Formerly: Director of Learning Group at O’Reilly Media; and Director of Community Evangelism at Databricks.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
The Dark Web: an estimated $3T USD flows annually through shell companies leveraging tax havens worldwide — serving as the _perpetua mobilia_ for oligarchs, funding illegal weapons transfers, cyber attacks at global scale, human trafficking, anti-democracy campaigns, even illegal fishing fleets. The tendrils of kleptocracy extend throughout our political and economic system.

People who hunt bad guys” — investigative journalists, OSINT, regulators, gov agencies, law enforcement, FinCrime investigation units, etc. — leverage both graph analytics and downstream AI apps to contend with the overwhelming data volumes. Our team provides core technology — entity resolution — used in this work, and other public sector such as the major of voter registration in the US. Most of our use cases run in air-gapped environments, based on large-scale distributed infrastructure, streaming data from multiple sources. In these production use cases, even with several billion graph elements, decisions to “merge” or “disambiguate” known entities can be propagated with milliseconds of a new record arriving.

Among those who perform this kind of confidential work, few are permitted to speak at tech conferences. However, we can use open source, open models, and open data to illustrate these kinds of applications. We’ll show how technology gets used to stick the moves of the world’s worst organized crime rings, how to fight against oligarchs who use complex networks to hide their grift. On the flip side, similar approaches can be leveraged to find your best customers within a graph.

This talk explores known cases, the fraud tradecraft employed, open data sources, and how technology gets leveraged. There are multiple areas were multimodal agentic workflows (e.g., based on BAML) play important roles, both for handling unstructured data sources and for actions taken based on inference. Moreover, we’ll look at where data professionals are very much needed, where you can get involved.

What You’ll Learn:
How a combination of graph technologies and downstream AI applications gets leveraged for fighting FinCrime and transnational corruption in general.

Talk: How to Train Your Agent: Building Reliable Agents with RL

Presenter:
Kyle Corbitt, Co-Founder & CEO, OpenPipe

About the Presenter:
Kyle Corbitt is the co-founder and CEO of OpenPipe, the RL post-training company. OpenPipe has trained thousands of customer models for both enterprises and tech-forward startups.

Before founding OpenPipe, Kyle led the Startup School team at Y Combinator, which was responsible for the product and content that YC produces for early-stage companies. Prior to that he worked as an engineer at Google and studied ML at school.

Talk Track: Augmenting Workforces with Agents

Technical Level: 4

Talk Abstract:
Have you ever launched an awesome agentic demo, only to realize no amount of prompting will make it reliable enough to deploy in production? Agent reliability is a famously difficult problem to solve!

In this talk we’ll learn how to use GRPO to help your agent learn from its successes and failures and improve over time. We’ve seen dramatic results with this technique, such as an email assistant agent whose success rate jumped from 74% to 94% after replacing o4-mini with an open source model optimized using GRPO.

We’ll share case studies as well as practical lessons learned around the types of problems this works well for and the unexpected pitfalls to avoid.

What You’ll Learn:
I’ve frankly been shocked by how well RL works on real-world agentic use cases, and I’m very excited to share lessons learned with the audience. We’re working with DoorDash as well as several smaller customers on deploying these agents to prod and seeing universally strong results. This won’t be an OpenPipe pitch session; I’ll cover all the open-source tooling we use to make these models work.

Talk: Why CI/CD Fails for AI, and How CC/CD Fixes I

Presenters:
Aishwarya Naresh Reganti, Founder, LevelUp Labs, Ex-AWS | Kiriti Badam, Member of Technical Staff, OpenAI

About the Presenters:
Aishwarya Naresh Reganti is the founder of LevelUp Labs, an AI services and consulting firm that helps organizations design, build, and scale AI systems that actually work in the real world. She has led engagements with multiple Fortune 500 companies and fast-growing startups, helping them move beyond demos to production-grade AI.

Before founding LevelUp Labs, she served as a tech lead at the AWS Generative AI Innovation Center, where she led and implemented AI solutions for a wide range of AWS clients. Her work spanned industries such as ISVs, banking, healthcare, e-commerce, and legal tech, with publicly referenced engagements including Bayer, NFL, Zillow, Kayak, and Imply (creators of Apache Druid).

Aishwarya holds a Master’s in Computer Science from Carnegie Mellon University (MCDS) and has authored 35+ papers in top-tier conferences including NeurIPS, ACL, CVPR, AAAI, and EACL. Her research background includes work on graph neural networks, multilingual NLP, multimodal summarization, and human-centric AI. She has mentored graduate students, served as a reviewer for major AI conferences, and collaborated with research teams at Microsoft Research, NTU Singapore, University of Michigan, and more.

Today, Aishwarya teaches top-rated applied AI courses, advises executive teams on AI strategy, and speaks at global conferences including TEDx, ReWork, and MLOps World. Her insights reach over 100,000 professionals on LinkedIn.

Kiriti Badam is a member of the technical staff at OpenAI, with over a decade of experience designing high-impact enterprise AI systems. He specializes in AI-centric infrastructure, with deep expertise in large-scale compute, data engineering, and storage systems. Prior to OpenAI, Kiriti was a founding engineer at Kumo.ai, a Forbes AI 50 startup, where he led the development of infrastructure that enabled training hundreds of models daily—driving significant ARR growth for enterprise clients. Kiriti brings a rare blend of startup agility and enterprise-scale depth, having worked at companies like Google, Samsung, Databricks, and Kumo.ai.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
AI products break the assumptions traditional software is built on. They’re non-deterministic, hard to debug, and come with a tradeoff no one tells you about: every time you give an AI system more autonomy, you lose a bit of control.

This talk introduces the Continuous Calibration / Continuous Development (CC/CD) framework, designed for building AI systems that behave unpredictably and operate with increasing levels of agency. Based on 50+ real-world deployments, CC/CD helps teams start with low-agency, high-control setups, then scale safely as the system earns trust.

What You’ll Learn:
You’ll learn how to scope capabilities, design meaningful evals, monitor behavior, and increase autonomy intentionally, so your AI product doesn’t collapse under real-world complexity.

Talk: From Static IVRs to Agentic Voice AI: Building Real-Time Intelligent Conversations

Presenter:
Pablo Salvador Lopez, Principal AI Application Development Architect – AI Solution Engineering Global Black Belt Team, Microsoft

About the Presenter:
As an AI Solution Engineering Leader in Microsoft’s Global Black Belt team—the elite force driving AI and cloud application innovation within the Azure ecosystem—I design and deliver transformative Generative AI solutions for some of the world’s most complex and highly regulated industries. My work bridges deep technical skills with mission-critical execution—specializing in Retrieval-Augmented Generation (RAG), agentic AI systems, and scalable multi-agent orchestration using Azure AI, OpenAI, and frameworks like Semantic Kernel or any custom stacks.

I design intelligence from the ground up—combining LLMs and custom orchestration frameworks to create real-time, memory-aware agents that reason, act, and collaborate.

My foundation spans full-stack data science, ML engineering, and software architecture. I’ve led real-time and batch AI deployments for Fortune 500 enterprises, with expertise across MLOps/LLMOps and high-throughput inference—anchored in cloud platforms like Azure, GCP, and AWS.

Where others see ambiguity, I see momentum. I’m known for turning raw ideas into production-grade systems—either by building from first principles or rethinking the “rules” when innovation demands it. My mission is to build systems that matter—empowering teams to do their best work, and leaving every product, platform, pattern and person stronger than I found them.

Beyond industry, I’m committed to education and community. As an Adjunct Instructor in Northwestern University’s MSAI program, I teach hands-on courses in Cloud AI, GenAI, RAG, and multi-agent systems. I mentor startups, serve on advisory boards, and contribute to open-source AI—sharing ideas that move the field forward.

Explore my blog | Check out my GitHub

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
Developers today face the challenge of transforming outdated IVRs and traditional voice systems into intelligent, responsive interactions. This session dives into the concept of agentic voice AI—systems capable of real-time reasoning, decision-making, and dynamic action execution. We’ll explore how to architect modular voice applications using Azure, orchestrate multiple autonomous agents for specialized tasks, and leverage real-time AI inference to produce fluid, human-like conversations. Attendees will learn practical strategies to design agentic voice interactions, enabling their systems to autonomously plan, act, and dynamically adapt to user contexts and needs.

What You’ll Learn:
Attendees will leave equipped with a clear understanding of agentic architecture in real-time voice applications, including practical techniques for orchestrating multiple specialized agents, integrating dynamic reasoning, leveraging memory, and optimizing speech latency. They will be empowered to move beyond static IVRs towards fully autonomous, intelligent voice experiences.

Talk: The Hard Truth About AI Agents: Lessons Learned from Running Agents in Production

Presenter:
Hannes Hapke, Principal Machine Learning Engineer, Digits

About the Speaker:
Hannes Hapke is a principal machine learning engineer at Digits, where he has spent years building production AI systems that accountants and business owners actually use daily.

Before Digits, he solved ML infrastructure problems across healthcare, retail, and renewable energy – industries where failure isn’t an option. At SAP Concur, he learned that impressive prototypes and production systems are entirely different beasts.

Hannes co-authored numerous machine learning books, including “Building Machine Learning Pipelines” and “Machine Learning Production Systems” (O’Reilly), and his upcoming “GenAI Design Patterns” book addresses the gap between AI hype and reality. As a Google Developer Expert for Machine Learning, he’s committed to sharing the hard truths about production ML.

Talk Track: Agents in Production

Talk Technical Level: 2/7

Talk Abstract:
Every conference showcases impressive agent demos. What they don’t show you are the 3 AM pages when agents go rogue, the customer support tickets when AI makes expensive mistakes, or the months of debugging why your “95% accurate” prototype becomes 60% reliable in production.

This talk cuts through the agent hype with unfiltered lessons from Digits’ journey deploying customer-facing agents that handle real financial data. Hannes will share the architectural decisions that actually matter (hint: it’s not the framework you choose), the monitoring approaches that catch problems before customers do, and the failure modes that no one warns you about.

You’ll learn why agent evaluation in development predicts almost nothing about production performance, how to build guardrails that don’t cripple functionality, and why the hardest problems aren’t technical – they’re about managing expectations and building trust.

This presentation is a field guide to the messy reality of production agents, complete with practical design patterns for Hannes’ newest O’Reilly publication “Generative AI Design Patterns” (together with Dr. Valliappa Lakshmanan), and the kind of lessons learned you only get from keeping systems running when money is involved.

What You’ll Learn:
– Production Reality Check: Why impressive demos fail spectacularly in production and how to bridge that gap

– Architecture for Reliability: The infrastructure patterns that actually matter for agent systems at scale

– Architecture for Observability: The specific ways to monitor agents in production

Talk: Agents as Ordinary Software: Principled Engineering for Scale

Presenter:
Linus Lee, EIR & Advisor, AI, Thrive Capital

About the Presenter:
Linus Lee is an EIR and advisor at Thrive Capital, where he focuses on AI as part of the product and engineering team and supports portfolio companies on adopting and deploying frontier AI capabilities. He previously pursued independent HCI and machine learning research before joining Notion as an early member of the AI team.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
Thrive Capital’s in-house research engine Puck executes thousands of research and automation tasks weekly, surfacing current events, drafting memos, and triggering workflows unassisted. This allows Puck to power the wide ecosystem of software tools and automations supporting the Thrive team. A single Puck run may traverse millions of tokens across hundreds of documents and LLM calls, and run for 30 minutes before returning multi-page reports or taking actions. With fewer than 10 engineers, we sustain this scale and complexity by embracing four values — composability, observability, statelessness, and changeability — in our orchestration library Polymer. We’ll share patterns that let us quickly add data sources or tools without regressions, enjoy deep observability to root cause every issue in minutes, and evolve the system smoothly as new model capabilities come online. We’ll end by discussing a few future capabilities we hope to unlock next, like RL, durable execution across hours or days, and scaling via parallel search.

What You’ll Learn:
Concretely, attendees will (1) learn design patterns like composition, adapters, and stateless effects that let us write more robust LLM systems faster and more confidently, and (2) see concrete code examples that illustrate these principles in action in a production system. Our goal is not to sell the audience on the library itself, but rather to advocate for the design patterns behind it.

More broadly, in such a rapidly evolving landscape it can feel tempting to trade off classic engineering principles like composability in favor of following frontier capabilities, subscribing to frameworks that obscure implementation detail or lock you into shortsighted abstractions. This talk will explore how we can have both rigor and frontier velocity with the right foundation.

Talk: Building Conversational AI Agents with Thread-Level Eval Metrics

Presenters:
Claire Longo, Lead AI Researcher, Comet | Tony Kipkemboi, Head of Developer Relations, CrewAI

About the Presenters:
Tony Kipkemboi leads Developer Advocacy at CrewAI, where he helps organizations adopt AI agents to drive efficiency and strategic decision-making. With a background spanning developer relations, technical storytelling, and ecosystem growth, Tony specializes in making complex AI concepts accessible to both technical and business audiences.

He is an active voice in the AI agent community, hosting workshops, podcasts, and tutorials that explore how multi-agent orchestration can reshape the way teams build, evaluate, and deploy AI systems. Tony’s work bridges product experimentation with real-world application; empowering developers, startups, and enterprises to harness AI agents for measurable impact.

At MLops World, Tony brings his experience building and scaling with CrewAI to demonstrate how agent orchestration, when paired with rigorous evaluation, accelerates the path from prototype to production.

Claire Longo is an AI leader and Mathematician with over a decade of experience in Data Science and AI. She has led cross-functional AI teams at Twilio, Opendoor, and Arize AI and is currently a Lead AI Researcher at Comet. She holds a Bachelor’s in Applied Mathematics and a Master’s in Statistics from The University of New Mexico. Beyond her technical work, Claire is a Speaker, Advisor, YouTuber, and Poker Player. She is dedicated to mentoring Engineers and Data Scientists while championing diversity and inclusion in AI. Her mission is to empower the next generation of AI practitioners.

Talk Track: Agents in Production

Technical Level: 4

Talk Abstract:
Building modern conversational AI Agents means dealing with dynamic, multi-step LLM reasoning processes and tool calling that cannot always be predicted or debugged at the trace level alone. During the conversation, we need to understand if the AI accomplishes the user’s goal while staying aligned with intent and delivering a smooth interaction. To truly measure quality, we need to trace and evaluate entire conversation sessions.

In this talk, we introduce a practical workflow for designing, orchestrating, and evaluating conversational AI Agents by combining CrewAI as the Agent development framework with Comet Opik for custom eval metrics.

On the CrewAI side, we’ll showcase how developers can define multi-agent workflows, specialized roles, and task orchestration that mirror real-world business processes. We’ll demonstrate how CrewAI simplifies experimentation with different agent designs and tool integrations, making it easier to move from prototypes to production-ready agents.

On the Opik side, we’ll go over how to capture expert human-in-the-loop feedback and build thread-level evaluation metrics. We’ll show how to log traces, annotate sessions with expert insights, and design LLM-as-a-Judge metrics that mimic human reasoning; turning domain expertise into a repeatable feedback loop.

Together, this workflow combines agentic orchestration + rigorous evaluation, giving developers deep observability, actionable insights, and a clear path to systematically improving conversational AI in real-world applications.

What You’ll Learn:
You can’t reliably build conversational AI agents without treating orchestration and evaluation as two halves of the same workflow; CrewAI structures the agent, Comet Opik ensures you can measure and improve it.

Workshop: How to Build and Evaluate Agentic AI Workflows with FloTorch

Presenter:
Dr. Hemant Joshi, CTO, FloTorch

About the Presenter:
Dr. Hemant Joshi has over 20 years of industry experience building products and services with AI/ML technologies.

As CTO of FloTorch, Hemant is engaged with customers to implement State of the Art GenAI solutions and agentic workflows for enterprises.

Prior to FloTorch, Hemant has worked in companies like Tumblr, L’Oreal and Claim Genius. Hemant holds a Bachelor of Engineering from Mumbai University and a Ph.D. in Applied Computing from the University of Arkansas at Little Rock.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
The workshop will guide you through the critical challenges and solutions for deploying GenAI agents in a business environment.

You will understand how to build and scale agentic workflows reliably and securely.

– Overview of Agentic Workflows from planning to enterprise-grade implementation
– Understand the pain points that can derail an enterprise’s AI adoption, like governance and monitoring
– Set up an agentic workflow with the Flotorch AI Gateway with any LLM via a single endpoint and smart routing
– Understand why a platform for agentic governance and observability is essential for accelerating your organization’s AI journey, ensuring trust, and maximizing business value.

What You’ll Learn:
The key takeaway should be the time savings around pushing agentic projects from concept to production by using the AI Gateway. Scaling and modifying them in the future becomes easy with different LLMs.

Also, having an evaluation platform to understand the costs, latency and accuracy of LLMs before deploying to production helps a business make the necessary trade-offs for their use case.

Talk: Hope is Not a Strategy: Retrieval Patterns for MCP

Presenter:
Philipp Krenn, Head of Developer Relations , Elastic

About the Speaker:
Philipp leads Developer Relations at Elastic — the company behind the Elasticsearch, Kibana, Beats, and Logstash. Based in San Francisco, he lives to demo interesting technology and solve challenging problems — all with a smile and a terminal window.

Talk Track: Agents in Production

Talk Technical Level: 3/7

Talk Abstract:
MCP is a solid integration layer — but how does it hold up when it comes to output quality? Often, not as well as you’d like. Here are some practical retrieval patterns, from basic to advanced, that worked well in my experiments:
– Naive: Just plug in plain MCP and hope the LLM gets it right. Sometimes it does. Sometimes you’ll need a miracle.
– Semantic: Add more descriptive field names and extra metadata. It helps — but usually just a bit.
– Templated: Use a structured template and have the LLM fill it out step by step. More effort, but by far the most reliable results.

What You’ll Learn:
While MCP is a simple protocol there are (emerging) patterns you can use to make it more powerful.

Talk: From Vectors to Agents: Managing RAG in an Agentic World

Presenter:
Rajiv Shah, Chief Evangelist, Contextual AI

About the Presenter:
Rajiv Shah is the Chief Evangelist at Contextual AI with a passion and expertise in Practical AI. He focuses on enabling enterprise teams to succeed with AI. Rajiv has worked on GTM teams at leading AI companies, including Hugging Face in open-source AI, Snorkel in data-centric AI, Snowflake in cloud computing, and DataRobot in AutoML. He started his career in data science at State Farm and Caterpillar.

Rajiv is a widely recognized speaker on AI, published over 20 research papers, been cited over 1000 times, and received over 20 patents. His recent work in AI covers topics such as sports analytics, deep learning, and interpretability.

Rajiv holds a PhD in Communications and a Juris Doctor from the University of Illinois at Urbana Champaign. While earning his degrees, he received a fellowship in Digital Government from the John F. Kennedy School of Government at Harvard University. He is well known on social media with his short videos, @rajistics, that have received over ten million views.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
The RAG landscape has evolved so quickly. We’ve gone from simple keyword search to semantic embeddings to multi-step agentic reasoning. With all these approaches, we see the rise of context engineering in mastering the best RAG for the problem. This talk helps you understand the right search architecture for your use case.
We’ll examine three distinct architectural patterns, including Speedy Retrieval (<500 ms), Accuracy Optimized RAG (<10 seconds), and Exhaustive Agentic Search (10s to several minutes). You’ll see how context engineering evolves across these patterns: from basic prompt augmentation in Speed-First RAG, to dynamic context selection and compression in hybrid systems, to full context orchestration with memory, tools, and state management in agentic approaches.
The talk will include a framework for selecting RAG architectures, architectural patterns with code examples, and guidance on practical issues around RAG infrastructure.

What You’ll Learn:
RAG has matured enough that we can stop chasing the bleeding edge and start making boring, practical decisions about what actually ships.

Points:
– Attendees should leave knowing exactly when to use speedy retrieval vs. agentic search
Most use cases don’t need agents (and shouldn’t pay for them)
– As retrieval improves, managing the context window becomes the real challenge
Success isn’t about retrieving more – it’s about orchestrating what you retrieve
– Agentic search can cost 100x more than vector search
Sometimes “good enough” at 500ms beats “perfect” at 2 minutes

Talk: Agent Name Service (ANS) in Action – A DNS-like Trust Layer for Secure, Scalable AI-Agent Deployments on Kubernetes

Presenter:
Akshay Mittal, Staff Software Engineer, PayPal

About the Presenter:
Akshay Mittal is a Staff Software Engineer at PayPal and an IEEE Senior Member with over a decade of experience in full-stack development and cloud-native systems. He is currently pursuing a PhD at the University of the Cumberlands, focusing on AI/ML-driven security for cloud architectures. Akshay actively contributes to the Austin tech community through speaking engagements, mentoring, and IEEE and ACM initiatives, with a professional mission of advancing technical excellence and fostering innovation.

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
Enterprise MLOps is rapidly shifting from model-centric pipelines to agent-centric ecosystems, where autonomous AI agents continuously retrain models, validate data, and remediate incidents without human intervention. Yet most production platforms still lack a uniform mechanism to discover, authenticate, and govern these agents. This session introduces the Agent Name Service (ANS) – an open, DNS-inspired protocol that assigns unique identities, publishes verifiable metadata, and issues capability attestations for AI agents running on Kubernetes. Drawing on lessons learned from securing PayPal’s global API platform, I will demonstrate how ANS enables end-to-end trust across the ML lifecycle: model-validation agents that flag concept drift, deployment agents that patch mis-configured Helm charts, and guard-agent ensembles that enforce policy-as-code in real time. A live demo will show ANS integrated with GitOps, Open Policy Agent, Sigstore, and an open-source agent-orchestration framework, highlighting zero-trust handshakes, key rotation, and automated RBAC provisioning. Attendees will leave with practical templates and a GitHub reference implementation ready for pilot adoption.

What You’ll Learn:
1. Why identity and capability verification are the missing guardrails for agentic MLOps

2. Reference architecture for deploying ANS on a Kubernetes stack with GitOps, OPA, and Sigstore

3. Patterns for chaining validation, remediation, and notification agents while preserving least-privilege access

4. Performance and security benchmarks from a production pilot handling 1 000+ daily agent interactions

Talk: Building and Evaluating Agents

Presenter:
Anish Shah, AI Engineer, Weights & Biases

About the Speaker:
Anish loves turning ML ideas into ML products. Anish started his career working with multiple Data Science teams within SAP, working with traditional ML, deep learning, and recommendation systems before landing at Weights & Biases. With the art of programming and a little bit of magic, Anish crafts ML projects to help better serve our customers, turning “oh nos” to “a-ha”s!

Talk Track: Agents in Production

Talk Technical Level: 3/7

Talk Abstract:
This session explores how large language models evolve from single-prompt tools into agentic systems capable of solving real-world business problems. We’ll cover the design principles behind agents — reflection, tool use, planning, and collaboration — and show how these map to modern architectures. The talk then focuses on the challenge of evaluation, highlighting methods like automated judges, process-level metrics, and continuous monitoring to ensure reliability, efficiency, and user trust. Attendees will leave with a clear understanding of how to structure AI agents and how to systematically measure and improve their performance.

What You’ll Learn:
Attendees will learn the current state of agents with an emphasis on the problems faced in development with advice and tools to deal with these problems

Talk: Testing AI Agents: A Practical Framework for Reliability and Performance

Presenter:
Irena Grabovitch-Zuyev, Staff Applied Scientist, PagerDuty

About the Presenter:
Irena Grabovitch-Zuyev is a Staff Applied Scientist at PagerDuty and a driving force behind PagerDuty Advance, the company’s generative AI capabilities. She leads the development of AI agents that are transforming how customers interact with PagerDuty, pushing the boundaries of incident response and automation.

With over 15 years of experience in machine learning, Irena specializes in generative AI, data mining, machine learning, and information retrieval. At PagerDuty, she partners with stakeholders and customers to identify business challenges and deliver innovative, data-driven solutions.

Irena earned her graduate degree in Information Retrieval in Social Networks from the Technion – Israel Institute of Technology. Before joining PagerDuty, she spent five years at Yahoo Research as part of the Mail Mining team, where her machine learning solutions for automatic extraction and classification were deployed at scale, powering Yahoo Mail’s backend and processing hundreds of millions of messages daily.

She is the author of several academic articles published at top conferences and the inventor of multiple patents. Irena is also a passionate advocate for increasing representation in tech, believing that diversity and inclusion are essential to innovation.

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
As AI agents powered by large language models (LLMs) become integral to production systems, ensuring their reliability and safety is both critical and uniquely challenging. Unlike traditional software, agentic systems are dynamic, probabilistic, and highly sensitive to subtle changes—making conventional testing approaches insufficient.

This talk presents a practical framework for testing AI agents, grounded in real-world experience developing and deploying production-grade agents at PagerDuty. The main focus will be on iterative regression testing: how to design, execute, and refine regression tests that catch failures and performance drifts as agents evolve. We’ll walk through a real use case, highlighting the challenges and solutions encountered along the way.

Beyond regression testing, we’ll cover the additional layers of testing essential for agentic systems, including unit tests for individual tools, adversarial testing to probe robustness, and ethical testing to evaluate outputs for bias, fairness, and compliance. Finally, I’ll share how we’re building automated pipelines to streamline test execution, scoring, and benchmarking—enabling rapid iteration and continuous improvement.

Attendees will leave with a practical, end-to-end framework for testing AI agents, actionable strategies for regression and beyond, and a deeper understanding of how to ensure their own AI systems are reliable, robust, and ready for real-world deployment.

What You’ll Learn:
Attendees will learn a practical, end-to-end framework for testing AI agents—covering correctness, robustness, and ethics—so they can confidently deploy reliable, high-performing LLM-based systems in production.

Talk: Agent Drift: Understanding and Managing AI Agent Performance Degradation in Production

Presenter:
Kumaran Ponnambalam, Principal AI Engineer, Cisco

About the Presenter:
Kumaran Ponnambalam is a technology leader with 20+ years of experience in Generative AI, Machine Learning, Data and Analytics. His focus is on creating robust, scalable Gen AI models and services to drive effective business solutions. He is currently leading Generative AI initiatives at Cisco, building next-generation AI innovations and products to help enterprises. In his previous roles, he has built conversational bots, ML platforms, data pipelines and cloud services. A frequent speaker at technology conferences, he has also authored several courses on the LinkedIn Learning Platform in Generative AI and Machine Learning.

Talk Track: Agents in Production

Technical Level: 4

Talk Abstract:
As AI Agents continues to integrate into production systems, maintaining consistent performance over time remains a critical challenge. This talk explores the concept of “Agent Drift,” a phenomenon where AI agents experience performance degradation due to shifts in data distribution, evolving user behavior, tool behavior or model changes. Attendees will gain insights into how agent drift impacts the reliability and effectiveness of AI systems, and why early detection is essential for mitigating risks in production environments. The session will introduce practical strategies for measuring agent drift, enabling teams to identify performance gaps and adapt their Agents proactively. By leveraging these techniques, organizations can ensure their AI agents remain robust and aligned with real-world requirements. Whether you are a data scientist, engineer, or AI practitioner, this talk will provide actionable takeaways for managing and optimizing AI agents in dynamic settings.

What You’ll Learn:
How to measure performance of AI Agents in production, identify drift and take remedial actions.

Talk: Context is King: Scaling Beyond Prompt Engineering at BlackRock

Presenters:
Vaibhav Page, Principal Engineer, Blackrock | Infant Vasanth, Senior Director of Engineering, Blackrock

About the Presenters:
Vaibhav is a Principal Engineer at BlackRock, where he leads the development of the Data Science and AI platform powering investment research and automation across the firm. Vaibhav is also the author of Argo-Events, a CNCF-graduated project widely used for event-driven automation in cloud-native environments.

Infant Vasanth leads the engineering team responsible for the Studio Compute Platform, BlackRock’s analytics and automation platform that enables our users to conduct research & analysis, run automations and distribute research at scale.
In addition, Infant is also leading the Data & AI Acceleration team focusing on efforts to enhance Aladdin Studio’s AI capabilities alongside the Operational AI capabilities(prospectus analyzer, operational agents etc.)

Talk Track: Augmenting Workforces with Agents

Technical Level: 2

Talk Abstract:
As AI use cases grow in complexity, prompt engineering along is insufficient. In this talk, we will discuss BlackRock’s evolution of engineering relevant contexts for a broad range of AI use cases from generating investment signals to optimizing operational processes. Furthermore, building the right context in real-time has its own set of challenges ranging from context window limitations, finding the relevant information, and running evaluations on the generated context. We’ll demonstrate how thoughtful context design leads to more robust and adaptable AI agents. We will go over the art and science of building relevant contexts for complex financial use cases and its associated challenges.

What You’ll Learn:
This session offers a practical guide and framework for users looking to build or engineer relevant contexts at scale for their AI applications and use cases. By showcasing how the framework accelerates the creation of these AI Contexts, the session will provide actionable insights for teams aiming to develop and deploy custom AI solutions. We’ll walk through real-world examples, including some of the challenges we faced while building contexts for different teams at BlackRock. The design principles, architectural patterns, and context engineering strategies shared can be applied across industries to reduce hallucinations and give relevant answers. Attendees will also learn how this looks like in a highly regulated environment where adhering to industry-standard security practices is of outmost importance.

Talk: Where Experts Can't Scale: Orchestrating AI Agents to Structure the World's Product Knowledge

Presenters:
Kshetrajna Raghavan, Principal Machine Learning Engineer, Shopify | Ricardo Tejedor Sanz, Senior Taxonomist, Shopify

About the Presenters:
Kshetrajna is a Principal Machine Learning Engineer at Shopify with 15 years of experience delivering AI solutions across technology, healthcare, and retail. He has led initiatives in large-scale product search, computer vision, natural language processing, and predictive modeling—translating cutting-edge research into systems used by millions. Known for his pragmatic approach, he focuses on building scalable, high-impact machine learning products that drive measurable business results.

Ricardo Tejedor Sanz is a Senior Taxonomist at Shopify with a distinctive background spanning legal experience, linguistics, and machine learning. With diverse analytical experience across international contexts and master’s degrees in English Literature and Audiovisual Translation, plus fluency in four languages, Ricardo brings exceptional rigor and customer-focused problem-solving to taxonomy challenges. He evolved from traditional manual taxonomy methods built on deep market research, competitive analysis, and semantic understanding, to pioneering AI-driven classification systems benefiting millions of merchants globally.

Talk Track: Augmenting Workforces with Agents

Technical Level: 2

Talk Abstract:
How do you maintain a product taxonomy spanning millions of items across every industry—from guitar picks to industrial sensors—when no human team could possibly possess expertise in all these domains? At Shopify, we faced this exact challenge and built an AI agentic system that transforms an impossible human task into a scalable, automated workflow.

In this talk, we reveal how we orchestrate multiple specialized AI agents to analyze, improve, and validate taxonomy changes at unprecedented scale.

You’ll discover:
– How parallel AI agents can augment human expertise across domains where deep knowledge is impossible to maintain
– The architecture patterns that enable agents to work together while maintaining quality and consistency
– Why LLM-as-judge systems are game-changers for scaling quality control
– Critical lessons learned from production deployment, including surprising failures and how we fixed them

We share real metrics showing how this approach transformed a years-long manual process into days of AI-augmented work, and provide actionable insights you can apply to your own “impossible” classification and curation challenges.
Whether you’re dealing with content moderation, data classification, or any task requiring expertise across vast domains, you’ll leave with concrete strategies for building AI agent systems that scale human judgment beyond traditional limitations.

What You’ll Learn:
1. Decompose “Impossible” Into Specialized Agents
Don’t build one AI to know everything. Build many agents that each know something, then orchestrate them.

2. LLM-as-Judge Unlocks Scale
Shifting from “humans review 100%” to “AI pre-screens, humans see 10%” is the game-changer. Key: Let AI fix minor issues, not just reject.

3. Production Lessons Are Brutal
– Prompt overload breaks reasoning
– Always build fallbacks for when services fail

4. Trust Through Transparency
Every AI decision needs reasoning, audit trails, and escalation paths. No black boxes.

5. The Meta-Lesson
Scale isn’t about replacing humans—it’s about amplifying the expertise you have across domains you couldn’t possibly cover.

Talk: From Zero to One: Building AI Agents From The Ground Up

Presenter:
Federico Bianchi, Senior ML Scientist, TogetherAI

About the Presenter:
Federico Bianchi is a Senior ML Scientist at TogetherAI, working on self-improving agents. He was a post-doc at Stanford University. His work has been published in major journals such as Nature and Nature Medicine and conferences such as ICLR, ICML and ACL.

Talk Track: Augmenting Workforces with Agents

Technical Level: 4

Talk Abstract:
What does it take to build a truly autonomous AI agent, from scratch and in the open? In this talk, I’ll share how we’ve developed agents capable of executing full analytical workflows, from raw data to insights. I’ll walk through key principles for designing robust, transparent agents that reason, reflect, and act in complex scientific domains. We’ll explore how architectural choices, tool use, and learning approaches—including reinforcement learning—can be combined to build agents that improve over time and generalize to new tasks.

What You’ll Learn:
Building agents is easy but requires some thinking about the context in which the agents are going to be embedded.

Talk: How Math-Driven Thinking Builds Smarter Agentic Systems

Presenter:
Claire Longo, Lead AI Researcher, Comet

About the Presenter:
Claire Longo is an AI leader and Mathematician with over a decade of experience in Data Science and AI. She has led cross-functional AI teams at Twilio, Opendoor, and Arize AI and is currently a Lead AI Researcher at Comet. She holds a Bachelor’s in Applied Mathematics and a Master’s in Statistics from The University of New Mexico. Beyond her technical work, Claire is a Speaker, Advisor, YouTuber, and Poker Player. She is dedicated to mentoring Engineers and Data Scientists while championing diversity and inclusion in AI. Her mission is to empower the next generation of AI practitioners.

Talk Track: Evolution of Agents

Technical Level: 3

Talk Abstract:
Everyone’s buzzing about LLMs, but too few are talking about the math that should guide how we apply them to real-world problems. Mathematics is the language of AI, and a foundational understanding of the math behind AI model architectures should drive decisions when we’re building AI systems.

In this talk, I will do a technical deep dive to demystify how different mathematical architectures in AI models can guide us on how and when to use each model type, and how this knowledge can help us design agent architectures and anticipate potential weaknesses in production so we can safeguard against them. I’ll break down what LLMs can do (and where they fall apart), clarify the elusive concept of “reasoning,” and introduce a benchmarking mindset rooted in math and modularity.

To put it all into context, I’ll share a real-world example of an Agentic use case from my own recent project: a poker coaching app that blends an LLM reasoning model as the interface with statistical models analyzing a player’s performance using historical data. This is a strong example of the future of hybrid agents, where LLMs and other mathematical algorithms work together, each solving the part of the problem it’s best suited for. It demonstrates the proper application of reasoning models grounded in their mathematical properties and shows how modular agent design allows each model to focus on the piece of the system it was built to handle.

I’ll also introduce a scientifically rigorous approach to benchmarking and comparing models, based on statistical hypothesis testing, so we can quantify and measure the impact of different models on our use cases as we evaluate and evolve agentic design patterns.

Whether you’re building RAG agents, real-time LLM apps, or reasoning pipelines, you’ll leave with a new lens for designing agents. You’ll no longer have to rely on trial and error or feel like you’re flying blind with a black-box algorithm. Foundational mathematical understanding will give you the intuition to anticipate how a model is likely to behave, reduce time to production, and increase system transparency.

What You’ll Learn:
It’s easier than you think to understand foundational mathematical concepts in AI, and use that knowledge to guide you build better AI systems

Talk: Insights and Epic Fails from 5 Years of Building ML Platforms

Presenter:
Eric Riddoch, Director of ML Platform, Pattern AI

About the Presenter:
Eric leads the ML Platform team at Pattern, the largest seller on Amazon.com besides Amazon themselves.

Talk Track: ML Collaboration in Large Organizations

Technical Level: 2

Talk Abstract:
Building an internal ML Platform is a good idea as your amount of data scientists, projects, or data increases. But the MLOps toolscape is overwhelming. How do you pick tools and set your strategy? How important is drift detection? Should I serve all my models as endpoints? How “engineering-oriented” should my data scientists be?

Join Eric on a tour 3 ML Platforms he has worked on to serve 14 million YouTubers and the largest 3P seller on Amazon. Eric will share specific architectures, honest takes from epic failures, things that turned out not to be important, and principles for building a platform with great adoption.

What You’ll Learn:
– Principles > tools. Ultimately all MLOps tools cover ~9 “jobs to be done”.
– “Drift monitoring” is overstated. Data quality issues account for most model failures.
– Offline inference exists and is great! Resist the temptation to use endpoints.
– Data lineage is underrated. Helps catch “target leakage” and upstream/downstream errors.
– Cloud GPUs from non-hyperscalers are getting cheaper. You may not need on-prem.
– DS can get away with “medium-sized” data tools for a long time.

Talk: From Schema Discovery to Kubernetes: Building an Autonomous Agent for Real-Time Apache Flink Apps with LangGraph

Presenter:
Purshotam Shah, Senior Princ Software Developer Engineer, Yahoo

About the Presenter:
Software Engineer on Yahoo’s Low Latency team, overseeing Apache Storm, ZooKeeper, and Flink deployments.

Talk Track: AI Agents for Model Validation and Deployments

Technical Level: 3

Talk Abstract:
In the era of generative AI, the focus of MLOps is shifting from managing models to managing autonomous agents that can write, test, and deploy their own code. This session presents a real-world case study on building a sophisticated AI agent that automates the entire lifecycle of a real-time Apache Flink application. The process is initiated from a single prompt where the user only needs to specify the location of their data schema in a registry like data.all. From there, the agent takes over, creating the code, discovering the schema, generating a corresponding serializer, building and pushing a Docker image via a Screwdriver pipeline, and finally deploying that image to create a production-ready Flink cluster on Kubernetes.

We will demonstrate how we used LangGraph to orchestrate a stateful workflow that intelligently scaffolds a Maven project, generates type-safe Java code based on centralized schemas, and writes its own unit tests. The core of our talk focuses on the self-healing CI/CD pipeline: when the initial build or tests fail, the agent analyzes the Maven error logs, identifies the root cause (be it a dependency issue in the pom.xml or a compilation error in the Java code), and autonomously performs repairs by prompting an LLM with the precise context needed for a fix.

Finally, we’ll cover the final mile of the pipeline, where the agent generates the necessary Dockerfile and screwdriver.yaml configurations to build a container, push it to a registry, and prepare it for deployment to a Kubernetes cluster. We’ll also touch on how we use specialized tracing tools to trace the agent’s complex decision-making process, providing critical observability into our autonomous development loop.

What You’ll Learn:
Architect Stateful AI Agents: Learn to design and build multi-step, autonomous agents using LangGraph to manage complex software engineering workflows, moving beyond simple, stateless API calls.

Automate Build & Test Cycles: Implement a CI/CD pipeline where an AI agent can autonomously diagnose and fix its own build failures, including dependency conflicts in pom.xml, Java compilation errors, and failing unit tests.

Enforce Enterprise-Grade Reliability: Discover techniques to ground LLM-generated code against sources of truth, such as a data.all schema registry and version-locked dependency rules, to ensure correctness and consistency.

Achieve Prompt-to-Prod Automation: Walk away with a complete methodology for creating a prompt-driven development lifecycle, from schema discovery and code generation to Dockerization and Kubernetes deployment.

Talk: Impact of AI on Developer Productivity

Presenter:
Yegor Denisov-Blanch, Researcher, Stanford University

About the Speaker:
I run the software engineering productivity research group at Stanford. For the past 3+ years, we’ve been working with hundreds of companies to analyze their private git repos to measure the productivity of their engineers. We have 120,000+ engineers in the dataset. Before Stanford, I looked after digital transformation projects at a F100 company with 6,000+ engineers. I found it paradoxical that software engineers are very data-driven, yet we had no good data-driven to make decisions about things that impacted software engineering productivity.

Talk Track: AI Agents for Developer Productivity

Talk Technical Level: 1/7

Talk Abstract:
Will be deep diving on the impact of AI on developer productivity, comparing numbers across languages, seniority levels, company types, types of work, reasoning vs non-reasoning LLMS. Can showcase best-practices for adoption at enterprise scale, and also situations where the initiatives didn’t yield the desired results

What You’ll Learn:
AI increases developer productivity, but not always and not in every setting – learn how & when to use AI agents for software engineering at scale

Talk: Code-Guided Agents for Legacy System Modernization

Presenter:
Calvin Smith, Senior Researcher Agent R&D, OpenHands

About the Speaker:
Calvin Smith is a software engineer and researcher who spent years developing formal methods for generating and understanding code at scale. He joined OpenHands to apply these techniques to real-world software engineering challenges. His current focus: building AI agents that leverage formal methods to modernize legacy codebases and pushing the boundaries of what autonomous agents can accomplish in software engineering.

Talk Track: AI Agents for Developer Productivity

Talk Technical Level: 2/7

Talk Abstract:
Legacy code modernization often fails because we try to boil the ocean. After early attempts at using autonomous agents for whole-codebase transformations resulted in chaos, we developed a novel approach: combine static dependency analysis with intelligent agents to break modernization into reviewable, incremental chunks. This talk explores how we use static-analysis tools to understand codebases, identify optimal modernization boundaries, and orchestrate multiple agents to collaboratively transform codebases to turn an impossible problem into a series of manageable PRs.

What You’ll Learn:
The solution space for AI-automated software engineering extends beyond “AI for code” or “code for AI”. It’s about creating feedback loops where static analysis, AI agents, and human expertise continuously inform and enhance each other.

Talk: Don't Page the Planet: Trust-Weighted Ops Decisions

Presenter:
Eric Reese, Senior Manager, Site Reliability Engineering, BestBuy

About the Presenter:
Eric Reese, Senior Manager of SRE at Best Buy, leads ML initiatives for incident operations. He specializes in trust-weighted decisions, spike detection algorithms, and the operational guardrails that make AI reliable in production. His focus: bridging the gap between ML predictions and safe automated actions.

Talk Track: AI Agents for Model Validation and Deployments

Technical Level: 3

Talk Abstract:
Enterprises don’t need more dashboards—they need deciders that act safely. This talk shows how we built an agentic validation layer that sits between ML predictions and operational responses. The system classifies incidents, applies exponentially-decaying trust scores with configurable half-life, adapts contextual thresholds (time-of-day baselines, scope, team diversity), routes gray-zone cases through smart policies, and posts idempotent state changes to chat systems. We’ll cover the trust accumulation algorithm, how derivative signals catch rising threats, multi-agent validation with cross-checking, and the observability needed to audit every decision. Attendees leave with a platform-agnostic pattern—Predict → Categorize → Weight → Accumulate → Decide → Notify—that turns noisy ML outputs into governed actions with built-in safety rails.

What You’ll Learn:
Core Message: ML predictions need a safety wrapper to become operational decisions—this talk provides that complete pattern.

Supporting learnings:
– A reusable last-mile pattern: turning any ML prediction into a safe operational action
– How to tune weighted-decay rates to catch real incidents without alert fatigue (with the actual math and knobs)
– Handling uncertainty: when the model isn’t sure, policy-based “gray-zone” routing takes over
– What observability means for AI ops: audit trails, structured rationales, rollback hooks
– Making it production-ready: idempotence, guaranteed delivery, and live config updates without downtime

Talk: Five Hard-Earned Lessons About Evals

Presenter:
Ankur Goyal, Founder & CEO, Braintrust

About the Presenter:
Ankur is the Founder and CEO of Braintrust, where he is building innovative solutions in the AI space. Previously, he served as Head of ML Platform at Figma after leading Impira as Founder and CEO through its successful acquisition by Figma.

Earlier in his career, Ankur was Vice President of Engineering at SingleStore, overseeing product architecture, engineering operations, and strategy. A graduate of Carnegie Mellon University with a degree in Computer Science, Ankur specializes in systems engineering, machine learning platforms, and product development.

Talk Track: AI Agents for Model Validation and Deployments

Technical Level: 3

Talk Abstract:
Are you catching critical issues in your LLM applications before they reach users? In this talk, I’ll share five hard‑earned lessons drawn from powering thousands of daily evals on Braintrust. You’ll discover how to engineer data pipelines and custom scorers that surface real failures, why optimizing your full eval loop outperforms prompt tweaks alone, and how Loop, our AI copilot, automates continuous improvement. I’ll share real examples like Notion’s 24‑hour model swaps and demonstrate practical steps to tighten your eval workflows. Join me to learn actionable strategies that ensure your LLM features ship reliably and confidently.

What You’ll Learn:
How do you know your AI feature works? Are bad responses reaching users? Can your team improve quality without guesswork? You need a scalable evals and observability platform to build and ship reliable AI agents.

Talk: Fake Data, Real Power: Crafting Synthetic Transactions for Bulletproof AI

Presenter:
Bhavana Sajja, Senior Machine Learning Engineer, Expedia Inc.

About the Speaker:
A Senior Machine Learning Engineer at Expedia Inc Company, I lead the end-to-end development and operationalization of AI/ML solutions across high-impact use cases such as fraud detection, supplier screening, and dynamic fraud listing. With a strong foundation in building, deploying, and monitoring production-grade models, I ensures that data pipelines, model performance, and governance frameworks align seamlessly with both business objectives and compliance requirements.

Known for a solutions-oriented mindset, I thrive on adopting emerging technologies to address real-world challenges. Currently, I am exploring agentic AI paradigms—such as agent-to-agent (A2A) protocols and model-context protocol (MCP) architectures—to enhance the reliability, adaptability, and explainability of fraud prevention systems. Our work focuses on crafting autonomous pipelines that can detect novel attack vectors in near real-time, prioritize high-risk cases, and continuously refine detection strategies through feedback loops.

Beyond day-to-day engineering, I actively contributes to cross-functional initiatives: mentoring junior engineers, sharing best practices at internal knowledge-shares, and evaluating new MLOps tools to accelerate model iteration cycles. With a passion for continuous learning, I participate in developer forums—bridging the gap between cutting-edge research and enterprise-scale deployments.

Talk Track: Data Engineering in an LLM era

Talk Technical Level: 2/7

Talk Abstract:
In today’s AI-driven world, organizations want to use their rich transaction records for insights and model building but worry about exposing sensitive customer details. This talk offers a clear, practical guide to creating high-quality synthetic transaction data—data that looks and behaves like real records but contains no actual customer information. We’ll focus on why good data quality is essential for any AI model: without realistic patterns and relationships, models trained on synthetic data simply won’t perform well.

We’ll first highlight the main hurdles in transaction tables: mixed data types (numbers, categories, dates), rare events (like fraud), and complex links between features. Then, we’ll introduce four proven generative approaches—GANs (Generative Adversarial Networks), TVAEs (Tabular Variational Autoencoders), TabularARGN (Tabular Autoregressive Generative Networks), and GPT-based methods—that address these challenges in different ways:

GANs learn to “fool” a critic network to produce realistic samples, which helps match complex data patterns.

TVAEs focus on understanding each column’s data type (text, number, category) to recreate accurate row-level details.

TabularARGN builds records step-by-step, preserving sequential and hierarchical relationships in the data.

GPT-based methods leverage transformer models (like those behind large language models) to capture broad patterns and generate new rows based on learned “templates.”

Through a simple case study on a public credit-card transactions dataset, we’ll walk through:

Preparing data (filling in missing values, encoding categories, handling outliers)

Choosing and training a model (why you might pick a GAN versus a TabularARGN or a TVAE)

Evaluating results with easy-to-understand checks—how closely synthetic data matches real data distributions and how well a fraud-detection model trained on synthetic data performs.

We’ll also discuss balancing privacy (keeping customer details safe) with usefulness (keeping important patterns, like rare fraud events). Finally, we’ll point to simple next steps: using synthetic data in healthcare records or IoT sensor logs, monitoring data quality automatically, and ensuring any privacy concerns are met. By the end of this session, even attendees new to generative AI will understand how to pick a method, build a high-quality synthetic dataset, and trust that their AI models can learn and perform effectively—boosting innovation without risking real customer data.

What You’ll Learn:
Using advanced techniques Understanding Why Synthetic Transactional Data Matters
Real-World Trade-Offs: Utility vs. Privacy

Talk: Smart Fine-Tuning of Video Foundation Models for Fast Deployments

Presenter:
Zachary Carrico, Senior Machine Learning Engineer, Apella

About the Presenter:
Zac is a Senior Machine Learning Engineer at Apella, specializing in machine learning products for improving surgical operations. He has a deep interest in healthcare applications of machine learning, and has worked on cancer and Alzheimer’s disease diagnostics. He has end-to-end experience developing ML systems: from early research to serving thousands of daily customers. Zac is an active member of the Data and ML community, having presented at conferences such as Ray Summit, TWIML AI, Data Day, and MLOps & GenAI World. He has also published eight journal articles. His passion lies in advancing ML and streamlining the deployment and monitoring of models, reducing complexity and time. Outside of work, Zac enjoys spending time with his family in Austin and traveling the world in search of the best surfing spots.

Talk Track: ML Training Lifecycle

Technical Level: 3

Talk Abstract:
As video foundation models become integral to applications in healthcare, security, retail, robotics, and consumer applications, MLOps teams face a new class of challenges: how to efficiently fine-tune these large models for domain-specific tasks without overcomplicating infrastructure, overloading compute resources, or degrading real-time performance.

This session presents tips for selecting and intelligently fine-tuning video foundation models at scale. Using a state-of-the-art vision foundation model, we’ll cover techniques for efficient data sampling, temporal-aware augmentation, adapter-based tuning, and scalable optimization strategies. Special focus will be given to handling long and sparse videos, deploying chunk-based inference, and integrating temporal fusion modules with minimal latency overhead. Attendees of this talk will come away with strategies for quickly deploying optimally fine-tuned foundation models.

What You’ll Learn:
Attendees will learn practical strategies for efficiently fine-tuning and deploying video foundation models at scale. They’ll take away techniques for data sampling, temporal-aware augmentation, adapter-based tuning, and scalable optimization—plus methods to handle long/sparse videos and deploy low-latency, chunk-based inference with temporal fusion.

Talk: Why is ML on Kubernetes Hard? Defining How ML and Software Diverge

Presenter:
Paul Yang, Member of Technical Staff, Runhouse

About the Presenter:
At Runhouse, Paul is helping to build, test, and deploy Kubetorch at leading AI labs and enterprises for RL, training, and inference use cases. Previously, he worked across a range of ML/DS and infra domain areas, from language model tuning and evaluations for contextually aware code generation to productizing causal ML / pseudo-causal inference.

Talk Track: ML Training Lifecycle

Technical Level: 2

Talk Abstract:
Mature organizations run ML workloads on Kubernetes, but implementations vary widely, and ML engineers rarely enjoy the streamlined development and deployment experiences that platform engineering teams provide for software engineers. Making small changes takes an hour to test and moving from research to production frequently takes multiple weeks – these unergonomic and inefficient processes are unthinkable for software, but standard in ML. To explain this, we first trace the history of ML platforms and how early attempts like Facebook’s FBLearner as “notebooks plus DAGs” led to incorrect reference implementations. Then we define the critical ways that ML diverges from software, such as inability to do local testing due to data size and acceleration needs (GPU), heterogeneity in distributed frameworks and their requirements (Ray, Spark, PyTorch, Tensorflow, Dask, etc.), non-trivial observability and logging. Finally, we propose a solution, Kubetorch, which bridges between an iterable and debuggable Pythonic API for ML Engineers and Kubernetes-first scalable execution.

What You’ll Learn:
ML, especially at sophisticated organizations, is done on Kubernetes. However, there are no definitive reference implementations and well-used projects to date for ML-on-Kubernetes like Kubeflow have had mixed reactions from the community. Kubetorch is an introduction of a novel compute platform that is Kubernetes-native that offers a great, iterable, and debuggable interface into powerful compute for developers, without introducing new pitfalls of brittle infrastructure or long deployment times. In short, Kubetorch is a recognition that ML teams are demanding better platform engineering (rather than “ML Ops” / DevOps) and the right abstraction over Kubernetes is necessary to achieve this.

Talk: Securing Models

Presenter:
Hudson Buzby, Solutions Architect, JFrog

About the Speaker:
Hudson Buzby is a solution engineer with a strong focus on MLOps and LLMOps, leveraging his expertise to help organizations optimize their machine learning operations and large language model deployments. His role involves providing technical solutions and guidance to enhance the efficiency and effectiveness of AI-driven projects.

Talk Track: Latest MLOps Trends

Talk Technical Level: 3/7

Talk Abstract:
Generative AI and machine learning models are reshaping industries but also introducing new security risks. Model marketplaces like HuggingFace or OLlama have become inundated with models that do not have trusted sources/authors and often contain vulnerabilities. Many organizations are struggling to formulate a strategy that safely allows their team to build and deploy open source LLM’s. This session explores the unique security challenges of ML systems in the GenAI era and provides actionable strategies to safeguard them. Learn why traditional approaches fall short and how to fortify your ML lifecycle to stay ahead in an evolving threat landscape.

What You’ll Learn:
It is essential for organizations to place guardrails around open source LLM development in a safe, scalable manner.

Talk: Building Multi-Cloud GenAI Platforms without The Pains

Presenter:
Romil Bhardwaj, Co-creator, SkyPilot

About the Presenter:
Romil Bhardwaj is the co-creator of SkyPilot, a widely adopted open-source project that enables running AI workloads seamlessly across multiple cloud platforms. He completed his Ph.D. in Computer Science at UC Berkeley’s RISE Lab, advised by Ion Stoica, focusing on large-scale systems and resource management for machine learning. Romil’s work, recognized with multiple patents, 1,100+ citations in top conferences, and awards such as the USENIX ATC 2024 Distinguished Artifact Award and ACM BuildSys 2017 Best Paper, builds on a strong foundation in both academia and industry. He was previously a contributor to the Ray project, and a Research Fellow at Microsoft Research, where he developed systems for machine learning and wireless networks, including award-winning projects and granted patents. He remains an active reviewer and speaker at leading systems and AI venues.

Talk Track: LLMs on Kubernetes

Technical Level: 2

Talk Abstract:
GenAI workloads are redefining how AI platforms are built. Teams can no longer rely on a single cloud to satisfy their GPU needs, infra costs are growing and productivity of ML engineers is paramount. Going multi-cloud secures GPU capacity, reduces costs and eliminates vendor lock-in, but introduces operational complexity that can slow down ML teams.

This talk is a hands-on guide to building a multi-cloud AI platform that unifies cloud VMs and Kubernetes clusters across Hyperscalers (AWS, GCP, and Azure), Neoclouds (Coreweave, Nebius, Lambda), and on-premise clusters into a single compute abstraction. We’ll walk through practical implementation details including workload scheduling strategies based on resource availability and cost, automated cloud selection for cost optimization, and handling cross-cloud data movement and dependency management. This approach lets ML engineers use the same interface for both interactive development sessions and large-scale distributed training jobs, enabling them to focus on building great AI products rather than wrestling with cloud complexity.

What You’ll Learn:
Multi-cloud solves GenAI’s capacity and cost challenges; the right abstraction layer makes it easy for infra teams and researchers alike.

Talk: Opening Pandora’s Box: Building Effective Multimodal Feedback Loops

Presenter:
Denise Kutnick, Co-Founder & CEO, Variata

About the Presenter:
Denise Kutnick is a technologist with over a decade of experience building multimodal systems and evaluation pipelines used by millions, with roles spanning large companies like Intel and high-growth startups like OctoAI (acquired by Nvidia). She is the Co-Founder and CEO of Variata, a company building AI that sees, thinks, and interacts like a user to run visual regression tests at scale and keep digital experiences reliable. Denise is passionate about tackling problems at the intersection of AI and UX.

Talk Track: Multimodal Systems in Production

Technical Level: 3

Talk Abstract:
AI market maps are overflowing with multimodal SDKs promising to blend vision, language, audio, and more into a seamless package. But when they fail in production, you may find yourself locked in without the visibility or tools to fix it.

In this talk, we’ll open the box and explore how to build and interpret multimodal feedback loops that keep complex AI systems healthy in production.

We’ll cover:
– Closed-box vs Open-box Workflows: How exposing intermediate signals in your agentic pipeline grants finer-grained control, faster debugging, and better calibration towards user needs.
– Defining the Right Evals: Why human-understandable checkpoints are essential for model introspection and human-in-the-loop review.
– Data Pipeline Building Blocks: Leveraging tooling such as declarative pipelines, computed columns, and batch execution to catch issues and surface improvements without slowing deployment.

What You’ll Learn:
Regardless of the model or SDKs you choose to build on top of, building the right scaffolding around it will open the box and give you control, visibility, and interpretability of your multimodal AI workflows.

Talk: Video Intelligence Is Going Agentic

Presenter:
James Le, Head of Developer Experience, TwelveLabs

About the Presenter:
James Le is currently leading Developer Experience at Twelve Labs – a startup building multimodal foundation models for video understanding. Previously, he has worked at the nexus of enterprise ML/AI and data infrastructure. He also hosted a podcast that features raw conversations with founders, investors, and operators in the space.

Talk Track: Multimodal Systems in Production

Technical Level: 4

Talk Abstract:
While 90% of the world’s data exists in video format, most AI systems treat video like static images or text—missing crucial temporal relationships and multimodal context. This talk explores the paradigm shift toward agentic video intelligence, where AI agents don’t just analyze video but actively reason about content, plan complex workflows, and execute sophisticated video operations.

Drawing from real-world implementations including MLSE’s 98% efficiency improvement in highlight creation (reducing 16-hour workflows to 9 minutes), this session demonstrates how video agents combine multimodal foundation models with agent architectures to solve previously intractable problems. We’ll explore the unique challenges of video agents—from handling high-dimensional temporal data to maintaining context across multi-step workflows—and showcase practical applications in media, entertainment, and enterprise video processing.

Attendees will learn how to architect video agent systems using planner-worker-reflector patterns, implement transparent agent reasoning, and design multimodal interfaces that bridge natural language interaction with visual media manipulation.”

What You’ll Learn:
1. Why traditional approaches fail: Understanding the fundamental limitations of applying text/image AI techniques to video, and why agentic approaches are necessary for complex video understanding.

2. Video agent architecture patterns: How to design and implement planner-worker-reflector architectures that can maintain context across complex multi-step video workflows.

3. Practical implementation strategies: Real-world approaches to building transparent agent reasoning, handling multimodal interfaces, and orchestrating video foundation models.

4. Business impact and ROI: Concrete examples of dramatic efficiency improvements and how to identify high-impact use cases in their own organizations

Talk: Future of AI in Healthcare

Presenter:
Denys Linkov, Head of ML, Wisedocs

About the Presenter:
Denys Linkov is currently Head of ML at Wisedocs and a ML Startup Advisor and LinkedIn Learning Course Instructor. He’s worked with 50+ enterprises in their conversational AI journey, and his Gen AI courses have helped 150,000+ learners build key skills. He’s worked across the AI product stack, being hands-on building key ML systems, managing product delivery teams, and working directly with customers on best practices.

Talk Track: Scoping and Delivering Complex AI Projects

Technical Level: 1

Talk Abstract:
What AI Products are working in production for health companies? In this panel we’ll cover past, present and future of AI in the healthcare industry. We’ll cover successes, trends and customer outcomes. This panel will focus on thought leadership in the domain covering

1. Build vs Buy
2. AI Governance
3. Improving patient outcomes
4. Success stories
5. Gaps in tooling
6. Requests for solutions

What You’ll Learn:
How different industry leaders (executives) are thinking about POC vs Production use cases in insurance

Talk: Humans in the Loop: Designing Trustworthy AI Through Embedded Research

Presenter:
David Baum, UX Researcher & Design Strategist, Amazon

About the Presenter:
David Baum is a design strategist and UX researcher with over a decade of experience shaping AI-powered products at the intersection of human behavior, ethical design, and emerging technology. Currently leading UX research for Amazon Ads’ Generative AI portfolio, David works across disciplines to translate ambiguity into actionable insight, ensuring that cutting-edge models serve real human needs.

His past work spans healthcare, behavioral science, and enterprise innovation guiding product teams at organizations like Johnson & Johnson, Memorial Sloan Kettering, Cigna, and the U.S. Department of Veterans Affairs. David is especially focused on how AI reshapes cognition, decision-making, and user trust, and frequently explores the implications of AI on systems-level design, human-AI collaboration, and collective wellbeing.

He’s a frequent panelist and contributor on topics ranging from ethical AI to strategic foresight, and is known for his ability to bridge deeply technical domains with accessible, human-centered narratives.

Talk Track: Scoping and Delivering Complex AI Projects

Technical Level: 2

Talk Abstract:
As generative AI rapidly moves from lab to product, many teams are rushing to ship capabilities without understanding the lived experiences, risks, and edge cases that define real-world usage. This talk explores how embedding user research earlier–and more meaningfully–into AI development pipelines can do more than just mitigate harm. It can enhance product adoption, build user trust, and surface invisible needs that AI alone won’t catch.

Drawing on experience leading UX research for Amazon Ads’ generative AI portfolio and past work in healthcare, behavioral science, and public systems, I’ll show how user insights can serve as functional guardrails – shaping model boundaries, UI design, and feedback loops. We’ll also interrogate the frictionless design ethos that dominates AI tooling today, and ask: what does it mean to design for thoughtfulness rather than speed?

Whether you’re building AI-native products or adapting legacy systems, this talk will offer frameworks and provocations for making AI more accountable, more human, and more useful.

What You’ll Learn:
UX research is not just a validation tool, it’s a critical input to AI product strategy and model governance.

Friction isn’t failure: thoughtful UX friction can support better outcomes, greater user agency, and higher trust in AI systems.

Embedding research into AI workflows helps detect misalignment early, before launch, reducing risk and surfacing ethical blind spots.

Cross-functional collaboration (PMs, designers, engineers, scientists) must center the human, not just the model.

Designing for trust means understanding how users think, not just how models predict.

Talk: LLM Inference: A Comparative Guide to Modern Open-Source Runtimes

Presenter:
Aleksandr Shirokov, Team Lead MLOps Engineer, Wildberries

About the Presenter:
My name is Aleksandr Shirokov, I am a T3 Fullstack AI Software Engineer with 5+ years of experience and Team Lead management competence. Currently, I am leading MLOps Team in world-famous marketplace Wildberries in the RecSys department, launching AI products, building ML infrastructure and tools for 300+ ML engineers. I and my team support the full ML lifecycle, from research to production, and work closely with real user-facing products, directly impacting business metrics. https://aptmess.io for more info

Talk Track: LLMs on Kubernetes

Technical Level: 3

Talk Abstract:
In this session, we’ll share how our team built and battle-tested a production-grade LLM serving platform using vLLM, Triton TensorRT-LLM, Text Generation Inference (TGI), and SGLang. We’ll walk through our custom benchmark setup, the trade-offs across frameworks, and when each one makes sense depending on model size, latency, and workload type. We’ll cover how we implemented HPA for vLLM, reduced cold start times with Tensorize, co-located multiple vLLM models in a single pod to save GPU memory, and added lightweight SAQ-based queue wrappers for fair and efficient request handling. To manage usage and visibility, we wrapped all endpoints with Kong, enabling per-user rate limits, token quotas, and usage observability. Finally, we’ll share which LLM and VLM models are running in production today (we are serving DeepSeek R1‑0528 in production), and how we maintain flexibility while keeping costs and complexity in check. If you’re exploring LLM deployment, struggling with infra choices, or planning to scale up usage, this talk will help you avoid common pitfalls, choose the right stack, and design a setup that truly fits your use case.

What You’ll Learn:
There’s no one-size-fits-all LLM serving stack – we’ve benchmarked, deployed, and optimized multiple runtimes in production, and we’ll share what works, when, and why, so you can build the right setup for your use case.

Prequisite Knowledge:
Base info about NLP transformers, Python and Docker

Talk: Architecting and Orchestrating AI Agents

Presenter:
Anish Shah, AI Engineer, Weights & Biases

About the Speaker:
Anish loves turning ML ideas into ML products. Anish started his career working with multiple Data Science teams within SAP, working with traditional ML, deep learning, and recommendation systems before landing at Weights & Biases. With the art of programming and a little bit of magic, Anish crafts ML projects to help better serve our customers, turning “oh nos” to “a-ha”s!

Talk Track: Agents in Production

Talk Technical Level: 3/7

Talk Abstract:
Attendees will learn the current state of agents with an emphasis on the problems faced in development with advice and tools to deal with these problems

What You’ll Learn:
This is for beginners to advanced participants

Talk: Architecting a Deep Research System

Presenter:
Suhas Pai, CTO & Co-Founder, Hudson Labs

About the Speaker:
Suhas Pai is a NLP researcher and co-founder/CTO at Hudson Labs a Toronto based startup. At Hudson Labs, he works on text ranking, representation learning, and productionizing LLMs. He is also currently writing a book on Designing Large Language Model Applications with O’Reilly Media. Suhas has been active in the ML community, being the Chair of the TMLS (Toronto Machine Learning Summit) conference since 2021 and also NLP lead at Aggregate Intellect (AISC). He was also co-lead of the Privacy working group at Big Science, as part of the BLOOM open-source LLM project.

Talk Track: Agents in Production

Talk Technical Level: 3/7

Talk Abstract:
In the past year, several pioneering AI labs have launched powerful ‘Deep Research’ features that search extensively across a large number of data sources and produce comprehensive reports in response to user queries. In this talk, we will discuss the anatomy of such a system, focusing on the tradeoffs involved in building such systems, and discuss promising architectures paradigms. We will also discuss the engineering and infrastructural considerations involved in building such systems.

What You’ll Learn:
1. Understand the potential of deep research systems and their components
2. Navigate through tradeoffs involved in building such systems
3. Learn architectural paradigms and best practices in building such systems

Talk: Gradio: The Web Framework for Humans and Machines

Presenter:
Freddy Boulton, Open Source Software Engineer, Hugging Face

About the Presenter:
Freddy Boulton, an Open Source Engineer at Hugging Face, brings six years of experience in developing tools that simplify AI sharing and usage. He’s a core maintainer of Gradio, an open-source Python package for building production-ready AI web applications. His latest work focuses on making Gradio applications MCP-compliant, enabling Python developers to create seamless, beautifully designed web interfaces for their AI models that integrate with any MCP client without additional configuration.

Talk Track: Augmenting Workforces with Agents

Technical Level: 2

Talk Abstract:
The Model Context Protocol (MCP) has ushered in a new paradigm, enabling applications to be accessible to AI agents. But shouldn’t these same applications be just as accessible and intuitive for humans? What if building a user-friendly interface for people could automatically create a powerful interface for machines too? This presentation introduces Gradio as The Web Framework for Humans and Machines. We’ll explore how Gradio allows developers to build performant and delightful web UIs for human users, while simultaneously, thanks to its automatic Model Context Protocol (MCP) integration, generating a fully compliant and feature-rich interface for AI agents.

Discover how Gradio simplifies the complexities of MCP, offering “”batteries-included”” functionality like robust file handling, real-time progress updates, and authentication, all with minimal additional effort. We’ll also highlight the Hugging Face Hub’s role as the world’s largest open-source MCP “”App Store,”” showcasing how Gradio-powered Spaces provide a vast ecosystem of readily available AI tools for LLMs. Join us to learn how Gradio uniquely positions you to develop unified AI applications that serve both human users and intelligent agents.

What You’ll Learn:
Developers can build performant feature rich UIs for AI models entirely in python with Gradio. These apps can be easily shared with human users as well as plugged into any MCP-compliant AI agent. Write once, deploy for truly every possible user.

Talk: The Rise of Self-Aware Data Lakehouses

Presenter:
Srishti Bhargava, Software Engineer, Amazon Web Services

About the Speaker:
I’m Srishti! I’m a software engineer at AWS where I work on data platforms, focusing on systems like Apache Iceberg and SageMaker Lakehouse. I help teams build analytics and machine learning solutions that actually work at scale – turning messy data into something useful.
I really care about making data engineering more approachable. A lot of modern data tools feel unnecessarily complex, so I write about the practical stuff, how to keep tables performing well, handle schema changes gracefully, and build systems that don’t break in production.
Outside of work, I love hiking and catching sunrises when I can. I also spend a lot of time cooking – it’s how I relax and unwind. There’s something satisfying about taking simple ingredients and making something good with them. Some of my best ideas actually come to me while I’m in the kitchen, just taking things slow and enjoying the process.

Talk Track: Data Engineering in an LLM era

Talk Technical Level: 2/7

Talk Abstract:
If you’re managing more than 50 tables and a handful of data models, you’ve probably felt the pain. Schema changes break production. Impact analysis takes hours. New engineers spend weeks figuring out what data exists and how it connects.
In this session, we’ll show you how to build an AI assistant that understands your data platform. Not just another chatbot, but a system that can analyze your schemas, parse dependencies, and predict exactly which models will break when you change a column.
We’ll demonstrate a working implementation that extracts metadata from Apache Iceberg tables, analyzes SQL dependencies, and creates an AI assistant that answers questions like – Which tables are burning through our storage budget?, What’s the blast radius if this critical system goes down?, Where is all our customer PII hiding across 500 tables? Which data pipelines haven’t been touched in months and might be zombie processes? Which tables in the data lakehouse can benefit from iceberg compaction? – analysis that would take days of detective work manually and complex queries. The result is a powerful, natural language interface for data discovery.
Attendees will see live examples of querying table schemas and identifying datasets using simple English prompts, leaving with a practical blueprint for leveraging LLMs to unlock the full potential of their data infrastructure in production settings.

What You’ll Learn:
1. Metadata problem is becoming worse not better, as organizations store large amounts of data across complex systems, it’s getting harder to derive insights from your data in a non-trivial manner.
2. LLMs can actually understand your data architecture.
3. Small and simple changes in how you structure your tables can be extremely beneficial for your organization.
4. This approach scales exponentially – manual approaches don’t. At 10 tables, spreadsheets or manual queries work fine, but when we’re dealing with the scale at organizations today, only an LLM powered approach can keep up with the complexity.
5. This approach can be integrated into existing systems today. We’ll show you how to extract metadata from real Apache Iceberg tables, analyse dependencies, create embeddings and build systems that work with your current data stack
6. Metadata contains way more business value than we realize. The schemas, dependencies and usage patterns tell stories about performance bottlenecks, governance gaps, and business impact that most of us are completely missing.

Talk: What’s Next in the Agent Stack

Presenter:
Shelby Heinecke, Senior AI Research Manager, Salesforce

About the Presenter:
Dr. Shelby Heinecke is a pioneering leader in AI, renowned for her transformative research, engineering excellence, and dynamic thought leadership. With over 35 influential AI research publications, she has made significant contributions to the field, driving innovation and shaping the future of AI.

Shelby is currently a Senior AI Research Manager at Salesforce, leading a team innovating in AI Agents (including multi-agent systems and large action models), On-Device AI, and Small Language Models, all aimed at revolutionizing Salesforce products. Her passion for fostering technical talent and cultivating collaborative environments empowers her team to achieve breakthrough advancements. Her team’s released contributions in agentic AI span large action models (xLAM models), AgentLite, multi-modal action model, TACO, and many research papers spanning agentic data generation and agent training.
Shelby holds a Ph.D. in Mathematics from the University of Illinois at Chicago, with a specialization in machine learning theory. She also earned an M.S. in Mathematics from Northwestern University and a B.S. in Mathematics from the Massachusetts Institute of Technology (MIT). To learn more about Shelby’s work and vision, visit www.shelbyh.ai.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
What does it take to go from promising prototype to production-ready AI agent?

In this talk, I’ll break down the emerging agent stack, including robust prompt generation with Promptomatix, protocol-level evaluation with MCPEval, multimodal reasoning with TACO, fast function-calling with xLAM, and more. Each layer targets a critical bottleneck in reliability, reasoning, or scale.

You’ll get a behind-the-scenes look at the research shaping these tools, and a blueprint for the next generation of enterprise-ready agents.

What You’ll Learn:
– Evals, latency, and prompt optimization are crucial to high performing agents
– Sharing links to open source repos/models to get started in these directions

Talk: Building Effective Agents

Presenter:
Sushant Mehta, Senior Research Engineer, Google DeepMind

About the Presenter:
Sushant is a senior research engineer at Google DeepMind, working on post-training to improve Coding capabilities in frontier Large Language Models.

Talk Track: Evolution of Agents

Technical Level: 3

Talk Abstract:
Large language models can now power capable software agents, yet real‑world success comes from disciplined engineering rather than flashy frameworks. Reliable agents are built from simple, composable patterns instead of heavy abstractions.

The talk will introduce several patterns that add complexity / autonomy only when it pays off:

1. Augmented LLM (retrieval, tools, memory) as the atomic building block
2. Workflow motifs: prompt chaining, routing, parallelization etc with concrete criteria and implementation tips
3. Autonomous agents that loop through plan‑act‑observe‑reflect cycles to tackle open‑ended tasks

What You’ll Learn:
Attendees will leave with a practical decision framework for escalating from a single prompt to a multi‑step agents, keeping in mind robust guardrails for shipping trustworthy, cost‑effective agents at scale.

Talk: From Hello to Repayment: Voice AI in African Finance

Presenter:
Remy Muhire, CEO, Pindo.ai

About the Speaker:
Remy Muhire is the Co-Founder and CEO of Pindo, a Voice AI startup helping banks and fintechs deliver services in local African languages. Previously, he led voice technology initiatives at Mozilla and co-founded the fintech startup Exuus. Passionate about digital inclusion, Remy is dedicated to breaking barriers of literacy and language so that underserved communities can access essential financial services.

Talk Track: Agents in Production

Talk Technical Level: 1/7

Talk Abstract:
In Africa, literacy and language barriers still limit access to financial services. This session will explore how Voice AI in local languages can transform loan applications and debt recovery—making credit more accessible while improving repayment rates. Drawing from early development work and upcoming pilots in East Africa, we’ll share insights on how banks, fintechs, and SACCOs can leverage conversational AI to engage customers more effectively, from the first “hello” to the final repayment.

What You’ll Learn:
Break barriers: How Voice AI bridges literacy and language gaps in African finance.

Reimagine credit journeys: From loan applications to debt recovery through conversational voice flows.

Unlock inclusion at scale: Early pilots in East Africa show the path to higher engagement and repayment.

Adversarial Threats Across the ML Lifecycle: A Red Team Perspective

Presenter:
Sanket Badhe, Senior Machine Learning Engineer, TikTok

About the Presenter:
Sanket Badhe is a seasoned Machine Learning Engineer with over 8 years of experience specializing in fraud and spam detection, offensive AI, large-scale ML systems, and LLM applications. He currently leads key ML initiatives at TikTok, driving the development of robust spam detection systems across the platform. Sanket holds a Master’s in Data Science from Rutgers University and a B.Tech from IIT Roorkee, with prior experience at Oracle, Red Hat, and Fuzzy Logix.

Talk Track: ML Lifecycle Security

Technical Level: 2

Talk Abstract:
As machine learning systems become deeply embedded in critical applications ranging from finance and healthcare to content moderation and national security their attack surface expands across the entire ML lifecycle. This talk presents a red team perspective on adversarial threats targeting each phase of the ML pipeline: from data poisoning during collection and labeling, to model theft and evasion in deployment, and manipulation of feedback loops post-launch. We explore real-world case studies and cutting-edge research, demonstrating how adversaries exploit blind spots in ML development and MLOps workflows. Attendees will gain a structured threat model, understand key attack vectors, and learn practical red teaming and hardening strategies to proactively secure ML systems.

What You’ll Learn:
1. ML systems are vulnerable at every stage of the lifecycle (data, training, deployment, feedback).
2. Adversarial threats vary by stage: data poisoning, model evasion, model extraction, prompt injection, and feedback manipulation.
3. Red teaming ML requires specialized tools and methods distinct from traditional security testing.
4. Data and feedback loops are high-risk, often-overlooked entry points for attackers.
5. Security must be proactive and continuous, not an afterthought post-deployment.
6. Monitoring, validation, and isolation mechanisms are essential across the pipeline.
7. Cross-functional collaboration between ML, security, and DevOps teams is critical.

Talk: Story is All You Need

Presenter:
Lin Liu, Director, Data Science, Wealthsimple

About the Presenter:
As Director of Data Science at Wealthsimple, Lin Liu architects AI/ML solutions that power the future of finance. His experience includes leading AI/ML consulting engagements for AWS clients at Amazon and creating flagship fraud and credit models for Capital One Canada. A patented inventor in credit scoring, Lin specializes in building scalable AI/ML solutions that bridge the gap between data science and tangible business value.

Talk Track: Scoping ML Projects in an AI Era

Technical Level: 2

Talk Abstract:
In the age of Generative AI, what if the most complex feature engineering could be replaced by simple storytelling? This talk introduces a novel paradigm for predictive analytics that challenges traditional modeling workflows. We demonstrate a powerful technique: translating raw, structured data—like transaction logs or application usage data—into coherent, text-based narratives, or “stories.”

We then feed these stories directly into Large Language Models (LLMs) and prompt them for a predictive score. This approach leverages the deep contextual understanding of LLMs to perform tasks that typically require bespoke models and intricate feature engineering.

We will explore real-world case studies, demonstrating how “stories” crafted from credit card transactions can accurately predict major life events. Similarly, we’ll show how narratives of a user’s app behavior can enable an LLM to detect subtle anomalies indicative of fraud, outperforming brittle, rule-based systems.

Join us to discover how transforming your data into stories can unlock a new frontier of predictive power and operational efficiency.

What You’ll Learn:
Attendees will leave with a practical framework for applying this “data-as-story” technique, understanding how it can radically simplify the MLOps pipeline and unlock the power of LLMs on classic predictive analytics problems.

Talk: The Efficiency Equation: Leveraging AI Agents to Augment Human Labelers in Building Trust and Safety Systems at Scale

Presenter:
Madhu Ramanathan, Principal Group Engineering Manager, Trust, Safety and Intelligence, Microsoft

About the Presenter:
Madhu Ramanathan is a seasoned engineering and applied science leader with over 13 years of experience building AI-powered systems at Microsoft, Meta and Amazon. She has led globally distributed teams in trust, safety, content intelligence, and search, delivering responsible AI solutions that impact millions of users worldwide. Passionate about trust, safety, and ethical innovation, she brings a practitioner’s lens to productionizing cutting-edge, trustworthy AI solutions to solve real-world problems at scale.

Talk Track: Augmenting workforces with Agents

Technical Level: 2

Talk Abstract:
In today’s digital ecosystem, Trust & Safety systems face mounting challenges—from content proliferation, real-time enforcement demands and cost savings pressure —complicated further by evolving threats like deepfakes, AI generated hallucination, misinformation, and adversarial behavior. Given this a defensive space with ever-evolving threats, human labeling has been crucial in this domain for measurement, data collection to train models, real time enforcements, reactive takedowns and appeals – but it comes with costs in the millions for scaled applications. This keynote explores how LLMs and AI agents are reshaping this landscape, augmenting the human labelers, offering scalable, cost-efficient, and high-quality solutions that optimizes across defect rates, precision and operation cost at unprecedented speeds.

In particular, the talk will cover the following –
– A brief introduction to Trust and Safety systems, metrics and emerging threats such as deepfakes, hallucination, misinformation in the evolving GenAI landscape
– The role of human labelers in the traditional Trust and Safety lifecycle across measurement, data collection, proactive enforcements and reactive takedowns, the challenges in having large scale human labeler dependencies and the costs involved
– Case study of how LLMs/Agents are used in each stage such as measurement, enforcement and reactive takedowns with real world examples and the impact on Defect rate, Precision, Cost at each stage.
– Deep dive on continuous evaluation and calibration techniques of the LLM/Agentic judges used in the measurement flow and enforcement flow using humans-in-the-loop and Auto tuners for prompt tuning.
– Challenges faced and solutions such as a) Pitfalls from using same Agent for measurement and enforcement and solution by using Agentic + HI flows for measurement b) Handling constant model migrations in product c) Cost and GPU constraints in deploying LLMs at scale and evolution into distilled SLM models using LLM based teacher models.
– Finally, summarizing learnings on how the recent AI evolution in the last couple of years has brought in new challenges to this space but also provided ability to solve those problems by smartly combining HI and AI.

What You’ll Learn:
Trust & Safety is entering a new era – where the rapid AI evolution has brought in mounting challenges such as deepfakes, hallucination, misinformation along with budget cuts and hyper agility demands. Fortunately, the AI evolution has also enabled powerful solutions to those problems —one where human judgment and AI intelligence must co-evolve. This talk will give the attendees a deep dive on real world hybrid systems built at scale that are not only scalable and cost-effective but also resilient, ethical, and continuously improving to defend against the evolving challenges.

Talk: A Practical Field Guide to Optimizing the Cost, Speed, and Accuracy of LLMs for Domain-Specific Agents

Presenter:
Niels Bantilan, Chief ML Engineer, Union.ai

About the Presenter:
Niels is the Chief Machine Learning Engineer at Union, a core maintainer of Flyte, an open source workflow orchestration tool, and creator of Pandera, a data validation and testing tool for dataframes. His mission is to help data science and machine learning practitioners be more productive. He has a Masters in Public Health Informatics, and prior to that a background in developmental biology and immunology. His research interests include reinforcement learning, NLP, ML in creative applications, and fairness, accountability, and transparency in automated systems.

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
As the dust settles from the initial boom of applications using hosted large language model (LLM) APIs, engineering teams are discovering that while LLMs get you to a working demo quickly, they often struggle in production with latency spikes, context limitations, and explosive compute costs. This session provides a practical roadmap for navigating not only the experiment-to-production gap using small language models (SLMs), but also the AI-native orchestration strategies that will get you the most bang for your buck.
We’ll explore how SLMs (models that range from hundreds of millions to a few billion parameters) offer a compelling alternative for domain-specific applications by trading off the generalization power of LLMs for significant gains in speed, cost-efficiency, and task-specific accuracy. Using the example of an agent that translates natural language into SQL database queries, this session will demonstrate when and how to deploy SLMs in production systems, how to progressively swap out LLMs for SLMs while maintaining quality, and which orchestration strategies help you customize and maintain SLMs in a cost-effective way.

Key topics include:
– Identifying key leverage points: Which LLM calls should you swap out for SLMs first? We’ll cover how to identify speed, cost, and accuracy leverage points in your AI system so that you can speed up inference, reduce cost, and maintain accuracy.
– Speed Optimization: It’s not just about the speed of inference, which SLMs already excel at, it’s also about accelerating experimentation when you fine-tune and retrain SLMs on a specific domain/task. We’ll cover parallelized optimization runs, intelligent caching strategies, and task fanout techniques for both prompt and hyperparameter optimization.
– Cost Management: Avoiding common pitfalls that negate SLMs’ cost advantages, including resource mismatching (GPU vs CPU workloads), infrastructure provisioning inefficiencies, and idle compute waste. Attendees will learn resource-aware orchestration patterns that scale to zero and recover gracefully from failures.
– Accuracy Enhancement: Maximizing domain-specific performance by implementing the equivalent of “AI unit tests” and incorporating it into your experimentation and deployment pipelines. We’ll cover how this can be done with synthetic datasets, LLM judges, and deterministic evaluation functions that help you catch regressions early and often.

What You’ll Learn:
Attendees will leave with actionable strategies for cost-effective AI deployment, a decision framework for SLM adoption, and orchestration patterns that compound the value of smaller models in domain-specific applications.

Talk: Your Infrastructure Just Got Smarter: AI Agents in the DevOps Loop

Presenter:
Kishan Rao, Engineering Manager, Delivery and Automation Platform, Okta

About the Speaker:
I’m an Engineering Manager with a background in backend systems, platform engineering, and infrastructure automation, currently focused on how AI agents can reshape developer workflows. With over 8 years of experience building CI/CD pipelines, internal platforms, and scalable infrastructure at cloud-first companies, I’ve seen firsthand how operational complexity can slow down engineering teams.

My recent work explores the intersection of DevOps and AI agents—designing tools that intelligently interpret infrastructure-as-code, reduce toil, and guide developers through their environments with contextual awareness. I’m passionate about building agentic systems that augment developer cognition, shorten feedback loops, and turn codebases into living documentation.

I care deeply about developer velocity, system reliability, and creating engineering environments where teams can move quickly without sacrificing quality. Based in San Francisco, I’m excited to contribute to the conversation on how autonomous AI tooling is changing the way we build, ship, and maintain software

Talk Track: AI Agents for Developer Productivity

Talk Technical Level: 2/7

Talk Abstract:
Modern infrastructure is rich, dynamic, and deeply complex. Yet most engineering teams still rely on manual processes and tribal knowledge to navigate it. In this talk, I explore how AI agents are transforming DevOps by becoming part of the loop. They’re not just automating tasks but interpreting infrastructure-as-code, understanding system context, and guiding developers through their environments.

We’ll examine how AI-native workflows are emerging across the DevOps lifecycle, from documentation generation and config reasoning to incident triage and deployment planning. I’ll share implementation patterns from my experience in platform engineering and backend systems, including how to design agents that interact with code, tools, and people with minimal friction.

You’ll leave with a practical understanding of how to embed AI agents into your stack, the trade-offs of using local versus cloud LLMs, and how this shift can change the speed, clarity, and confidence with which your teams ship code.

What You’ll Learn:
AI at scale in production does not need to be hard. But you do need to spend time deeply thinking about the outcome you are trying to achieve with AI in the loop

Talk: I Tried Everything: A Pragmatist's Guide to Building Knowledge Graphs from Unstructured Data

Presenter:
Alessandro Pireno, Founder, Stealth Company

About the Speaker:
Alessandro Pireno is an AI and Data Product leader with a 15-year track record of scaling innovative data infrastructure companies. His career is distinguished by a unique 360-degree perspective gained from leading Engineering, Product, and Sales Engineering teams at hyper-growth startups like Snowflake and SurrealDB. He played a pivotal role in building the technical GTM engine that established Snowflake’s early enterprise dominance and more recently architected the product and GTM strategy for SurrealDB’s AI and vector capabilities. His open-source work includes proofs-of-concept for Retrieval-Augmented Generation with Knowledge Graphs (surrealdb-rag) and techniques for graph extraction (graph-examples). Currently, he is building a new stealth project to automate knowledge graph generation using an agentic framework that leverages diverse techniques from NLP to in-database search.

Talk Track: Data Engineering in an LLM era

Talk Technical Level: 3/7

Talk Abstract:
Traditional ETL pipelines are breaking under the demands of LLMs. They excel at structured data, but fail when confronted with the unstructured documents and implicit relationships that give AI its context. To solve this, we must evolve from ETL to “KG-ETL”—pipelines that build knowledge graphs as a first-class output. This session is a pragmatic guide to three competing pipeline architectures for building KGs from raw data. We’ll explore using LLM prompts as a new ‘T’ in your pipeline, contrast it with traditional NLP pipelines, and deep-dive into a novel hybrid retrieval workflow that uses vector stores for something beyond semantic search: high-precision entity resolution. You’ll leave with a framework for choosing the right pipeline for your data, moving beyond simple RAG to build truly context-rich AI systems.

What You’ll Learn:
Design and contrast three distinct data pipeline architectures for knowledge graph construction: LLM-prompt-based, traditional NLP-based, and a hybrid vector search-based model.

Evaluate the cost, latency, scalability, and observability trade-offs of each pipeline pattern, helping you select the right approach for your MLOps environment.

Learn a novel, operational technique for using vector stores beyond semantic search—by training a custom fasttext model on cleansed names to create embeddings for high-precision, scalable entity resolution.

Receive a decision framework for selecting the right KG-ETL pipeline based on your source data’s structure (unstructured, semi-structured, or structured) and your project’s specific requirements.

Talk: Productizing Generative AI at Google Scale: Lessons on Scoping and Delivering Ai Powered Editors

Presenter:
Kelvin Ma, Staff Software Engineer, Google Photos

About the Presenter:
Kelvin Ma is a Staff Software Engineer and a Technical Lead for the Creative Expressions team at Google Photos. As a founding engineer of the team responsible for all machine learning and editing features, he helps build and scale the tools that allow hundreds of millions of users to relive their most important memories. He is passionate about designing and building foundational infrastructure for on-device machine learning, solving complex technical challenges to create simple and intuitive products that operate at the intersection of technology and human connection.

Talk Track: Scoping and Delivering Complex AI Projects

Technical Level: 2

Talk Abstract:
Go behind the scenes of Google Photos’ Magic Editor, a premier example of productizing cutting-edge generative AI into a billion-user application. This talk will demystify the scoping and delivery of complex AI-powered editors, detailing the engineering feats required to integrate multimodal AI models that blend on-device and server-side processing for global scale. Attendees will gain actionable insights and hard-won lessons on navigating the practical challenges of AI product development, from initial concept to successful deployment and scaling across hundreds of millions of users and diverse device types

What You’ll Learn:
Shipping AI powered software products requires greater understanding of engineer, product, UX, and other concerns across multiple disciplines/roles. There is no longer 1 correct answer but rather a series of trade offs and balances to rein in the capabilities of LLMs to provide value to users.

Talk: Shipping AI That Works

Presenter:
Nicholas Luzio, AI Solutions Lead, Arize AI

About the Speaker:
Nick Luzio is an AI Solutions Lead at Arize AI.

Talk Track: LLM Observability

Talk Abstract:
Observability and evaluation are critical to knowing whether an agent is working—and why. In this talk, we’ll examine the challenges of building agents that operate reliably in practice. We’ll explore approaches for evaluating and refining agents during development, as well as monitoring and debugging them once deployed—sharing practical lessons and tools that help teams accelerate their work while maintaining trust in their systems.

What You’ll Learn:
How to ensure agents work reliably at scale with observability and evaluation, and how Arize can help.

Talk: Beyond the Vibe: Eval Driven Development

Presenter:
Robert Shelton, Applied AI Engineer, Redis

About the Speaker:
Robert is a builder with a background in data science and full stack engineering. As an Applied AI Engineer at Redis, he focuses on bridging the gap between AI research and real-world applications. In open source, he helps maintain the Redis Vector Library and contributes to integrations with LangChain, LlamaIndex, and LangGraph. He has delivered workshops and consulting engagements for multiple Fortune 50 companies and has spoken at conferences including PyData and CodeMash.

Talk Track: Scoping and Delivering Complex AI Projectsy

Talk Abstract:
AI systems are probabilistic, which makes “what’s better?” a deceptively hard question. Teams often chase silver bullets—new models, chunking tricks, retrieval hacks—without knowing what’s really moving the needle. The result: endless guessing, little confidence. Enter eval-driven development: a way to ground experimentation in metrics, define success up front, and turn every guess into a measurable signal. This talk shows how shifting from vibes to evals transforms the way we build with AI.

What You’ll Learn:
How to think about probabilistic system design and evaluation.

Title: SLMs + Fine-Tuning: Building the Infrastructure for Multi-Agent Systems

Presenter:
Mariam Jabara, Senior Field Engineer, Arcee AI

About the Presenter:
Mariam has been in the AI space for the last 5 years, in both academic and professional capacities. Currently, she works as a Senior Field Engineer for Arcee AI, the pioneers of Small Language Models (SLMs) who are now offering SLM-powered agentic AI solutions. She has previous experience in AI engineering, sales, and research at companies such as Google Research and Deloitte. Her values are rooted in building community, diversity and inclusion, and using AI responsibly to solve problems contribute to the betterment of society.

Talk Track: Agents in Production

Talk Abstract:
Enterprises are discovering the limits of massive general-purpose LLMs: high costs, heavy infrastructure, and security risks when sensitive data leaves controlled environments. Small Language Models (SLMs) offer a practical alternative.

In this talk, I’ll share lessons from building and fine-tuning SLMs, including our release of a new small foundation model. I’ll show how SLMs enable domain-specific performance, stronger security through local deployment, and why they often outperform larger models in multi-agent workflows with lower latency and higher reliability.

Attendees will leave with a clear view of why SLMs are the best candidates to power multi-agent systems, balancing performance, cost, and trustworthiness for real-world MLOps.

What You’ll Learn:
Small Language Models are the best way to power multi-agent systems because they deliver domain-specific performance, stronger security, and greater efficiency than large general-purpose LLMs.

Talk: The Real Problem building Agentic applications (And How MLOps Solves It)

Presenter:
Alexej Penner, Founding Engineer, ZenML

About the Speaker:
As a founding engineer at ZenML, Alexej is at the forefront of solving today’s MLOps challenges. His journey began in the trenches of ML, building everything from object detection models for edge devices to complex forecasting systems. After leading AI product development at the data labeling company Datagym, he saw the critical need for better MLOps tooling and joined ZenML. There, he now drives core product development, guides its direction, and works hands-on with users to bring their ML projects to life.

Talk Abstract:
For years, we’ve honed the MLOps playbook to turn fragile ML models into reliable, production systems. We learned that success depends on principles like modularity, reproducibility, and lineage. Now, with the rise of LLMs, we’re facing a new wave of brilliant but chaotic prototypes. The core question is: do we throw away our playbook, or do we evolve it?

This session argues for evolution. We’ll demonstrate how the hard-won principles of MLOps provide the perfect foundation for the emerging world of LLM Ops. We’ll take a simple LLM prototype and, in a live demo, transform it into a structured ZenML pipeline. Then, we’ll showcase the next step in the MLOps journey: serving the entire pipeline as a live, interactive API endpoint. We will explore this endpoint directly from the ZenML dashboard, showing how to inspect it, run sample invocations, and get the full traceability of a classic MLOps pipeline for every single interactive call. This is the roadmap for what your AI platform could be.

Talk: Agentic Metaflow in Action

Presenter:
Ville Tuulos, Co-Founder, CEO, Outerbounds

About the Speaker:
Ville Tuulos is the co-founder and CEO of Outerbounds, a platform that empowers enterprises to build production-ready, standout AI systems. He has been building infrastructure for machine learning and AI for over two decades. Ville began his career as an AI researcher in academia, authored Effective Data Science Infrastructure, and has held leadership roles at several companies—including Netflix, where he led the team that created Metaflow, a widely adopted open-source framework for end-to-end ML and AI systems.

Talk Abstract:
We will show a live demo of the new agentic features of Metaflow!

Talk: A Simple Recipe for LLM Observability

Presenter:
Claire Longo, Lead AI Researcher, Comet

About the Presenter:
Claire Longo is an AI leader and Mathematician with over a decade of experience in Data Science and AI. She has led cross-functional AI teams at Twilio, Opendoor, and Arize AI and is currently a Lead AI Researcher at Comet. She holds a Bachelor’s in Applied Mathematics and a Master’s in Statistics from The University of New Mexico. Beyond her technical work, Claire is a Speaker, Advisor, YouTuber, and Poker Player. She is dedicated to mentoring Engineers and Data Scientists while championing diversity and inclusion in AI. Her mission is to empower the next generation of AI practitioners.

Talk Abstract:
Developing LLM-based applications for production requires a new approach to monitoring. Unlike traditional software, these probabilistic systems can hallucinate, drift, or degrade in unpredictable ways. The best way to learn AI concepts is by tinkering hands-on. So I built an LLM-powered recipe generator to help with my home cooking , and set up an end-to-end monitoring strategy to keep it on budget and behaving as expected. In this talk, I’ll walk through how I configured traces in this project using Comet’s open-source tool Opik to track cost and quality. I’ll also show how I built custom business metrics with LLM-as-a-Judge to capture issues specific to the recipe generator, and I’ll show the same approach can be adapted to your own use case when out-of-the-box metrics fall short. With a few adaptable code snippets and a simple framework, you’ll leave knowing how to add robust observability to your own LLM apps, making it easier to detect, debug, and improve systems at scale.

Talk: What gets AI Agents to Production

Presenter:
Chris Matteson, Head of Sales Engineering, Union.ai

About the Speaker:
Chris Matteson is Head of Sales Engineering at Union AI, where he helps customers tackle their toughest machine-learning infrastructure challenges by bringing together a passion for AI and DevOps with deep startup experience.
A seasoned startup leader and technical problem-solver, Chris has spent more than a decade at companies including Puppet, HashiCorp, Prisma, and Fermyon. He’s worn hats from Founder/CEO to Solutions Engineering, Sales, and Consulting, writing early feature code that evolved into core enterprise offerings and architecting scalable processes for open-source–to-enterprise transitions.

Talk Abstract:
A widely cited MIT study recently found that 95% of AI projects fail to move the needle on the P&L. The promise of AI is clear, but how do we bridge the gap between prototypes and production-ready AI Agents in 2025?
This talk introduces a practical framework for aligning business needs with the right mix of technologies and architectures to make agentic projects succeed. We’ll blitz through trade-offs across quality, speed, and cost, and highlight a process for mapping the true limits of possibility when combining today’s most powerful AI tools.

Talk: Techniques to build high quality agents faster with MLflow

Presenter:
Danny Chiao, Engineering Lead, Databricks

About the Speaker:
Danny Chiao is an engineering lead at Databricks, leading efforts around data observability (data quality, data classification) and agent quality. Previously, Danny led efforts at Tecton (+ Feast, an open source feature store) and Google to build ML infrastructure and high scale ML powered features. Danny holds a Bachelor’s Degree in Computer Science from MIT.

Talk Abstract:
One of the top challenges in building an agent is ensuring high quality outputs. Today, this involves labeling and analyzing traces by hand and iterating on the agent code. In this talk, you’ll learn how to use MLflow to accelerate this process and quickly build a high quality agent, leveraging techniques used by leading companies to deploy agents in production.

Talk: AI Catalog by JFrog - Control access to Open Source LLM's

Presenter:
Hudson Buzby, Solutions Architect, JFrog

About the Speaker:
Hudson Buzby is a solution engineer with a strong focus on MLOps and LLMOps, leveraging his expertise to help organizations optimize their machine learning operations and large language model deployments. His role involves providing technical solutions and guidance to enhance the efficiency and effectiveness of AI-driven projects.

Talk Abstract:
AI Catalog is a new product by the JFrogML team at Jfrog that allows you to create and enforce dynamic rules and policies around the open source models that your developers and data scientists are permitted to access and deploy. AI Catalog provides a platform to discover, govern, and deploy open source models safely at scale while staying compliant with your organization’s legal and governance policies.

Talk: Building Feedback-Driven Agentic Workflows

Presenter:
Nicholas Luzio, AI Solutions Lead, Arize AI

About the Speaker:
Nick Luzio is an AI Solutions Lead at Arize AI.

Talk Abstract:
A live demo of how to trace agent decisions, implement evaluations, and create closed-loop workflows for continuously improving agent performance using real-world data.

Talk: Unified Control Plane for Enterprise GenAI: Powered by Agentic Deployment Platform with Central AI Gateway & MCP Integration

Presenter:
Nikunj Bajaj, CEO, TrueFoundry

About the Speaker:
Nikunj is the co-founder and CEO of TrueFoundry, a platform helping enterprises build, deploy and ship LLM applications in a fast, scalable, cost efficient way with right governance controls within their own cloud. Prior to this role, he served as a Tech Lead for Conversational AI at Meta, where he spearheaded the development of proactive virtual assistants. His team also put Meta’s first deep learning model on-device. Nikunj also led the Machine Learning team at Reflektion, where he built an AI platform to enhance search and recommendations for over 600 million users across numerous eCommerce websites. He has done his bachelors in Electrical Engineering from IIT Kharagpur and masters in Computer Science from UC Berkeley.

As a visionary leader in the enterprise AI space, Nikunj is a sought-after speaker at premier technology conferences and summits, sharing his expertise on production AI deployment and enterprise ML strategies. His speaking portfolio includes keynotes and expert panels at MLOps Community events hosted with 50 speakers discussing LLMs in production alongside industry leaders from Stripe, Meta, Canva, Databricks, Anthropic, and Cohere, GenAI Summit San Francisco 2024 which attracted over 30,000 attendees and 200+ industry leaders at the historic Palace of Fine Arts, LLM Avalanche (part of Data+AI Summit by Databricks) technical meetups featuring 20 world experts and attracting 1,000 attendees in San Francisco, Global Big Data Conference where he was featured as a speaker at the Global Artificial Intelligence Conference, and MLOps World connecting over 15,000 members exploring best practices for ML/AI in production environments.

Talk Abstract:
As generative AI evolves from experimental tools to mission-critical enterprise applications, organizations face unprecedented operational complexity. Modern AI systems now orchestrate multiple models, invoke diverse tools, and span hybrid infrastructures, creating challenges around inconsistent APIs, model outages, unpredictable latency, complex rate limiting, and mounting governance requirements. Without centralized control, enterprises struggle with vendor lock-in, compliance gaps, runaway costs, and fragmented observability across their distributed AI ecosystems.

This session introduces the AI Gateway pattern—a critical architectural component that serves as the central control plane for enterprise AI systems. We’ll explore practical solutions including unified API abstraction, intelligent failover mechanisms, semantic caching, centralized guardrails, and granular cost controls. You’ll learn technical architecture patterns for building high-availability gateways that handle thousands of concurrent requests with sub-millisecond decision-making, plus emerging integration patterns like Model Context Protocol (MCP) for managing entire tool ecosystems.

Whether you’re an architect, platform engineer, or technical leader, you’ll gain actionable insights, architectural blueprints, and a practical framework for implementing scalable AI infrastructure that grows with your organization’s AI maturity.

Talk: MLOps for Agents: Bringing the Outer Loop to Autonomous AI

Presenter:
Hamza Tahir, Co-Founder, ZenML

About the Speaker:
Hamza Tahir is a software developer turned ML engineer, with a passion for turning ideas into real, data-driven products. An indie hacker at heart, he has built projects like PicHance, Scrilys, BudgetML, and you-tldr. After deploying ML in production for predictive maintenance use-cases in his previous startup, he co-created ZenML, an open-source MLOps framework. Today, ZenML is evolving into the foundation for agentic AI systems—helping teams build, orchestrate, and scale autonomous ML pipelines and AI agents on any infrastructure stack.

Talk Abstract:
Most of today’s excitement around AI agents focuses on prompts, tools, and clever behaviors—the inner loop of development. But just like with machine learning models, real-world adoption demands more than prototypes. Without reproducibility, monitoring, evaluation, and continuous improvement, agents remain demos, not production systems.

In this talk, I’ll argue that we need to bring MLOps principles into agent development. By applying the outer loop—data collection, training, benchmarking, deployment, and feedback—we can move from one-off agents to robust, scalable, and trustworthy AI systems. Drawing on my experience co-creating ZenML, I’ll show how the lessons learned from operationalizing ML pipelines apply directly to this new era of agentic AI, and what infrastructure patterns teams can adopt today to close the gap between experimentation and production.

Talk: Memory and Memory Accessories: Building an Agent from Scratch

Presenter:
Robert Shelton, Applied AI Engineer, Redis

About the Speaker:
Robert is a builder with a background in data science and full stack engineering. As an Applied AI Engineer at Redis, he focuses on bridging the gap between AI research and real-world applications. In open source, he helps maintain the Redis Vector Library and contributes to integrations with LangChain, LlamaIndex, and LangGraph. He has delivered workshops and consulting engagements for multiple Fortune 50 companies and has spoken at conferences including PyData and CodeMash.

Talk Abstract:
AI agents don’t have to be black boxes. In this live demo, we’ll show how to create a production-ready agent fully deployed on AWS from scratch, without bulky frameworks — just FastAPI, OpenAI, Redis, and Docket for async task orchestration. By the end, we’ll have an agent capable of multi-turn conversations that draw from both short- and long-term memory, showing how memory powers real-time reasoning, context retention, retrieval, and custom tool calls such as web search with Tavily.

Talk: Live Demo - World's First ‍Data Agentic AI With Business Logic Intelligence

Presenter:
Aish Agarwal, CEO, Connecty AI

About the Speaker:
Aish Agarwal is the CEO and co-founder of Connecty AI, the world’s first data agentic AI platform with built-in business logic intelligence. He brings 15+ years of executive experience in customer data science, having led two $600M+ SaaS exits and held leadership roles at FL Studio, MAGIX, Rakuten, and Rocket Internet.

Talk Abstract:
Explore how Connecty AI, the world’s first data agentic AI with business logic intelligence, delivers chat-based data analytics powered by deep reasoning and an autonomous semantic graph. See how data and business teams can finally get consistent, reliable answers to their most critical questions in seconds.