All sessions and workshops curated by leading AI/ML practitioners

Agents in Production

(Click on Speaker's headshot to See Abstract)

Ville Tuulos

Co-Founder, CEO, Outerbounds

Metaflow: The Baseplate for Agentic Systems

Paco Nathan

Principal DevRel Engineer, Senzing

Doxing Dark Money: Entity Resolution to Empower Downstream Ai Applications in Anti-Fraud

Kyle Corbitt

Co-Founder & CEO, OpenPipe

How to Train Your Agent: Building Reliable Agents with RL

Aishwarya Naresh Reganti

Founder, LevelUp Labs, Ex-AWS

Why CI/CD Fails for AI, and How CC/CD Fixes It

Kiriti Badam

Member of Technical Staff, OpenAI

Why CI/CD Fails for AI, and How CC/CD Fixes It

Pablo Salvador Lopez

Principal AI Application Development Architect – AI Solution Engineering Global Black Belt Team, Microsoft

From Static IVRs to Agentic Voice AI: Building Real-Time Intelligent Conversations

Hannes Hapke

Principal Machine Learning Engineer, Digits

The Hard Truth About AI Agents: Lessons Learned from Running Agents in Production

Linus Lee

EIR & Advisor, AI, Thrive Capital

Agents as Ordinary Software: Principled Engineering for Scale

Tony Kipkemboi

Head of Developer Relations, CrewAI

Building Conversational AI Agents with Thread-Level Eval Metrics

Claire Longo

Lead AI Researcher, Comet

Building Conversational AI Agents with Thread-Level Eval Metrics

Ravi Chandu Ummadisetti

Generative AI Architect, Toyota

Agentic AI in Manufacturing

Stephen Ellis

Generative AI Technical Product Owner, Toyota

Agentic AI in Manufacturing

Vaibhav Gupta

CEO, Boundary

Context Engineering: Practical Techniques for Improving Agent Quality Today

Dr. Hemant Joshi

CTO, FloTorch

How to Build and Evaluate Agentic AI Workflows with FloTorch

Philipp Krenn

Head of Developer Relations , Elastic

Hope is Not a Strategy: Retrieval Patterns for MCP

Rajiv Shah

Chief Evangelist, Contextual AI

From Vectors to Agents: Managing RAG in an Agentic World

Akshay Mittal

Staff Software Engineer, PayPal

Agent Name Service (ANS) in Action – A DNS-like Trust Layer for Secure, Scalable AI-Agent Deployments on Kubernetes

Anish Shah

AI Engineer, Weights & Biases

Building and Evaluating Agents

Irena Grabovitch-Zuyev

Staff Applied Scientist, PagerDuty

Testing AI Agents: A Practical Framework for Reliability and Performance

Himani Rallapalli

Senior Applied Scientist, Microsoft

Judging the Agents: Building Reliable LLM Evaluators with Scalable Metrics and Prompt Optimization

Kumaran Ponnambalam

Principal AI Engineer, Cisco

Agent Drift: Understanding and Managing AI Agent Performance Degradation in Production

Anish Shah

AI Engineer, Weights & Biases

Architecting and Orchestrating AI Agents

Suhas Pai

CTO & Co-Founder, Hudson Labs

Architecting a Deep Research System

Shelby Heinecke

Senior AI Research Manager, Salesforce

What’s Next in the Agent Stack

Jamieson Leibovitch

Senior Software Engineer, Uber

Uber's Multi-Agent SDK

Sushant Mehta

Senior Research Engineer, Google DeepMind

Building Effective Agents

Remy Muhire

CEO, Pindo.ai

From Hello to Repayment: Voice AI in African Finance

Pratik Verma

Founder & CEO, Okahu AI

Build Eeliable AI Apps with Observability, Validations and Evaluations

Niels Bantilan

Chief ML Engineer, Union.ai

A Practical Field Guide to Optimizing the Cost, Speed, and Accuracy of LLMs for Domain-Specific Agents

AI Agents for Developer Productivity

Yegor Denisov-Blanch

Researcher, Stanford University

Impact of AI on Developer Productivity

Calvin Smith

Senior Researcher Agent R&D, OpenHands

Code-Guided Agents for Legacy System Modernization

Dr. Greg Loughnane

Co-Founder & CEO, AI Makerspace

Vibe-Coding Your First LLM End-to-End Application

Chris "The Wiz" Alexiuk

Co-Founder & CTO, AI Makerspace

Vibe-Coding Your First LLM End-to-End Application

Kishan Rao

Engineering Manager, Delivery and Automation Platform, Okta

Your Infrastructure Just Got Smarter: AI Agents in the DevOps Loop

AI Agents for Model Validation and Deployments

Eric Reese

Senior Manager, Site Reliability Engineering, BestBuy

Don't Page the Planet: Trust-Weighted Ops Decisions

Augmenting Workforces with Agents

Vaibhav Page

Principal Engineer, Blackrock

Context is King: Scaling Beyond Prompt Engineering at BlackRock

Infant Vasanth

Senior Director of Engineering, Blackrock

Context is King: Scaling Beyond Prompt Engineering at BlackRock

Kshetrajna Raghavan

Principal Machine Learning Engineer, Shopify

Where Experts Can't Scale: Orchestrating AI Agents to Structure the World's Product Knowledge

Ricardo Tejedor Sanz

Senior Taxonomist, Shopify

Where Experts Can't Scale: Orchestrating AI Agents to Structure the World's Product Knowledge

Federico Bianchi

Senior ML Scientist, TogetherAI

From Zero to One: Building AI Agents From The Ground Up

Freddy Boulton

Open Source Software Engineer, Hugging Face

Gradio: The Web Framework for Humans and Machines

Madhu Ramanathan

Senior Engineering Leader,Trust and Safety, Meta

The Efficiency Equation: Leveraging AI Agents to Augment Human Labelers in Building Trust and Safety Systems at Scale

Data Engineering in an LLM era

Bhavana Sajja

Senior Machine Learning Engineer, Expedia Inc

Fake Data, Real Power: Crafting Synthetic Transactions for Bulletproof AI

Vaibhav Misra

Director - Distinguished Engineer, CapitalOne

RAG architecture at CapitalOne

Srishti Bhargava

Software Engineer, Amazon Web Services

The Rise of Self-Aware Data Lakehouses

Alessandro Pireno

Founder, Stealth Company

I Tried Everything: A Pragmatist's Guide to Building Knowledge Graphs from Unstructured Data

Evolution of Agents

Claire Longo

Lead AI Researcher, Comet

How Math-Driven Thinking Builds Smarter Agentic Systems

Governance, Auditability & Model Risk Management

Lanre Ogunkunle

Senior AI Engineer, PLEYVERSE AI

MCML: A Universal Schema for AI Traceability and Lifecycle Governance

Alex Olaniyan

Project Manager, PLEYVERSE AI

MCML: A Universal Schema for AI Traceability and Lifecycle Governance

Latest MLOps Trends

Hudson Buzby

Solutions Architect, JFrog

Securing Models

LLMs on Kubernetes

Romil Bhardwaj

Co-Creator, SkyPilot

Building Multi-Cloud GenAI Platforms without The Pains

Aleksandr Shirokov

Team Lead MLOps Engineer, Wildberries

LLM Inference: A Comparative Guide to Modern Open-Source Runtimes

LLM Observability

Vaibhavi Gangwar

CEO, Co-Founder, Maxim AI

Observability Panel

ML Collaboration in Large Organizations

Eric Riddoch

Director of ML Platform, Pattern AI

Insights and Epic Fails from 5 Years of Building ML Platforms

ML Lifecycle Security

Sanket Badhe

Senior Machine Learning Engineer, TikTok

Adversarial Threats Across the ML Lifecycle: A Red Team Perspective

ML Training Lifecycle

Zachary Carrico

Senior Machine Learning Engineer, Apella

Smart Fine-Tuning of Video Foundation Models for Fast Deployments

Donny Greenberg

Co-Founder / CEO, Runhouse

Why is ML on Kubernetes Hard? Defining How ML and Software Diverge

Paul Yang

Member of Technical Staff, Runhouse

Why is ML on Kubernetes Hard? Defining How ML and Software Diverge

Micaela Kaplan

ML Evangelist, HumanSignal

From Benchmarks to Reality: Embedding HITL in Your MLOps Stack

Claudia Penaloza

Data Scientist, Continental Tires

Multilingual ML in Action: Building & Deploying Continental RnD’s First Predictive ML Model

Multimodal Systems in Production

Denise Kutnick

Co-Founder & CEO, Variata

Opening Pandora’s Box: Building Effective Multimodal Feedback Loops

James Le

Head of Developer Experience, TwelveLabs

Video Intelligence Is Going Agentic

Dmitry Petrov

Co-Founder & CEO, DataChain

Query Inside the File: AI Engineering for Audio, Video, and Sensor Data

Scoping and Delivering Complex AI Projects

David Baum

UX Researcher & Design Strategist, Amazon

Humans in the Loop: Designing Trustworthy AI Through Embedded Research

Kelvin Ma

Staff Software Engineer, Google Photos

Productizing Generative AI at Google Scale: Lessons on Scoping and Delivering Ai Powered Editors

Scoping ML Projects in an AI Era

Lin Liu

Director, Data Science, Wealthsimple

Story is All You Need

Lightning Talks

Nicholas Luzio

AI Solutions Lead, Arize AI

Shipping AI That Works

Robert Shelton

Applied AI Engineer, Redis

Beyond the Vibe: Eval Driven Development

Josh Goldstien

Solutions Architect, Weaviate

Purpose Built Data Agents - This is the Way

Micaela Kaplan

ML Evangelist, HumanSignal

Why GenAI Still Needs Humans in the Loop

Mariam Jabara

Senior Field Engineer, Arcee AI

SLMs + Fine-Tuning: Building the Infrastructure for Multi-Agent Systems

Yoni Michael

Co-Founder, typedef

DataFrames for the LLM Era: Turning Inference into a First-Class Transform with Fenic

Devdas Gupta

Senior Manager Software Development and Engineering Lead, Charles Schwab

AI-Powered Development Productivity in Finance

Dippu Kumar Singh

Leader For Emerging Data & Analytics, Fujitsu North America Inc.

Explainable AI at Fujitsu North America Inc.

Naveen Reddy Kasturi

Staff Machine Learning Engineer, Realtor.com

Agent-Powered Code Migration at Realtor.com

Nitin Kumar

Director Data Science, Marriott International

A Modular Framework for Building Agentic Workforces at Marriot International

Ravi Shankar

Manager, Data Science, Dick's Sporting Goods

Streamlining ML collaboration at Dick's Sporting Goods

Prasanth Nandanuru

Managing Director, Wells Fargo

ROI of Gen AI Frontier Models vs Traditional Models in Finance Industry

Balaji Varadarajan

Lead AI Engineer - Digital Personalization, Target Enterprise

Building Sustainable GenAI Systems at Target

Speakers Corner

Alexej Penner

Founding Engineer, ZenML

The Real Problem building Agentic applications (And How MLOps Solves It)

Ville Tuulos

CEO, Co-Founder, Outerbounds

Agentic Metaflow in Action

Vincent Koc

Lead AI Researcher & Developer Relations, Comet

A Simple Recipe for LLM Observability

Claire Longo

Lead AI Researcher, Comet

A Simple Recipe for LLM Observability

Chris Matteson

Head of Sales Engineering, Union.ai

What gets AI Agents to Production

Danny Chiao

Engineering Lead, Databricks

Techniques to build high quality agents faster with MLflow

Hudson Buzby

Solutions Architect, JFrog

AI Catalog by JFrog - Control access to Open Source LLM's

Nicholas Luzio

AI Solutions Lead, Arize AI

Building Feedback-Driven Agentic Workflows

Nikunj Bajaj

CEO, TrueFoundry

Unified Control Plane for Enterprise GenAI: Powered by Agentic Deployment Platform with Central AI Gateway & MCP Integration

The Next Wave of AI

Hamza Tahir

Co-Founder, ZenML

MLOps for Agents: Bringing the Outer Loop to Autonomous AI

Robert Shelton

Applied AI Engineer, Redis

Memory and Memory Accessories: Building an Agent from Scratch

Qingyun Wu

CEO, AG2

Build the Next Generation of Agent Workforce with AG2

Vishakha Gupta-Cledat

Co-Founder / CEO, ApertureData

What Does A Foundational Data Layer For The AI Era Look Like?

Aish Agarwal

CEO, Connecty AI

Live Demo - World's First ‍Data Agentic AI With Business Logic Intelligence

more coming soon

Agenda

This agenda is still subject to changes.

Join free virtual sessions October 6–7, then meet us in Austin for in-person case studies, workshops, and expo October 8–9

Talk: Metaflow: The Baseplate for Agentic Systems

Presenter:
Ville Tuulos, Co-Founder, CEO, Outerbounds

About the Speaker:
Ville Tuulos is the co-founder and CEO of Outerbounds, a platform that empowers enterprises to build production-ready, standout AI systems. He has been building infrastructure for machine learning and AI for over two decades. Ville began his career as an AI researcher in academia, authored Effective Data Science Infrastructure, and has held leadership roles at several companies—including Netflix, where he led the team that created Metaflow, a widely adopted open-source framework for end-to-end ML and AI systems.

Talk Track: Agents in Production

Talk Technical Level: 2/7

Talk Abstract:
Agent frameworks like LangChain or OpenAI’s Agent SDK make it easy to prototype agents, but they must be deployed in a production-grade environment that provides resilience, memory, and a runtime environment, with robust access to services and tools via MCP. The newly released open-source Metaflow 2.18 delivers such a baseplate for agentic systems, building on the battle-tested and versatile infrastructure Metaflow has refined over years. Paired with your favorite agent framework, Metaflow offers a complete stack for agents – and the tools they depend on – ready for serious production use cases.

This talk introduces Metaflow’s new agentic features and demonstrates a practical example you can easily adapt to your own use cases.

What You’ll Learn:
Understanding the full stack required by production-grade agents, and how one can leverage open-source Metaflow to deliver it

Talk: Doxing Dark Money: Entity Resolution to Empower Downstream Ai Applications in Anti-Fraud

Presenter:
Paco Nathan, Principal DevRel Engineer, Senzing

About the Presenter:
Paco Nathan leads DevRel for the Entity Resolved Knowledge Graph practice area at Senzing.com and is a computer scientist with +40 years of tech industry experience and core expertise in data science, natural language, graph technologies, and cloud computing. He’s the author of numerous books, videos, and tutorials about these topics. He also hosts the monthly “Graph Power Hour!” webinar.

Paco advises Kurve.ai, EmergentMethods.ai, and is lead committer for the `pytextrank` and `kglab` open source projects. Formerly: Director of Learning Group at O’Reilly Media; and Director of Community Evangelism at Databricks.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
The Dark Web: an estimated $3T USD flows annually through shell companies leveraging tax havens worldwide — serving as the _perpetua mobilia_ for oligarchs, funding illegal weapons transfers, cyber attacks at global scale, human trafficking, anti-democracy campaigns, even illegal fishing fleets. The tendrils of kleptocracy extend throughout our political and economic system.

People who hunt bad guys” — investigative journalists, OSINT, regulators, gov agencies, law enforcement, FinCrime investigation units, etc. — leverage both graph analytics and downstream AI apps to contend with the overwhelming data volumes. Our team provides core technology — entity resolution — used in this work, and other public sector such as the major of voter registration in the US. Most of our use cases run in air-gapped environments, based on large-scale distributed infrastructure, streaming data from multiple sources. In these production use cases, even with several billion graph elements, decisions to “merge” or “disambiguate” known entities can be propagated with milliseconds of a new record arriving.

Among those who perform this kind of confidential work, few are permitted to speak at tech conferences. However, we can use open source, open models, and open data to illustrate these kinds of applications. We’ll show how technology gets used to stick the moves of the world’s worst organized crime rings, how to fight against oligarchs who use complex networks to hide their grift. On the flip side, similar approaches can be leveraged to find your best customers within a graph.

This talk explores known cases, the fraud tradecraft employed, open data sources, and how technology gets leveraged. There are multiple areas were multimodal agentic workflows (e.g., based on BAML) play important roles, both for handling unstructured data sources and for actions taken based on inference. Moreover, we’ll look at where data professionals are very much needed, where you can get involved.

What You’ll Learn:
How a combination of graph technologies and downstream AI applications gets leveraged for fighting FinCrime and transnational corruption in general.

Talk: How to Train Your Agent: Building Reliable Agents with RL

Presenter:
Kyle Corbitt, Co-Founder & CEO, OpenPipe

About the Presenter:
Kyle Corbitt is the co-founder and CEO of OpenPipe, the RL post-training company. OpenPipe has trained thousands of customer models for both enterprises and tech-forward startups.

Before founding OpenPipe, Kyle led the Startup School team at Y Combinator, which was responsible for the product and content that YC produces for early-stage companies. Prior to that he worked as an engineer at Google and studied ML at school.

Talk Track: Augmenting Workforces with Agents

Technical Level: 4

Talk Abstract:
Have you ever launched an awesome agentic demo, only to realize no amount of prompting will make it reliable enough to deploy in production? Agent reliability is a famously difficult problem to solve!

In this talk we’ll learn how to use GRPO to help your agent learn from its successes and failures and improve over time. We’ve seen dramatic results with this technique, such as an email assistant agent whose success rate jumped from 74% to 94% after replacing o4-mini with an open source model optimized using GRPO.

We’ll share case studies as well as practical lessons learned around the types of problems this works well for and the unexpected pitfalls to avoid.

What You’ll Learn:
I’ve frankly been shocked by how well RL works on real-world agentic use cases, and I’m very excited to share lessons learned with the audience. We’re working with DoorDash as well as several smaller customers on deploying these agents to prod and seeing universally strong results. This won’t be an OpenPipe pitch session; I’ll cover all the open-source tooling we use to make these models work.

Talk: Why CI/CD Fails for AI, and How CC/CD Fixes I

Presenters:
Aishwarya Naresh Reganti, Founder, LevelUp Labs, Ex-AWS | Kiriti Badam, Member of Technical Staff, OpenAI

About the Presenters:
Aishwarya Naresh Reganti is the founder of LevelUp Labs, an AI services and consulting firm that helps organizations design, build, and scale AI systems that actually work in the real world. She has led engagements with multiple Fortune 500 companies and fast-growing startups, helping them move beyond demos to production-grade AI.

Before founding LevelUp Labs, she served as a tech lead at the AWS Generative AI Innovation Center, where she led and implemented AI solutions for a wide range of AWS clients. Her work spanned industries such as ISVs, banking, healthcare, e-commerce, and legal tech, with publicly referenced engagements including Bayer, NFL, Zillow, Kayak, and Imply (creators of Apache Druid).

Aishwarya holds a Master’s in Computer Science from Carnegie Mellon University (MCDS) and has authored 35+ papers in top-tier conferences including NeurIPS, ACL, CVPR, AAAI, and EACL. Her research background includes work on graph neural networks, multilingual NLP, multimodal summarization, and human-centric AI. She has mentored graduate students, served as a reviewer for major AI conferences, and collaborated with research teams at Microsoft Research, NTU Singapore, University of Michigan, and more.

Today, Aishwarya teaches top-rated applied AI courses, advises executive teams on AI strategy, and speaks at global conferences including TEDx, ReWork, and MLOps World. Her insights reach over 100,000 professionals on LinkedIn.

Kiriti Badam is a member of the technical staff at OpenAI, with over a decade of experience designing high-impact enterprise AI systems. He specializes in AI-centric infrastructure, with deep expertise in large-scale compute, data engineering, and storage systems. Prior to OpenAI, Kiriti was a founding engineer at Kumo.ai, a Forbes AI 50 startup, where he led the development of infrastructure that enabled training hundreds of models daily—driving significant ARR growth for enterprise clients. Kiriti brings a rare blend of startup agility and enterprise-scale depth, having worked at companies like Google, Samsung, Databricks, and Kumo.ai.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
AI products break the assumptions traditional software is built on. They’re non-deterministic, hard to debug, and come with a tradeoff no one tells you about: every time you give an AI system more autonomy, you lose a bit of control.

This talk introduces the Continuous Calibration / Continuous Development (CC/CD) framework, designed for building AI systems that behave unpredictably and operate with increasing levels of agency. Based on 50+ real-world deployments, CC/CD helps teams start with low-agency, high-control setups, then scale safely as the system earns trust.

What You’ll Learn:
You’ll learn how to scope capabilities, design meaningful evals, monitor behavior, and increase autonomy intentionally, so your AI product doesn’t collapse under real-world complexity.

Talk: From Static IVRs to Agentic Voice AI: Building Real-Time Intelligent Conversations

Presenter:
Pablo Salvador Lopez, Principal AI Application Development Architect – AI Solution Engineering Global Black Belt Team, Microsoft

About the Presenter:
As an AI Solution Engineering Leader in Microsoft’s Global Black Belt team—the elite force driving AI and cloud application innovation within the Azure ecosystem—I design and deliver transformative Generative AI solutions for some of the world’s most complex and highly regulated industries. My work bridges deep technical skills with mission-critical execution—specializing in Retrieval-Augmented Generation (RAG), agentic AI systems, and scalable multi-agent orchestration using Azure AI, OpenAI, and frameworks like Semantic Kernel or any custom stacks.

I design intelligence from the ground up—combining LLMs and custom orchestration frameworks to create real-time, memory-aware agents that reason, act, and collaborate.

My foundation spans full-stack data science, ML engineering, and software architecture. I’ve led real-time and batch AI deployments for Fortune 500 enterprises, with expertise across MLOps/LLMOps and high-throughput inference—anchored in cloud platforms like Azure, GCP, and AWS.

Where others see ambiguity, I see momentum. I’m known for turning raw ideas into production-grade systems—either by building from first principles or rethinking the “rules” when innovation demands it. My mission is to build systems that matter—empowering teams to do their best work, and leaving every product, platform, pattern and person stronger than I found them.

Beyond industry, I’m committed to education and community. As an Adjunct Instructor in Northwestern University’s MSAI program, I teach hands-on courses in Cloud AI, GenAI, RAG, and multi-agent systems. I mentor startups, serve on advisory boards, and contribute to open-source AI—sharing ideas that move the field forward.

Explore my blog | Check out my GitHub

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
Developers today face the challenge of transforming outdated IVRs and traditional voice systems into intelligent, responsive interactions. This session dives into the concept of agentic voice AI—systems capable of real-time reasoning, decision-making, and dynamic action execution. We’ll explore how to architect modular voice applications using Azure, orchestrate multiple autonomous agents for specialized tasks, and leverage real-time AI inference to produce fluid, human-like conversations. Attendees will learn practical strategies to design agentic voice interactions, enabling their systems to autonomously plan, act, and dynamically adapt to user contexts and needs.

What You’ll Learn:
Attendees will leave equipped with a clear understanding of agentic architecture in real-time voice applications, including practical techniques for orchestrating multiple specialized agents, integrating dynamic reasoning, leveraging memory, and optimizing speech latency. They will be empowered to move beyond static IVRs towards fully autonomous, intelligent voice experiences.

Talk: The Hard Truth About AI Agents: Lessons Learned from Running Agents in Production

Presenter:
Hannes Hapke, Principal Machine Learning Engineer, Digits

About the Speaker:
Hannes Hapke is a principal machine learning engineer at Digits, where he has spent years building production AI systems that accountants and business owners actually use daily.

Before Digits, he solved ML infrastructure problems across healthcare, retail, and renewable energy – industries where failure isn’t an option. At SAP Concur, he learned that impressive prototypes and production systems are entirely different beasts.

Hannes co-authored numerous machine learning books, including “Building Machine Learning Pipelines” and “Machine Learning Production Systems” (O’Reilly), and his upcoming “GenAI Design Patterns” book addresses the gap between AI hype and reality. As a Google Developer Expert for Machine Learning, he’s committed to sharing the hard truths about production ML.

Talk Track: Agents in Production

Talk Technical Level: 2/7

Talk Abstract:
Every conference showcases impressive agent demos. What they don’t show you are the 3 AM pages when agents go rogue, the customer support tickets when AI makes expensive mistakes, or the months of debugging why your “95% accurate” prototype becomes 60% reliable in production.

This talk cuts through the agent hype with unfiltered lessons from Digits’ journey deploying customer-facing agents that handle real financial data. Hannes will share the architectural decisions that actually matter (hint: it’s not the framework you choose), the monitoring approaches that catch problems before customers do, and the failure modes that no one warns you about.

You’ll learn why agent evaluation in development predicts almost nothing about production performance, how to build guardrails that don’t cripple functionality, and why the hardest problems aren’t technical – they’re about managing expectations and building trust.

This presentation is a field guide to the messy reality of production agents, complete with practical design patterns for Hannes’ newest O’Reilly publication “Generative AI Design Patterns” (together with Dr. Valliappa Lakshmanan), and the kind of lessons learned you only get from keeping systems running when money is involved.

What You’ll Learn:
– Production Reality Check: Why impressive demos fail spectacularly in production and how to bridge that gap

– Architecture for Reliability: The infrastructure patterns that actually matter for agent systems at scale

– Architecture for Observability: The specific ways to monitor agents in production

Talk: Agents as Ordinary Software: Principled Engineering for Scale

Presenter:
Linus Lee, EIR & Advisor, AI, Thrive Capital

About the Presenter:
Linus Lee is an EIR and advisor at Thrive Capital, where he focuses on AI as part of the product and engineering team and supports portfolio companies on adopting and deploying frontier AI capabilities. He previously pursued independent HCI and machine learning research before joining Notion as an early member of the AI team.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
Thrive Capital’s in-house research engine Puck executes thousands of research and automation tasks weekly, surfacing current events, drafting memos, and triggering workflows unassisted. This allows Puck to power the wide ecosystem of software tools and automations supporting the Thrive team. A single Puck run may traverse millions of tokens across hundreds of documents and LLM calls, and run for 30 minutes before returning multi-page reports or taking actions. With fewer than 10 engineers, we sustain this scale and complexity by embracing four values — composability, observability, statelessness, and changeability — in our orchestration library Polymer. We’ll share patterns that let us quickly add data sources or tools without regressions, enjoy deep observability to root cause every issue in minutes, and evolve the system smoothly as new model capabilities come online. We’ll end by discussing a few future capabilities we hope to unlock next, like RL, durable execution across hours or days, and scaling via parallel search.

What You’ll Learn:
Concretely, attendees will (1) learn design patterns like composition, adapters, and stateless effects that let us write more robust LLM systems faster and more confidently, and (2) see concrete code examples that illustrate these principles in action in a production system. Our goal is not to sell the audience on the library itself, but rather to advocate for the design patterns behind it.

More broadly, in such a rapidly evolving landscape it can feel tempting to trade off classic engineering principles like composability in favor of following frontier capabilities, subscribing to frameworks that obscure implementation detail or lock you into shortsighted abstractions. This talk will explore how we can have both rigor and frontier velocity with the right foundation.

Talk: Building Conversational AI Agents with Thread-Level Eval Metrics

Presenters:
Claire Longo, Lead AI Researcher, Comet | Tony Kipkemboi, Head of Developer Relations, CrewAI

About the Presenters:
Tony Kipkemboi leads Developer Advocacy at CrewAI, where he helps organizations adopt AI agents to drive efficiency and strategic decision-making. With a background spanning developer relations, technical storytelling, and ecosystem growth, Tony specializes in making complex AI concepts accessible to both technical and business audiences.

He is an active voice in the AI agent community, hosting workshops, podcasts, and tutorials that explore how multi-agent orchestration can reshape the way teams build, evaluate, and deploy AI systems. Tony’s work bridges product experimentation with real-world application; empowering developers, startups, and enterprises to harness AI agents for measurable impact.

At MLops World, Tony brings his experience building and scaling with CrewAI to demonstrate how agent orchestration, when paired with rigorous evaluation, accelerates the path from prototype to production.

Claire Longo is an AI leader and Mathematician with over a decade of experience in Data Science and AI. She has led cross-functional AI teams at Twilio, Opendoor, and Arize AI and is currently a Lead AI Researcher at Comet. She holds a Bachelor’s in Applied Mathematics and a Master’s in Statistics from The University of New Mexico. Beyond her technical work, Claire is a Speaker, Advisor, YouTuber, and Poker Player. She is dedicated to mentoring Engineers and Data Scientists while championing diversity and inclusion in AI. Her mission is to empower the next generation of AI practitioners.

Talk Track: Agents in Production

Technical Level: 4

Talk Abstract:
Building modern conversational AI Agents means dealing with dynamic, multi-step LLM reasoning processes and tool calling that cannot always be predicted or debugged at the trace level alone. During the conversation, we need to understand if the AI accomplishes the user’s goal while staying aligned with intent and delivering a smooth interaction. To truly measure quality, we need to trace and evaluate entire conversation sessions.

In this talk, we introduce a practical workflow for designing, orchestrating, and evaluating conversational AI Agents by combining CrewAI as the Agent development framework with Comet Opik for custom eval metrics.

On the CrewAI side, we’ll showcase how developers can define multi-agent workflows, specialized roles, and task orchestration that mirror real-world business processes. We’ll demonstrate how CrewAI simplifies experimentation with different agent designs and tool integrations, making it easier to move from prototypes to production-ready agents.

On the Opik side, we’ll go over how to capture expert human-in-the-loop feedback and build thread-level evaluation metrics. We’ll show how to log traces, annotate sessions with expert insights, and design LLM-as-a-Judge metrics that mimic human reasoning; turning domain expertise into a repeatable feedback loop.

Together, this workflow combines agentic orchestration + rigorous evaluation, giving developers deep observability, actionable insights, and a clear path to systematically improving conversational AI in real-world applications.

What You’ll Learn:
You can’t reliably build conversational AI agents without treating orchestration and evaluation as two halves of the same workflow; CrewAI structures the agent, Comet Opik ensures you can measure and improve it.

Talk: Agentic AI in Manufacturing

Presenters:
Ravi Chandu Ummadisetti, Generative AI Architect, Toyota | Stephen Ellis, Generative AI Technical Product Owner, Toyota

About the Presenters:
Ravi Chandu Ummadisetti, Generative AI Architect, Toyota

Ravi Chandu Bio (Generative AI Architect): Ravi Chandu Ummadisetti is a distinguished Generative AI Architect with over a decade of experience, known for his pivotal role in advancing AI initiatives at Toyota Motor North America. His expertise in AI/ML methodologies has driven significant improvements across Toyota’s operations, including a 75% reduction in production downtime and the development of secure, AI-powered applications. Ravi’s work at Toyota, spanning manufacturing optimization, legal automation, and corporate AI solutions, showcases his ability to deliver impactful, data-driven strategies that enhance efficiency and drive innovation. His technical proficiency and leadership have earned him recognition as a key contributor to Toyota’s AI success.

Stephen Ellis Bio (Technical Generative AI Product Manager): 10 years of experience in research strategy and the application of emerging technologies for companies as small as startups to Fortune 50 Enterprises. Former Director of the North Texas Blockchain Alliance where leading the cultivation of the Blockchain and Cryptocurrency competencies among software developers, C-level executives, and private investment advisors. Formerly the CTO of Plymouth Artificial Intelligence which was researching and developing future applications of AI. In this capacity advised companies on building platforms that seek to leverage emerging technologies for new business cases. Currently Technical Product Manager at Toyota Motors North America focused on enabling generative AI solutions for various group across the enterprise to drive transformation in developing new mobility solutions and enterprise operations.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
This talk explores the transformative impact of Generative AI in an agentic manner across manufacturing, battery production, and supply chain management. By leveraging the capabilities of Generative AI, organizations can automate routine processes, enhance decision-making through real-time data analysis, and foster innovation while promoting sustainability. The agentic approach empowers human workers to collaborate effectively with AI systems, focusing on strategic initiatives that drive operational efficiency and competitive advantage. Attendees will gain insights into how adopting Generative AI can future-proof their operations and position them at the forefront of industry advancements.

What You’ll Learn:
The core message for attendees is that Agentic AI has the potential to revolutionize manufacturing, battery production, and supply chain management by enhancing operational efficiency, enabling data-driven decision-making, and promoting sustainability. By automating routine tasks and integrating seamlessly with existing technologies, AI empowers human workers to focus on strategic initiatives, fostering innovation and collaboration. Embracing these advancements is essential for future-proofing operations and maintaining competitiveness in a rapidly evolving industry landscape.

Talk: Context Engineering: Practical Techniques for Improving Agent Quality Today

Presenter:
Vaibhav Gupta, CEO, Boundary

About the Speaker:
Vaibhav Gupta is the founder and CEO of Boundary, a Y Combinator startup developing a new programming language (BAML) that makes LLMs both easier and more efficient for developers. Across nearly a decade in software engineering, Vaibhav has built predictive pipelines at D. E. Shaw, Google, and Microsoft HoloLens. In his free time, Vaibhav dabbles in competitive table tennis and board games, and various aspects of compilers.

Talk Track: Agents in Production

Talk Technical Level: 4/7

Talk Abstract:
Everyone is sharing tons of buzz words like few shot prompting, reasoning, test time compute, structured outputs, tool calling, but in reality, all of these are just different ways to try and get the model to do what you want. We call this context engineer. This talk is a hands-on guide to thinking about what exactly is context and how can you apply these more general techniques in specific ways to improve the reliability of your agent.

What You’ll Learn:
Models today are really quite impressive, what is often missing is the tooling around them.

Workshop: How to Build and Evaluate Agentic AI Workflows with FloTorch

Presenter:
Dr. Hemant Joshi, CTO, FloTorch

About the Presenter:
Dr. Hemant Joshi has over 20 years of industry experience building products and services with AI/ML technologies.

As CTO of FloTorch, Hemant is engaged with customers to implement State of the Art GenAI solutions and agentic workflows for enterprises.

Prior to FloTorch, Hemant has worked in companies like Tumblr, L’Oreal and Claim Genius. Hemant holds a Bachelor of Engineering from Mumbai University and a Ph.D. in Applied Computing from the University of Arkansas at Little Rock.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
The workshop will guide you through the critical challenges and solutions for deploying GenAI agents in a business environment.

You will understand how to build and scale agentic workflows reliably and securely.

– Overview of Agentic Workflows from planning to enterprise-grade implementation
– Understand the pain points that can derail an enterprise’s AI adoption, like governance and monitoring
– Set up an agentic workflow with the Flotorch AI Gateway with any LLM via a single endpoint and smart routing
– Understand why a platform for agentic governance and observability is essential for accelerating your organization’s AI journey, ensuring trust, and maximizing business value.

What You’ll Learn:
The key takeaway should be the time savings around pushing agentic projects from concept to production by using the AI Gateway. Scaling and modifying them in the future becomes easy with different LLMs.

Also, having an evaluation platform to understand the costs, latency and accuracy of LLMs before deploying to production helps a business make the necessary trade-offs for their use case.

Talk: Hope is Not a Strategy: Retrieval Patterns for MCP

Presenter:
Philipp Krenn, Head of Developer Relations, Elastic

About the Speaker:
Philipp leads Developer Relations at Elastic — the company behind the Elasticsearch, Kibana, Beats, and Logstash. Based in San Francisco, he lives to demo interesting technology and solve challenging problems — all with a smile and a terminal window.

Talk Track: Agents in Production

Talk Technical Level: 3/7

Talk Abstract:
MCP is a solid integration layer — but how does it hold up when it comes to output quality? Often, not as well as you’d like. Here are some practical retrieval patterns, from basic to advanced, that worked well in my experiments:
– Naive: Just plug in plain MCP and hope the LLM gets it right. Sometimes it does. Sometimes you’ll need a miracle.
– Semantic: Add more descriptive field names and extra metadata. It helps — but usually just a bit.
– Templated: Use a structured template and have the LLM fill it out step by step. More effort, but by far the most reliable results.

What You’ll Learn:
While MCP is a simple protocol there are (emerging) patterns you can use to make it more powerful.

Talk: From Vectors to Agents: Managing RAG in an Agentic World

Presenter:
Rajiv Shah, Chief Evangelist, Contextual AI

About the Presenter:
Rajiv Shah is the Chief Evangelist at Contextual AI with a passion and expertise in Practical AI. He focuses on enabling enterprise teams to succeed with AI. Rajiv has worked on GTM teams at leading AI companies, including Hugging Face in open-source AI, Snorkel in data-centric AI, Snowflake in cloud computing, and DataRobot in AutoML. He started his career in data science at State Farm and Caterpillar.

Rajiv is a widely recognized speaker on AI, published over 20 research papers, been cited over 1000 times, and received over 20 patents. His recent work in AI covers topics such as sports analytics, deep learning, and interpretability.

Rajiv holds a PhD in Communications and a Juris Doctor from the University of Illinois at Urbana Champaign. While earning his degrees, he received a fellowship in Digital Government from the John F. Kennedy School of Government at Harvard University. He is well known on social media with his short videos, @rajistics, that have received over ten million views.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
The RAG landscape has evolved so quickly. We’ve gone from simple keyword search to semantic embeddings to multi-step agentic reasoning. With all these approaches, we see the rise of context engineering in mastering the best RAG for the problem. This talk helps you understand the right search architecture for your use case.
We’ll examine three distinct architectural patterns, including Speedy Retrieval (<500 ms), Accuracy Optimized RAG (<10 seconds), and Exhaustive Agentic Search (10s to several minutes). You’ll see how context engineering evolves across these patterns: from basic prompt augmentation in Speed-First RAG, to dynamic context selection and compression in hybrid systems, to full context orchestration with memory, tools, and state management in agentic approaches.
The talk will include a framework for selecting RAG architectures, architectural patterns with code examples, and guidance on practical issues around RAG infrastructure.

What You’ll Learn:
RAG has matured enough that we can stop chasing the bleeding edge and start making boring, practical decisions about what actually ships.

Points:
– Attendees should leave knowing exactly when to use speedy retrieval vs. agentic search
Most use cases don’t need agents (and shouldn’t pay for them)
– As retrieval improves, managing the context window becomes the real challenge
Success isn’t about retrieving more – it’s about orchestrating what you retrieve
– Agentic search can cost 100x more than vector search
Sometimes “good enough” at 500ms beats “perfect” at 2 minutes

Talk: Agent Name Service (ANS) in Action – A DNS-like Trust Layer for Secure, Scalable AI-Agent Deployments on Kubernetes

Presenter:
Akshay Mittal, Staff Software Engineer, PayPal

About the Presenter:
Akshay Mittal is a Staff Software Engineer at PayPal and an IEEE Senior Member with over a decade of experience in full-stack development and cloud-native systems. He is currently pursuing a PhD at the University of the Cumberlands, focusing on AI/ML-driven security for cloud architectures. Akshay actively contributes to the Austin tech community through speaking engagements, mentoring, and IEEE and ACM initiatives, with a professional mission of advancing technical excellence and fostering innovation.

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
Enterprise MLOps is rapidly shifting from model-centric pipelines to agent-centric ecosystems, where autonomous AI agents continuously retrain models, validate data, and remediate incidents without human intervention. Yet most production platforms still lack a uniform mechanism to discover, authenticate, and govern these agents. This session introduces the Agent Name Service (ANS) – an open, DNS-inspired protocol that assigns unique identities, publishes verifiable metadata, and issues capability attestations for AI agents running on Kubernetes. Drawing on lessons learned from securing PayPal’s global API platform, I will demonstrate how ANS enables end-to-end trust across the ML lifecycle: model-validation agents that flag concept drift, deployment agents that patch mis-configured Helm charts, and guard-agent ensembles that enforce policy-as-code in real time. A live demo will show ANS integrated with GitOps, Open Policy Agent, Sigstore, and an open-source agent-orchestration framework, highlighting zero-trust handshakes, key rotation, and automated RBAC provisioning. Attendees will leave with practical templates and a GitHub reference implementation ready for pilot adoption.

What You’ll Learn:
1. Why identity and capability verification are the missing guardrails for agentic MLOps

2. Reference architecture for deploying ANS on a Kubernetes stack with GitOps, OPA, and Sigstore

3. Patterns for chaining validation, remediation, and notification agents while preserving least-privilege access

4. Performance and security benchmarks from a production pilot handling 1 000+ daily agent interactions

Talk: Building and Evaluating Agents

Presenter:
Anish Shah, AI Engineer, Weights & Biases

About the Speaker:
Anish loves turning ML ideas into ML products. Anish started his career working with multiple Data Science teams within SAP, working with traditional ML, deep learning, and recommendation systems before landing at Weights & Biases. With the art of programming and a little bit of magic, Anish crafts ML projects to help better serve our customers, turning “oh nos” to “a-ha”s!

Talk Track: Agents in Production

Talk Technical Level: 3/7

Talk Abstract:
This session explores how large language models evolve from single-prompt tools into agentic systems capable of solving real-world business problems. We’ll cover the design principles behind agents — reflection, tool use, planning, and collaboration — and show how these map to modern architectures. The talk then focuses on the challenge of evaluation, highlighting methods like automated judges, process-level metrics, and continuous monitoring to ensure reliability, efficiency, and user trust. Attendees will leave with a clear understanding of how to structure AI agents and how to systematically measure and improve their performance.

What You’ll Learn:
Attendees will learn the current state of agents with an emphasis on the problems faced in development with advice and tools to deal with these problems

Talk: Testing AI Agents: A Practical Framework for Reliability and Performance

Presenter:
Irena Grabovitch-Zuyev, Staff Applied Scientist, PagerDuty

About the Presenter:
Irena Grabovitch-Zuyev is a Staff Applied Scientist at PagerDuty and a driving force behind PagerDuty Advance, the company’s generative AI capabilities. She leads the development of AI agents that are transforming how customers interact with PagerDuty, pushing the boundaries of incident response and automation.

With over 15 years of experience in machine learning, Irena specializes in generative AI, data mining, machine learning, and information retrieval. At PagerDuty, she partners with stakeholders and customers to identify business challenges and deliver innovative, data-driven solutions.

Irena earned her graduate degree in Information Retrieval in Social Networks from the Technion – Israel Institute of Technology. Before joining PagerDuty, she spent five years at Yahoo Research as part of the Mail Mining team, where her machine learning solutions for automatic extraction and classification were deployed at scale, powering Yahoo Mail’s backend and processing hundreds of millions of messages daily.

She is the author of several academic articles published at top conferences and the inventor of multiple patents. Irena is also a passionate advocate for increasing representation in tech, believing that diversity and inclusion are essential to innovation.

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
As AI agents powered by large language models (LLMs) become integral to production systems, ensuring their reliability and safety is both critical and uniquely challenging. Unlike traditional software, agentic systems are dynamic, probabilistic, and highly sensitive to subtle changes—making conventional testing approaches insufficient.

This talk presents a practical framework for testing AI agents, grounded in real-world experience developing and deploying production-grade agents at PagerDuty. The main focus will be on iterative regression testing: how to design, execute, and refine regression tests that catch failures and performance drifts as agents evolve. We’ll walk through a real use case, highlighting the challenges and solutions encountered along the way.

Beyond regression testing, we’ll cover the additional layers of testing essential for agentic systems, including unit tests for individual tools, adversarial testing to probe robustness, and ethical testing to evaluate outputs for bias, fairness, and compliance. Finally, I’ll share how we’re building automated pipelines to streamline test execution, scoring, and benchmarking—enabling rapid iteration and continuous improvement.

Attendees will leave with a practical, end-to-end framework for testing AI agents, actionable strategies for regression and beyond, and a deeper understanding of how to ensure their own AI systems are reliable, robust, and ready for real-world deployment.

What You’ll Learn:
Attendees will learn a practical, end-to-end framework for testing AI agents—covering correctness, robustness, and ethics—so they can confidently deploy reliable, high-performing LLM-based systems in production.

Talk: Judging the Agents: Building Reliable LLM Evaluators with Scalable Metrics and Prompt Optimization

Presenter:
Himani Rallapalli, Senior Applied Scientist, Microsoft

About the Speaker:
Himani is a Senior Applied Scientist at Microsoft, where she focuses on fine-tuning large and small language models (LLMs and SLMs) for domain-specific applications. Her recent work includes building LLM judges, online evaluators to assess agent performance, and improving retrieval-augmented generation (RAG) systems using agentic workflows. Prior to joining Microsoft, she worked at SAP, on text analysis, search optimization, and recommendation systems. Himani is deeply passionate about research and experimentation, and is driven by the challenge of designing innovative, AI-powered solutions to complex business problems especially those involving AI, natural language processing, and information retrieval

Talk Track: Agents in Production

Talk Technical Level: 2/7

Talk Abstract:
As LLM-powered agents take on increasingly important role in production systems, ensuring their reliability and consistency is critical. These agents influence decisions, recommendations, and interactions that can directly affect users and business outcomes.
This talk introduces LLM as a Judge — using language models to evaluate other agents in real time, delivering continuous, context aware assessments without human intervention.
We’ll explore:
• Why online evaluators matter: LLM judges provide real-time, contextual assessment beyond static benchmarks.
• How LLM Judges work: Embedding evaluators into production pipelines to assess agent responses.
• Prompt optimization: Using open-source frameworks to design minimal, high-precision prompts that reduce ambiguity
• Code and workflow optimization: Explore how various productivity tools can help refine prompt structures and optimize evaluation code

• LLM Evaluator Assessment: We assess groundedness, completeness, consistency, and inter-rater reliability metrics.

We will further examine how to address prompt sensitivity and self-preference bias in LLM judges.

Attendees will leave with the skills to design and deploy LLM based judges for real time agent evaluation, leveraging prompt engineering and reproducible metrics to deliver consistent, reliable assessment at scale.

What You’ll Learn:
Attendees will gain a practical understanding of using LLM as a Judge to evaluate agents in production—measuring groundedness, completeness, and consistency with high inter rater reliability, while designing clear, reproducible prompts and mitigating pitfalls like prompt sensitivity and self preference bias.

Talk: Agent Drift: Understanding and Managing AI Agent Performance Degradation in Production

Presenter:
Kumaran Ponnambalam, Principal AI Engineer, Cisco

About the Presenter:
Kumaran Ponnambalam is a technology leader with 20+ years of experience in Generative AI, Machine Learning, Data and Analytics. His focus is on creating robust, scalable Gen AI models and services to drive effective business solutions. He is currently leading Generative AI initiatives at Cisco, building next-generation AI innovations and products to help enterprises. In his previous roles, he has built conversational bots, ML platforms, data pipelines and cloud services. A frequent speaker at technology conferences, he has also authored several courses on the LinkedIn Learning Platform in Generative AI and Machine Learning.

Talk Track: Agents in Production

Technical Level: 4

Talk Abstract:
As AI Agents continues to integrate into production systems, maintaining consistent performance over time remains a critical challenge. This talk explores the concept of “Agent Drift,” a phenomenon where AI agents experience performance degradation due to shifts in data distribution, evolving user behavior, tool behavior or model changes. Attendees will gain insights into how agent drift impacts the reliability and effectiveness of AI systems, and why early detection is essential for mitigating risks in production environments. The session will introduce practical strategies for measuring agent drift, enabling teams to identify performance gaps and adapt their Agents proactively. By leveraging these techniques, organizations can ensure their AI agents remain robust and aligned with real-world requirements. Whether you are a data scientist, engineer, or AI practitioner, this talk will provide actionable takeaways for managing and optimizing AI agents in dynamic settings.

What You’ll Learn:
How to measure performance of AI Agents in production, identify drift and take remedial actions.

Talk: Architecting and Orchestrating AI Agents

Presenter:
Anish Shah, AI Engineer, Weights & Biases

About the Speaker:
Anish loves turning ML ideas into ML products. Anish started his career working with multiple Data Science teams within SAP, working with traditional ML, deep learning, and recommendation systems before landing at Weights & Biases. With the art of programming and a little bit of magic, Anish crafts ML projects to help better serve our customers, turning “oh nos” to “a-ha”s!

Talk Track: Agents in Production

Talk Technical Level: 3/7

Talk Abstract:
Attendees will learn the current state of agents with an emphasis on the problems faced in development with advice and tools to deal with these problems

What You’ll Learn:
This is for beginners to advanced participants

Talk: Architecting a Deep Research System

Presenter:
Suhas Pai, CTO & Co-Founder, Hudson Labs

About the Speaker:
Suhas Pai is a NLP researcher and co-founder/CTO at Hudson Labs a Toronto based startup. At Hudson Labs, he works on text ranking, representation learning, and productionizing LLMs. He is also currently writing a book on Designing Large Language Model Applications with O’Reilly Media. Suhas has been active in the ML community, being the Chair of the TMLS (Toronto Machine Learning Summit) conference since 2021 and also NLP lead at Aggregate Intellect (AISC). He was also co-lead of the Privacy working group at Big Science, as part of the BLOOM open-source LLM project.

Talk Track: Agents in Production

Talk Technical Level: 3/7

Talk Abstract:
In the past year, several pioneering AI labs have launched powerful ‘Deep Research’ features that search extensively across a large number of data sources and produce comprehensive reports in response to user queries. In this talk, we will discuss the anatomy of such a system, focusing on the tradeoffs involved in building such systems, and discuss promising architectures paradigms. We will also discuss the engineering and infrastructural considerations involved in building such systems.

What You’ll Learn:
1. Understand the potential of deep research systems and their components
2. Navigate through tradeoffs involved in building such systems
3. Learn architectural paradigms and best practices in building such systems

Talk: What’s Next in the Agent Stack

Presenter:
Shelby Heinecke, Senior AI Research Manager, Salesforce

About the Presenter:
Dr. Shelby Heinecke is a pioneering leader in AI, renowned for her transformative research, engineering excellence, and dynamic thought leadership. With over 35 influential AI research publications, she has made significant contributions to the field, driving innovation and shaping the future of AI.

Shelby is currently a Senior AI Research Manager at Salesforce, leading a team innovating in AI Agents (including multi-agent systems and large action models), On-Device AI, and Small Language Models, all aimed at revolutionizing Salesforce products. Her passion for fostering technical talent and cultivating collaborative environments empowers her team to achieve breakthrough advancements. Her team’s released contributions in agentic AI span large action models (xLAM models), AgentLite, multi-modal action model, TACO, and many research papers spanning agentic data generation and agent training.
Shelby holds a Ph.D. in Mathematics from the University of Illinois at Chicago, with a specialization in machine learning theory. She also earned an M.S. in Mathematics from Northwestern University and a B.S. in Mathematics from the Massachusetts Institute of Technology (MIT). To learn more about Shelby’s work and vision, visit www.shelbyh.ai.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
What does it take to go from promising prototype to production-ready AI agent?

In this talk, I’ll break down the emerging agent stack, including robust prompt generation with Promptomatix, protocol-level evaluation with MCPEval, multimodal reasoning with TACO, fast function-calling with xLAM, and more. Each layer targets a critical bottleneck in reliability, reasoning, or scale.

You’ll get a behind-the-scenes look at the research shaping these tools, and a blueprint for the next generation of enterprise-ready agents.

What You’ll Learn:
– Evals, latency, and prompt optimization are crucial to high performing agents
– Sharing links to open source repos/models to get started in these directions

Talk: Uber's Multi-Agent SDK

Presenter:
Jamieson Leibovitch, Senior Software Engineer, Uber

About the Presenter:
Originally from Toronto, Canada, Jamieson has been at Uber for 4 years. He worked in Uber’s Data platforms for 3 years before moving to Uber’s AI Platform, Michelangelo.
He now helps lead part of the agent platform as a full stack engineer, working on Uber’s Agent SDK and Uber’s no code solutions.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
In this presentation we will go over Uber’s Agent Platform and its inhouse Agent SDK “AgentFx”.
We’ll cover how Uber thinks about how agents and some technologies built to enable enterprise-ready agentic use cases at Uber.

What You’ll Learn:
Attendees should have an understanding of how Uber thinks about how Agents power its business, and the technology behind its success.

Talk: Building Effective Agents

Presenter:
Sushant Mehta, Senior Research Engineer, Google DeepMind

About the Presenter:
Sushant is a senior research engineer at Google DeepMind, working on post-training to improve Coding capabilities in frontier Large Language Models.

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
Large language models can now power capable software agents, yet real‑world success comes from disciplined engineering rather than flashy frameworks. Reliable agents are built from simple, composable patterns instead of heavy abstractions.

The talk will introduce several patterns that add complexity / autonomy only when it pays off:

1. Augmented LLM (retrieval, tools, memory) as the atomic building block
2. Workflow motifs: prompt chaining, routing, parallelization etc with concrete criteria and implementation tips
3. Autonomous agents that loop through plan‑act‑observe‑reflect cycles to tackle open‑ended tasks

What You’ll Learn:
Attendees will leave with a practical decision framework for escalating from a single prompt to a multi‑step agents, keeping in mind robust guardrails for shipping trustworthy, cost‑effective agents at scale.

Talk: From Hello to Repayment: Voice AI in African Finance

Presenter:
Remy Muhire, CEO, Pindo.ai

About the Speaker:
Remy Muhire is the Co-Founder and CEO of Pindo, a Voice AI startup helping banks and fintechs deliver services in local African languages. Previously, he led voice technology initiatives at Mozilla and co-founded the fintech startup Exuus. Passionate about digital inclusion, Remy is dedicated to breaking barriers of literacy and language so that underserved communities can access essential financial services.

Talk Track: Agents in Production

Talk Technical Level: 1/7

Talk Abstract:
In Africa, literacy and language barriers still limit access to financial services. This session will explore how Voice AI in local languages can transform loan applications and debt recovery—making credit more accessible while improving repayment rates. Drawing from early development work and upcoming pilots in East Africa, we’ll share insights on how banks, fintechs, and SACCOs can leverage conversational AI to engage customers more effectively, from the first “hello” to the final repayment.

What You’ll Learn:
Break barriers: How Voice AI bridges literacy and language gaps in African finance.

Reimagine credit journeys: From loan applications to debt recovery through conversational voice flows.

Unlock inclusion at scale: Early pilots in East Africa show the path to higher engagement and repayment.

Talk:Build Eeliable AI Apps with Observability, Validations and Evaluations

Presenter:
Pratik Verma, Founder & CEO, Okahu AI

About the Speaker:
Serial founder in data, AI and cloud tech with a PhD from Stanford and building products at Microsoft for Global 2000 companies building data + AI workloads on Azure. Currently leading venture-funded startup Okahu to help developers build reliable AI apps and agents.

Talk Track: Agents in Production

Talk Technical Level: 2/7

Talk Abstract:
Only 5% of AI projects reached production, but the ones that do yield massive value for their organization and increased productivity for individual contributors. Core barriers are brittle workflows, lack of contextual learning, and inability to improve over time with feedback and usage. To solve these problems, developers of AI, especially agentic ones, apps must rely on observability, validation and evaluations to fix issues and improve based on user feedback. Pratik Verma from Okahu shows how easy it is to use open source dev and observability tools to build reliable AI apps in Azure, AWS and GCP.

What You’ll Learn:
How to use test driven iterative development with observability and evaluations to take AI apps into prod

Talk: A Practical Field Guide to Optimizing the Cost, Speed, and Accuracy of LLMs for Domain-Specific Agents

Presenter:
Niels Bantilan, Chief ML Engineer, Union.ai

About the Presenter:
Niels is the Chief Machine Learning Engineer at Union, a core maintainer of Flyte, an open source workflow orchestration tool, and creator of Pandera, a data validation and testing tool for dataframes. His mission is to help data science and machine learning practitioners be more productive. He has a Masters in Public Health Informatics, and prior to that a background in developmental biology and immunology. His research interests include reinforcement learning, NLP, ML in creative applications, and fairness, accountability, and transparency in automated systems.

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
As the dust settles from the initial boom of applications using hosted large language model (LLM) APIs, engineering teams are discovering that while LLMs get you to a working demo quickly, they often struggle in production with latency spikes, context limitations, and explosive compute costs. This session provides a practical roadmap for navigating not only the experiment-to-production gap using small language models (SLMs), but also the AI-native orchestration strategies that will get you the most bang for your buck.
We’ll explore how SLMs (models that range from hundreds of millions to a few billion parameters) offer a compelling alternative for domain-specific applications by trading off the generalization power of LLMs for significant gains in speed, cost-efficiency, and task-specific accuracy. Using the example of an agent that translates natural language into SQL database queries, this session will demonstrate when and how to deploy SLMs in production systems, how to progressively swap out LLMs for SLMs while maintaining quality, and which orchestration strategies help you customize and maintain SLMs in a cost-effective way.

Key topics include:
– Identifying key leverage points: Which LLM calls should you swap out for SLMs first? We’ll cover how to identify speed, cost, and accuracy leverage points in your AI system so that you can speed up inference, reduce cost, and maintain accuracy.
– Speed Optimization: It’s not just about the speed of inference, which SLMs already excel at, it’s also about accelerating experimentation when you fine-tune and retrain SLMs on a specific domain/task. We’ll cover parallelized optimization runs, intelligent caching strategies, and task fanout techniques for both prompt and hyperparameter optimization.
– Cost Management: Avoiding common pitfalls that negate SLMs’ cost advantages, including resource mismatching (GPU vs CPU workloads), infrastructure provisioning inefficiencies, and idle compute waste. Attendees will learn resource-aware orchestration patterns that scale to zero and recover gracefully from failures.
– Accuracy Enhancement: Maximizing domain-specific performance by implementing the equivalent of “AI unit tests” and incorporating it into your experimentation and deployment pipelines. We’ll cover how this can be done with synthetic datasets, LLM judges, and deterministic evaluation functions that help you catch regressions early and often.

What You’ll Learn:
Attendees will leave with actionable strategies for cost-effective AI deployment, a decision framework for SLM adoption, and orchestration patterns that compound the value of smaller models in domain-specific applications.

Talk: Impact of AI on Developer Productivity

Presenter:
Yegor Denisov-Blanch, Researcher, Stanford University

About the Speaker:
I run the software engineering productivity research group at Stanford. For the past 3+ years, we’ve been working with hundreds of companies to analyze their private git repos to measure the productivity of their engineers. We have 120,000+ engineers in the dataset. Before Stanford, I looked after digital transformation projects at a F100 company with 6,000+ engineers. I found it paradoxical that software engineers are very data-driven, yet we had no good data-driven to make decisions about things that impacted software engineering productivity.

Talk Track: AI Agents for Developer Productivity

Talk Technical Level: 1/7

Talk Abstract:
Will be deep diving on the impact of AI on developer productivity, comparing numbers across languages, seniority levels, company types, types of work, reasoning vs non-reasoning LLMS. Can showcase best-practices for adoption at enterprise scale, and also situations where the initiatives didn’t yield the desired results

What You’ll Learn:
AI increases developer productivity, but not always and not in every setting – learn how & when to use AI agents for software engineering at scale

Talk: Code-Guided Agents for Legacy System Modernization

Presenter:
Calvin Smith, Senior Researcher Agent R&D, OpenHands

About the Speaker:
Calvin Smith is a software engineer and researcher who spent years developing formal methods for generating and understanding code at scale. He joined OpenHands to apply these techniques to real-world software engineering challenges. His current focus: building AI agents that leverage formal methods to modernize legacy codebases and pushing the boundaries of what autonomous agents can accomplish in software engineering.

Talk Track: AI Agents for Developer Productivity

Talk Technical Level: 2/7

Talk Abstract:
Legacy code modernization often fails because we try to boil the ocean. After early attempts at using autonomous agents for whole-codebase transformations resulted in chaos, we developed a novel approach: combine static dependency analysis with intelligent agents to break modernization into reviewable, incremental chunks. This talk explores how we use static-analysis tools to understand codebases, identify optimal modernization boundaries, and orchestrate multiple agents to collaboratively transform codebases to turn an impossible problem into a series of manageable PRs.

What You’ll Learn:
The solution space for AI-automated software engineering extends beyond “AI for code” or “code for AI”. It’s about creating feedback loops where static analysis, AI agents, and human expertise continuously inform and enhance each other.

Talk: Vibe-Coding Your First LLM End-to-End Application

Presenters:
Dr. Greg Loughnane, Co-Founder & CEO, AI Makerspace | Chris “The Wiz” Alexiuk, Co-Founder & CTO, AI Makerspace

About the Speakers:
Dr. Greg” Loughnane is the Co-Founder & CEO of AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. Since 2021, he has built and led industry-leading Machine Learning education programs.  Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and an ML researcher.  He loves trail running and is based in Columbus, Ohio.

Chris “The Wiz” Alexiuk is the Co-Founder & CTO at AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. During the day, he is also a Developer Advocate at NVIDIA. Previously, he was a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.

Talk Track: AI Agents for Developer Productivity

Talk Technical Level: 1/7

Talk Abstract:
Want to become an AI Engineer or leader? 

Welcome to the future, where anyone with the tenacity to get serious about fixing errors when they run into them has AI-assistance at their fingertips.

Maybe you used to code, but haven’t in a while.

Maybe you never have but have always wanted to.

In 2025, people who know are going beyond VS Code. They’re picking AI Code Editors like Cursor. These types of tools make it easier than ever to build, ship, and share your first LLM application.

This workshop is the perfect way to get started programming according to industry best practice tools like Cursor powered by coding assistants like Claude CLI.

At AI Makerspace been tracking the best dev environment for aspiring AI Engineers for years, from the MLOps Dev Environment Setup to the LLM Ops Dev Environment Setup to the best way to go Beyond ChatGPT to deploy your first LLM application.

Now, it’s time we all step up to AI Engineering, to Vibe-Coding, and ultimately, to AI-Assisted Development with best-practice tools.

Today, the best-practice AI Code Editor is Cursor. But we’ll review the entire stack we need to wrap an LLM and build our first end-to-end application, including:

LLM: OpenAI models)
Package Management: uv
Code Editor: Cursor
CLI Coding Agent: Claude Code
User Interface: Vibe Coded w/ React
Deployment: Vercel

We’ll not only break down the stack, but discuss the best practices for AI-assisted development that combine classic software engineering workflows with a vibe-coding future led by the reality that while many of us might not be front end developers, Claude certainly is.

Come for the hot takes, fun demos, and discussion, leave with a real artifact to share with your family, friends, colleagues; even your boss!

What You’ll Learn:
Who should attend the event:

– Any aspiring AI Engineer or leader who wants to build, ship, and share their first LLM application
– Anyone who is curious to learn why everyone is talking about AI-assisted development, and how it differs from pure vibes.
– What to watch out for when you vibe code yourself too far down the rabbit hole 🕳️…

Talk: Your Infrastructure Just Got Smarter: AI Agents in the DevOps Loop

Presenter:
Kishan Rao, Engineering Manager, Delivery and Automation Platform, Okta

About the Speaker:
I’m an Engineering Manager with a background in backend systems, platform engineering, and infrastructure automation, currently focused on how AI agents can reshape developer workflows. With over 8 years of experience building CI/CD pipelines, internal platforms, and scalable infrastructure at cloud-first companies, I’ve seen firsthand how operational complexity can slow down engineering teams.

My recent work explores the intersection of DevOps and AI agents—designing tools that intelligently interpret infrastructure-as-code, reduce toil, and guide developers through their environments with contextual awareness. I’m passionate about building agentic systems that augment developer cognition, shorten feedback loops, and turn codebases into living documentation.

I care deeply about developer velocity, system reliability, and creating engineering environments where teams can move quickly without sacrificing quality. Based in San Francisco, I’m excited to contribute to the conversation on how autonomous AI tooling is changing the way we build, ship, and maintain software

Talk Track: AI Agents for Developer Productivity

Talk Technical Level: 2/7

Talk Abstract:
Modern infrastructure is rich, dynamic, and deeply complex. Yet most engineering teams still rely on manual processes and tribal knowledge to navigate it. In this talk, I explore how AI agents are transforming DevOps by becoming part of the loop. They’re not just automating tasks but interpreting infrastructure-as-code, understanding system context, and guiding developers through their environments.

We’ll examine how AI-native workflows are emerging across the DevOps lifecycle, from documentation generation and config reasoning to incident triage and deployment planning. I’ll share implementation patterns from my experience in platform engineering and backend systems, including how to design agents that interact with code, tools, and people with minimal friction.

You’ll leave with a practical understanding of how to embed AI agents into your stack, the trade-offs of using local versus cloud LLMs, and how this shift can change the speed, clarity, and confidence with which your teams ship code.

What You’ll Learn:
AI at scale in production does not need to be hard. But you do need to spend time deeply thinking about the outcome you are trying to achieve with AI in the loop

Talk: Don't Page the Planet: Trust-Weighted Ops Decisions

Presenter:
Eric Reese, Senior Manager, Site Reliability Engineering, BestBuy

About the Presenter:
Eric Reese, Senior Manager of SRE at Best Buy, leads ML initiatives for incident operations. He specializes in trust-weighted decisions, spike detection algorithms, and the operational guardrails that make AI reliable in production. His focus: bridging the gap between ML predictions and safe automated actions.

Talk Track: AI Agents for Model Validation and Deployments

Technical Level: 3

Talk Abstract:
Enterprises don’t need more dashboards—they need deciders that act safely. This talk shows how we built an agentic validation layer that sits between ML predictions and operational responses. The system classifies incidents, applies exponentially-decaying trust scores with configurable half-life, adapts contextual thresholds (time-of-day baselines, scope, team diversity), routes gray-zone cases through smart policies, and posts idempotent state changes to chat systems. We’ll cover the trust accumulation algorithm, how derivative signals catch rising threats, multi-agent validation with cross-checking, and the observability needed to audit every decision. Attendees leave with a platform-agnostic pattern—Predict → Categorize → Weight → Accumulate → Decide → Notify—that turns noisy ML outputs into governed actions with built-in safety rails.

What You’ll Learn:
Core Message: ML predictions need a safety wrapper to become operational decisions—this talk provides that complete pattern.

Supporting learnings:
– A reusable last-mile pattern: turning any ML prediction into a safe operational action
– How to tune weighted-decay rates to catch real incidents without alert fatigue (with the actual math and knobs)
– Handling uncertainty: when the model isn’t sure, policy-based “gray-zone” routing takes over
– What observability means for AI ops: audit trails, structured rationales, rollback hooks
– Making it production-ready: idempotence, guaranteed delivery, and live config updates without downtime

Talk: Context is King: Scaling Beyond Prompt Engineering at BlackRock

Presenters:
Vaibhav Page, Principal Engineer, Blackrock | Infant Vasanth, Senior Director of Engineering, Blackrock

About the Presenters:
Vaibhav is a Principal Engineer at BlackRock, where he leads the development of the Data Science and AI platform powering investment research and automation across the firm. Vaibhav is also the author of Argo-Events, a CNCF-graduated project widely used for event-driven automation in cloud-native environments.

Infant Vasanth leads the engineering team responsible for the Studio Compute Platform, BlackRock’s analytics and automation platform that enables our users to conduct research & analysis, run automations and distribute research at scale.
In addition, Infant is also leading the Data & AI Acceleration team focusing on efforts to enhance Aladdin Studio’s AI capabilities alongside the Operational AI capabilities(prospectus analyzer, operational agents etc.)

Talk Track: Augmenting Workforces with Agents

Technical Level: 2

Talk Abstract:
As AI use cases grow in complexity, prompt engineering along is insufficient. In this talk, we will discuss BlackRock’s evolution of engineering relevant contexts for a broad range of AI use cases from generating investment signals to optimizing operational processes. Furthermore, building the right context in real-time has its own set of challenges ranging from context window limitations, finding the relevant information, and running evaluations on the generated context. We’ll demonstrate how thoughtful context design leads to more robust and adaptable AI agents. We will go over the art and science of building relevant contexts for complex financial use cases and its associated challenges.

What You’ll Learn:
This session offers a practical guide and framework for users looking to build or engineer relevant contexts at scale for their AI applications and use cases. By showcasing how the framework accelerates the creation of these AI Contexts, the session will provide actionable insights for teams aiming to develop and deploy custom AI solutions. We’ll walk through real-world examples, including some of the challenges we faced while building contexts for different teams at BlackRock. The design principles, architectural patterns, and context engineering strategies shared can be applied across industries to reduce hallucinations and give relevant answers. Attendees will also learn how this looks like in a highly regulated environment where adhering to industry-standard security practices is of outmost importance.

Talk: Where Experts Can't Scale: Orchestrating AI Agents to Structure the World's Product Knowledge

Presenters:
Kshetrajna Raghavan, Principal Machine Learning Engineer, Shopify | Ricardo Tejedor Sanz, Senior Taxonomist, Shopify

About the Presenters:
Kshetrajna is a Principal Machine Learning Engineer at Shopify with 15 years of experience delivering AI solutions across technology, healthcare, and retail. He has led initiatives in large-scale product search, computer vision, natural language processing, and predictive modeling—translating cutting-edge research into systems used by millions. Known for his pragmatic approach, he focuses on building scalable, high-impact machine learning products that drive measurable business results.

Ricardo Tejedor Sanz is a Senior Taxonomist at Shopify with a distinctive background spanning legal experience, linguistics, and machine learning. With diverse analytical experience across international contexts and master’s degrees in English Literature and Audiovisual Translation, plus fluency in four languages, Ricardo brings exceptional rigor and customer-focused problem-solving to taxonomy challenges. He evolved from traditional manual taxonomy methods built on deep market research, competitive analysis, and semantic understanding, to pioneering AI-driven classification systems benefiting millions of merchants globally.

Talk Track: Augmenting Workforces with Agents

Technical Level: 2

Talk Abstract:
How do you maintain a product taxonomy spanning millions of items across every industry—from guitar picks to industrial sensors—when no human team could possibly possess expertise in all these domains? At Shopify, we faced this exact challenge and built an AI agentic system that transforms an impossible human task into a scalable, automated workflow.

In this talk, we reveal how we orchestrate multiple specialized AI agents to analyze, improve, and validate taxonomy changes at unprecedented scale.

You’ll discover:
– How parallel AI agents can augment human expertise across domains where deep knowledge is impossible to maintain
– The architecture patterns that enable agents to work together while maintaining quality and consistency
– Why LLM-as-judge systems are game-changers for scaling quality control
– Critical lessons learned from production deployment, including surprising failures and how we fixed them

We share real metrics showing how this approach transformed a years-long manual process into days of AI-augmented work, and provide actionable insights you can apply to your own “impossible” classification and curation challenges.
Whether you’re dealing with content moderation, data classification, or any task requiring expertise across vast domains, you’ll leave with concrete strategies for building AI agent systems that scale human judgment beyond traditional limitations.

What You’ll Learn:
1. Decompose “Impossible” Into Specialized Agents
Don’t build one AI to know everything. Build many agents that each know something, then orchestrate them.

2. LLM-as-Judge Unlocks Scale
Shifting from “humans review 100%” to “AI pre-screens, humans see 10%” is the game-changer. Key: Let AI fix minor issues, not just reject.

3. Production Lessons Are Brutal
– Prompt overload breaks reasoning
– Always build fallbacks for when services fail

4. Trust Through Transparency
Every AI decision needs reasoning, audit trails, and escalation paths. No black boxes.

5. The Meta-Lesson
Scale isn’t about replacing humans—it’s about amplifying the expertise you have across domains you couldn’t possibly cover.

Talk: From Zero to One: Building AI Agents From The Ground Up

Presenter:
Federico Bianchi, Senior ML Scientist, TogetherAI

About the Presenter:
Federico Bianchi is a Senior ML Scientist at TogetherAI, working on self-improving agents. He was a post-doc at Stanford University. His work has been published in major journals such as Nature and Nature Medicine and conferences such as ICLR, ICML and ACL.

Talk Track: Augmenting Workforces with Agents

Technical Level: 4

Talk Abstract:
What does it take to build a truly autonomous AI agent, from scratch and in the open? In this talk, I’ll share how we’ve developed agents capable of executing full analytical workflows, from raw data to insights. I’ll walk through key principles for designing robust, transparent agents that reason, reflect, and act in complex scientific domains. We’ll explore how architectural choices, tool use, and learning approaches—including reinforcement learning—can be combined to build agents that improve over time and generalize to new tasks.

What You’ll Learn:
Building agents is easy but requires some thinking about the context in which the agents are going to be embedded.

Talk: Gradio: The Web Framework for Humans and Machines

Presenter:
Freddy Boulton, Open Source Software Engineer, Hugging Face

About the Presenter:
Freddy Boulton, an Open Source Engineer at Hugging Face, brings six years of experience in developing tools that simplify AI sharing and usage. He’s a core maintainer of Gradio, an open-source Python package for building production-ready AI web applications. His latest work focuses on making Gradio applications MCP-compliant, enabling Python developers to create seamless, beautifully designed web interfaces for their AI models that integrate with any MCP client without additional configuration.

Talk Track: Augmenting Workforces with Agents

Technical Level: 2

Talk Abstract:
The Model Context Protocol (MCP) has ushered in a new paradigm, enabling applications to be accessible to AI agents. But shouldn’t these same applications be just as accessible and intuitive for humans? What if building a user-friendly interface for people could automatically create a powerful interface for machines too? This presentation introduces Gradio as The Web Framework for Humans and Machines. We’ll explore how Gradio allows developers to build performant and delightful web UIs for human users, while simultaneously, thanks to its automatic Model Context Protocol (MCP) integration, generating a fully compliant and feature-rich interface for AI agents.

Discover how Gradio simplifies the complexities of MCP, offering “”batteries-included”” functionality like robust file handling, real-time progress updates, and authentication, all with minimal additional effort. We’ll also highlight the Hugging Face Hub’s role as the world’s largest open-source MCP “”App Store,”” showcasing how Gradio-powered Spaces provide a vast ecosystem of readily available AI tools for LLMs. Join us to learn how Gradio uniquely positions you to develop unified AI applications that serve both human users and intelligent agents.

What You’ll Learn:
Developers can build performant feature rich UIs for AI models entirely in python with Gradio. These apps can be easily shared with human users as well as plugged into any MCP-compliant AI agent. Write once, deploy for truly every possible user.

Talk: The Efficiency Equation: Leveraging AI Agents to Augment Human Labelers in Building Trust and Safety Systems at Scale

Presenter:
Madhu Ramanathan, Senior Engineering Leader,Trust and Safety, Meta

About the Presenter:
Madhu Ramanathan is a seasoned engineering and applied science leader with over 13 years of experience building AI-powered systems at Microsoft, Meta and Amazon. She has led globally distributed teams in trust, safety, content intelligence, and search, delivering responsible AI solutions that impact millions of users worldwide. Passionate about trust, safety, and ethical innovation, she brings a practitioner’s lens to productionizing cutting-edge, trustworthy AI solutions to solve real-world problems at scale.

Talk Track: Augmenting workforces with Agents

Technical Level: 2

Talk Abstract:
In today’s digital ecosystem, Trust & Safety systems face mounting challenges—from content proliferation, real-time enforcement demands and cost savings pressure —complicated further by evolving threats like deepfakes, AI generated hallucination, misinformation, and adversarial behavior. Given this a defensive space with ever-evolving threats, human labeling has been crucial in this domain for measurement, data collection to train models, real time enforcements, reactive takedowns and appeals – but it comes with costs in the millions for scaled applications. This keynote explores how LLMs and AI agents are reshaping this landscape, augmenting the human labelers, offering scalable, cost-efficient, and high-quality solutions that optimizes across defect rates, precision and operation cost at unprecedented speeds.

In particular, the talk will cover the following –
– A brief introduction to Trust and Safety systems, metrics and emerging threats such as deepfakes, hallucination, misinformation in the evolving GenAI landscape
– The role of human labelers in the traditional Trust and Safety lifecycle across measurement, data collection, proactive enforcements and reactive takedowns, the challenges in having large scale human labeler dependencies and the costs involved
– Case study of how LLMs/Agents are used in each stage such as measurement, enforcement and reactive takedowns with real world examples and the impact on Defect rate, Precision, Cost at each stage.
– Deep dive on continuous evaluation and calibration techniques of the LLM/Agentic judges used in the measurement flow and enforcement flow using humans-in-the-loop and Auto tuners for prompt tuning.
– Challenges faced and solutions such as a) Pitfalls from using same Agent for measurement and enforcement and solution by using Agentic + HI flows for measurement b) Handling constant model migrations in product c) Cost and GPU constraints in deploying LLMs at scale and evolution into distilled SLM models using LLM based teacher models.
– Finally, summarizing learnings on how the recent AI evolution in the last couple of years has brought in new challenges to this space but also provided ability to solve those problems by smartly combining HI and AI.

What You’ll Learn:
Trust & Safety is entering a new era – where the rapid AI evolution has brought in mounting challenges such as deepfakes, hallucination, misinformation along with budget cuts and hyper agility demands. Fortunately, the AI evolution has also enabled powerful solutions to those problems —one where human judgment and AI intelligence must co-evolve. This talk will give the attendees a deep dive on real world hybrid systems built at scale that are not only scalable and cost-effective but also resilient, ethical, and continuously improving to defend against the evolving challenges.

Talk: Fake Data, Real Power: Crafting Synthetic Transactions for Bulletproof AI

Presenter:
Bhavana Sajja, Senior Machine Learning Engineer, Expedia Inc.

About the Speaker:
A Senior Machine Learning Engineer at Expedia Inc Company, I lead the end-to-end development and operationalization of AI/ML solutions across high-impact use cases such as fraud detection, supplier screening, and dynamic fraud listing. With a strong foundation in building, deploying, and monitoring production-grade models, I ensures that data pipelines, model performance, and governance frameworks align seamlessly with both business objectives and compliance requirements.

Known for a solutions-oriented mindset, I thrive on adopting emerging technologies to address real-world challenges. Currently, I am exploring agentic AI paradigms—such as agent-to-agent (A2A) protocols and model-context protocol (MCP) architectures—to enhance the reliability, adaptability, and explainability of fraud prevention systems. Our work focuses on crafting autonomous pipelines that can detect novel attack vectors in near real-time, prioritize high-risk cases, and continuously refine detection strategies through feedback loops.

Beyond day-to-day engineering, I actively contributes to cross-functional initiatives: mentoring junior engineers, sharing best practices at internal knowledge-shares, and evaluating new MLOps tools to accelerate model iteration cycles. With a passion for continuous learning, I participate in developer forums—bridging the gap between cutting-edge research and enterprise-scale deployments.

Talk Track: Data Engineering in an LLM era

Talk Technical Level: 2/7

Talk Abstract:
In today’s AI-driven world, organizations want to use their rich transaction records for insights and model building but worry about exposing sensitive customer details. This talk offers a clear, practical guide to creating high-quality synthetic transaction data—data that looks and behaves like real records but contains no actual customer information. We’ll focus on why good data quality is essential for any AI model: without realistic patterns and relationships, models trained on synthetic data simply won’t perform well.

We’ll first highlight the main hurdles in transaction tables: mixed data types (numbers, categories, dates), rare events (like fraud), and complex links between features. Then, we’ll introduce four proven generative approaches—GANs (Generative Adversarial Networks), TVAEs (Tabular Variational Autoencoders), TabularARGN (Tabular Autoregressive Generative Networks), and GPT-based methods—that address these challenges in different ways:

GANs learn to “fool” a critic network to produce realistic samples, which helps match complex data patterns.

TVAEs focus on understanding each column’s data type (text, number, category) to recreate accurate row-level details.

TabularARGN builds records step-by-step, preserving sequential and hierarchical relationships in the data.

GPT-based methods leverage transformer models (like those behind large language models) to capture broad patterns and generate new rows based on learned “templates.”

Through a simple case study on a public credit-card transactions dataset, we’ll walk through:

Preparing data (filling in missing values, encoding categories, handling outliers)

Choosing and training a model (why you might pick a GAN versus a TabularARGN or a TVAE)

Evaluating results with easy-to-understand checks—how closely synthetic data matches real data distributions and how well a fraud-detection model trained on synthetic data performs.

We’ll also discuss balancing privacy (keeping customer details safe) with usefulness (keeping important patterns, like rare fraud events). Finally, we’ll point to simple next steps: using synthetic data in healthcare records or IoT sensor logs, monitoring data quality automatically, and ensuring any privacy concerns are met. By the end of this session, even attendees new to generative AI will understand how to pick a method, build a high-quality synthetic dataset, and trust that their AI models can learn and perform effectively—boosting innovation without risking real customer data.

What You’ll Learn:
Using advanced techniques Understanding Why Synthetic Transactional Data Matters
Real-World Trade-Offs: Utility vs. Privacy

Talk: RAG architecture at CapitalOne

Presenter:
Vaibhav Misra, Director – Distinguished Engineer, CapitalOne

About the Speaker:
An experienced hands on technologist and Engineering Leader with a proven track record of delivering successful large scale (~20 years), cloud based,scalable, robust, secure and fault-tolerant enterprise level distributed systems to meet evolving business requirements capable of processing hundreds of TB of data daily across thousands of customers.

Experience building high performance engineering teams with 8+ years of technical leadership experience providing architecture design, influencing product roadmap, setting technical direction.

Driven by passion for excellence, continuously look to up skill myself on the latest and greatest in technology world and also provide technical guidance, coaching and mentorship to grow other technical leaders with proven ability to lead by influence.

Experience collaborating with cross functional stakeholders, product and engineering leadership to define prioritization of architectural & product roadmap items, build technology strategies across multiple teams, ensuring alignment with business objectives.

Experience handling data at scale in cloud, employing various storage technologies providing secure and reliable cloud solutions which uses encryption for data in transit as well data at rest.

Experience building data intensive applications on top of AI/ML/ LLMs

Talk Track: Data Engineering in an LLM era

Talk Technical Level: 2/7

Talk Abstract:
Shortcomings of LLMs With RAG
RAG Use Cases
Building RAG Data Pipeline with Vector Search
Using RAG with Prompt Engineering and Fine Tuning

What You’ll Learn:
Shortcomings from GEN AI and how to overcome

Talk: The Rise of Self-Aware Data Lakehouses

Presenter:
Srishti Bhargava, Software Engineer, Amazon Web Services

About the Speaker:
I’m Srishti! I’m a software engineer at AWS where I work on data platforms, focusing on systems like Apache Iceberg and SageMaker Lakehouse. I help teams build analytics and machine learning solutions that actually work at scale – turning messy data into something useful.
I really care about making data engineering more approachable. A lot of modern data tools feel unnecessarily complex, so I write about the practical stuff, how to keep tables performing well, handle schema changes gracefully, and build systems that don’t break in production.
Outside of work, I love hiking and catching sunrises when I can. I also spend a lot of time cooking – it’s how I relax and unwind. There’s something satisfying about taking simple ingredients and making something good with them. Some of my best ideas actually come to me while I’m in the kitchen, just taking things slow and enjoying the process.

Talk Track: Data Engineering in an LLM era

Talk Technical Level: 2/7

Talk Abstract:
If you’re managing more than 50 tables and a handful of data models, you’ve probably felt the pain. Schema changes break production. Impact analysis takes hours. New engineers spend weeks figuring out what data exists and how it connects.
In this session, we’ll show you how to build an AI assistant that understands your data platform. Not just another chatbot, but a system that can analyze your schemas, parse dependencies, and predict exactly which models will break when you change a column.
We’ll demonstrate a working implementation that extracts metadata from Apache Iceberg tables, analyzes SQL dependencies, and creates an AI assistant that answers questions like – Which tables are burning through our storage budget?, What’s the blast radius if this critical system goes down?, Where is all our customer PII hiding across 500 tables? Which data pipelines haven’t been touched in months and might be zombie processes? Which tables in the data lakehouse can benefit from iceberg compaction? – analysis that would take days of detective work manually and complex queries. The result is a powerful, natural language interface for data discovery.
Attendees will see live examples of querying table schemas and identifying datasets using simple English prompts, leaving with a practical blueprint for leveraging LLMs to unlock the full potential of their data infrastructure in production settings.

What You’ll Learn:
1. Metadata problem is becoming worse not better, as organizations store large amounts of data across complex systems, it’s getting harder to derive insights from your data in a non-trivial manner.
2. LLMs can actually understand your data architecture.
3. Small and simple changes in how you structure your tables can be extremely beneficial for your organization.
4. This approach scales exponentially – manual approaches don’t. At 10 tables, spreadsheets or manual queries work fine, but when we’re dealing with the scale at organizations today, only an LLM powered approach can keep up with the complexity.
5. This approach can be integrated into existing systems today. We’ll show you how to extract metadata from real Apache Iceberg tables, analyse dependencies, create embeddings and build systems that work with your current data stack
6. Metadata contains way more business value than we realize. The schemas, dependencies and usage patterns tell stories about performance bottlenecks, governance gaps, and business impact that most of us are completely missing.

Talk: I Tried Everything: A Pragmatist's Guide to Building Knowledge Graphs from Unstructured Data

Presenter:
Alessandro Pireno, Founder, Stealth Company

About the Speaker:
Alessandro Pireno is an AI and Data Product leader with a 15-year track record of scaling innovative data infrastructure companies. His career is distinguished by a unique 360-degree perspective gained from leading Engineering, Product, and Sales Engineering teams at hyper-growth startups like Snowflake and SurrealDB. He played a pivotal role in building the technical GTM engine that established Snowflake’s early enterprise dominance and more recently architected the product and GTM strategy for SurrealDB’s AI and vector capabilities. His open-source work includes proofs-of-concept for Retrieval-Augmented Generation with Knowledge Graphs (surrealdb-rag) and techniques for graph extraction (graph-examples). Currently, he is building a new stealth project to automate knowledge graph generation using an agentic framework that leverages diverse techniques from NLP to in-database search.

Talk Track: Data Engineering in an LLM era

Talk Technical Level: 3/7

Talk Abstract:
Traditional ETL pipelines are breaking under the demands of LLMs. They excel at structured data, but fail when confronted with the unstructured documents and implicit relationships that give AI its context. To solve this, we must evolve from ETL to “KG-ETL”—pipelines that build knowledge graphs as a first-class output. This session is a pragmatic guide to three competing pipeline architectures for building KGs from raw data. We’ll explore using LLM prompts as a new ‘T’ in your pipeline, contrast it with traditional NLP pipelines, and deep-dive into a novel hybrid retrieval workflow that uses vector stores for something beyond semantic search: high-precision entity resolution. You’ll leave with a framework for choosing the right pipeline for your data, moving beyond simple RAG to build truly context-rich AI systems.

What You’ll Learn:
Design and contrast three distinct data pipeline architectures for knowledge graph construction: LLM-prompt-based, traditional NLP-based, and a hybrid vector search-based model.

Evaluate the cost, latency, scalability, and observability trade-offs of each pipeline pattern, helping you select the right approach for your MLOps environment.

Learn a novel, operational technique for using vector stores beyond semantic search—by training a custom fasttext model on cleansed names to create embeddings for high-precision, scalable entity resolution.

Receive a decision framework for selecting the right KG-ETL pipeline based on your source data’s structure (unstructured, semi-structured, or structured) and your project’s specific requirements.

Talk: How Math-Driven Thinking Builds Smarter Agentic Systems

Presenter:
Claire Longo, Lead AI Researcher, Comet

About the Presenter:
Claire Longo is an AI leader and Mathematician with over a decade of experience in Data Science and AI. She has led cross-functional AI teams at Twilio, Opendoor, and Arize AI and is currently a Lead AI Researcher at Comet. She holds a Bachelor’s in Applied Mathematics and a Master’s in Statistics from The University of New Mexico. Beyond her technical work, Claire is a Speaker, Advisor, YouTuber, and Poker Player. She is dedicated to mentoring Engineers and Data Scientists while championing diversity and inclusion in AI. Her mission is to empower the next generation of AI practitioners.

Talk Track: Evolution of Agents

Technical Level: 3

Talk Abstract:
Everyone’s buzzing about LLMs, but too few are talking about the math that should guide how we apply them to real-world problems. Mathematics is the language of AI, and a foundational understanding of the math behind AI model architectures should drive decisions when we’re building AI systems.

In this talk, I will do a technical deep dive to demystify how different mathematical architectures in AI models can guide us on how and when to use each model type, and how this knowledge can help us design agent architectures and anticipate potential weaknesses in production so we can safeguard against them. I’ll break down what LLMs can do (and where they fall apart), clarify the elusive concept of “reasoning,” and introduce a benchmarking mindset rooted in math and modularity.

To put it all into context, I’ll share a real-world example of an Agentic use case from my own recent project: a poker coaching app that blends an LLM reasoning model as the interface with statistical models analyzing a player’s performance using historical data. This is a strong example of the future of hybrid agents, where LLMs and other mathematical algorithms work together, each solving the part of the problem it’s best suited for. It demonstrates the proper application of reasoning models grounded in their mathematical properties and shows how modular agent design allows each model to focus on the piece of the system it was built to handle.

I’ll also introduce a scientifically rigorous approach to benchmarking and comparing models, based on statistical hypothesis testing, so we can quantify and measure the impact of different models on our use cases as we evaluate and evolve agentic design patterns.

Whether you’re building RAG agents, real-time LLM apps, or reasoning pipelines, you’ll leave with a new lens for designing agents. You’ll no longer have to rely on trial and error or feel like you’re flying blind with a black-box algorithm. Foundational mathematical understanding will give you the intuition to anticipate how a model is likely to behave, reduce time to production, and increase system transparency.

What You’ll Learn:
It’s easier than you think to understand foundational mathematical concepts in AI, and use that knowledge to guide you build better AI systems

Talk: MCML: A Universal Schema for AI Traceability and Lifecycle Governance

Presenters:
Lanre Ogunkunle, Senior AI Engineer, PleyVerse AI | Alex Olaniyan, Project Manager, PleyVerse AI

About the Presenters:
Lanre Ogunkunle is the creator of MCML (Model Connect Markup Language), a schema-based framework designed to bring lifecycle traceability, auditability, and regulatory compliance to AI systems. With deep experience in AI architecture, MLOps, and responsible AI deployment, Lanre has led governance-focused AI implementations in healthcare, finance, and autonomous systems. They are currently building infrastructure to align AI development with standards like the EU AI Act, NIST RMF, and FDA SaMD. Their work focuses on operationalizing ethics, transparency, and safety across the entire AI lifecycle.

Alex is a Project Manager at PLEYVERSE, where he supports high-impact initiatives at the intersection of healthcare innovation, MLOps, and GenAI. With a background in Agile coaching and enterprise transformation, Alex ensures seamless cross-functional collaboration between engineering, research, and compliance teams. At the 2025 GenAI Summit, he plays a key support role in coordinating speaker engagement and representing PLEYVERSE’s commitment to safe, scalable, and responsible AI in healthcare.

Talk Track: Governance, Auditability & Model Risk Management

Technical Level: 4

Talk Abstract:
The deployment of artificial intelligence systems in critical domains such as healthcare, finance, and autonomous systems has intensified regulatory scrutiny and demands for transparent, auditable AI practices. Current documentation approaches, while valuable, suffer from fragmentation, limited interoperability, and insufficient lifecycle coverage. This paper introduces the Model Connect Markup Language (MCML), a unified schema-based governance framework that integrates model, dataset, interface, and agent documentation into a comprehensive traceability system. Our contribution demonstrates how MCML enables end-to-end AI traceability from development through inference, supporting regulatory compliance (EU AI Act, NIST RMF, FDA SaMD) and facilitating cross-organizational interoperability. We validate MCML’s effectiveness through alignment analysis with ML Bill of Materials initiatives and present empirical results from real-world implementations across three industry verticals, showing 40% reduction in audit preparation time and improved incident response capabilities.

What You’ll Learn:
– How to implement lifecycle traceability using MCML
– How to map AI artifacts to compliance frameworks (NIST, EU AI Act, FDA)
– How to integrate MCML into CI/CD pipelines and existing MLOps stacks
– Best practices from MCML adoption in healthcare, finance, and autonomous systems

Talk: Securing Models

Presenter:
Hudson Buzby, Solutions Architect, JFrog

About the Speaker:
Hudson Buzby is a solution engineer with a strong focus on MLOps and LLMOps, leveraging his expertise to help organizations optimize their machine learning operations and large language model deployments. His role involves providing technical solutions and guidance to enhance the efficiency and effectiveness of AI-driven projects.

Talk Track: Latest MLOps Trends

Talk Technical Level: 3/7

Talk Abstract:
Generative AI and machine learning models are reshaping industries but also introducing new security risks. Model marketplaces like HuggingFace or OLlama have become inundated with models that do not have trusted sources/authors and often contain vulnerabilities. Many organizations are struggling to formulate a strategy that safely allows their team to build and deploy open source LLM’s. This session explores the unique security challenges of ML systems in the GenAI era and provides actionable strategies to safeguard them. Learn why traditional approaches fall short and how to fortify your ML lifecycle to stay ahead in an evolving threat landscape.

What You’ll Learn:
It is essential for organizations to place guardrails around open source LLM development in a safe, scalable manner.

Talk: Building Multi-Cloud GenAI Platforms without The Pains

Presenter:
Romil Bhardwaj, Co-creator, SkyPilot

About the Presenter:
Romil Bhardwaj is the co-creator of SkyPilot, a widely adopted open-source project that enables running AI workloads seamlessly across multiple cloud platforms. He completed his Ph.D. in Computer Science at UC Berkeley’s RISE Lab, advised by Ion Stoica, focusing on large-scale systems and resource management for machine learning. Romil’s work, recognized with multiple patents, 1,100+ citations in top conferences, and awards such as the USENIX ATC 2024 Distinguished Artifact Award and ACM BuildSys 2017 Best Paper, builds on a strong foundation in both academia and industry. He was previously a contributor to the Ray project, and a Research Fellow at Microsoft Research, where he developed systems for machine learning and wireless networks, including award-winning projects and granted patents. He remains an active reviewer and speaker at leading systems and AI venues.

Talk Track: LLMs on Kubernetes

Technical Level: 2

Talk Abstract:
GenAI workloads are redefining how AI platforms are built. Teams can no longer rely on a single cloud to satisfy their GPU needs, infra costs are growing and productivity of ML engineers is paramount. Going multi-cloud secures GPU capacity, reduces costs and eliminates vendor lock-in, but introduces operational complexity that can slow down ML teams.

This talk is a hands-on guide to building a multi-cloud AI platform that unifies cloud VMs and Kubernetes clusters across Hyperscalers (AWS, GCP, and Azure), Neoclouds (Coreweave, Nebius, Lambda), and on-premise clusters into a single compute abstraction. We’ll walk through practical implementation details including workload scheduling strategies based on resource availability and cost, automated cloud selection for cost optimization, and handling cross-cloud data movement and dependency management. This approach lets ML engineers use the same interface for both interactive development sessions and large-scale distributed training jobs, enabling them to focus on building great AI products rather than wrestling with cloud complexity.

What You’ll Learn:
Multi-cloud solves GenAI’s capacity and cost challenges; the right abstraction layer makes it easy for infra teams and researchers alike.

Talk: LLM Inference: A Comparative Guide to Modern Open-Source Runtimes

Presenter:
Aleksandr Shirokov, Team Lead MLOps Engineer, Wildberries

About the Presenter:
My name is Aleksandr Shirokov, I am a T3 Fullstack AI Software Engineer with 5+ years of experience and Team Lead management competence. Currently, I am leading MLOps Team in world-famous marketplace Wildberries in the RecSys department, launching AI products, building ML infrastructure and tools for 300+ ML engineers. I and my team support the full ML lifecycle, from research to production, and work closely with real user-facing products, directly impacting business metrics. https://aptmess.io for more info

Talk Track: LLMs on Kubernetes

Technical Level: 3

Talk Abstract:
In this session, we’ll share how our team built and battle-tested a production-grade LLM serving platform using vLLM, Triton TensorRT-LLM, Text Generation Inference (TGI), and SGLang. We’ll walk through our custom benchmark setup, the trade-offs across frameworks, and when each one makes sense depending on model size, latency, and workload type. We’ll cover how we implemented HPA for vLLM, reduced cold start times with Tensorize, co-located multiple vLLM models in a single pod to save GPU memory, and added lightweight SAQ-based queue wrappers for fair and efficient request handling. To manage usage and visibility, we wrapped all endpoints with Kong, enabling per-user rate limits, token quotas, and usage observability. Finally, we’ll share which LLM and VLM models are running in production today (we are serving DeepSeek R1‑0528 in production), and how we maintain flexibility while keeping costs and complexity in check. If you’re exploring LLM deployment, struggling with infra choices, or planning to scale up usage, this talk will help you avoid common pitfalls, choose the right stack, and design a setup that truly fits your use case.

What You’ll Learn:
There’s no one-size-fits-all LLM serving stack – we’ve benchmarked, deployed, and optimized multiple runtimes in production, and we’ll share what works, when, and why, so you can build the right setup for your use case.

Prequisite Knowledge:
Base info about NLP transformers, Python and Docker

Talk: Observability Panel

Presenter:
Vaibhavi Gangwar, CEO, Co-Founder, Maxim AI

About the Presenter:
Vaibhavi Gangwar (VG) is the co-founder and CEO of Maxim AI (getmaxim.ai), an enterprise-grade evaluation and observability platform for generative AI applications. Drawing from her experience building AI developer tools and NLP products at Google, VG founded Maxim in 2023 alongside her co-founder, Akshay Deo, who previously led product engineering at Postman. Maxim has raised $3M in seed funding from prominent investors and is developing the GenAI developer stack, which includes tools for benchmarking and experimentation, pre-release and post-release evaluation and quality control, as well as data curation, to help modern AI teams ship products with quality, speed, and reliability.

Talk Track: LLM Observability

Technical Level: 2

Talk Abstract:
The talk will cover how AI teams can build reliable, high-quality products by placing evaluation at the core of their product development process. It will explore how top teams experiment & test before launch, monitor AI behavior & quality in the wild, and close the loop between data and decisions – driving faster iteration and greater confidence in what they ship. The session will also feature learnings from start-ups to enterprises in terms of what’s actually working (and what isn’t) for their evaluation workflows.

What You’ll Learn:
1. Setting up structured and scalable eval processes for AI agent development
2. Measuring and improving quality of AI agents
3. Monitoring AI agents in production and optimizing quality
4. Running effective prompt engineering
5. Automating AI agent testing

Talk: Insights and Epic Fails from 5 Years of Building ML Platforms

Presenter:
Eric Riddoch, Director of ML Platform, Pattern AI

About the Presenter:
Eric leads the ML Platform team at Pattern, the largest seller on Amazon.com besides Amazon themselves.

Talk Track: ML Collaboration in Large Organizations

Technical Level: 2

Talk Abstract:
Building an internal ML Platform is a good idea as your amount of data scientists, projects, or data increases. But the MLOps toolscape is overwhelming. How do you pick tools and set your strategy? How important is drift detection? Should I serve all my models as endpoints? How “engineering-oriented” should my data scientists be?

Join Eric on a tour 3 ML Platforms he has worked on to serve 14 million YouTubers and the largest 3P seller on Amazon. Eric will share specific architectures, honest takes from epic failures, things that turned out not to be important, and principles for building a platform with great adoption.

What You’ll Learn:
– Principles > tools. Ultimately all MLOps tools cover ~9 “jobs to be done”.
– “Drift monitoring” is overstated. Data quality issues account for most model failures.
– Offline inference exists and is great! Resist the temptation to use endpoints.
– Data lineage is underrated. Helps catch “target leakage” and upstream/downstream errors.
– Cloud GPUs from non-hyperscalers are getting cheaper. You may not need on-prem.
– DS can get away with “medium-sized” data tools for a long time.

Adversarial Threats Across the ML Lifecycle: A Red Team Perspective

Presenter:
Sanket Badhe, Senior Machine Learning Engineer, TikTok

About the Presenter:
Sanket Badhe is a seasoned Machine Learning Engineer with over 8 years of experience specializing in fraud and spam detection, offensive AI, large-scale ML systems, and LLM applications. He currently leads key ML initiatives at TikTok, driving the development of robust spam detection systems across the platform. Sanket holds a Master’s in Data Science from Rutgers University and a B.Tech from IIT Roorkee, with prior experience at Oracle, Red Hat, and Fuzzy Logix.

Talk Track: ML Lifecycle Security

Technical Level: 2

Talk Abstract:
As machine learning systems become deeply embedded in critical applications ranging from finance and healthcare to content moderation and national security their attack surface expands across the entire ML lifecycle. This talk presents a red team perspective on adversarial threats targeting each phase of the ML pipeline: from data poisoning during collection and labeling, to model theft and evasion in deployment, and manipulation of feedback loops post-launch. We explore real-world case studies and cutting-edge research, demonstrating how adversaries exploit blind spots in ML development and MLOps workflows. Attendees will gain a structured threat model, understand key attack vectors, and learn practical red teaming and hardening strategies to proactively secure ML systems.

What You’ll Learn:
1. ML systems are vulnerable at every stage of the lifecycle (data, training, deployment, feedback).
2. Adversarial threats vary by stage: data poisoning, model evasion, model extraction, prompt injection, and feedback manipulation.
3. Red teaming ML requires specialized tools and methods distinct from traditional security testing.
4. Data and feedback loops are high-risk, often-overlooked entry points for attackers.
5. Security must be proactive and continuous, not an afterthought post-deployment.
6. Monitoring, validation, and isolation mechanisms are essential across the pipeline.
7. Cross-functional collaboration between ML, security, and DevOps teams is critical.

Talk: Smart Fine-Tuning of Video Foundation Models for Fast Deployments

Presenter:
Zachary Carrico, Senior Machine Learning Engineer, Apella

About the Presenter:
Zac is a Senior Machine Learning Engineer at Apella, specializing in machine learning products for improving surgical operations. He has a deep interest in healthcare applications of machine learning, and has worked on cancer and Alzheimer’s disease diagnostics. He has end-to-end experience developing ML systems: from early research to serving thousands of daily customers. Zac is an active member of the Data and ML community, having presented at conferences such as Ray Summit, TWIML AI, Data Day, and MLOps & GenAI World. He has also published eight journal articles. His passion lies in advancing ML and streamlining the deployment and monitoring of models, reducing complexity and time. Outside of work, Zac enjoys spending time with his family in Austin and traveling the world in search of the best surfing spots.

Talk Track: ML Training Lifecycle

Technical Level: 3

Talk Abstract:
As video foundation models become integral to applications in healthcare, security, retail, robotics, and consumer applications, MLOps teams face a new class of challenges: how to efficiently fine-tune these large models for domain-specific tasks without overcomplicating infrastructure, overloading compute resources, or degrading real-time performance.

This session presents tips for selecting and intelligently fine-tuning video foundation models at scale. Using a state-of-the-art vision foundation model, we’ll cover techniques for efficient data sampling, temporal-aware augmentation, adapter-based tuning, and scalable optimization strategies. Special focus will be given to handling long and sparse videos, deploying chunk-based inference, and integrating temporal fusion modules with minimal latency overhead. Attendees of this talk will come away with strategies for quickly deploying optimally fine-tuned foundation models.

What You’ll Learn:
Attendees will learn practical strategies for efficiently fine-tuning and deploying video foundation models at scale. They’ll take away techniques for data sampling, temporal-aware augmentation, adapter-based tuning, and scalable optimization—plus methods to handle long/sparse videos and deploy low-latency, chunk-based inference with temporal fusion.

Talk: Why is ML on Kubernetes Hard? Defining How ML and Software Diverge

Presenters:
Donny Greenberg, Co-Founder / CEO, Runhouse | Paul Yang, Member of Technical Staff, Runhouse

About the Presenters:
Donny is the co-founder and CEO of 🏃‍♀️Runhouse🏠. He was previously the product lead for PyTorch at Meta, supporting the AI community across research, production, OSS, and enterprise. Notable projects include TorchRec, the open-sourcing of Meta’s large-scale recommendations infra, TorchArrow & TorchData, PyTorch’s next generation of data APIs.

At Runhouse, Paul is helping to build, test, and deploy Kubetorch at leading AI labs and enterprises for RL, training, and inference use cases. Previously, he worked across a range of ML/DS and infra domain areas, from language model tuning and evaluations for contextually aware code generation to productizing causal ML / pseudo-causal inference.

Talk Track: ML Training Lifecycle

Technical Level: 2

Talk Abstract:
Mature organizations run ML workloads on Kubernetes, but implementations vary widely, and ML engineers rarely enjoy the streamlined development and deployment experiences that platform engineering teams provide for software engineers. Making small changes takes an hour to test and moving from research to production frequently takes multiple weeks – these unergonomic and inefficient processes are unthinkable for software, but standard in ML. To explain this, we first trace the history of ML platforms and how early attempts like Facebook’s FBLearner as “notebooks plus DAGs” led to incorrect reference implementations. Then we define the critical ways that ML diverges from software, such as inability to do local testing due to data size and acceleration needs (GPU), heterogeneity in distributed frameworks and their requirements (Ray, Spark, PyTorch, Tensorflow, Dask, etc.), non-trivial observability and logging. Finally, we propose a solution, Kubetorch, which bridges between an iterable and debuggable Pythonic API for ML Engineers and Kubernetes-first scalable execution.

What You’ll Learn:
ML, especially at sophisticated organizations, is done on Kubernetes. However, there are no definitive reference implementations and well-used projects to date for ML-on-Kubernetes like Kubeflow have had mixed reactions from the community. Kubetorch is an introduction of a novel compute platform that is Kubernetes-native that offers a great, iterable, and debuggable interface into powerful compute for developers, without introducing new pitfalls of brittle infrastructure or long deployment times. In short, Kubetorch is a recognition that ML teams are demanding better platform engineering (rather than “ML Ops” / DevOps) and the right abstraction over Kubernetes is necessary to achieve this.

Talk: From Benchmarks to Reality: Embedding HITL in Your MLOps Stack

Presenter:
Micaela Kaplan, ML Evangelist, HumanSignal

About the Speaker:
Micaela Kaplan is the Machine Learning Evangelist at HumanSignal. With her background in applied Data Science and a masters in Computational Linguistics, she loves helping other understand AI tools and practices.

Talk Track: ML Training Lifecycle

Technical Level: 2

Talk Abstract:
Generative AI is reshaping how we build and deploy machine learning systems—but one thing hasn’t changed: human-in-the-loop (HITL) remains essential for quality and trust. Automated metrics can only take you so far; the real nuance often comes from human judgment about what’s right, wrong, and acceptable in context.

In this talk, we’ll explore why HITL is more important than ever in the genAI era and how to design it into your pipelines without slowing teams down.

You’ll learn:
Practical ways to integrate HITL for monitoring, evaluation, and feedback at scale.

How to engage subject-matter experts efficiently while reducing operational drag.

Real-world strategies to act on the data that matters most.

Whether you’re a data scientist, ML engineer, or MLOps practitioner, this session will give you a clear, hands-on roadmap to balancing automation with the human oversight that makes generative AI truly work in production.

What You’ll Learn:
How to incorporate Human In the Loop to your existing pipelines for trustworthy AI

Talk: Multilingual ML in Action: Building & Deploying Continental RnD’s First Predictive ML Model

Presenter:
Claudia Penaloza, Data Scientist, Continental Tires

About the Speaker:
Claudia Penaloza is a Data Scientist at Continental, leveraging her scientific background in Quantitative Ecology and expertise in machine learning to develop and industrialize predictive models for R&D, Manufacturing, Finance & Controlling, and Business Intelligence. At Continental, Claudia focuses on delivering practical, data-driven solutions that enhance efficiency and support informed decision-making across the company.

Talk Track: ML Training Lifecycle

Technical Level: 3

Talk Abstract:
What began as a proof of concept became Continental’s first predictive machine learning model for R&D, designed to forecast tire performance and accelerate development. To scale and industrialize, we used a language-agnostic MLOps platform, necessary for our multilingual project, which enabled team collaboration, high-performance distributed runs, and a smooth path to production. The result is a fully integrated model connected to PostgreSQL databases and Tableau dashboards, now part of the tire developer’s daily workflow—and a framework that is being applied to new use cases across the company.

What You’ll Learn:
Scaling ML success means more than building a model — it’s about creating a collaborative, language-agnostic MLOps foundation that turns proofs of concept into production-ready, repeatable business value.

Talk: Opening Pandora’s Box: Building Effective Multimodal Feedback Loops

Presenter:
Denise Kutnick, Co-Founder & CEO, Variata

About the Presenter:
Denise Kutnick is a technologist with over a decade of experience building multimodal systems and evaluation pipelines used by millions, with roles spanning large companies like Intel and high-growth startups like OctoAI (acquired by Nvidia). She is the Co-Founder and CEO of Variata, a company building AI that sees, thinks, and interacts like a user to run visual regression tests at scale and keep digital experiences reliable. Denise is passionate about tackling problems at the intersection of AI and UX.

Talk Track: Multimodal Systems in Production

Technical Level: 3

Talk Abstract:
AI market maps are overflowing with multimodal SDKs promising to blend vision, language, audio, and more into a seamless package. But when they fail in production, you may find yourself locked in without the visibility or tools to fix it.

In this talk, we’ll open the box and explore how to build and interpret multimodal feedback loops that keep complex AI systems healthy in production.

We’ll cover:
– Closed-box vs Open-box Workflows: How exposing intermediate signals in your agentic pipeline grants finer-grained control, faster debugging, and better calibration towards user needs.
– Defining the Right Evals: Why human-understandable checkpoints are essential for model introspection and human-in-the-loop review.
– Data Pipeline Building Blocks: Leveraging tooling such as declarative pipelines, computed columns, and batch execution to catch issues and surface improvements without slowing deployment.

What You’ll Learn:
Regardless of the model or SDKs you choose to build on top of, building the right scaffolding around it will open the box and give you control, visibility, and interpretability of your multimodal AI workflows.

Talk: Video Intelligence Is Going Agentic

Presenter:
James Le, Head of Developer Experience, TwelveLabs

About the Presenter:
James Le is currently leading Developer Experience at Twelve Labs – a startup building multimodal foundation models for video understanding. Previously, he has worked at the nexus of enterprise ML/AI and data infrastructure. He also hosted a podcast that features raw conversations with founders, investors, and operators in the space.

Talk Track: Multimodal Systems in Production

Technical Level: 4

Talk Abstract:
While 90% of the world’s data exists in video format, most AI systems treat video like static images or text—missing crucial temporal relationships and multimodal context. This talk explores the paradigm shift toward agentic video intelligence, where AI agents don’t just analyze video but actively reason about content, plan complex workflows, and execute sophisticated video operations.

Drawing from real-world implementations including MLSE’s 98% efficiency improvement in highlight creation (reducing 16-hour workflows to 9 minutes), this session demonstrates how video agents combine multimodal foundation models with agent architectures to solve previously intractable problems. We’ll explore the unique challenges of video agents—from handling high-dimensional temporal data to maintaining context across multi-step workflows—and showcase practical applications in media, entertainment, and enterprise video processing.

Attendees will learn how to architect video agent systems using planner-worker-reflector patterns, implement transparent agent reasoning, and design multimodal interfaces that bridge natural language interaction with visual media manipulation.

What You’ll Learn:
1. Why traditional approaches fail: Understanding the fundamental limitations of applying text/image AI techniques to video, and why agentic approaches are necessary for complex video understanding.

2. Video agent architecture patterns: How to design and implement planner-worker-reflector architectures that can maintain context across complex multi-step video workflows.

3. Practical implementation strategies: Real-world approaches to building transparent agent reasoning, handling multimodal interfaces, and orchestrating video foundation models.

4. Business impact and ROI: Concrete examples of dramatic efficiency improvements and how to identify high-impact use cases in their own organizations

Talk: Query Inside the File: AI Engineering for Audio, Video, and Sensor Data

Presenter:
Dmitry Petrov, Co-Founder & CEO, DataChain

About the Presenter:
Dmitry Petrov is the co-founder of DataChain.ai, where he focuses on enabling AI-native workflows over unstructured data and building open-source infrastructure for AI. He holds a PhD in Computer Science, is a former Data Scientist at Microsoft, and is the creator of the popular open-source tool DVC – Data Version Control.

Talk Track: Multimodal Systems in Production

Technical Level: 3

Talk Abstract:
AI systems often need just a slice of a large file – a clip from a video, a segment of audio, or a time-windowed spectrogram – not the entire thing. This talk explores real-world use-cases for querying inside video, audio, and sensor data stored in cloud storage – enabling tasks like segmentation, object detection, speaker filtering, and targeted, context-aware LLM inputs.

We will show how DataChain powers these workflows by transforming raw media into structured, queryable assets – directly from S3. We will show how complex data types like video clips, bounding boxes, and frame-level metadata become easy to represent and manipulate using Pydantic data models. These techniques enable more targeted LLM prompts, reduce input size, and improve both inference cost and output accuracy.

Instead of processing a full recording, we’ll show how to ask: “What’s happening in this 12-second clip where two people enter the car?” and how to build the pipeline that makes that possible.

What You’ll Learn:
The shift is to stop treating audio, video, or sensor files as big, unmanageable blobs. Instead, people work inside these files – slicing them into meaningful, searchable pieces like clips, segments, or events. This way, ML models and LLMs only process what really matters, making them faster, cheaper, and more accurate, and enabling practical large-scale applications.

Talk: Humans in the Loop: Designing Trustworthy AI Through Embedded Research

Presenter:
David Baum, UX Researcher & Design Strategist, Amazon

About the Presenter:
David Baum is a design strategist and UX researcher with over a decade of experience shaping AI-powered products at the intersection of human behavior, ethical design, and emerging technology. Currently leading UX research for Amazon Ads’ Generative AI portfolio, David works across disciplines to translate ambiguity into actionable insight, ensuring that cutting-edge models serve real human needs.

His past work spans healthcare, behavioral science, and enterprise innovation guiding product teams at organizations like Johnson & Johnson, Memorial Sloan Kettering, Cigna, and the U.S. Department of Veterans Affairs. David is especially focused on how AI reshapes cognition, decision-making, and user trust, and frequently explores the implications of AI on systems-level design, human-AI collaboration, and collective wellbeing.

He’s a frequent panelist and contributor on topics ranging from ethical AI to strategic foresight, and is known for his ability to bridge deeply technical domains with accessible, human-centered narratives.

Talk Track: Scoping and Delivering Complex AI Projects

Technical Level: 2

Talk Abstract:
As generative AI rapidly moves from lab to product, many teams are rushing to ship capabilities without understanding the lived experiences, risks, and edge cases that define real-world usage. This talk explores how embedding user research earlier–and more meaningfully–into AI development pipelines can do more than just mitigate harm. It can enhance product adoption, build user trust, and surface invisible needs that AI alone won’t catch.

Drawing on experience leading UX research for Amazon Ads’ generative AI portfolio and past work in healthcare, behavioral science, and public systems, I’ll show how user insights can serve as functional guardrails – shaping model boundaries, UI design, and feedback loops. We’ll also interrogate the frictionless design ethos that dominates AI tooling today, and ask: what does it mean to design for thoughtfulness rather than speed?

Whether you’re building AI-native products or adapting legacy systems, this talk will offer frameworks and provocations for making AI more accountable, more human, and more useful.

What You’ll Learn:
UX research is not just a validation tool, it’s a critical input to AI product strategy and model governance.

Friction isn’t failure: thoughtful UX friction can support better outcomes, greater user agency, and higher trust in AI systems.

Embedding research into AI workflows helps detect misalignment early, before launch, reducing risk and surfacing ethical blind spots.

Cross-functional collaboration (PMs, designers, engineers, scientists) must center the human, not just the model.

Designing for trust means understanding how users think, not just how models predict.

Talk: Productizing Generative AI at Google Scale: Lessons on Scoping and Delivering Ai Powered Editors

Presenter:
Kelvin Ma, Staff Software Engineer, Google Photos

About the Presenter:
Kelvin Ma is a Staff Software Engineer and a Technical Lead for the Creative Expressions team at Google Photos. As a founding engineer of the team responsible for all machine learning and editing features, he helps build and scale the tools that allow hundreds of millions of users to relive their most important memories. He is passionate about designing and building foundational infrastructure for on-device machine learning, solving complex technical challenges to create simple and intuitive products that operate at the intersection of technology and human connection.

Talk Track: Scoping and Delivering Complex AI Projects

Technical Level: 2

Talk Abstract:
Go behind the scenes of Google Photos’ Magic Editor, a premier example of productizing cutting-edge generative AI into a billion-user application. This talk will demystify the scoping and delivery of complex AI-powered editors, detailing the engineering feats required to integrate multimodal AI models that blend on-device and server-side processing for global scale. Attendees will gain actionable insights and hard-won lessons on navigating the practical challenges of AI product development, from initial concept to successful deployment and scaling across hundreds of millions of users and diverse device types

What You’ll Learn:
Shipping AI powered software products requires greater understanding of engineer, product, UX, and other concerns across multiple disciplines/roles. There is no longer 1 correct answer but rather a series of trade offs and balances to rein in the capabilities of LLMs to provide value to users.

Talk: Story is All You Need

Presenter:
Lin Liu, Director, Data Science, Wealthsimple

About the Presenter:
As Director of Data Science at Wealthsimple, Lin Liu architects AI/ML solutions that power the future of finance. His experience includes leading AI/ML consulting engagements for AWS clients at Amazon and creating flagship fraud and credit models for Capital One Canada. A patented inventor in credit scoring, Lin specializes in building scalable AI/ML solutions that bridge the gap between data science and tangible business value.

Talk Track: Scoping ML Projects in an AI Era

Technical Level: 2

Talk Abstract:
In the age of Generative AI, what if the most complex feature engineering could be replaced by simple storytelling? This talk introduces a novel paradigm for predictive analytics that challenges traditional modeling workflows. We demonstrate a powerful technique: translating raw, structured data—like transaction logs or application usage data—into coherent, text-based narratives, or “stories.”

We then feed these stories directly into Large Language Models (LLMs) and prompt them for a predictive score. This approach leverages the deep contextual understanding of LLMs to perform tasks that typically require bespoke models and intricate feature engineering.

We will explore real-world case studies, demonstrating how “stories” crafted from credit card transactions can accurately predict major life events. Similarly, we’ll show how narratives of a user’s app behavior can enable an LLM to detect subtle anomalies indicative of fraud, outperforming brittle, rule-based systems.

Join us to discover how transforming your data into stories can unlock a new frontier of predictive power and operational efficiency.

What You’ll Learn:
Attendees will leave with a practical framework for applying this “data-as-story” technique, understanding how it can radically simplify the MLOps pipeline and unlock the power of LLMs on classic predictive analytics problems.

Talk: Shipping AI That Works

Presenter:
Nicholas Luzio, AI Solutions Lead, Arize AI

About the Speaker:
Nick Luzio is an AI Solutions Lead at Arize AI.

Talk Track: LLM Observability

Talk Abstract:
Observability and evaluation are critical to knowing whether an agent is working—and why. In this talk, we’ll examine the challenges of building agents that operate reliably in practice. We’ll explore approaches for evaluating and refining agents during development, as well as monitoring and debugging them once deployed—sharing practical lessons and tools that help teams accelerate their work while maintaining trust in their systems.

What You’ll Learn:
How to ensure agents work reliably at scale with observability and evaluation, and how Arize can help.

Talk: Beyond the Vibe: Eval Driven Development

Presenter:
Robert Shelton, Applied AI Engineer, Redis

About the Speaker:
Robert is a builder with a background in data science and full stack engineering. As an Applied AI Engineer at Redis, he focuses on bridging the gap between AI research and real-world applications. In open source, he helps maintain the Redis Vector Library and contributes to integrations with LangChain, LlamaIndex, and LangGraph. He has delivered workshops and consulting engagements for multiple Fortune 50 companies and has spoken at conferences including PyData and CodeMash.

Talk Track: Scoping and Delivering Complex AI Projectsy

Talk Abstract:
AI systems are probabilistic, which makes “what’s better?” a deceptively hard question. Teams often chase silver bullets—new models, chunking tricks, retrieval hacks—without knowing what’s really moving the needle. The result: endless guessing, little confidence. Enter eval-driven development: a way to ground experimentation in metrics, define success up front, and turn every guess into a measurable signal. This talk shows how shifting from vibes to evals transforms the way we build with AI.

What You’ll Learn:
How to think about probabilistic system design and evaluation.

Talk: Purpose Built Data Agents - This is the Way

Presenter:
Josh Goldstien, Solutions Architect, Weaviate

About the Speaker:
Josh Goldstein is a seasoned search engineer who specializes in building intelligent retrieval systems that bridge the gap between machine learning and meaning. With over a decade of experience across enterprise search, MLOps, and production-grade infrastructure, Josh has architected large-scale solutions that help people find what matters. When he’s not coding, you can find him playing racket sports, running a marathon, or regretting mechanical bulls.

Talk Track: ML Agents in Production

Talk Abstract:
As enterprises transition from AI experimentation to production deployment, a critical challenge has emerged: the complexity of context engineering, the art and science of filling LLM context windows with precisely the right information. Unlike simple prompt engineering, context engineering involves carefully selecting and organizing the right information, examples, relevant data, tools, and conversation history to give the Robots EXACTLYwhat they need to work efficiently and accurately. Provide too little context and the AI will underperform; include too much information and costs will rise while quality drops. The balancing act requires more than prompt engineering, it necessitates conscientious, directed data management.

This talk advocates for a pragmatic alternative: purpose-built data agents. Drawing from Karpathy’s insight that modern AI applications require the evolution of prompt engineering into sophisticated context engineering, we’ll explore how specialized agents can efficiently manage the two foundational pillars of context: memory and knowledge sources.

Through real-world case studies from teams in MVP and production phases, attendees will discover:

Context engineering in practice: Strategies for dynamically accessing, filtering, and expanding knowledge sources based on user queries

Measurable business impact: How purpose-built agents accelerate time-to-value for data-driven organizations

Rather than manually managing context assembly, discover how purpose built data agents automatically perform the critical steps of context engineering: intelligent knowledgebase understanding, querying, filtering data, search expansion when appropriate, and dynamically assembling the right context for individual user requests. Integrating with the agentic framework provides a clear path to production readyAI applications that solve specific challenges efficiently without losing response accuracy.

What You’ll Learn:
Strategies and techniques for context engineering with memory and knowledge management

Understanding the of agentic Ai

Talk: Why GenAI Still Needs Humans in the Loop

Presenter:
Micaela Kaplan, ML Evangelist, HumanSignal

About the Speaker:
Micaela Kaplan is the Machine Learning Evangelist at HumanSignal. With her background in applied Data Science and a masters in Computational Linguistics, she loves helping other understand AI tools and practices.

Talk Track: ML Training Lifecycle

Talk Abstract:
Generative AI is powerful—but it doesn’t know right from wrong. In just five minutes, you’ll see why human-in-the-loop isn’t a nice-to-have, it’s the difference between trust and failure.

Benchmarks won’t catch the nuance. Metrics won’t capture the edge cases. Only people can. I’ll show you how to bring HITL into your pipelines so quality isn’t just measured—it’s protected

What You’ll Learn:
Metrics are important in GenAI, but Humans are the key to quality

Title: SLMs + Fine-Tuning: Building the Infrastructure for Multi-Agent Systems

Presenter:
Mariam Jabara, Senior Field Engineer, Arcee AI

About the Presenter:
Mariam has been in the AI space for the last 5 years, in both academic and professional capacities. Currently, she works as a Senior Field Engineer for Arcee AI, the pioneers of Small Language Models (SLMs) who are now offering SLM-powered agentic AI solutions. She has previous experience in AI engineering, sales, and research at companies such as Google Research and Deloitte. Her values are rooted in building community, diversity and inclusion, and using AI responsibly to solve problems contribute to the betterment of society.

Talk Track: Agents in Production

Talk Abstract:
Enterprises are discovering the limits of massive general-purpose LLMs: high costs, heavy infrastructure, and security risks when sensitive data leaves controlled environments. Small Language Models (SLMs) offer a practical alternative.

In this talk, I’ll share lessons from building and fine-tuning SLMs, including our release of a new small foundation model. I’ll show how SLMs enable domain-specific performance, stronger security through local deployment, and why they often outperform larger models in multi-agent workflows with lower latency and higher reliability.

Attendees will leave with a clear view of why SLMs are the best candidates to power multi-agent systems, balancing performance, cost, and trustworthiness for real-world MLOps.

What You’ll Learn:
Small Language Models are the best way to power multi-agent systems because they deliver domain-specific performance, stronger security, and greater efficiency than large general-purpose LLMs.

Talk: DataFrames for the LLM Era: Turning Inference into a First-Class Transform with Fenic

Presenter:
Yoni Michael, Co-Founder, typedef

About the Speaker:
Yoni Michael is the co-founder of typedef, a serverless data platform purpose-built to help teams process AI workloads with LLM-powered workflows at scale. With a deep background in infrastructure, Yoni has spent over a decade building systems at the intersection of data and AI, including leading infrastructure engineering teams at Tecton and Salesforce. He previously had a startup called Coolan in the data center analytics space that was acquired by Salesforce.

Talk Track: Data Engineering in an LLM era

Talk Abstract:
DataFrames for the LLM Era: Turning Inference into a First-Class Transform with Fenic

What You’ll Learn:
– A clear mental model for treating inference like any other transform in a pipeline.
– Patterns for mixing OLAP-style operators with semantic operators for scale and cost control.
– Practical examples of building production-grade, multi-step LLM workflows with Fenic.

Whether you’re processing millions of chat transcripts, labeling content, or enriching raw data with AI, you’ll see how the right abstractions let you go from “it works on my notebook” to “it works in production” without reinventing your data platform.

Talk: AI-Powered Development Productivity in Finance

Presenter:
Devdas Gupta, Senior Manager Software Development and Engineering Lead, Charles Schwab

About the Speaker:
Tech Lead at Charles Schwab, Austin TX | IEEE Senior Member | BCS -Fellow | Tech Speaker & Community Contributor
Technology leader with 20+ years of experience in intelligent system design, cloud technologies, microservices, and AI/ML, Agent AI. Passionate about building resilient, scalable systems and actively contributing as a speaker, judge, and mentor in global tech communities.

Talk Track: AI Agents for Developer Productivity

Talk Abstract:
AI agents are transforming the developer experience. From smart copilots that assist with coding to autonomous assistants that handle testing, debugging, and deployments, these tools are freeing developers to focus on creativity, innovation, and problem-solving.

In this talk, I’ll share how teams are already using AI agents to boost productivity, reduce repetitive work, and accelerate delivery in real-world cloud-native environments. We’ll explore practical strategies, real examples, and lessons learned from integrating these agents into the development workflow.

Looking ahead, AI agents won’t just support developers—they’ll collaborate, learn, and evolve alongside them. This session offers a glimpse into that future, and how we can start building toward it today.

What You’ll Learn:
AI agents boost developer productivity
They assist, not replace, humans
Real-world adoption is achievable
Cloud-native is the ideal foundation
The future is human–AI collaboration
Start small, scale smart

Talk: Explainable AI at Fujitsu North America Inc.

Presenter:
Dippu Kumar Singh, Leader For Emerging Data & Analytics, Fujitsu North America Inc.

About the Speaker:
Dippu is a strategic Data & Analytics leader and thought leader in emerging solutions, including Computer Vision and Generative AI/LLMs. He drives significant Fujitsu North American sales pursuits and offering development, applying deep expertise in cloud platforms (Azure, AWS, GCP), big data frameworks (Spark, MLOps), and end-to-end data architecture. He possesses extensive experience in designing and implementing comprehensive Data & Analytics solutions—from ingestion and ETL to advanced analytics and presentation—across telecom, manufacturing, and public sector domains. His technical prowess is underscored by certifications like Azure Solution Architect, Databricks Engineer Expert, and Palantir Foundry Expert.

Talk Track: ML Lifecycle Security

Talk Abstract:
Value: Provides a quick reference into how organizations can use XAI to support investigations of suspicious behavior to determine legitimacy at scale.

Key Contents:
– How are the realities of risk and fraud challenging the effectiveness of organizations? ( e.g. Procurement Fraud, Environmental, social and governance fraud, Financial fraud etc.)
– Why is using explainable artificial intelligence (XAI) essential for risk and fraud management? (e.g. Performance and bias, Accountability and regulations, Transparency and trust)- What is needed to adopt explainable AI (XAI) successfully? ( e.g. Engineering to be explainable, Anticipating evolving regulations, Balancing performance)- Key Recommendations

What You’ll Learn:
How organizations can use XAI to support investigations of suspicious behavior to determine legitimacy at scale.

Talk: Agent-Powered Code Migration at Realtor.com

Presenter:
Naveen Reddy Kasturi, Staff Machine Learning Engineer, Realtor.com

About the Speaker:
Naveen Reddy Kasturi is a seasoned AI and ML leader with over 13 years of experience building intelligent systems at scale. He currently serves as a Staff Machine Learning Engineer at Realtor.com, where he leads the development of Generative AI–powered search experiences and the modernization of the company’s ML platform infrastructure to deliver scalable, production-ready models.

Prior to Realtor.com, Naveen was a founding member of the ML team at Typeform, where he spearheaded the company’s earliest Generative AI initiatives. He built LLM-powered features such as AskAI, a natural language interface for analyzing response data, as well as intelligent assistants for form creation, smart insights, and lead scoring. His work integrated cutting-edge technologies like AWS Bedrock, LangChain, LlamaIndex, and PineCone into production systems, while also advancing LLM evaluation frameworks, RAG pipelines, and fine-tuning strategies.

Earlier in his career, Naveen applied ML across diverse industries—designing predictive maintenance for locomotives at GE, anomaly detection in IT systems at Société Générale, and developing safety-critical software at Bosch. His journey reflects a unique blend of research-driven innovation and hands-on engineering to bring advanced AI capabilities into real-world applications.

A frequent cross-functional collaborator, Naveen is passionate about bridging the gap between ML research and practical deployment, with deep expertise in MLOps, LLM integration, infrastructure scaling, and AI-powered product innovation. He continues to mentor teams, publish thought leadership, and drive forward-looking applications of AI that enhance business outcomes and user experiences

Talk Track: AI Agents for Developer Productivity

Talk Abstract:
Migrating large-scale machine learning and data workflows is often tedious, error-prone, and resistant to automation. At Realtor.com, we faced the challenge of moving nearly 100 ML pipelines from a legacy, self-managed Metaflow setup on AWS Batch to a fully managed, RDC-compliant Outerbounds Metaflow environment on AWS EKS—without breaking functionality or slowing down innovation.

In this talk, I will share how we combined static code analysis, AI-powered pattern recognition, and LLM-assisted code rewriting to accelerate migration at scale. Using Claude via AWS Bedrock, we built an AI-assisted workflow that scanned entire repositories for deprecated constructs, generated migration blueprints, and automatically rewrote code with human-in-the-loop validation. This approach reduced weeks of manual effort, migrated ~30,000 lines of code, and enabled a reliable framework for future platform shifts (e.g., upgrading from Python 3.9 to 3.12).

Beyond the technical details, this session highlights how AI can act as a force multiplier in MLOps, transforming code migration from a painful, manual process into a repeatable, inspectable, and scalable pipeline. Attendees will walk away with insights into building AI-augmented developer workflows, balancing automation with governance, and applying generative AI to real-world engineering challenges

What You’ll Learn:
AI is not magic—context is everything:
Effective AI-assisted code migration depends on strong foundations like static analysis, pattern recognition, and injecting the right domain-specific context into prompts.

AI can turn painful migrations into scalable workflows:
By combining LLMs with programmatic analysis, organizations can automate large portions of legacy code migration, saving weeks of engineering effort while maintaining reliability.

Human-in-the-loop is non-negotiable:
AI-generated code must go through validation and governance loops to ensure correctness, compliance, and maintainability. AI accelerates, but humans direct and safeguard.

Pattern recognition unlocks platform-level thinking:
Once you codify patterns from static repo scans, the same approach can be reapplied for future shifts—such as framework upgrades, Python version migrations, or platform compliance needs.

AI-augmented developer workflows are the future of MLOps:
Embedding generative AI into engineering pipelines opens a new frontier where infrastructure upgrades, code rewrites, and large-scale ML operations become faster, repeatable, and less error-prone

Talk: A Modular Framework for Building Agentic Workforces at Marriot International

Presenter:
Nitin Kumar, Director Data Science, Marriott International

About the Speaker:
Nitin Kumar is the Director of Data Science at Marriott International, where he leads the design and deployment of enterprise-scale AI and generative solutions across 30 global brands. With over 20 years of experience in technology—including CRM modernization, process optimization, and applied machine learning—he specializes in translating cutting-edge AI into operational impact at scale.

Nitin holds a Master’s in Data Science from the University of Illinois Urbana-Champaign and actively contributes to the research community through peer-reviewed publications on topics like ethical AI, LLM-based translation benchmarking, and multi-view clustering. He also serves as a judge for global AI and innovation awards.

He was recently selected as a panelist at the Ai4 2025 Conference in Las Vegas, where he will speak on building and deploying autonomous AI agents.

Talk Track: Augmenting workforces with Agents

Talk Abstract:
As organizations increasingly leverage large language models (LLMs) to automate knowledge work—from content creation to summarization and multilingual communication—new challenges arise around trust, governance, and quality control. This session introduces a modular framework for building agentic workforces: semi-autonomous AI systems that work in tandem with human oversight to deliver scalable, high-quality outcomes. The approach blends LLM-driven generation, automated self-evaluation using LLM-as-a-judge techniques, and human-in-the-loop (HitL) checkpoints for verifying accuracy, tone, and contextual appropriateness.

We will share practical implementation patterns such as: (1) triaging outputs by confidence scores to route uncertain cases to humans, (2) defining escalation triggers using prompt-based evaluations, (3) layering automated evaluation rubrics for translation and content quality, and (4) integrating feedback loops to fine-tune both model performance and human review criteria over time. These design principles help teams deploy generative AI responsibly—balancing speed with control—and are applicable across domains like marketing, customer support, documentation, and multilingual communication.

What You’ll Learn:
Attendees will learn how to design scalable, semi-autonomous AI workflows that combine LLM-driven content generation, automated evaluation (LLM-as-a-judge), and human-in-the-loop oversight to ensure quality, trust, and accountability. The talk emphasizes practical patterns for triaging outputs, setting escalation thresholds, and integrating feedback loops, while also extending these principles to multilingual translation and evaluation. The key takeaway is that agentic workforces—blending AI efficiency with human judgment—offer a responsible and scalable path for deploying generative AI in real-world, high-stakes environments.

Talk: Streamlining ML collaboration at Dick's Sporting Goods

Presenter:
Ravi Shankar, Manager, Data Science, Dick’s Sporting Goods

About the Speaker:
Ravi Shankar is working as the Manager of Data Science at Dick’s Sporting Goods where he focuses on applying artificial intelligence to solve real-world problems in digital commerce. Over the past decade, he has contributed to multiple projects involving data-driven consumer engagement and machine learning applications in retail. His work often involves analyzing the evolution of recommendation algorithms and their real-world implications for operational scalability, privacy compliance, and user experience. Ravi has collaborated with technology firms, academic institutions, and startups to help shape AI implementation strategies that align with both business objectives and ethical standards.

Talk Track: Scoping and Delivering Complex AI Projects

Talk Abstract:
This talk explores how ML teams in enterprise overcome complexity and scale by creating shared standards, collaborative workflows, and common/standard infrastructure. It covers strategies for alignment, reproducibility, and effective communication, illustrated with a real-world case study from a sports retail company, highlighting measurable improvements in deployment speed, error reduction, and ROI. Attendees will leave with actionable steps to streamline ML collaboration in their own organizations.

What You’ll Learn:
Learn how large ML teams streamline collaboration, share infrastructure, and deliver models faster without losing alignment.

Talk: ROI of Gen AI Frontier Models vs Traditional Models in Finance Industry

Presenter:
Prasanth Nandanuru, Managing Director, Wells Fargo

About the Speaker:
Technology evangelist with more than 25 year experience ranging across multiple digital transformational efforts blending technology strategies and execution with business outcomes and customer savviness . The areas of expertise I have cut across technology strategies in AI , edge/cloud computing that bring radical innovation in an enterprise solving existing business gaps and setting context for scalable and extensible business models

Talk Track: Data Engineering in an LLM era

Talk Abstract:
Designing Data driven applications for real time insights using frontier models

What You’ll Learn:
As enterprises accelerate their digital transformation journeys, the adoption of Generative AI (GenAI) — particularly frontier foundation models — is reshaping value creation across industries. This talk elaborates on the return on investment (ROI) of GenAI frontier models compared to traditional machine learning and rule-based AI approaches, focusing on multi-dimensional impact metrics such as time-to-market, total cost of ownership, model adaptability, talent leverage, and downstream operational efficiency.

Talk: Building Sustainable GenAI Systems at Target

Presenter:
Balaji Varadarajan, Lead AI Engineer – Digital Personalization, Target Enterprise

About the Speaker:
Currently a Lead AI/ML Engineer at Target Corporation, I am at the forefront of integrating GenAI and MLOps into enterprise workflows—driving innovations in retail personalization, scalable model deployment, and real-time analytics.

Talk Track: MLOps for Smaller Teams

Talk Abstract:
In the age of GenAI, it’s tempting to chase rapid prototypes and flashy demos—but real business value lies in building sustainable, scalable systems that work beyond the lab. In this talk, we’ll cut through the hype and dive deep into the MLOps infrastructure required to support production-ready GenAI applications at scale.

What You’ll Learn:
Power of MLOps in getting a product to production in a quicker & sustainable way

Whether you’re processing millions of chat transcripts, labeling content, or enriching raw data with AI, you’ll see how the right abstractions let you go from “it works on my notebook” to “it works in production” without reinventing your data platform.

Talk: The Real Problem building Agentic applications (And How MLOps Solves It)

Presenter:
Alexej Penner, Founding Engineer, ZenML

About the Speaker:
As a founding engineer at ZenML, Alexej is at the forefront of solving today’s MLOps challenges. His journey began in the trenches of ML, building everything from object detection models for edge devices to complex forecasting systems. After leading AI product development at the data labeling company Datagym, he saw the critical need for better MLOps tooling and joined ZenML. There, he now drives core product development, guides its direction, and works hands-on with users to bring their ML projects to life.

Talk Abstract:
For years, we’ve honed the MLOps playbook to turn fragile ML models into reliable, production systems. We learned that success depends on principles like modularity, reproducibility, and lineage. Now, with the rise of LLMs, we’re facing a new wave of brilliant but chaotic prototypes. The core question is: do we throw away our playbook, or do we evolve it?

This session argues for evolution. We’ll demonstrate how the hard-won principles of MLOps provide the perfect foundation for the emerging world of LLM Ops. We’ll take a simple LLM prototype and, in a live demo, transform it into a structured ZenML pipeline. Then, we’ll showcase the next step in the MLOps journey: serving the entire pipeline as a live, interactive API endpoint. We will explore this endpoint directly from the ZenML dashboard, showing how to inspect it, run sample invocations, and get the full traceability of a classic MLOps pipeline for every single interactive call. This is the roadmap for what your AI platform could be.

Talk: Agentic Metaflow in Action

Presenter:
Ville Tuulos, Co-Founder, CEO, Outerbounds

About the Speaker:
Ville Tuulos is the co-founder and CEO of Outerbounds, a platform that empowers enterprises to build production-ready, standout AI systems. He has been building infrastructure for machine learning and AI for over two decades. Ville began his career as an AI researcher in academia, authored Effective Data Science Infrastructure, and has held leadership roles at several companies—including Netflix, where he led the team that created Metaflow, a widely adopted open-source framework for end-to-end ML and AI systems.

Talk Abstract:
We will show a live demo of the new agentic features of Metaflow!

Talk: A Simple Recipe for LLM Observability

Presenter:
Vincent Koc, Lead AI Researcher & Developer Relations, Comet

About the Speaker:
Vincent Koc is a globally recognized lecturer, futurist, and keynote speaker known for his work as an artificial intelligence engineer and technical leader. As AI Research Engineer and Developer Relations Advocate at Comet, his work centers on solving cutting-edge AI development challenges in repeatable and scalable ways and sharing newly discovered methodologies with fellow AI practitioners. With over two decades of experience across industries such as finance, telecommunications, travel, and FMCG, he has led data-driven projects for major organizations including Qantas, McDonald’s, Cisco, and the Australian Federal Government. A fellow at the Institute of Managers and Leaders Australia, Vincent is dedicated to advancing the field of data, mentoring future professionals, and informing the public about rapid technological change.

Talk Abstract:
Developing LLM-based applications for production requires a new approach to monitoring. Unlike traditional software, these probabilistic systems can hallucinate, drift, or degrade in unpredictable ways. The best way to learn AI concepts is by tinkering hands-on. So I built an LLM-powered recipe generator to help with my home cooking, and set up an end-to-end monitoring strategy to keep it on budget and behaving as expected.

In this talk, I’ll walk through how I configured traces in this project using Comet’s open-source tool Opik to track cost and quality. I’ll also show how I built custom business metrics with LLM-as-a-Judge to capture issues specific to the recipe generator, and I’ll show the same approach can be adapted to your own use case when out-of-the-box metrics fall short. With a few adaptable code snippets and a simple framework, you’ll leave knowing how to add robust observability to your own LLM apps, making it easier to detect, debug, and improve systems at scale.

Talk: A Simple Recipe for LLM Observability

Presenter:
Claire Longo, Lead AI Researcher, Comet

About the Presenter:
Claire Longo is an AI leader and Mathematician with over a decade of experience in Data Science and AI. She has led cross-functional AI teams at Twilio, Opendoor, and Arize AI and is currently a Lead AI Researcher at Comet. She holds a Bachelor’s in Applied Mathematics and a Master’s in Statistics from The University of New Mexico. Beyond her technical work, Claire is a Speaker, Advisor, YouTuber, and Poker Player. She is dedicated to mentoring Engineers and Data Scientists while championing diversity and inclusion in AI. Her mission is to empower the next generation of AI practitioners.

Talk Abstract:
Developing LLM-based applications for production requires a new approach to monitoring. Unlike traditional software, these probabilistic systems can hallucinate, drift, or degrade in unpredictable ways. The best way to learn AI concepts is by tinkering hands-on. So I built an LLM-powered recipe generator to help with my home cooking , and set up an end-to-end monitoring strategy to keep it on budget and behaving as expected. In this talk, I’ll walk through how I configured traces in this project using Comet’s open-source tool Opik to track cost and quality. I’ll also show how I built custom business metrics with LLM-as-a-Judge to capture issues specific to the recipe generator, and I’ll show the same approach can be adapted to your own use case when out-of-the-box metrics fall short. With a few adaptable code snippets and a simple framework, you’ll leave knowing how to add robust observability to your own LLM apps, making it easier to detect, debug, and improve systems at scale.

Talk: What gets AI Agents to Production

Presenter:
Chris Matteson, Head of Sales Engineering, Union.ai

About the Speaker:
Chris Matteson is Head of Sales Engineering at Union AI, where he helps customers tackle their toughest machine-learning infrastructure challenges by bringing together a passion for AI and DevOps with deep startup experience.
A seasoned startup leader and technical problem-solver, Chris has spent more than a decade at companies including Puppet, HashiCorp, Prisma, and Fermyon. He’s worn hats from Founder/CEO to Solutions Engineering, Sales, and Consulting, writing early feature code that evolved into core enterprise offerings and architecting scalable processes for open-source–to-enterprise transitions.

Talk Abstract:
A widely cited MIT study recently found that 95% of AI projects fail to move the needle on the P&L. The promise of AI is clear, but how do we bridge the gap between prototypes and production-ready AI Agents in 2025?
This talk introduces a practical framework for aligning business needs with the right mix of technologies and architectures to make agentic projects succeed. We’ll blitz through trade-offs across quality, speed, and cost, and highlight a process for mapping the true limits of possibility when combining today’s most powerful AI tools.

Talk: Techniques to build high quality agents faster with MLflow

Presenter:
Danny Chiao, Engineering Lead, Databricks

About the Speaker:
Danny Chiao is an engineering lead at Databricks, leading efforts around data observability (data quality, data classification) and agent quality. Previously, Danny led efforts at Tecton (+ Feast, an open source feature store) and Google to build ML infrastructure and high scale ML powered features. Danny holds a Bachelor’s Degree in Computer Science from MIT.

Talk Abstract:
One of the top challenges in building an agent is ensuring high quality outputs. Today, this involves labeling and analyzing traces by hand and iterating on the agent code. In this talk, you’ll learn how to use MLflow to accelerate this process and quickly build a high quality agent, leveraging techniques used by leading companies to deploy agents in production.

Talk: AI Catalog by JFrog - Control access to Open Source LLM's

Presenter:
Hudson Buzby, Solutions Architect, JFrog

About the Speaker:
Hudson Buzby is a solution engineer with a strong focus on MLOps and LLMOps, leveraging his expertise to help organizations optimize their machine learning operations and large language model deployments. His role involves providing technical solutions and guidance to enhance the efficiency and effectiveness of AI-driven projects.

Talk Abstract:
AI Catalog is a new product by the JFrogML team at Jfrog that allows you to create and enforce dynamic rules and policies around the open source models that your developers and data scientists are permitted to access and deploy. AI Catalog provides a platform to discover, govern, and deploy open source models safely at scale while staying compliant with your organization’s legal and governance policies.

Talk: Unified Control Plane for Enterprise GenAI: Powered by Agentic Deployment Platform with Central AI Gateway & MCP Integration

Presenter:
Nikunj Bajaj, CEO, TrueFoundry

About the Speaker:
Nikunj is the co-founder and CEO of TrueFoundry, a platform helping enterprises build, deploy and ship LLM applications in a fast, scalable, cost efficient way with right governance controls within their own cloud. Prior to this role, he served as a Tech Lead for Conversational AI at Meta, where he spearheaded the development of proactive virtual assistants. His team also put Meta’s first deep learning model on-device. Nikunj also led the Machine Learning team at Reflektion, where he built an AI platform to enhance search and recommendations for over 600 million users across numerous eCommerce websites. He has done his bachelors in Electrical Engineering from IIT Kharagpur and masters in Computer Science from UC Berkeley.

As a visionary leader in the enterprise AI space, Nikunj is a sought-after speaker at premier technology conferences and summits, sharing his expertise on production AI deployment and enterprise ML strategies. His speaking portfolio includes keynotes and expert panels at MLOps Community events hosted with 50 speakers discussing LLMs in production alongside industry leaders from Stripe, Meta, Canva, Databricks, Anthropic, and Cohere, GenAI Summit San Francisco 2024 which attracted over 30,000 attendees and 200+ industry leaders at the historic Palace of Fine Arts, LLM Avalanche (part of Data+AI Summit by Databricks) technical meetups featuring 20 world experts and attracting 1,000 attendees in San Francisco, Global Big Data Conference where he was featured as a speaker at the Global Artificial Intelligence Conference, and MLOps World connecting over 15,000 members exploring best practices for ML/AI in production environments.

Talk Abstract:
As generative AI evolves from experimental tools to mission-critical enterprise applications, organizations face unprecedented operational complexity. Modern AI systems now orchestrate multiple models, invoke diverse tools, and span hybrid infrastructures, creating challenges around inconsistent APIs, model outages, unpredictable latency, complex rate limiting, and mounting governance requirements. Without centralized control, enterprises struggle with vendor lock-in, compliance gaps, runaway costs, and fragmented observability across their distributed AI ecosystems.

This session introduces the AI Gateway pattern—a critical architectural component that serves as the central control plane for enterprise AI systems. We’ll explore practical solutions including unified API abstraction, intelligent failover mechanisms, semantic caching, centralized guardrails, and granular cost controls. You’ll learn technical architecture patterns for building high-availability gateways that handle thousands of concurrent requests with sub-millisecond decision-making, plus emerging integration patterns like Model Context Protocol (MCP) for managing entire tool ecosystems.

Whether you’re an architect, platform engineer, or technical leader, you’ll gain actionable insights, architectural blueprints, and a practical framework for implementing scalable AI infrastructure that grows with your organization’s AI maturity.

Talk: MLOps for Agents: Bringing the Outer Loop to Autonomous AI

Presenter:
Hamza Tahir, Co-Founder, ZenML

About the Speaker:
Hamza Tahir is a software developer turned ML engineer, with a passion for turning ideas into real, data-driven products. An indie hacker at heart, he has built projects like PicHance, Scrilys, BudgetML, and you-tldr. After deploying ML in production for predictive maintenance use-cases in his previous startup, he co-created ZenML, an open-source MLOps framework. Today, ZenML is evolving into the foundation for agentic AI systems—helping teams build, orchestrate, and scale autonomous ML pipelines and AI agents on any infrastructure stack.

Talk Abstract:
Most of today’s excitement around AI agents focuses on prompts, tools, and clever behaviors—the inner loop of development. But just like with machine learning models, real-world adoption demands more than prototypes. Without reproducibility, monitoring, evaluation, and continuous improvement, agents remain demos, not production systems.

In this talk, I’ll argue that we need to bring MLOps principles into agent development. By applying the outer loop—data collection, training, benchmarking, deployment, and feedback—we can move from one-off agents to robust, scalable, and trustworthy AI systems. Drawing on my experience co-creating ZenML, I’ll show how the lessons learned from operationalizing ML pipelines apply directly to this new era of agentic AI, and what infrastructure patterns teams can adopt today to close the gap between experimentation and production.

Talk: Memory and Memory Accessories: Building an Agent from Scratch

Presenter:
Robert Shelton, Applied AI Engineer, Redis

About the Speaker:
Robert is a builder with a background in data science and full stack engineering. As an Applied AI Engineer at Redis, he focuses on bridging the gap between AI research and real-world applications. In open source, he helps maintain the Redis Vector Library and contributes to integrations with LangChain, LlamaIndex, and LangGraph. He has delivered workshops and consulting engagements for multiple Fortune 50 companies and has spoken at conferences including PyData and CodeMash.

Talk Abstract:
AI agents don’t have to be black boxes. In this live demo, we’ll show how to create a production-ready agent fully deployed on AWS from scratch, without bulky frameworks — just FastAPI, OpenAI, Redis, and Docket for async task orchestration. By the end, we’ll have an agent capable of multi-turn conversations that draw from both short- and long-term memory, showing how memory powers real-time reasoning, context retention, retrieval, and custom tool calls such as web search with Tavily.

Talk: Build the Next Generation of Agent Workforce with AG2

Presenter:
Qingyun Wu, CEO, AG2

About the Speaker:
Dr. Qingyun Wu is the Founder & CEO of AG2AI, Inc. (AG2), a venture-backed company pioneering the next generation of agentic AI platforms. She is also an Assistant Professor (on leave) at Penn State University, with a research focus on machine learning, reinforcement learning, and AI agents.

Qingyun is the co-creator of AutoGen (now AG2), one of the most widely adopted open-source multi-agent frameworks, and FLAML, a leading AutoML library. Her research has been recognized with multiple best paper awards and spotlights at top AI conferences such as NeurIPS and ICML.

At AG2, Qingyun is building the foundation for the agent workforce of the future, enabling enterprises to orchestrate parallel, collaborative AI agents that deliver real-world impact—from legal automation and software testing to enterprise workflow orchestration

Talk Abstract:
The future of work will be powered by AI agents working as a digital workforce—collaborating, specializing, and running in parallel to solve problems once thought unsolvable. In this session, Dr. Qingyun Wu, Founder & CEO of AG2, will share how organizations are already deploying this workforce with AG2. Real-world use cases include automating legal case preparation to cut turnaround by nearly a week, accelerating software test generation in compliance-heavy industries, and orchestrating enterprise workflows across tools like Slack, Google Docs, and APIs. A live demo of AG2’s Agent Playground will show how natural-language instructions instantly become coordinated, multi-agent workflows—bringing the agent workforce vision to life

Talk: What Does A Foundational Data Layer For The AI Era Look Like?

Presenter:
Vishakha Gupta-Cledat, Co-Founder / CEO, ApertureData

About the Speaker:
Vishakha Gupta-Cledat is Cofounder and CEO of ApertureData, a startup that offers a unique vector-graph database for multimodal AI. She holds a Ph.D in Computer Science from Georgia Tech, a M.S. in Information Networking from CMU and has worked on heterogeneous multi-core environments, graph databases, and multimodal data management challenges for AI.

Talk Abstract:
Machine learning began with recognition tasks, then over the last two years generative AI became woven into our daily lives bringing interaction and creativity, but remained largely limited to a single modality: text. Now, with the rise of AI agents, we see initiative, action, and orchestration, yet again, still grounded primarily in text and mainly relying on semantic search. This is a rapidly growing space, but it lacks the essential qualities of human memory: contextual awareness, multimodal understanding, and efficiency.

In this demo, we will show how easy it is to ingest and manage multimodal data in ApertureDB, and use it as the foundational data layer for building your Smart AI Agents or GenAI applications. With examples showing the integrated AI workflows and plug-ins, we will talk about how ApertureDB delivers the right memory to power intelligent systems at enterprise scale.

Talk: Live Demo - World's First ‍Data Agentic AI With Business Logic Intelligence

Presenter:
Aish Agarwal, CEO, Connecty AI

About the Speaker:
Aish Agarwal is the CEO and co-founder of Connecty AI, the world’s first data agentic AI platform with built-in business logic intelligence. He brings 15+ years of executive experience in customer data science, having led two $600M+ SaaS exits and held leadership roles at FL Studio, MAGIX, Rakuten, and Rocket Internet.

Talk Abstract:
Explore how Connecty AI, the world’s first data agentic AI with business logic intelligence, delivers chat-based data analytics powered by deep reasoning and an autonomous semantic graph. See how data and business teams can finally get consistent, reliable answers to their most critical questions in seconds.