RSVP Free Online Stage (Oct 6-7)

Join the Conference

All sessions and workshops curated by leading AI/ML practitioners

Keynote

(Click on Speaker's headshot to See Abstract)

Agents in Production

AI Agents for Developer Productivity

AI Agents for Model Validation and Deployments

Augmenting Workforces with Agents

Data Engineering in an LLM era

Evolution of Agents

Governance, Auditability & Model Risk Management

Latest MLOps Trends

LLMs on Kubernetes

LLM Observability

ML Collaboration in Large Organizations

ML Lifecycle Security

ML Training Lifecycle

MLOps for Smaller Teams

Multimodal Systems in Production

Scoping and Delivering Complex AI Projects

Scoping ML Projects in an AI Era

Lightning Talks

Speakers Corner

The Next Wave of AI

more coming soon

Agenda

This agenda is still subject to changes.

Join free virtual sessions October 6–7, then meet us in Austin for in-person case studies, workshops, and expo October 8–9

Powered By Whova
Best App for Conferences

Talk: Building Open Infrastructure for the Agentic Era

Presenter:
Bryan McCann, CTO, You.com

About the Speaker:
Bryan McCann is the co-founder and CTO of You.com. Previously, he was a Lead Research Scientist at Salesforce Research working on Deep Learning and its applications to Natural Language Processing (NLP) and presented his work directly to customers on the keynote stage at Dreamforce, on behalf of collaboration between Salesforce, Google, and Meta at the Pytorch Developers Conference (and a second time), as well as at broad, business-facing venues like VentureBeat Transform.
He authored the first paper and holds the patent on contextualized word vectors, which eventually led to the transfer learning revolution in NLP with BERT and other transformer-based architectures for contextualized word vectors. Other notable work includes early unified models for multi-tasking in NLP, training the largest public and open source language model in the world in 2019, and applying language models to biology in which his team generated proteins that were synthesized by a lab and shown to be as or more effective than those in nature. His work has been cited thousands of times and he has spoken about the cutting edge of NLP and AI at research labs around the world.
Bryan’s work comes out of a deep philosophical interest in meaning and the desire to use AI to complement human creativity, inspire new thoughts, and ultimately develop tools for more fulfilling lives. He was the recipient of the 1st ever eVe award at SXSW 2021 for his
collaboration with award-winning (and Netflix show writing) author Daniel Kehlmann. He is a regular speaker on topics of literature and AI, poetry and AI, and other crossovers between AI and the arts.

Talk Abstract:
We’re entering an era where AI agents will interact with the web more than humans ever have, but the infrastructure of the internet was built for humans – much of it for consumers – rather than agents working on behalf of humans. This has spurred a race to grab land in this new era of agentic-driven economics and a round of defensive measures – closing APIs and walling off gardens to push users and enterprises towards consolidation within a single ecosystem. Consequently, there’s a widening gap between foundation models and working applications.

All of the infrastructure we’ve built for the consumer web needs to be rebuilt for agents – even when they’re doing our consumer and professional activities on our behalf. It is crucial that this new infrastructure is open to all rather than more closed off than what currently exists. Otherwise, every business risks devolving into a mere data provider to the dominant platforms, a trend we’re already seeing.

This talk explores what agentic infrastructure actually looks like, why it’s essential for innovation, and how the MLOps World community can help build the foundation layer that enables sophisticated AI applications without platform lock-in.

Talk: Doxing Dark Money: Entity Resolution to Empower Downstream Ai Applications in Anti-Fraud

Presenter:
Paco Nathan, Principal DevRel Engineer, Senzing

About the Presenter:
Paco Nathan leads DevRel for the Entity Resolved Knowledge Graph practice area at Senzing.com and is a computer scientist with +40 years of tech industry experience and core expertise in data science, natural language, graph technologies, and cloud computing. He’s the author of numerous books, videos, and tutorials about these topics. He also hosts the monthly “Graph Power Hour!” webinar.

Paco advises Kurve.ai, EmergentMethods.ai, and is lead committer for the `pytextrank` and `kglab` open source projects. Formerly: Director of Learning Group at O’Reilly Media; and Director of Community Evangelism at Databricks.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
The Dark Web: an estimated $3T USD flows annually through shell companies leveraging tax havens worldwide — serving as the _perpetua mobilia_ for oligarchs, funding illegal weapons transfers, cyber attacks at global scale, human trafficking, anti-democracy campaigns, even illegal fishing fleets. The tendrils of kleptocracy extend throughout our political and economic system.

People who hunt bad guys” — investigative journalists, OSINT, regulators, gov agencies, law enforcement, FinCrime investigation units, etc. — leverage both graph analytics and downstream AI apps to contend with the overwhelming data volumes. Our team provides core technology — entity resolution — used in this work, and other public sector such as the major of voter registration in the US. Most of our use cases run in air-gapped environments, based on large-scale distributed infrastructure, streaming data from multiple sources. In these production use cases, even with several billion graph elements, decisions to “merge” or “disambiguate” known entities can be propagated with milliseconds of a new record arriving.

Among those who perform this kind of confidential work, few are permitted to speak at tech conferences. However, we can use open source, open models, and open data to illustrate these kinds of applications. We’ll show how technology gets used to stick the moves of the world’s worst organized crime rings, how to fight against oligarchs who use complex networks to hide their grift. On the flip side, similar approaches can be leveraged to find your best customers within a graph.

This talk explores known cases, the fraud tradecraft employed, open data sources, and how technology gets leveraged. There are multiple areas were multimodal agentic workflows (e.g., based on BAML) play important roles, both for handling unstructured data sources and for actions taken based on inference. Moreover, we’ll look at where data professionals are very much needed, where you can get involved.

What You’ll Learn:
How a combination of graph technologies and downstream AI applications gets leveraged for fighting FinCrime and transnational corruption in general.

Talk: From Static IVRs to Agentic Voice AI: Building Real-Time Intelligent Conversations

Presenter:
Pablo Salvador Lopez, Principal AI Application Development Architect – AI Solution Engineering Global Black Belt Team, Microsoft

About the Presenter:
As an AI Solution Engineering Leader in Microsoft’s Global Black Belt team—the elite force driving AI and cloud application innovation within the Azure ecosystem—I design and deliver transformative Generative AI solutions for some of the world’s most complex and highly regulated industries. My work bridges deep technical skills with mission-critical execution—specializing in Retrieval-Augmented Generation (RAG), agentic AI systems, and scalable multi-agent orchestration using Azure AI, OpenAI, and frameworks like Semantic Kernel or any custom stacks.

I design intelligence from the ground up—combining LLMs and custom orchestration frameworks to create real-time, memory-aware agents that reason, act, and collaborate.

My foundation spans full-stack data science, ML engineering, and software architecture. I’ve led real-time and batch AI deployments for Fortune 500 enterprises, with expertise across MLOps/LLMOps and high-throughput inference—anchored in cloud platforms like Azure, GCP, and AWS.

Where others see ambiguity, I see momentum. I’m known for turning raw ideas into production-grade systems—either by building from first principles or rethinking the “rules” when innovation demands it. My mission is to build systems that matter—empowering teams to do their best work, and leaving every product, platform, pattern and person stronger than I found them.

Beyond industry, I’m committed to education and community. As an Adjunct Instructor in Northwestern University’s MSAI program, I teach hands-on courses in Cloud AI, GenAI, RAG, and multi-agent systems. I mentor startups, serve on advisory boards, and contribute to open-source AI—sharing ideas that move the field forward.

Explore my blog | Check out my GitHub

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
Developers today face the challenge of transforming outdated IVRs and traditional voice systems into intelligent, responsive interactions. This session dives into the concept of agentic voice AI—systems capable of real-time reasoning, decision-making, and dynamic action execution. We’ll explore how to architect modular voice applications using Azure, orchestrate multiple autonomous agents for specialized tasks, and leverage real-time AI inference to produce fluid, human-like conversations. Attendees will learn practical strategies to design agentic voice interactions, enabling their systems to autonomously plan, act, and dynamically adapt to user contexts and needs.

What You’ll Learn:
Attendees will leave equipped with a clear understanding of agentic architecture in real-time voice applications, including practical techniques for orchestrating multiple specialized agents, integrating dynamic reasoning, leveraging memory, and optimizing speech latency. They will be empowered to move beyond static IVRs towards fully autonomous, intelligent voice experiences.

Talk: The Hard Truth About AI Agents: Lessons Learned from Running Agents in Production

Presenter:
Hannes Hapke, Principal Machine Learning Engineer, Digits

About the Speaker:
Hannes Hapke is a principal machine learning engineer at Digits, where he has spent years building production AI systems that accountants and business owners actually use daily.

Before Digits, he solved ML infrastructure problems across healthcare, retail, and renewable energy – industries where failure isn’t an option. At SAP Concur, he learned that impressive prototypes and production systems are entirely different beasts.

Hannes co-authored numerous machine learning books, including “Building Machine Learning Pipelines” and “Machine Learning Production Systems” (O’Reilly), and his upcoming “GenAI Design Patterns” book addresses the gap between AI hype and reality. As a Google Developer Expert for Machine Learning, he’s committed to sharing the hard truths about production ML.

Talk Track: Agents in Production

Talk Technical Level: 2/7

Talk Abstract:
Every conference showcases impressive agent demos. What they don’t show you are the 3 AM pages when agents go rogue, the customer support tickets when AI makes expensive mistakes, or the months of debugging why your “95% accurate” prototype becomes 60% reliable in production.

This talk cuts through the agent hype with unfiltered lessons from Digits’ journey deploying customer-facing agents that handle real financial data. Hannes will share the architectural decisions that actually matter (hint: it’s not the framework you choose), the monitoring approaches that catch problems before customers do, and the failure modes that no one warns you about.

You’ll learn why agent evaluation in development predicts almost nothing about production performance, how to build guardrails that don’t cripple functionality, and why the hardest problems aren’t technical – they’re about managing expectations and building trust.

This presentation is a field guide to the messy reality of production agents, complete with practical design patterns for Hannes’ newest O’Reilly publication “Generative AI Design Patterns” (together with Dr. Valliappa Lakshmanan), and the kind of lessons learned you only get from keeping systems running when money is involved.

What You’ll Learn:
– Production Reality Check: Why impressive demos fail spectacularly in production and how to bridge that gap

– Architecture for Reliability: The infrastructure patterns that actually matter for agent systems at scale

– Architecture for Observability: The specific ways to monitor agents in production

Talk: Building Conversational AI Agents with Thread-Level Eval Metrics

Presenters:
Claire Longo, Lead AI Researcher, Comet | Tony Kipkemboi, Head of Developer Relations, CrewAI

About the Presenters:
Tony Kipkemboi leads Developer Advocacy at CrewAI, where he helps organizations adopt AI agents to drive efficiency and strategic decision-making. With a background spanning developer relations, technical storytelling, and ecosystem growth, Tony specializes in making complex AI concepts accessible to both technical and business audiences.

He is an active voice in the AI agent community, hosting workshops, podcasts, and tutorials that explore how multi-agent orchestration can reshape the way teams build, evaluate, and deploy AI systems. Tony’s work bridges product experimentation with real-world application; empowering developers, startups, and enterprises to harness AI agents for measurable impact.

At MLops World, Tony brings his experience building and scaling with CrewAI to demonstrate how agent orchestration, when paired with rigorous evaluation, accelerates the path from prototype to production.

Claire Longo is an AI leader and Mathematician with over a decade of experience in Data Science and AI. She has led cross-functional AI teams at Twilio, Opendoor, and Arize AI and is currently a Lead AI Researcher at Comet. She holds a Bachelor’s in Applied Mathematics and a Master’s in Statistics from The University of New Mexico. Beyond her technical work, Claire is a Speaker, Advisor, YouTuber, and Poker Player. She is dedicated to mentoring Engineers and Data Scientists while championing diversity and inclusion in AI. Her mission is to empower the next generation of AI practitioners.

Talk Track: Agents in Production

Technical Level: 4

Talk Abstract:
Building modern conversational AI Agents means dealing with dynamic, multi-step LLM reasoning processes and tool calling that cannot always be predicted or debugged at the trace level alone. During the conversation, we need to understand if the AI accomplishes the user’s goal while staying aligned with intent and delivering a smooth interaction. To truly measure quality, we need to trace and evaluate entire conversation sessions.

In this talk, we introduce a practical workflow for designing, orchestrating, and evaluating conversational AI Agents by combining CrewAI as the Agent development framework with Comet Opik for custom eval metrics.

On the CrewAI side, we’ll showcase how developers can define multi-agent workflows, specialized roles, and task orchestration that mirror real-world business processes. We’ll demonstrate how CrewAI simplifies experimentation with different agent designs and tool integrations, making it easier to move from prototypes to production-ready agents.

On the Opik side, we’ll go over how to capture expert human-in-the-loop feedback and build thread-level evaluation metrics. We’ll show how to log traces, annotate sessions with expert insights, and design LLM-as-a-Judge metrics that mimic human reasoning; turning domain expertise into a repeatable feedback loop.

Together, this workflow combines agentic orchestration + rigorous evaluation, giving developers deep observability, actionable insights, and a clear path to systematically improving conversational AI in real-world applications.

What You’ll Learn:
You can’t reliably build conversational AI agents without treating orchestration and evaluation as two halves of the same workflow; CrewAI structures the agent, Comet Opik ensures you can measure and improve it.

Talk: Agentic AI in Manufacturing

Presenters:
Ravi Chandu Ummadisetti, Generative AI Architect, Toyota | Stephen Ellis, Generative AI Technical Product Owner, Toyota

About the Presenters:
Ravi Chandu Ummadisetti, Generative AI Architect, Toyota

Ravi Chandu Bio (Generative AI Architect): Ravi Chandu Ummadisetti is a distinguished Generative AI Architect with over a decade of experience, known for his pivotal role in advancing AI initiatives at Toyota Motor North America. His expertise in AI/ML methodologies has driven significant improvements across Toyota’s operations, including a 75% reduction in production downtime and the development of secure, AI-powered applications. Ravi’s work at Toyota, spanning manufacturing optimization, legal automation, and corporate AI solutions, showcases his ability to deliver impactful, data-driven strategies that enhance efficiency and drive innovation. His technical proficiency and leadership have earned him recognition as a key contributor to Toyota’s AI success.

Stephen Ellis Bio (Technical Generative AI Product Manager): 10 years of experience in research strategy and the application of emerging technologies for companies as small as startups to Fortune 50 Enterprises. Former Director of the North Texas Blockchain Alliance where leading the cultivation of the Blockchain and Cryptocurrency competencies among software developers, C-level executives, and private investment advisors. Formerly the CTO of Plymouth Artificial Intelligence which was researching and developing future applications of AI. In this capacity advised companies on building platforms that seek to leverage emerging technologies for new business cases. Currently Technical Product Manager at Toyota Motors North America focused on enabling generative AI solutions for various group across the enterprise to drive transformation in developing new mobility solutions and enterprise operations.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
This talk explores the transformative impact of Generative AI in an agentic manner across manufacturing, battery production, and supply chain management. By leveraging the capabilities of Generative AI, organizations can automate routine processes, enhance decision-making through real-time data analysis, and foster innovation while promoting sustainability. The agentic approach empowers human workers to collaborate effectively with AI systems, focusing on strategic initiatives that drive operational efficiency and competitive advantage. Attendees will gain insights into how adopting Generative AI can future-proof their operations and position them at the forefront of industry advancements.

What You’ll Learn:
The core message for attendees is that Agentic AI has the potential to revolutionize manufacturing, battery production, and supply chain management by enhancing operational efficiency, enabling data-driven decision-making, and promoting sustainability. By automating routine tasks and integrating seamlessly with existing technologies, AI empowers human workers to focus on strategic initiatives, fostering innovation and collaboration. Embracing these advancements is essential for future-proofing operations and maintaining competitiveness in a rapidly evolving industry landscape.

Talk: From Vectors to Agents: Managing RAG in an Agentic World

Presenter:
Rajiv Shah, Chief Evangelist, Contextual AI

About the Presenter:
Rajiv Shah is the Chief Evangelist at Contextual AI with a passion and expertise in Practical AI. He focuses on enabling enterprise teams to succeed with AI. Rajiv has worked on GTM teams at leading AI companies, including Hugging Face in open-source AI, Snorkel in data-centric AI, Snowflake in cloud computing, and DataRobot in AutoML. He started his career in data science at State Farm and Caterpillar.

Rajiv is a widely recognized speaker on AI, published over 20 research papers, been cited over 1000 times, and received over 20 patents. His recent work in AI covers topics such as sports analytics, deep learning, and interpretability.

Rajiv holds a PhD in Communications and a Juris Doctor from the University of Illinois at Urbana Champaign. While earning his degrees, he received a fellowship in Digital Government from the John F. Kennedy School of Government at Harvard University. He is well known on social media with his short videos, @rajistics, that have received over ten million views.

Talk Track: Agents in Production

Technical Level: 2

Talk Abstract:
The RAG landscape has evolved so quickly. We’ve gone from simple keyword search to semantic embeddings to multi-step agentic reasoning. With all these approaches, we see the rise of context engineering in mastering the best RAG for the problem. This talk helps you understand the right search architecture for your use case.
We’ll examine three distinct architectural patterns, including Speedy Retrieval (<500 ms), Accuracy Optimized RAG (<10 seconds), and Exhaustive Agentic Search (10s to several minutes). You’ll see how context engineering evolves across these patterns: from basic prompt augmentation in Speed-First RAG, to dynamic context selection and compression in hybrid systems, to full context orchestration with memory, tools, and state management in agentic approaches.
The talk will include a framework for selecting RAG architectures, architectural patterns with code examples, and guidance on practical issues around RAG infrastructure.

What You’ll Learn:
RAG has matured enough that we can stop chasing the bleeding edge and start making boring, practical decisions about what actually ships.

Points:
– Attendees should leave knowing exactly when to use speedy retrieval vs. agentic search
Most use cases don’t need agents (and shouldn’t pay for them)
– As retrieval improves, managing the context window becomes the real challenge
Success isn’t about retrieving more – it’s about orchestrating what you retrieve
– Agentic search can cost 100x more than vector search
Sometimes “good enough” at 500ms beats “perfect” at 2 minutes

Talk: Agent Name Service (ANS) in Action – A DNS-like Trust Layer for Secure, Scalable AI-Agent Deployments on Kubernetes

Presenter:
Akshay Mittal, Staff Software Engineer, PayPal

About the Presenter:
Akshay Mittal is a Staff Software Engineer at PayPal and an IEEE Senior Member with over a decade of experience in full-stack development and cloud-native systems. He is currently pursuing a PhD at the University of the Cumberlands, focusing on AI/ML-driven security for cloud architectures. Akshay actively contributes to the Austin tech community through speaking engagements, mentoring, and IEEE and ACM initiatives, with a professional mission of advancing technical excellence and fostering innovation.

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
Enterprise MLOps is rapidly shifting from model-centric pipelines to agent-centric ecosystems, where autonomous AI agents continuously retrain models, validate data, and remediate incidents without human intervention. Yet, most production platforms still lack a uniform mechanism for discovering, authenticating, and governing these agents. This session introduces the Agent Name Service (ANS) – an open, DNS-inspired protocol that assigns unique identities, publishes verifiable metadata, and issues capability attestations for AI agents running on Kubernetes. Drawing on extensive research in multi-tenant agent ecosystems, I will demonstrate how ANS enables end-to-end trust across the ML lifecycle: model-validation agents that flag concept drift, deployment agents that patch misconfigured Helm charts, and guard-agent ensembles that enforce policy-as-code in real-time. A live demo will show ANS integrated with GitOps, Open Policy Agent, Sigstore, and an open-source agent-orchestration framework, highlighting zero-trust handshakes, key rotation, and automated RBAC provisioning. Attendees will leave with practical templates and a GitHub reference implementation ready for pilot adoption.

What You’ll Learn:
Key takeaways

1. Why identity and capability verification are the missing guardrails for agentic MLOps

2. Reference architecture for deploying ANS on a Kubernetes stack with GitOps, OPA, and Sigstore

3. Patterns for chaining validation, remediation, and notification agents while preserving least-privilege access

4. Performance and security benchmarks from a production pilot handling 1 000+ daily agent interactions

Talk: Testing AI Agents: A Practical Framework for Reliability and Performance

Presenter:
Irena Grabovitch-Zuyev, Staff Applied Scientist, PagerDuty

About the Presenter:
Irena Grabovitch-Zuyev is a Staff Applied Scientist at PagerDuty and a driving force behind PagerDuty Advance, the company’s generative AI capabilities. She leads the development of AI agents that are transforming how customers interact with PagerDuty, pushing the boundaries of incident response and automation.

With over 15 years of experience in machine learning, Irena specializes in generative AI, data mining, machine learning, and information retrieval. At PagerDuty, she partners with stakeholders and customers to identify business challenges and deliver innovative, data-driven solutions.

Irena earned her graduate degree in Information Retrieval in Social Networks from the Technion – Israel Institute of Technology. Before joining PagerDuty, she spent five years at Yahoo Research as part of the Mail Mining team, where her machine learning solutions for automatic extraction and classification were deployed at scale, powering Yahoo Mail’s backend and processing hundreds of millions of messages daily.

She is the author of several academic articles published at top conferences and the inventor of multiple patents. Irena is also a passionate advocate for increasing representation in tech, believing that diversity and inclusion are essential to innovation.

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
As AI agents powered by large language models (LLMs) become integral to production systems, ensuring their reliability and safety is both critical and uniquely challenging. Unlike traditional software, agentic systems are dynamic, probabilistic, and highly sensitive to subtle changes—making conventional testing approaches insufficient.

This talk presents a practical framework for testing AI agents, grounded in real-world experience developing and deploying production-grade agents at PagerDuty. The main focus will be on iterative regression testing: how to design, execute, and refine regression tests that catch failures and performance drifts as agents evolve. We’ll walk through a real use case, highlighting the challenges and solutions encountered along the way.

Beyond regression testing, we’ll cover the additional layers of testing essential for agentic systems, including unit tests for individual tools, adversarial testing to probe robustness, and ethical testing to evaluate outputs for bias, fairness, and compliance. Finally, I’ll share how we’re building automated pipelines to streamline test execution, scoring, and benchmarking—enabling rapid iteration and continuous improvement.

Attendees will leave with a practical, end-to-end framework for testing AI agents, actionable strategies for regression and beyond, and a deeper understanding of how to ensure their own AI systems are reliable, robust, and ready for real-world deployment.

What You’ll Learn:
Attendees will learn a practical, end-to-end framework for testing AI agents—covering correctness, robustness, and ethics—so they can confidently deploy reliable, high-performing LLM-based systems in production.

Talk: Judging the Agents: Building Reliable LLM Evaluators with Scalable Metrics and Prompt Optimization

Presenter:
Himani Rallapalli, Senior Applied Scientist, Microsoft

About the Speaker:
Himani is a Senior Applied Scientist at Microsoft, where she focuses on fine-tuning large and small language models (LLMs and SLMs) for domain-specific applications. Her recent work includes building LLM judges, online evaluators to assess agent performance, and improving retrieval-augmented generation (RAG) systems using agentic workflows. Prior to joining Microsoft, she worked at SAP, on text analysis, search optimization, and recommendation systems. Himani is deeply passionate about research and experimentation, and is driven by the challenge of designing innovative, AI-powered solutions to complex business problems especially those involving AI, natural language processing, and information retrieval

Talk Track: Agents in Production

Talk Technical Level: 2/7

Talk Abstract:
As LLM-powered agents take on increasingly important role in production systems, ensuring their reliability and consistency is critical. These agents influence decisions, recommendations, and interactions that can directly affect users and business outcomes.
This talk introduces LLM as a Judge — using language models to evaluate other agents in real time, delivering continuous, context aware assessments without human intervention.
We’ll explore:
• Why online evaluators matter: LLM judges provide real-time, contextual assessment beyond static benchmarks.
• How LLM Judges work: Embedding evaluators into production pipelines to assess agent responses.
• Prompt optimization: Using open-source frameworks to design minimal, high-precision prompts that reduce ambiguity
• Code and workflow optimization: Explore how various productivity tools can help refine prompt structures and optimize evaluation code

• LLM Evaluator Assessment: We assess groundedness, completeness, consistency, and inter-rater reliability metrics.

We will further examine how to address prompt sensitivity and self-preference bias in LLM judges.

Attendees will leave with the skills to design and deploy LLM based judges for real time agent evaluation, leveraging prompt engineering and reproducible metrics to deliver consistent, reliable assessment at scale.

What You’ll Learn:
Attendees will gain a practical understanding of using LLM as a Judge to evaluate agents in production—measuring groundedness, completeness, and consistency with high inter rater reliability, while designing clear, reproducible prompts and mitigating pitfalls like prompt sensitivity and self preference bias.

Talk: A Practical Field Guide to Optimizing the Cost, Speed, and Accuracy of LLMs for Domain-Specific Agents

Presenter:
Niels Bantilan, Chief ML Engineer, Union.ai

About the Presenter:
Niels is the Chief Machine Learning Engineer at Union, a core maintainer of Flyte, an open source workflow orchestration tool, and creator of Pandera, a data validation and testing tool for dataframes. His mission is to help data science and machine learning practitioners be more productive. He has a Masters in Public Health Informatics, and prior to that a background in developmental biology and immunology. His research interests include reinforcement learning, NLP, ML in creative applications, and fairness, accountability, and transparency in automated systems.

Talk Track: Agents in Production

Technical Level: 3

Talk Abstract:
As the dust settles from the initial boom of applications using hosted large language model (LLM) APIs, engineering teams are discovering that while LLMs get you to a working demo quickly, they often struggle in production with latency spikes, context limitations, and explosive compute costs. This session provides a practical roadmap for navigating not only the experiment-to-production gap using small language models (SLMs), but also the AI-native orchestration strategies that will get you the most bang for your buck.
We’ll explore how SLMs (models that range from hundreds of millions to a few billion parameters) offer a compelling alternative for domain-specific applications by trading off the generalization power of LLMs for significant gains in speed, cost-efficiency, and task-specific accuracy. Using the example of an agent that translates natural language into SQL database queries, this session will demonstrate when and how to deploy SLMs in production systems, how to progressively swap out LLMs for SLMs while maintaining quality, and which orchestration strategies help you customize and maintain SLMs in a cost-effective way.

Key topics include:
– Identifying key leverage points: Which LLM calls should you swap out for SLMs first? We’ll cover how to identify speed, cost, and accuracy leverage points in your AI system so that you can speed up inference, reduce cost, and maintain accuracy.
– Speed Optimization: It’s not just about the speed of inference, which SLMs already excel at, it’s also about accelerating experimentation when you fine-tune and retrain SLMs on a specific domain/task. We’ll cover parallelized optimization runs, intelligent caching strategies, and task fanout techniques for both prompt and hyperparameter optimization.
– Cost Management: Avoiding common pitfalls that negate SLMs’ cost advantages, including resource mismatching (GPU vs CPU workloads), infrastructure provisioning inefficiencies, and idle compute waste. Attendees will learn resource-aware orchestration patterns that scale to zero and recover gracefully from failures.
– Accuracy Enhancement: Maximizing domain-specific performance by implementing the equivalent of “AI unit tests” and incorporating it into your experimentation and deployment pipelines. We’ll cover how this can be done with synthetic datasets, LLM judges, and deterministic evaluation functions that help you catch regressions early and often.

What You’ll Learn:
Attendees will leave with actionable strategies for cost-effective AI deployment, a decision framework for SLM adoption, and orchestration patterns that compound the value of smaller models in domain-specific applications.

Talk: Vibe-Coding Your First LLM End-to-End Application

Presenters:
Dr. Greg Loughnane, Co-Founder & CEO, AI Makerspace | Chris “The Wiz” Alexiuk, Co-Founder & CTO, AI Makerspace

About the Speakers:
Dr. Greg” Loughnane is the Co-Founder & CEO of AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. Since 2021, he has built and led industry-leading Machine Learning education programs. Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and an ML researcher. He loves trail running and is based in Columbus, Ohio.

Chris “The Wiz” Alexiuk is the Co-Founder & CTO at AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. During the day, he is also a Developer Advocate at NVIDIA. Previously, he was a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.

Talk Track: AI Agents for Developer Productivity

Talk Technical Level: 1/7

Talk Abstract:
Want to become an AI Engineer or leader?

Welcome to the future, where anyone with the tenacity to get serious about fixing errors when they run into them has AI-assistance at their fingertips.

Maybe you used to code, but haven’t in a while.

Maybe you never have but have always wanted to.

In 2025, people who know are going beyond VS Code. They’re picking AI Code Editors like Cursor. These types of tools make it easier than ever to build, ship, and share your first LLM application.

This workshop is the perfect way to get started programming according to industry best practice tools like Cursor powered by coding assistants like Claude CLI.

At AI Makerspace been tracking the best dev environment for aspiring AI Engineers for years, from the MLOps Dev Environment Setup to the LLM Ops Dev Environment Setup to the best way to go Beyond ChatGPT to deploy your first LLM application.

Now, it’s time we all step up to AI Engineering, to Vibe-Coding, and ultimately, to AI-Assisted Development with best-practice tools.

Today, the best-practice AI Code Editor is Cursor. But we’ll review the entire stack we need to wrap an LLM and build our first end-to-end application, including:

– LLM: OpenAI models)
– Package Management: uv
– Code Editor: Cursor
– CLI Coding Agent: Claude Code
– User Interface: Vibe Coded w/ React
– Deployment: Vercel

We’ll not only break down the stack, but discuss the best practices for AI-assisted development that combine classic software engineering workflows with a vibe-coding future led by the reality that while many of us might not be front end developers, Claude certainly is.

Come for the hot takes, fun demos, and discussion, leave with a real artifact to share with your family, friends, colleagues; even your boss!

What You’ll Learn:
Who should attend the event:

– Any aspiring AI Engineer or leader who wants to build, ship, and share their first LLM application
– Anyone who is curious to learn why everyone is talking about AI-assisted development, and how it differs from pure vibes.
– What to watch out for when you vibe code yourself too far down the rabbit hole 🕳️…

Talk: Your Infrastructure Just Got Smarter: AI Agents in the DevOps Loop

Presenter:
Kishan Rao, Engineering Manager, Delivery and Automation Platform, Okta

About the Speaker:
I’m an Engineering Manager with a background in backend systems, platform engineering, and infrastructure automation, currently focused on how AI agents can reshape developer workflows. With over 8 years of experience building CI/CD pipelines, internal platforms, and scalable infrastructure at cloud-first companies, I’ve seen firsthand how operational complexity can slow down engineering teams.

My recent work explores the intersection of DevOps and AI agents—designing tools that intelligently interpret infrastructure-as-code, reduce toil, and guide developers through their environments with contextual awareness. I’m passionate about building agentic systems that augment developer cognition, shorten feedback loops, and turn codebases into living documentation.

I care deeply about developer velocity, system reliability, and creating engineering environments where teams can move quickly without sacrificing quality. Based in San Francisco, I’m excited to contribute to the conversation on how autonomous AI tooling is changing the way we build, ship, and maintain software

Talk Track: AI Agents for Developer Productivity

Talk Technical Level: 2/7

Talk Abstract:
Modern infrastructure is rich, dynamic, and deeply complex. Yet most engineering teams still rely on manual processes and tribal knowledge to navigate it. In this talk, I explore how AI agents are transforming DevOps by becoming part of the loop. They’re not just automating tasks but interpreting infrastructure-as-code, understanding system context, and guiding developers through their environments.

We’ll examine how AI-native workflows are emerging across the DevOps lifecycle, from documentation generation and config reasoning to incident triage and deployment planning. I’ll share implementation patterns from my experience in platform engineering and backend systems, including how to design agents that interact with code, tools, and people with minimal friction.

You’ll leave with a practical understanding of how to embed AI agents into your stack, the trade-offs of using local versus cloud LLMs, and how this shift can change the speed, clarity, and confidence with which your teams ship code.

What You’ll Learn:
AI at scale in production does not need to be hard. But you do need to spend time deeply thinking about the outcome you are trying to achieve with AI in the loop

Talk: Context is King: Scaling Beyond Prompt Engineering at BlackRock

Presenters:
Vaibhav Page, Principal Engineer, Blackrock | Infant Vasanth, Senior Director of Engineering, Blackrock

About the Presenters:
Vaibhav is a Principal Engineer at BlackRock, where he leads the development of the Data Science and AI platform powering investment research and automation across the firm. Vaibhav is also the author of Argo-Events, a CNCF-graduated project widely used for event-driven automation in cloud-native environments.

Infant Vasanth leads the engineering team responsible for the Studio Compute Platform, BlackRock’s analytics and automation platform that enables our users to conduct research & analysis, run automations and distribute research at scale.
In addition, Infant is also leading the Data & AI Acceleration team focusing on efforts to enhance Aladdin Studio’s AI capabilities alongside the Operational AI capabilities(prospectus analyzer, operational agents etc.)

Talk Track: Augmenting Workforces with Agents

Technical Level: 2

Talk Abstract:
As AI use cases grow in complexity, prompt engineering along is insufficient. In this talk, we will discuss BlackRock’s evolution of engineering relevant contexts for a broad range of AI use cases from generating investment signals to optimizing operational processes. Furthermore, building the right context in real-time has its own set of challenges ranging from context window limitations, finding the relevant information, and running evaluations on the generated context. We’ll demonstrate how thoughtful context design leads to more robust and adaptable AI agents. We will go over the art and science of building relevant contexts for complex financial use cases and its associated challenges.

What You’ll Learn:
This session offers a practical guide and framework for users looking to build or engineer relevant contexts at scale for their AI applications and use cases. By showcasing how the framework accelerates the creation of these AI Contexts, the session will provide actionable insights for teams aiming to develop and deploy custom AI solutions. We’ll walk through real-world examples, including some of the challenges we faced while building contexts for different teams at BlackRock. The design principles, architectural patterns, and context engineering strategies shared can be applied across industries to reduce hallucinations and give relevant answers. Attendees will also learn how this looks like in a highly regulated environment where adhering to industry-standard security practices is of outmost importance.

Talk: Where Experts Can't Scale: Orchestrating AI Agents to Structure the World's Product Knowledge

Presenters:
Kshetrajna Raghavan, Principal Machine Learning Engineer, Shopify | Ricardo Tejedor Sanz, Senior Taxonomist, Shopify

About the Presenters:
Kshetrajna is a Principal Machine Learning Engineer at Shopify with 15 years of experience delivering AI solutions across technology, healthcare, and retail. He has led initiatives in large-scale product search, computer vision, natural language processing, and predictive modeling—translating cutting-edge research into systems used by millions. Known for his pragmatic approach, he focuses on building scalable, high-impact machine learning products that drive measurable business results.

Ricardo Tejedor Sanz is a Senior Taxonomist at Shopify with a distinctive background spanning legal experience, linguistics, and machine learning. With diverse analytical experience across international contexts and master’s degrees in English Literature and Audiovisual Translation, plus fluency in four languages, Ricardo brings exceptional rigor and customer-focused problem-solving to taxonomy challenges. He evolved from traditional manual taxonomy methods built on deep market research, competitive analysis, and semantic understanding, to pioneering AI-driven classification systems benefiting millions of merchants globally.

Talk Track: Augmenting Workforces with Agents

Technical Level: 2

Talk Abstract:
How do you maintain a product taxonomy spanning millions of items across every industry—from guitar picks to industrial sensors—when no human team could possibly possess expertise in all these domains? At Shopify, we faced this exact challenge and built an AI agentic system that transforms an impossible human task into a scalable, automated workflow.

In this talk, we reveal how we orchestrate multiple specialized AI agents to analyze, improve, and validate taxonomy changes at unprecedented scale.

You’ll discover:
– How parallel AI agents can augment human expertise across domains where deep knowledge is impossible to maintain
– The architecture patterns that enable agents to work together while maintaining quality and consistency
– Why LLM-as-judge systems are game-changers for scaling quality control
– Critical lessons learned from production deployment, including surprising failures and how we fixed them

We share real metrics showing how this approach transformed a years-long manual process into days of AI-augmented work, and provide actionable insights you can apply to your own “impossible” classification and curation challenges.
Whether you’re dealing with content moderation, data classification, or any task requiring expertise across vast domains, you’ll leave with concrete strategies for building AI agent systems that scale human judgment beyond traditional limitations.

What You’ll Learn:
1. Decompose “Impossible” Into Specialized Agents
Don’t build one AI to know everything. Build many agents that each know something, then orchestrate them.

2. LLM-as-Judge Unlocks Scale
Shifting from “humans review 100%” to “AI pre-screens, humans see 10%” is the game-changer. Key: Let AI fix minor issues, not just reject.

3. Production Lessons Are Brutal
– Prompt overload breaks reasoning
– Always build fallbacks for when services fail

4. Trust Through Transparency
Every AI decision needs reasoning, audit trails, and escalation paths. No black boxes.

5. The Meta-Lesson
Scale isn’t about replacing humans—it’s about amplifying the expertise you have across domains you couldn’t possibly cover.

Talk: The Efficiency Equation: Leveraging AI Agents to Augment Human Labelers in Building Trust and Safety Systems at Scale

Presenter:
Madhu Ramanathan, Senior Engineering Leader,Trust and Safety, Meta

About the Presenter:
Madhu Ramanathan is a seasoned engineering and applied science leader with over 13 years of experience building AI-powered systems at Microsoft, Meta and Amazon. She has led globally distributed teams in trust, safety, content intelligence, and search, delivering responsible AI solutions that impact millions of users worldwide. Passionate about trust, safety, and ethical innovation, she brings a practitioner’s lens to productionizing cutting-edge, trustworthy AI solutions to solve real-world problems at scale.

Talk Track: Augmenting workforces with Agents

Technical Level: 2

Talk Abstract:
In today’s digital ecosystem, Trust & Safety systems face mounting challenges—from content proliferation, real-time enforcement demands and cost savings pressure —complicated further by evolving threats like deepfakes, AI generated hallucination, misinformation, and adversarial behavior. Given this a defensive space with ever-evolving threats, human labeling has been crucial in this domain for measurement, data collection to train models, real time enforcements, reactive takedowns and appeals – but it comes with costs in the millions for scaled applications. This keynote explores how LLMs and AI agents are reshaping this landscape, augmenting the human labelers, offering scalable, cost-efficient, and high-quality solutions that optimizes across defect rates, precision and operation cost at unprecedented speeds.

In particular, the talk will cover the following –
– A brief introduction to Trust and Safety systems, metrics and emerging threats such as deepfakes, hallucination, misinformation in the evolving GenAI landscape
– The role of human labelers in the traditional Trust and Safety lifecycle across measurement, data collection, proactive enforcements and reactive takedowns, the challenges in having large scale human labeler dependencies and the costs involved
– Case study of how LLMs/Agents are used in each stage such as measurement, enforcement and reactive takedowns with real world examples and the impact on Defect rate, Precision, Cost at each stage.
– Deep dive on continuous evaluation and calibration techniques of the LLM/Agentic judges used in the measurement flow and enforcement flow using humans-in-the-loop and Auto tuners for prompt tuning.
– Challenges faced and solutions such as a) Pitfalls from using same Agent for measurement and enforcement and solution by using Agentic + HI flows for measurement b) Handling constant model migrations in product c) Cost and GPU constraints in deploying LLMs at scale and evolution into distilled SLM models using LLM based teacher models.
– Finally, summarizing learnings on how the recent AI evolution in the last couple of years has brought in new challenges to this space but also provided ability to solve those problems by smartly combining HI and AI.

What You’ll Learn:
Trust & Safety is entering a new era – where the rapid AI evolution has brought in mounting challenges such as deepfakes, hallucination, misinformation along with budget cuts and hyper agility demands. Fortunately, the AI evolution has also enabled powerful solutions to those problems —one where human judgment and AI intelligence must co-evolve. This talk will give the attendees a deep dive on real world hybrid systems built at scale that are not only scalable and cost-effective but also resilient, ethical, and continuously improving to defend against the evolving challenges.

Talk: The Missing Layer: Why Semantics and Knowledge Graphs Are Essential for AI-Ready Data Systems

Presenter:
Juan Sequeda, Principal Researcher, data.world from ServiceNow

About the Speaker:
Juan Sequeda is a Principal Fundamental Researcher at ServiceNow, joining through the acquisition of data.world. He holds a PhD in Computer Science from The University of Texas at Austin. Juan’s research and industry work has been on the intersection of data and AI, with the goal to reliably create knowledge from inscrutable data, specifically designing and building Knowledge Graph for enterprise data and metadata management. Juan is the co-author of the book “Designing and Building Enterprise Knowledge Graph” and the co-host of Catalog and Cocktails, an honest, no-bs, non-salesy data podcast.

Juan has researched and developed technology on semantic data virtualization, graph data modeling, schema mapping and data integration methodologies. He pioneered technology to construct knowledge graphs from relational databases starting in the mid 2000s, resulting in W3C standards, research awards, patents, software and his startup Capsenta acquired by data.world in 2019. Juan is the recipient of the NSF Graduate Research Fellowship, received 2nd Place in the 2013 Semantic Web Challenge for his work on ConstituteProject.org, Best Student Research Paper at the 2014 International Semantic Web Conference (ISWC), the 2015 Best Transfer and Innovation Project awarded by the Institute for Applied Informatics, 2023 Best Industry Paper at SIGMOD and nominated two additional times for best paper at ISWC.

Juan strives to build bridges between academia and industry as former co-chair of the LDBC Property Graph Schema Working Group, member of the LDBC Graph Query Languages task force, standards editor at the World Wide Web Consortium (W3C). Juan continues to be an active member of the scientific community by being on the editorial board and program committees of scientific journals and conferences in Semantic Web, Knowledge Graphs, Databases and AI, as well as organizer of various academic and industry conferences, including being the General Chair of The ACM Web Conference 2023.

Talk Track: Data Engineering in an LLM era

Talk Technical Level: 2/7

Talk Abstract:
Data and AI engineers are flooded with technology, tools, blogs, and quick fixes to get their systems production ready. But to build systems that are truly AI-ready, we must return to the fundamentals: what does the data actually mean.

This talk unpacks what “semantics”, “semantic layers” and “knowledge graphs” really mean, why they matter now more than ever, and how data engineers can start applying them today. We’ll walk through concrete, tactical ways to apply them in your daily work. Whether you’re wrangling tables or building LLM pipelines, this talk will help you move from experimentation to intentionality.

What You’ll Learn:
Understand the fundamental importance of semantics in AI-ready systems
Bridge the gap between traditional data engineering and AI systems
Move from experimental to intentional data system design

Talk: Fake Data, Real Power: Crafting Synthetic Transactions for Bulletproof AI

Presenter:
Bhavana Sajja, Senior Machine Learning Engineer, Expedia Inc.

About the Speaker:
A Senior Machine Learning Engineer at Expedia Inc Company, I lead the end-to-end development and operationalization of AI/ML solutions across high-impact use cases such as fraud detection, supplier screening, and dynamic fraud listing. With a strong foundation in building, deploying, and monitoring production-grade models, I ensures that data pipelines, model performance, and governance frameworks align seamlessly with both business objectives and compliance requirements.

Known for a solutions-oriented mindset, I thrive on adopting emerging technologies to address real-world challenges. Currently, I am exploring agentic AI paradigms—such as agent-to-agent (A2A) protocols and model-context protocol (MCP) architectures—to enhance the reliability, adaptability, and explainability of fraud prevention systems. Our work focuses on crafting autonomous pipelines that can detect novel attack vectors in near real-time, prioritize high-risk cases, and continuously refine detection strategies through feedback loops.

Beyond day-to-day engineering, I actively contributes to cross-functional initiatives: mentoring junior engineers, sharing best practices at internal knowledge-shares, and evaluating new MLOps tools to accelerate model iteration cycles. With a passion for continuous learning, I participate in developer forums—bridging the gap between cutting-edge research and enterprise-scale deployments.

Talk Track: Data Engineering in an LLM era

Talk Technical Level: 2/7

Talk Abstract:
In today’s AI-driven world, organizations want to use their rich transaction records for insights and model building but worry about exposing sensitive customer details. This talk offers a clear, practical guide to creating high-quality synthetic transaction data—data that looks and behaves like real records but contains no actual customer information. We’ll focus on why good data quality is essential for any AI model: without realistic patterns and relationships, models trained on synthetic data simply won’t perform well.

We’ll first highlight the main hurdles in transaction tables: mixed data types (numbers, categories, dates), rare events (like fraud), and complex links between features. Then, we’ll introduce four proven generative approaches—GANs (Generative Adversarial Networks), TVAEs (Tabular Variational Autoencoders), TabularARGN (Tabular Autoregressive Generative Networks), and GPT-based methods—that address these challenges in different ways:

GANs learn to “fool” a critic network to produce realistic samples, which helps match complex data patterns.

TVAEs focus on understanding each column’s data type (text, number, category) to recreate accurate row-level details.

TabularARGN builds records step-by-step, preserving sequential and hierarchical relationships in the data.

GPT-based methods leverage transformer models (like those behind large language models) to capture broad patterns and generate new rows based on learned “templates.”

Through a simple case study on a public credit-card transactions dataset, we’ll walk through:

Preparing data (filling in missing values, encoding categories, handling outliers)

Choosing and training a model (why you might pick a GAN versus a TabularARGN or a TVAE)

Evaluating results with easy-to-understand checks—how closely synthetic data matches real data distributions and how well a fraud-detection model trained on synthetic data performs.

We’ll also discuss balancing privacy (keeping customer details safe) with usefulness (keeping important patterns, like rare fraud events). Finally, we’ll point to simple next steps: using synthetic data in healthcare records or IoT sensor logs, monitoring data quality automatically, and ensuring any privacy concerns are met. By the end of this session, even attendees new to generative AI will understand how to pick a method, build a high-quality synthetic dataset, and trust that their AI models can learn and perform effectively—boosting innovation without risking real customer data.

What You’ll Learn:
Using advanced techniques Understanding Why Synthetic Transactional Data Matters
Real-World Trade-Offs: Utility vs. Privacy

Talk: The Rise of Self-Aware Data Lakehouses

Presenter:
Srishti Bhargava, Software Engineer, Amazon Web Services

About the Speaker:
I’m Srishti! I’m a software engineer at AWS where I work on data platforms, focusing on systems like Apache Iceberg and SageMaker Lakehouse. I help teams build analytics and machine learning solutions that actually work at scale – turning messy data into something useful.
I really care about making data engineering more approachable. A lot of modern data tools feel unnecessarily complex, so I write about the practical stuff, how to keep tables performing well, handle schema changes gracefully, and build systems that don’t break in production.
Outside of work, I love hiking and catching sunrises when I can. I also spend a lot of time cooking – it’s how I relax and unwind. There’s something satisfying about taking simple ingredients and making something good with them. Some of my best ideas actually come to me while I’m in the kitchen, just taking things slow and enjoying the process.

Talk Track: Data Engineering in an LLM era

Talk Technical Level: 2/7

Talk Abstract:
If you’re managing more than 50 tables and a handful of data models, you’ve probably felt the pain. Schema changes break production. Impact analysis takes hours. New engineers spend weeks figuring out what data exists and how it connects.
In this session, we’ll show you how to build an AI assistant that understands your data platform. Not just another chatbot, but a system that can analyze your schemas, parse dependencies, and predict exactly which models will break when you change a column.
We’ll demonstrate a working implementation that extracts metadata from Apache Iceberg tables, analyzes SQL dependencies, and creates an AI assistant that answers questions like – Which tables are burning through our storage budget?, What’s the blast radius if this critical system goes down?, Where is all our customer PII hiding across 500 tables? Which data pipelines haven’t been touched in months and might be zombie processes? Which tables in the data lakehouse can benefit from iceberg compaction? – analysis that would take days of detective work manually and complex queries. The result is a powerful, natural language interface for data discovery.
Attendees will see live examples of querying table schemas and identifying datasets using simple English prompts, leaving with a practical blueprint for leveraging LLMs to unlock the full potential of their data infrastructure in production settings.

What You’ll Learn:
1. Metadata problem is becoming worse not better, as organizations store large amounts of data across complex systems, it’s getting harder to derive insights from your data in a non-trivial manner.
2. LLMs can actually understand your data architecture.
3. Small and simple changes in how you structure your tables can be extremely beneficial for your organization.
4. This approach scales exponentially – manual approaches don’t. At 10 tables, spreadsheets or manual queries work fine, but when we’re dealing with the scale at organizations today, only an LLM powered approach can keep up with the complexity.
5. This approach can be integrated into existing systems today. We’ll show you how to extract metadata from real Apache Iceberg tables, analyse dependencies, create embeddings and build systems that work with your current data stack
6. Metadata contains way more business value than we realize. The schemas, dependencies and usage patterns tell stories about performance bottlenecks, governance gaps, and business impact that most of us are completely missing.

Talk: I Tried Everything: A Pragmatist's Guide to Building Knowledge Graphs from Unstructured Data

Presenter:
Alessandro Pireno, Founder, Stealth Company

About the Speaker:
Alessandro Pireno is an AI and Data Product leader with a 15-year track record of scaling innovative data infrastructure companies. His career is distinguished by a unique 360-degree perspective gained from leading Engineering, Product, and Sales Engineering teams at hyper-growth startups like Snowflake and SurrealDB. He played a pivotal role in building the technical GTM engine that established Snowflake’s early enterprise dominance and more recently architected the product and GTM strategy for SurrealDB’s AI and vector capabilities. His open-source work includes proofs-of-concept for Retrieval-Augmented Generation with Knowledge Graphs (surrealdb-rag) and techniques for graph extraction (graph-examples). Currently, he is building a new stealth project to automate knowledge graph generation using an agentic framework that leverages diverse techniques from NLP to in-database search.

Talk Track: Data Engineering in an LLM era

Talk Technical Level: 3/7

Talk Abstract:
Traditional ETL pipelines are breaking under the demands of LLMs. They excel at structured data, but fail when confronted with the unstructured documents and implicit relationships that give AI its context. To solve this, we must evolve from ETL to “KG-ETL”—pipelines that build knowledge graphs as a first-class output. This session is a pragmatic guide to three competing pipeline architectures for building KGs from raw data. We’ll explore using LLM prompts as a new ‘T’ in your pipeline, contrast it with traditional NLP pipelines, and deep-dive into a novel hybrid retrieval workflow that uses vector stores for something beyond semantic search: high-precision entity resolution. You’ll leave with a framework for choosing the right pipeline for your data, moving beyond simple RAG to build truly context-rich AI systems.

What You’ll Learn:
Design and contrast three distinct data pipeline architectures for knowledge graph construction: LLM-prompt-based, traditional NLP-based, and a hybrid vector search-based model.

Evaluate the cost, latency, scalability, and observability trade-offs of each pipeline pattern, helping you select the right approach for your MLOps environment.

Learn a novel, operational technique for using vector stores beyond semantic search—by training a custom fasttext model on cleansed names to create embeddings for high-precision, scalable entity resolution.

Receive a decision framework for selecting the right KG-ETL pipeline based on your source data’s structure (unstructured, semi-structured, or structured) and your project’s specific requirements.

Talk: How Math-Driven Thinking Builds Smarter Agentic Systems

Presenter:
Claire Longo, Lead AI Researcher, Comet

About the Presenter:
Claire Longo is an AI leader and Mathematician with over a decade of experience in Data Science and AI. She has led cross-functional AI teams at Twilio, Opendoor, and Arize AI and is currently a Lead AI Researcher at Comet. She holds a Bachelor’s in Applied Mathematics and a Master’s in Statistics from The University of New Mexico. Beyond her technical work, Claire is a Speaker, Advisor, YouTuber, and Poker Player. She is dedicated to mentoring Engineers and Data Scientists while championing diversity and inclusion in AI. Her mission is to empower the next generation of AI practitioners.

Talk Track: Evolution of Agents

Technical Level: 3

Talk Abstract:
Everyone’s buzzing about LLMs, but too few are talking about the math that should guide how we apply them to real-world problems. Mathematics is the language of AI, and a foundational understanding of the math behind AI model architectures should drive decisions when we’re building AI systems.

In this talk, I will do a technical deep dive to demystify how different mathematical architectures in AI models can guide us on how and when to use each model type, and how this knowledge can help us design agent architectures and anticipate potential weaknesses in production so we can safeguard against them. I’ll break down what LLMs can do (and where they fall apart), clarify the elusive concept of “reasoning,” and introduce a benchmarking mindset rooted in math and modularity.

To put it all into context, I’ll share a real-world example of an Agentic use case from my own recent project: a poker coaching app that blends an LLM reasoning model as the interface with statistical models analyzing a player’s performance using historical data. This is a strong example of the future of hybrid agents, where LLMs and other mathematical algorithms work together, each solving the part of the problem it’s best suited for. It demonstrates the proper application of reasoning models grounded in their mathematical properties and shows how modular agent design allows each model to focus on the piece of the system it was built to handle.

I’ll also introduce a scientifically rigorous approach to benchmarking and comparing models, based on statistical hypothesis testing, so we can quantify and measure the impact of different models on our use cases as we evaluate and evolve agentic design patterns.

Whether you’re building RAG agents, real-time LLM apps, or reasoning pipelines, you’ll leave with a new lens for designing agents. You’ll no longer have to rely on trial and error or feel like you’re flying blind with a black-box algorithm. Foundational mathematical understanding will give you the intuition to anticipate how a model is likely to behave, reduce time to production, and increase system transparency.

What You’ll Learn:
It’s easier than you think to understand foundational mathematical concepts in AI, and use that knowledge to guide you build better AI systems

Talk: MCML: A Universal Schema for AI Traceability and Lifecycle Governance

Presenters:
Lanre Ogunkunle, Senior AI Engineer, PleyVerse AI | Alex Olaniyan, Project Manager, PleyVerse AI

About the Presenters:
Lanre Ogunkunle is the creator of MCML (Model Connect Markup Language), a schema-based framework designed to bring lifecycle traceability, auditability, and regulatory compliance to AI systems. With deep experience in AI architecture, MLOps, and responsible AI deployment, Lanre has led governance-focused AI implementations in healthcare, finance, and autonomous systems. They are currently building infrastructure to align AI development with standards like the EU AI Act, NIST RMF, and FDA SaMD. Their work focuses on operationalizing ethics, transparency, and safety across the entire AI lifecycle.

Alex is a Project Manager at PLEYVERSE, where he supports high-impact initiatives at the intersection of healthcare innovation, MLOps, and GenAI. With a background in Agile coaching and enterprise transformation, Alex ensures seamless cross-functional collaboration between engineering, research, and compliance teams. At the 2025 GenAI Summit, he plays a key support role in coordinating speaker engagement and representing PLEYVERSE’s commitment to safe, scalable, and responsible AI in healthcare.

Talk Track: Governance, Auditability & Model Risk Management

Technical Level: 4

Talk Abstract:
The deployment of artificial intelligence systems in critical domains such as healthcare, finance, and autonomous systems has intensified regulatory scrutiny and demands for transparent, auditable AI practices. Current documentation approaches, while valuable, suffer from fragmentation, limited interoperability, and insufficient lifecycle coverage. This paper introduces the Model Connect Markup Language (MCML), a unified schema-based governance framework that integrates model, dataset, interface, and agent documentation into a comprehensive traceability system. Our contribution demonstrates how MCML enables end-to-end AI traceability from development through inference, supporting regulatory compliance (EU AI Act, NIST RMF, FDA SaMD) and facilitating cross-organizational interoperability. We validate MCML’s effectiveness through alignment analysis with ML Bill of Materials initiatives and present empirical results from real-world implementations across three industry verticals, showing 40% reduction in audit preparation time and improved incident response capabilities.

What You’ll Learn:
– How to implement lifecycle traceability using MCML
– How to map AI artifacts to compliance frameworks (NIST, EU AI Act, FDA)
– How to integrate MCML into CI/CD pipelines and existing MLOps stacks
– Best practices from MCML adoption in healthcare, finance, and autonomous systems

Talk: Why is ML on Kubernetes Hard? Defining How ML and Software Diverge

Presenters:
Donny Greenberg, Co-Founder / CEO, Runhouse | Paul Yang, Member of Technical Staff, Runhouse

About the Presenters:
Donny is the co-founder and CEO of 🏃‍♀️Runhouse🏠. He was previously the product lead for PyTorch at Meta, supporting the AI community across research, production, OSS, and enterprise. Notable projects include TorchRec, the open-sourcing of Meta’s large-scale recommendations infra, TorchArrow & TorchData, PyTorch’s next generation of data APIs.

At Runhouse, Paul is helping to build, test, and deploy Kubetorch at leading AI labs and enterprises for RL, training, and inference use cases. Previously, he worked across a range of ML/DS and infra domain areas, from language model tuning and evaluations for contextually aware code generation to productizing causal ML / pseudo-causal inference.

Talk Track: ML Training Lifecycle

Technical Level: 2

Talk Abstract:
Mature organizations run ML workloads on Kubernetes, but implementations vary widely, and ML engineers rarely enjoy the streamlined development and deployment experiences that platform engineering teams provide for software engineers. Making small changes takes an hour to test and moving from research to production frequently takes multiple weeks – these unergonomic and inefficient processes are unthinkable for software, but standard in ML. To explain this, we first trace the history of ML platforms and how early attempts like Facebook’s FBLearner as “notebooks plus DAGs” led to incorrect reference implementations. Then we define the critical ways that ML diverges from software, such as inability to do local testing due to data size and acceleration needs (GPU), heterogeneity in distributed frameworks and their requirements (Ray, Spark, PyTorch, Tensorflow, Dask, etc.), non-trivial observability and logging. Finally, we propose a solution, Kubetorch, which bridges between an iterable and debuggable Pythonic API for ML Engineers and Kubernetes-first scalable execution.

What You’ll Learn:
ML, especially at sophisticated organizations, is done on Kubernetes. However, there are no definitive reference implementations and well-used projects to date for ML-on-Kubernetes like Kubeflow have had mixed reactions from the community. Kubetorch is an introduction of a novel compute platform that is Kubernetes-native that offers a great, iterable, and debuggable interface into powerful compute for developers, without introducing new pitfalls of brittle infrastructure or long deployment times. In short, Kubetorch is a recognition that ML teams are demanding better platform engineering (rather than “ML Ops” / DevOps) and the right abstraction over Kubernetes is necessary to achieve this.

Talk: Autonomous MLOps Pipelines: Architecting Self-Healing, Drift-Resistant Models at Scale

Presenter:
Kamal Singh Bisht, Principal Application Engineer, Discover Financial Services

About the Presenter:
Kamal Singh Bisht is a seasoned technologist with over 18 years of experience across observability, AI/ML, and cybersecurity. He has held senior engineering and leadership roles at organizations like JPMorgan Chase, Zillow, and Discover Financial Services, where he specialized in building intelligent, scalable platforms for analytics and infrastructure monitoring.

Kamal’s expertise lies in bridging advanced machine learning techniques—including Generative AI and LLMs—with real-world production systems. His recent work focuses on automating root cause analysis and anomaly detection in large-scale IT environments using AI-driven observability architectures.

He has completed masters from Bangalore University and Post Graduate Program in Artificial Intelligence and Machine Learning from the University of Texas at Austin. Kamal is also the author of technical publications and is actively contributing to the advancement of AI in enterprise observability.

Talk Track: MLOps for Smaller Teams

Technical Level: 2

Talk Abstract:
Machine learning models deployed in production inevitably face performance degradation due to data and concept drift, infrastructure anomalies, and evolving business conditions. Traditional MLOps pipelines automate deployment and monitoring but often lack the capability to adapt and remediate autonomously, leading to increased downtime and costly manual interventions. This session presents a comprehensive architecture for building autonomous MLOps pipelines that integrate real-time observability, automated drift detection, and self-healing mechanisms to maintain model accuracy and reliability at scale. Leveraging observability tools and orchestration frameworks, the pipeline continuously monitors telemetry data, detects anomalies, and triggers corrective actions including automated rollback, retraining, or adaptive alert tuning—all without human intervention. Join this session to learn how to transition from reactive ML maintenance to proactive, intelligent operations that ensure continuous model health and business continuity.

What You’ll Learn:
A reference architecture for end-to-end autonomous MLOps pipelines

Strategies to combat data drift, concept drift, and model decay

Real-world design patterns for auto-retraining, rollback, and model health checks

Tools and frameworks to operationalize this vision

Talk: Video Intelligence Is Going Agentic

Presenter:
James Le, Head of Developer Experience, TwelveLabs

About the Presenter:
James Le is currently leading Developer Experience at Twelve Labs – a startup building multimodal foundation models for video understanding. Previously, he has worked at the nexus of enterprise ML/AI and data infrastructure. He also hosted a podcast that features raw conversations with founders, investors, and operators in the space.

Talk Track: Multimodal Systems in Production

Technical Level: 4

Talk Abstract:
While 90% of the world’s data exists in video format, most AI systems treat video like static images or text—missing crucial temporal relationships and multimodal context. This talk explores the paradigm shift toward agentic video intelligence, where AI agents don’t just analyze video but actively reason about content, plan complex workflows, and execute sophisticated video operations.

Drawing from real-world implementations including MLSE’s 98% efficiency improvement in highlight creation (reducing 16-hour workflows to 9 minutes), this session demonstrates how video agents combine multimodal foundation models with agent architectures to solve previously intractable problems. We’ll explore the unique challenges of video agents—from handling high-dimensional temporal data to maintaining context across multi-step workflows—and showcase practical applications in media, entertainment, and enterprise video processing.

Attendees will learn how to architect video agent systems using planner-worker-reflector patterns, implement transparent agent reasoning, and design multimodal interfaces that bridge natural language interaction with visual media manipulation.

What You’ll Learn:
1. Why traditional approaches fail: Understanding the fundamental limitations of applying text/image AI techniques to video, and why agentic approaches are necessary for complex video understanding.

2. Video agent architecture patterns: How to design and implement planner-worker-reflector architectures that can maintain context across complex multi-step video workflows.

3. Practical implementation strategies: Real-world approaches to building transparent agent reasoning, handling multimodal interfaces, and orchestrating video foundation models.

4. Business impact and ROI: Concrete examples of dramatic efficiency improvements and how to identify high-impact use cases in their own organizations

Talk: Humans in the Loop: Designing Trustworthy AI Through Embedded Research

Presenter:
David Baum, UX Researcher & Design Strategist, Amazon

About the Presenter:
David Baum is a design strategist and UX researcher with over a decade of experience shaping AI-powered products at the intersection of human behavior, ethical design, and emerging technology. Currently leading UX research for Amazon Ads’ Generative AI portfolio, David works across disciplines to translate ambiguity into actionable insight, ensuring that cutting-edge models serve real human needs.

His past work spans healthcare, behavioral science, and enterprise innovation guiding product teams at organizations like Johnson & Johnson, Memorial Sloan Kettering, Cigna, and the U.S. Department of Veterans Affairs. David is especially focused on how AI reshapes cognition, decision-making, and user trust, and frequently explores the implications of AI on systems-level design, human-AI collaboration, and collective wellbeing.

He’s a frequent panelist and contributor on topics ranging from ethical AI to strategic foresight, and is known for his ability to bridge deeply technical domains with accessible, human-centered narratives.

Talk Track: Scoping and Delivering Complex AI Projects

Technical Level: 2

Talk Abstract:
As generative AI rapidly moves from lab to product, many teams are rushing to ship capabilities without understanding the lived experiences, risks, and edge cases that define real-world usage. This talk explores how embedding user research earlier–and more meaningfully–into AI development pipelines can do more than just mitigate harm. It can enhance product adoption, build user trust, and surface invisible needs that AI alone won’t catch.

Drawing on experience leading UX research for Amazon Ads’ generative AI portfolio and past work in healthcare, behavioral science, and public systems, I’ll show how user insights can serve as functional guardrails – shaping model boundaries, UI design, and feedback loops. We’ll also interrogate the frictionless design ethos that dominates AI tooling today, and ask: what does it mean to design for thoughtfulness rather than speed?

Whether you’re building AI-native products or adapting legacy systems, this talk will offer frameworks and provocations for making AI more accountable, more human, and more useful.

What You’ll Learn:
UX research is not just a validation tool, it’s a critical input to AI product strategy and model governance.

Friction isn’t failure: thoughtful UX friction can support better outcomes, greater user agency, and higher trust in AI systems.

Embedding research into AI workflows helps detect misalignment early, before launch, reducing risk and surfacing ethical blind spots.

Cross-functional collaboration (PMs, designers, engineers, scientists) must center the human, not just the model.

Designing for trust means understanding how users think, not just how models predict.

Talk: Purpose Built Data Agents - This is the Way

Presenter:
Josh Goldstien, Solutions Architect, Weaviate

About the Speaker:
Josh Goldstein is a seasoned search engineer who specializes in building intelligent retrieval systems that bridge the gap between machine learning and meaning. With over a decade of experience across enterprise search, MLOps, and production-grade infrastructure, Josh has architected large-scale solutions that help people find what matters. When he’s not coding, you can find him playing racket sports, running a marathon, or regretting mechanical bulls.

Talk Track: ML Agents in Production

Talk Abstract:
As enterprises transition from AI experimentation to production deployment, a critical challenge has emerged: the complexity of context engineering, the art and science of filling LLM context windows with precisely the right information. Unlike simple prompt engineering, context engineering involves carefully selecting and organizing the right information, examples, relevant data, tools, and conversation history to give the Robots EXACTLYwhat they need to work efficiently and accurately. Provide too little context and the AI will underperform; include too much information and costs will rise while quality drops. The balancing act requires more than prompt engineering, it necessitates conscientious, directed data management.

This talk advocates for a pragmatic alternative: purpose-built data agents. Drawing from Karpathy’s insight that modern AI applications require the evolution of prompt engineering into sophisticated context engineering, we’ll explore how specialized agents can efficiently manage the two foundational pillars of context: memory and knowledge sources.

Through real-world case studies from teams in MVP and production phases, attendees will discover:

Context engineering in practice: Strategies for dynamically accessing, filtering, and expanding knowledge sources based on user queries

Measurable business impact: How purpose-built agents accelerate time-to-value for data-driven organizations

Rather than manually managing context assembly, discover how purpose built data agents automatically perform the critical steps of context engineering: intelligent knowledgebase understanding, querying, filtering data, search expansion when appropriate, and dynamically assembling the right context for individual user requests. Integrating with the agentic framework provides a clear path to production readyAI applications that solve specific challenges efficiently without losing response accuracy.

What You’ll Learn:
Strategies and techniques for context engineering with memory and knowledge management

Understanding the of agentic Ai

Talk: Agent-Powered Code Migration at Realtor.com

Presenter:
Naveen Reddy Kasturi, Staff Machine Learning Engineer, Realtor.com

About the Speaker:
Naveen Reddy Kasturi is a seasoned AI and ML leader with over 13 years of experience building intelligent systems at scale. He currently serves as a Staff Machine Learning Engineer at Realtor.com, where he leads the development of Generative AI–powered search experiences and the modernization of the company’s ML platform infrastructure to deliver scalable, production-ready models.

Prior to Realtor.com, Naveen was a founding member of the ML team at Typeform, where he spearheaded the company’s earliest Generative AI initiatives. He built LLM-powered features such as AskAI, a natural language interface for analyzing response data, as well as intelligent assistants for form creation, smart insights, and lead scoring. His work integrated cutting-edge technologies like AWS Bedrock, LangChain, LlamaIndex, and PineCone into production systems, while also advancing LLM evaluation frameworks, RAG pipelines, and fine-tuning strategies.

Earlier in his career, Naveen applied ML across diverse industries—designing predictive maintenance for locomotives at GE, anomaly detection in IT systems at Société Générale, and developing safety-critical software at Bosch. His journey reflects a unique blend of research-driven innovation and hands-on engineering to bring advanced AI capabilities into real-world applications.

A frequent cross-functional collaborator, Naveen is passionate about bridging the gap between ML research and practical deployment, with deep expertise in MLOps, LLM integration, infrastructure scaling, and AI-powered product innovation. He continues to mentor teams, publish thought leadership, and drive forward-looking applications of AI that enhance business outcomes and user experiences

Talk Track: AI Agents for Developer Productivity

Talk Abstract:
Migrating large-scale machine learning and data workflows is often tedious, error-prone, and resistant to automation. At Realtor.com, we faced the challenge of moving nearly 100 ML pipelines from a legacy, self-managed Metaflow setup on AWS Batch to a fully managed, RDC-compliant Outerbounds Metaflow environment on AWS EKS—without breaking functionality or slowing down innovation.

In this talk, I will share how we combined static code analysis, AI-powered pattern recognition, and LLM-assisted code rewriting to accelerate migration at scale. Using Claude via AWS Bedrock, we built an AI-assisted workflow that scanned entire repositories for deprecated constructs, generated migration blueprints, and automatically rewrote code with human-in-the-loop validation. This approach reduced weeks of manual effort, migrated ~30,000 lines of code, and enabled a reliable framework for future platform shifts (e.g., upgrading from Python 3.9 to 3.12).

Beyond the technical details, this session highlights how AI can act as a force multiplier in MLOps, transforming code migration from a painful, manual process into a repeatable, inspectable, and scalable pipeline. Attendees will walk away with insights into building AI-augmented developer workflows, balancing automation with governance, and applying generative AI to real-world engineering challenges

What You’ll Learn:
AI is not magic—context is everything:
Effective AI-assisted code migration depends on strong foundations like static analysis, pattern recognition, and injecting the right domain-specific context into prompts.

AI can turn painful migrations into scalable workflows:
By combining LLMs with programmatic analysis, organizations can automate large portions of legacy code migration, saving weeks of engineering effort while maintaining reliability.

Human-in-the-loop is non-negotiable:
AI-generated code must go through validation and governance loops to ensure correctness, compliance, and maintainability. AI accelerates, but humans direct and safeguard.

Pattern recognition unlocks platform-level thinking:
Once you codify patterns from static repo scans, the same approach can be reapplied for future shifts—such as framework upgrades, Python version migrations, or platform compliance needs.

AI-augmented developer workflows are the future of MLOps:
Embedding generative AI into engineering pipelines opens a new frontier where infrastructure upgrades, code rewrites, and large-scale ML operations become faster, repeatable, and less error-prone

Talk: A Modular Framework for Building Agentic Workforces at Marriot International

Presenter:
Nitin Kumar, Director Data Science, Marriott International

About the Speaker:
Nitin Kumar is the Director of Data Science at Marriott International, where he leads the design and deployment of enterprise-scale AI and generative solutions across 30 global brands. With over 20 years of experience in technology—including CRM modernization, process optimization, and applied machine learning—he specializes in translating cutting-edge AI into operational impact at scale.

Nitin holds a Master’s in Data Science from the University of Illinois Urbana-Champaign and actively contributes to the research community through peer-reviewed publications on topics like ethical AI, LLM-based translation benchmarking, and multi-view clustering. He also serves as a judge for global AI and innovation awards.

He was recently selected as a panelist at the Ai4 2025 Conference in Las Vegas, where he will speak on building and deploying autonomous AI agents.

Talk Track: Augmenting workforces with Agents

Talk Abstract:
As organizations increasingly leverage large language models (LLMs) to automate knowledge work—from content creation to summarization and multilingual communication—new challenges arise around trust, governance, and quality control. This session introduces a modular framework for building agentic workforces: semi-autonomous AI systems that work in tandem with human oversight to deliver scalable, high-quality outcomes. The approach blends LLM-driven generation, automated self-evaluation using LLM-as-a-judge techniques, and human-in-the-loop (HitL) checkpoints for verifying accuracy, tone, and contextual appropriateness.

We will share practical implementation patterns such as: (1) triaging outputs by confidence scores to route uncertain cases to humans, (2) defining escalation triggers using prompt-based evaluations, (3) layering automated evaluation rubrics for translation and content quality, and (4) integrating feedback loops to fine-tune both model performance and human review criteria over time. These design principles help teams deploy generative AI responsibly—balancing speed with control—and are applicable across domains like marketing, customer support, documentation, and multilingual communication.

What You’ll Learn:
Attendees will learn how to design scalable, semi-autonomous AI workflows that combine LLM-driven content generation, automated evaluation (LLM-as-a-judge), and human-in-the-loop oversight to ensure quality, trust, and accountability. The talk emphasizes practical patterns for triaging outputs, setting escalation thresholds, and integrating feedback loops, while also extending these principles to multilingual translation and evaluation. The key takeaway is that agentic workforces—blending AI efficiency with human judgment—offer a responsible and scalable path for deploying generative AI in real-world, high-stakes environments.

Talk: Unified Control Plane for Enterprise GenAI: Powered by Agentic Deployment Platform with Central AI Gateway & MCP Integration

Presenter:
Raghu Chandra, Strategic Advisor, TrueFoundry

About the Speaker:
Raghu Chandra is a Strategic Advisor to TrueFoundry, helping enterprises adopt MLOps and AI-native infrastructure at scale. He has built and scaled global engineering, GTM, and solution architecture teams, delivering large-scale enterprise transformation initiatives across healthcare, financial services, and technology. Previously, Raghu founded and grew the public cloud business at Cognizant, spearheaded digital modernization programs for Fortune 500 companies, and advised fast-growing AI startups on product, partnerships, and go-to-market strategy. With deep expertise at the intersection of data, AI, and enterprise systems, Raghu focuses on making advanced AI infrastructure accessible, reliable, and impactful for organizations worldwide.

Talk Abstract:
As generative AI evolves from experimental tools to mission-critical enterprise applications, organizations face unprecedented operational complexity. Modern AI systems now orchestrate multiple models, invoke diverse tools, and span hybrid infrastructures, creating challenges around inconsistent APIs, model outages, unpredictable latency, complex rate limiting, and mounting governance requirements. Without centralized control, enterprises struggle with vendor lock-in, compliance gaps, runaway costs, and fragmented observability across their distributed AI ecosystems.

This session introduces the AI Gateway pattern—a critical architectural component that serves as the central control plane for enterprise AI systems. We’ll explore practical solutions including unified API abstraction, intelligent failover mechanisms, semantic caching, centralized guardrails, and granular cost controls. You’ll learn technical architecture patterns for building high-availability gateways that handle thousands of concurrent requests with sub-millisecond decision-making, plus emerging integration patterns like Model Context Protocol (MCP) for managing entire tool ecosystems.

Whether you’re an architect, platform engineer, or technical leader, you’ll gain actionable insights, architectural blueprints, and a practical framework for implementing scalable AI infrastructure that grows with your organization’s AI maturity.