We’ll see you in Austin!!

Renaissance Austin Hotel
9721 Arboretum Blvd
Austin, TX 78759, United States

October 7th: Virtual Talks
October 8th & 9th: In-person talks, workshops & networking
August 1st:- Deadline to submit

2025 Conference Tracks

This track highlights practical uses of agents to streamline dev workflows—from debugging and code generation to test automation and CI/CD integration.

This track covers the key architectural choices and infra strategies behind scaling AI and LLM systems in production—from bare metal to Kubernetes, GPU scheduling to inference optimization. Learn what it really takes to build and operate reliable GenAI and agent platforms at scale.

This track explores how teams are combining human oversight with semi-autonomous agents to scale support, operations, and decision-making across the business.

This track explores the design patterns shaping modern agents, from prompt engineering and tool integration to memory and planning strategies, focusing on real-world systems, not just frameworks. It also covers the infrastructure, safety checks, and governance required to deploy agents reliably and securely in production environments, with expert presenters sharing their insights on the challenges of running agents at scale.

This track covers the key architectural choices and infrastructure strategies behind scaling AI and LLM systems in production, from bare metal to Kubernetes, GPU scheduling to inference optimization. It also addresses the complexities of managing model, data, and pipeline versions in a reproducible, team-friendly way, alongside the unique challenges of deploying ML in regulated, resource-constrained, or air-gapped environments. Expert speakers will share insights on building and operating reliable GenAI and agent platforms at scale while navigating the tradeoffs when cloud-based solutions aren’t an option.

This track covers how teams manage AI risk in production—through model governance, audit trails, compliance workflows, and strategies for monitoring model behavior over time.
Not every team has a platform squad or unlimited infra budget. This track shares practical approaches to shipping ML with lean teams—covering lightweight tooling, automation shortcuts, and lessons from teams doing more with less.
Security doesn’t end at deployment. This track covers threat models, model hardening, data protection, and supply chain risks across the entire ML lifecycle.
Training isn’t just about epochs and GPUs. Talks focus on reproducibility, retraining triggers, pipeline automation, and how teams manage iterative experimentation at scale.

This track focuses on scoping and delivering complex AI projects, exploring how teams are adapting their scoping processes to account for LLMs, agents, and evolving project boundaries in fast-moving environments. It also dives into the strategies behind AI product development, from aligning business goals to driving successful delivery and scaling. Expert presenters will share practical insights on navigating the complexities of AI product strategy and execution.

This 2025 track covers real-world patterns and pitfalls of running LLMs on Kubernetes. Topics include GPU scheduling, autoscaling, memory isolation, and managing cost and complexity at scale.
This 2025 track explores the realities of deploying ML in regulated, resource-constrained, or air-gapped environments. Talks focus on infrastructure design, data access, and managing tradeoffs when the cloud isn’t an option.
What does it mean to observe an LLM in production? This 2025 track unpacks logging, tracing, token-level inspection, and metrics that actually help teams debug and improve deployed models.
LLMs are changing how we think about data pipelines. This track examines the shifting roles of ETL, vector stores, and retrieval workflows in context-rich, model-driven systems.
This track addresses the performance, cost, and reliability challenges of running inference at scale, exploring techniques from token streaming and caching strategies to hardware-aware scheduling. It also delves into low-level optimizations, model compilation, and inference kernels, covering everything from Triton and ONNX to custom CUDA solutions. Expert presenters will share insights into the systems that power fast, efficient, and production-ready AI inference across modern hardware.
This track focuses on building and scaling multimodal systems—models that handle text, image, audio, or video—in production. Learn how teams are designing serving stacks, data flows, and evaluation methods for real-world use.