CFS Topics — MLOps World

**MLOps World | GenAI Summit** 2026 Call for Speakers mlopsworld.com · November 2026 · Austin, Texas *What this community is blocked on — and what it's ready to go deeper on. Directly from the people in the room.* ───────────────────────────────────────────────────────────────────────────── **The Short Version** MLOps World | GenAI Summit exists for one reason: to give AI/ML engineers, platform leads, and technical decision-makers a credible place to discuss what actually happens when AI systems are built, deployed, and operated at scale. The people in this room are not beginners. They are already running production systems — at financial institutions, enterprise software companies, cloud platforms, and high-scale consumer products. They operate under real constraints: regulated data environments, legacy infrastructure, multi-team ownership, and hard cost ceilings. They are looking for two things: +────────────────────────────────────+────────────────────────────────────+ | Honest accounts of what they're | Serious engagement with what's | | stuck on | actually possible | | | | | Talks grounded in real operational | Talks that honestly map the gap | | experience that give them something | between AI's current capabilities | | durable to take back. | and our collective ability to use | | | them. | +────────────────────────────────────+────────────────────────────────────+ +──────────────────────────────────────────────────────────────────────────+ | A strong talk answers at least one of these: | | | | - What did you build — and what actually happened when it ran in | | production? | | - What took longer to resolve than you expected — and why? | | - What would you do differently if you started today? | | - What does your team now know that it didn't know twelve months ago? | | - What can production AI do right now that most practitioners haven't | | fully internalized yet? | +──────────────────────────────────────────────────────────────────────────+ ───────────────────────────────────────────────────────────────────────────── **Who Is in the Room** MLOps World | GenAI Summit draws a working practitioner audience. The majority of attendees are actively operating AI systems in production or have deployed within the past twelve months. They include: • ML Platform Engineers, MLOps leads, and infrastructure architects • AI/ML engineering managers and technical directors • Principal and staff data scientists with production responsibilities • GenAI engineers and LLM platform builders • Applied AI researchers bridging lab and production • Technical product managers and AI program leads This is not a general AI audience. They have dealt with evaluation failures, compliance blockers, cost surprises, and organizational friction. They come specifically for operational honesty — not inspiration. ───────────────────────────────────────────────────────────────────────────── **The Three Tracks** +──────────────────────+──────────────────────+──────────────────────+ | 🔬 Research & | ⚙️ Technical & | 📊 Business & | | Cutting Edge | Engineering | Leadership | | | | | | New methods, | Systems design, | Org design, ROI, | | emerging archi- | infrastructure, | governance, | | tectures, frontier | implementation | stakeholder | | capabilities | patterns | alignment | +──────────────────────+──────────────────────+──────────────────────+ Strong submissions may span more than one track. ═════════════════════════════════════════════════════════════════════════════ PART ONE: WHAT THIS COMMUNITY IS BLOCKED ON ═════════════════════════════════════════════════════════════════════════════ From the 2026 Steering Committee survey — practitioners at Cisco, RBC, Netflix, PIMCO, Mastercard, Snowflake, Meta, Google, TELUS, Chubb, and more. Your talk does not need to fit neatly into one category. But it should connect to a real constraint that people in this room are living with. ┌──────────────────────────────────┐ ┌───────────────────────────────┐ │ Evaluation, Testing & │ │ Agent Infrastructure & │ │ Observability │ │ Harness │ └──────────────────────────────────┘ └───────────────────────────────┘ ┌──────────────────────────────────┐ ┌───────────────────────────────┐ │ Governance, Security & │ │ MLOps & Infrastructure │ │ Responsible AI │ │ Scaling │ └──────────────────────────────────┘ └───────────────────────────────┘ ┌──────────────────────────────────┐ ┌───────────────────────────────┐ │ Data Quality, Readiness & │ │ Organizational Readiness & │ │ Governance │ │ Business Value │ └──────────────────────────────────┘ └───────────────────────────────┘ ───────────────────────────────────────────────────────────────────────────── 1 Evaluation, Testing & Observability How do you actually know your system is working — and keep knowing it? ───────────────────────────────────────────────────────────────────────────── The single most named blocker in our committee survey — cited by practitioners at Netflix, Pinterest, PIMCO, Yelp, Scotiabank, Cisco, and dozens more. +──────────────────────────────────────+─────────────────────────────────────+ | We are looking for | Not looking for | +──────────────────────────────────────+─────────────────────────────────────+ | ✓ Eval frameworks for non- | ✗ General intros to evaluation | | deterministic systems (LLMs, | as a concept | | agents) | | | ✓ Defining 'good enough' across | ✗ Vendor tooling walkthroughs | | engineering, product, legal, biz | without honest trade-offs | | ✓ Evals tied to business outcomes, | ✗ Benchmark results not grounded | | not just model metrics | in real operating conditions | | ✓ LLM-as-judge: how you calibrated | ✗ Talks that describe the problem | | it and where it failed | without a path through it | | ✓ Adversarial testing + drift | | | detection in live production | | | ✓ Cross-functional 'eval council' | | | structures that actually worked | | +──────────────────────────────────────+─────────────────────────────────────+ ╔══════════════════════════════════════════════════════════════════════╗ ║ Ask yourself: Can you describe specifically how your team determines ║ ║ whether the system is performing well enough to stay ║ ║ in production — and what happens when it isn't? ║ ╚══════════════════════════════════════════════════════════════════════╝ ───────────────────────────────────────────────────────────────────────────── 2 Agent Infrastructure & Harness What does it take to build and operate agents that work reliably at scale? ───────────────────────────────────────────────────────────────────────────── Our committee drew a sharp line between deploying an agent and engineering one. The talks worth attending are about the latter. +──────────────────────────────────────+─────────────────────────────────────+ | We are looking for | Not looking for | +──────────────────────────────────────+─────────────────────────────────────+ | ✓ Orchestration, memory, tool design, | ✗ 'Look what I built with LangChain' | | and fallback logic for prod agents | demo-level talks | | ✓ Moving from generic to domain- | ✗ Architecture diagrams without | | specialized agents | operational history | | ✓ Multi-agent observability: auditing | ✗ Capability showcases without | | and failure detection at scale | honest failure mode discussion | | ✓ Non-human identity (NHI) and | | | credential mgmt for autonomous | | | agents | | | ✓ Platformization: multiple agent | | | stacks under shared infrastructure | | +──────────────────────────────────────+─────────────────────────────────────+ ╔══════════════════════════════════════════════════════════════════════╗ ║ Ask yourself: Did your team realize it was building agents quickly but ║ ║ not well — and what changed when you made deliberate ║ ║ architectural choices? ║ ╚══════════════════════════════════════════════════════════════════════╝ ───────────────────────────────────────────────────────────────────────────── 3 Governance, Security & Responsible AI How do you deploy responsibly in regulated environments without stalling? ───────────────────────────────────────────────────────────────────────────── Security and compliance appeared in nearly every committee response — not as an abstract concern, but as the specific thing that stopped a working system from reaching production. +──────────────────────────────────────+─────────────────────────────────────+ | We are looking for | Not looking for | +──────────────────────────────────────+─────────────────────────────────────+ | ✓ LLM deployment patterns in | ✗ High-level regulatory overviews | | finance, healthcare, insurance, | without operational takeaways | | telecom | ✗ Responsible AI frameworks that | | ✓ Navigating internal legal/security | exist only in slide decks | | review — what slowed you down | ✗ Talks that describe the challenge | | ✓ PII handling, data residency, | but not how the team resolved it | | prompt logging in real systems | | | ✓ Red-teaming in enterprise env. | | | ✓ EU AI Act, GDPR, cross-border | | | compliance patterns | | +──────────────────────────────────────+─────────────────────────────────────+ ╔══════════════════════════════════════════════════════════════════════╗ ║ Ask yourself: Can you describe a specific compliance constraint that ║ ║ blocked your deployment — and what it took to get past ║ ║ it? ║ ╚══════════════════════════════════════════════════════════════════════╝ ───────────────────────────────────────────────────────────────────────────── 4 MLOps & Infrastructure Scaling What does it actually cost to run this in production — and how did you manage it? ───────────────────────────────────────────────────────────────────────────── As agentic and LLM-based systems move from pilots to real traffic, the economics and engineering patterns of production ML are being stress-tested. +──────────────────────────────────────+─────────────────────────────────────+ | We are looking for | Not looking for | +──────────────────────────────────────+─────────────────────────────────────+ | ✓ Inference cost and unit economics | ✗ Cloud vendor capability overviews | | at real traffic — specific numbers | without production grounding | | ✓ GPU access, model serving, latency | ✗ Benchmarks not grounded in actual | | under production load | production workloads | | ✓ Self-hosting vs. API: honest econ. | | | ✓ Platformization: multiple stacks, | | | divergent team practices | | | ✓ LLM inference optimization: | | | quantization, latency, throughput | | | ✓ CI/CD pipelines for GenAI systems | | +──────────────────────────────────────+─────────────────────────────────────+ ╔══════════════════════════════════════════════════════════════════════╗ ║ Ask yourself: Can you walk through a real infrastructure or cost ║ ║ decision — what you chose, what it cost, and whether ║ ║ you'd choose the same thing again? ║ ╚══════════════════════════════════════════════════════════════════════╝ ───────────────────────────────────────────────────────────────────────────── 5 Data Quality, Readiness & Governance What did your data actually look like when you tried to use it? ───────────────────────────────────────────────────────────────────────────── +──────────────────────────────────────+─────────────────────────────────────+ | We are looking for | Not looking for | +──────────────────────────────────────+─────────────────────────────────────+ | ✓ Getting data to production-ready | ✗ Data platform demos without | | state, including org work | honest discussion of failure modes | | ✓ RAG reliability: silent failures, | ✗ Clean pipeline examples that don't | | knowledge base drift, attribution | reflect enterprise complexity | | ✓ Feedback loops and continuous | | | labeling in live systems | | | ✓ Regulated or sensitive data in | | | training and inference pipelines | | +──────────────────────────────────────+─────────────────────────────────────+ ╔══════════════════════════════════════════════════════════════════════╗ ║ Ask yourself: Can you describe the gap between what you expected your ║ ║ data to look like — and what it actually looked like ║ ║ when you started building in earnest? ║ ╚══════════════════════════════════════════════════════════════════════╝ ───────────────────────────────────────────────────────────────────────────── 6 Organizational Readiness & Business Value How do you move AI from a working demo to something the org actually uses? ───────────────────────────────────────────────────────────────────────────── Multiple committee respondents identified organizational friction as the real blocker. One put it directly: "evaluation is a political problem disguised as a technical one." +──────────────────────────────────────+─────────────────────────────────────+ | We are looking for | Not looking for | +──────────────────────────────────────+─────────────────────────────────────+ | ✓ POC → internal governance → | ✗ Motivational content about AI's | | production: the real path | potential | | ✓ Measuring business value beyond | ✗ Case studies showing only the | | model performance | outcome without the friction | | ✓ Translating model metrics into risk | | | language legal + leadership can act | | | ✓ Lean teams that shipped faster — | | | what you stopped doing | | +──────────────────────────────────────+─────────────────────────────────────+ ╔══════════════════════════════════════════════════════════════════════╗ ║ Ask yourself: Can you describe a specific moment where the technical ║ ║ work was done but deployment stalled — and what it took ║ ║ to move it forward? ║ ╚══════════════════════════════════════════════════════════════════════╝ ═════════════════════════════════════════════════════════════════════════════ PART TWO: WHAT THIS COMMUNITY IS CURIOUS ABOUT ═════════════════════════════════════════════════════════════════════════════ NOTE: Topics below are inferred from open-ended responses in our 2026 Steering Committee survey. They are our best read as of early 2026 and will evolve as the committee meets monthly throughout the year. Submissions are always welcome for the main November event, monthly sessions, and webinar programming. ───────────────────────────────────────────────────────────────────────────── A Agent Engineering vs. Vibe Coding What deliberate agent architecture looks like — and what it costs not to. ───────────────────────────────────────────────────────────────────────────── Why this matters: • Teams are discovering fast-and-loose agent development accumulates architectural debt at unusual speed • The question of what makes a domain-specialized agent genuinely capable is largely unanswered in public forums • Strong format for a debate or panel: engineering rigor vs. ship-fast What a talk could look like: • The specific decisions — orchestration, memory, tool design, fallback logic — that separated production-ready from not • What vibe coding got wrong at scale, and what recovery looked like • How you extended an agent for a specific domain ╔══════════════════════════════════════════════════════════════════════╗ ║ Ask yourself: Can you describe the moment your team realized it was ║ ║ building agents quickly but not well? ║ ╚══════════════════════════════════════════════════════════════════════╝ ───────────────────────────────────────────────────────────────────────────── B AI-Generated Code in Production Maintenance, authorship, and what we're still figuring out. ───────────────────────────────────────────────────────────────────────────── The hidden costs are in maintenance, regression, and accountability — not in the initial speed gains everyone references. The field hasn't agreed on authorship, regression testing, or accountability models for AI-generated code. • Real experience maintaining a codebase where significant portions were AI-generated — what broke, what surprised you • How your team handles authorship, review standards, accountability • The honest trade-off: productivity at generation time vs. maintenance cost over time ───────────────────────────────────────────────────────────────────────────── C Non-Human Identity & Agent Security The attack surface nobody is talking about. ───────────────────────────────────────────────────────────────────────────── As agents take on autonomous tasks — calling APIs, writing files, triggering workflows — they require identities: credentials, tokens, permissions. Most existing IAM/PAM frameworks were not designed for non-human principals. • How you designed and governed NHI credentials for production agents • Failure modes: agents with excess permissions, leaked credentials • Compliance and audit trail for agents acting autonomously ───────────────────────────────────────────────────────────────────────────── D RL and Fine-Tuning in the Enterprise Getting better models with less data — in environments where data is always the constraint. ───────────────────────────────────────────────────────────────────────────── • How you applied GRPO, RLHF, or RLAIF in a domain-specific setting • Domain adaptation with limited labeled data — what actually worked • Task-specific supplementary models to patch foundation model gaps • Continual learning pipelines post-deployment ───────────────────────────────────────────────────────────────────────────── E Novel Approaches to Evaluation Not the problem — the creative ways practitioners have found through it. ───────────────────────────────────────────────────────────────────────────── • Combining business-outcome data with LLM traces, not logging in isolation • Approaches that tied model behavior directly to a business metric stakeholders could act on • Evaluation for agentic systems operating across multiple steps ═════════════════════════════════════════════════════════════════════════════ WHAT WE ARE NOT LOOKING FOR ═════════════════════════════════════════════════════════════════════════════ ✗ Product announcements or capability overviews framed as practitioner talks ✗ Trend surveys or predictions without operational grounding ✗ Talks built around what a system could do rather than what it did ✗ Success stories that skip the constraints, failure modes, and friction ✗ Vendor case studies not from the customer/end-user perspective ✗ Academic work not connected to production deployment or real constraints ═════════════════════════════════════════════════════════════════════════════ THE STANDARD ═════════════════════════════════════════════════════════════════════════════ ╔══════════════════════════════════════════════════════════════════════╗ ║ ║ ║ Could someone in that room learn something durable from your ║ ║ experience that they could not have gotten from a blog post, a ║ ║ vendor demo, or a press release? ║ ║ ║ ║ If yes, submit. ║ ║ ║ ╚══════════════════════════════════════════════════════════════════════╝ ───────────────────────────────────────────────────────────────────────────── Year-round submissions welcome. Sessions not placed in the November main program will be considered for monthly community sessions, webinars, and pre-conference workshops. Part Two topics will be updated as the steering committee meets throughout 2026. ───────────────────────────────────────────────────────────────────────────── mlopsworld.com · MLOps World | GenAI Summit 2026