What If You Could Trust an Agent to Fix 53% of GitHub Issues?
Software co-pilots are good. Autonomous coding agents might be even better. In this expert talk from MLOps World | GenAI Summit 2024, Graham Neubig shares the frontiers of agent-based software development, showing how his team at All Hands AI has built agents that author 20% of their pull requests, and are on track to do more.
Neubig walks through the OpenDevin-inspired agent platform he maintains, explains real-world challenges like file localization and context management, and breaks down the latest benchmark results showing where autonomous agents truly shine (and where they still fall short).
This talk was recorded during MLOps Word | GenAI Summit 2024 which took place at the Austin Renaissance Hotel.
Presentation Highlights
This talk is for researchers, tool builders, and engineering teams curious about what it takes to build and deploy autonomous coding agents:
- Why only 15% of software dev time is actual coding and what agents can do about the rest
- The architecture behind OpenHands: agents that use bash, Jupyter, and browser tools in concert
- Strategies for file localization, planning, error recovery, and sandboxing
- Benchmarks like SWE-bench that test agent performance on real GitHub issues
- The current leaderboards (spoiler: Neubig’s agent is #1 with a 53% resolution rate)
- Common failure cases and safety risks: from over-eager test deletions to surprise main-branch pushes
About The Speaker
Graham Neubig is a professor at Carnegie Mellon University and Chief Scientist at All Hands AI, where he leads research and open-source development on autonomous software agents. He is a co-creator and maintainer of the OpenHands framework and a noted expert in machine learning, natural language processing, and code intelligence.
3 Days of Context, Insights, & Connections
The 6th annual MLOps World | GenAI Summit is taking place October 7–9, 2025 at the Austin Renaissance Hotel.
Don’t miss this chance to accelerate and de-risk your AI/ML, agentic, and infrastructure outcomes through cutting-edge strategies, real-world case studies, technical deep dives, and hands-on workshops. Every presentation is hand-picked by a committee of top AI practitioners whose only goal is to help their industry colleagues understand where the line of AI in production excellence is, right now.
The experience also includes a vibrant expo, where attendees shift from focused learning to active participation by engaging in Brain Dates, Community Stage, Startup Zone, and interactive demos with leading vendors like Weights & Measures, Outerbounds, and Data Bricks.
MLOps World | GenAI Summit is a high-impact way to learn, connect, and elevate your team, projects, and career.
Early Bird tickets are on sale now and offer 15% savings when you register in advance.