Welcome to 5th MLOps World 2024

Virtual Workshops & Talks — Nov 6th

Please click on the individual Join Now button to access the sessions

Talk: Open-Ended and AI-Generating Algorithms in the Era of Foundation Models

Presenter:
Jeff Clune, Professor, Computer Science, University of British Columbia; CIFAR AI Chair, Vector; Senior Research Advisor, DeepMind

About the Speaker:
Jeff Clune is a Professor of computer science at the University of British Columbia, a Canada CIFAR AI Chair at the Vector Institute, and a Senior Research Advisor at DeepMind. Jeff focuses on deep learning, including deep reinforcement learning. Previously he was a research manager at OpenAI, a Senior Research Manager and founding member of Uber AI Labs (formed after Uber acquired a startup he helped lead), the Harris Associate Professor in Computer Science at the University of Wyoming, and a Research Scientist at Cornell University. He received degrees from Michigan State University (PhD, master’s) and the University of Michigan (bachelor’s). More on Jeff’s research can be found at JeffClune.com or on Twitter (@jeffclune). Since 2015, he won the Presidential Early Career Award for Scientists and Engineers from the White House, had two papers in Nature and one in PNAS, won an NSF CAREER award, received Outstanding Paper of the Decade and Distinguished Young Investigator awards, received two test of time awards, and had best paper awards, oral presentations, and invited talks at the top machine learning conferences (NeurIPS, CVPR, ICLR, and ICML). His research is regularly covered in the press, including the New York Times, NPR, the New Yorker, CNN, NBC, Wired, the BBC, the Economist, Science, Nature, National Geographic, the Atlantic, and the New Scientist.

Talk Track: Virtual Talk

Talk Technical Level: 3/7

Talk Abstract:
Open-Ended and AI-Generating Algorithms in the Era of Foundation Models

Foundation models (e.g. large language models) create exciting new opportunities in our longstanding quests to produce open-ended and AI-generating algorithms, wherein agents can truly keep innovating and learning forever. In this talk I will share some of our recent work harnessing the power of foundation models to make progress in these areas. I will cover our recent work on OMNI (Open-endedness via Models of human Notions of Interestingness), Video Pre-Training (VPT), Thought Cloning, Automatically Designing Agentic Systems, and The AI Scientist.

What You’ll Learn
TBA

Talk: Beyond the Kaggle Paradigm: Future of End-to-End ML Platforms

Presenter:
Norm Zhou, Engineering Manager, Meta

About the Speaker:
Innovative, adaptable, and highly technical leader. Always looking holistically to make the highest impact starting from first principle. My career has lead me over many interesting challenges and environments. From hardware Chip architecture in ASIC and FPGA startups, to Ads then AI Platforms in large internet companies. Currently I am leading multiple teams working on AutoML to Democratize AI at Meta.

I am interested in maximizing my impact on this world by working both on cutting edge research and translational work that improves people’s lives in the real world.

Talk Track: Business Strategy

Talk Technical Level: 4/7

Talk Abstract:
ML platforms help enable intelligent data-driven applications and maintain them with limited engineering effort. However the current approaches to ML system building is limited by the “Kaggle Paradigm” which focuses on the data to model transformation and the operationalizing the deployment of models into applications. This model centric view limits further increase engineering productivity for future ML Systems. We propose an alternative policy centric view as an alternative to model centric view. This policy centric view involves two major additions to model centric view. First is a fully managed unified data collection system extending upstream to establish a “full chain of data custody”. Second we propose downstream extension to A/B testing systems which will bridge online offline mismatch typically experienced in many ML practitioners. Together these approaches enable a fully end-to-end automation allowing for a future ML platform to directly improve business metrics and more fluently address changing business needs.

What You’ll Learn:
How to best practice data-centric AI in real-world ML Connecting ML to Business Impact. Short comings of a model first approach and an proposed alternative.

Talk: LLMidas' Touch; Safely Adopting GenAI for Production Use-Cases

Presenter:
Gon Rappaport, Solution Architect, Aporia

About the Speaker:
I’m a solution architect at Aporia. I joined just over two years ago. I’ve spent over eight years in the tech industry, starting from low-level programming and cybersecurity and transitioning to AI&ML.

Talk Track: Virtual Workshop

Talk Technical Level: 3/7

Talk Abstract:
During the session, we’ll explore the challenges of adopting GenAI in production use-cases. Through focus on the goal of using language models to solve more dynamic problems, we’ll address the dangers of “No-man’s-prod” and provide insights into safe and successful adoption. This presentation is designed for engineers, product managers and stakeholders and aims to provide a roadmap to release the first GenAI applications safely and successfully to production.

What You’ll Learn:

  • Become familiar with the potential issues of using generative AI in production applications
  • Learn how to mitigate the dangers of AI applications
  • Learn how to measure the performance of different AI application types
Talk: Hemm: Holistic Evaluation of Multi-modal Generative Models

Presenter:
Anish Shah, ML Engineer, Weights & Biases

About the Speaker:
Join Anish Shah for an in-depth session on fine-tuning and evaluating multimodal generative models. This talk will delve into advanced methodologies for optimizing text-to-image diffusion models, with a focus on enhancing image quality and improving prompt comprehension.
Learn how to leverage Weights & Biases for efficient experiment tracking, enabling seamless monitoring and analysis of your model’s performance.

Additionally, discover how to utilize Weave, a lightweight toolkit for tracking and evaluating LLM applications, to conduct practical and holistic evaluations of multimodal models.

The session will also introduce Hemm, a comprehensive library for benchmarking text-to-image diffusion models on image quality and prompt comprehension, integrated with Weights & Biases and Weave. By the end of this talk, you’ll be equipped with cutting-edge tools and techniques to elevate your multimodal generative models to the next level.

Talk Track: Virtual Workshop

Talk Technical Level: 3/7

Talk Abstract:
Join Anish Shah for an in-depth session on fine-tuning and evaluating multimodal generative models. This talk will delve into advanced methodologies for optimizing text-to-image diffusion models, with a focus on enhancing image quality and improving prompt comprehension.
Learn how to leverage Weights & Biases for efficient experiment tracking, enabling seamless monitoring and analysis of your model’s performance.

Additionally, discover how to utilize Weave, a lightweight toolkit for tracking and evaluating LLM applications, to conduct practical and holistic evaluations of multimodal models.

The session will also introduce Hemm, a comprehensive library for benchmarking text-to-image diffusion models on image quality and prompt comprehension, integrated with Weights & Biases and Weave. By the end of this talk, you’ll be equipped with cutting-edge tools and techniques to elevate your multimodal generative models to the next level.

What You’ll Learn:
Advanced Fine-Tuning Techniques: Explore methods for fine-tuning text-to-image diffusion models to enhance image quality and prompt comprehension.
Optimizing Image Quality: Understand the metrics and practices for assessing and improving the visual fidelity of generated images.
Enhancing Prompt Comprehension: Learn how to ensure your models accurately interpret and respond to complex textual prompts.
Utilizing Weights & Biases: Gain hands-on experience with Weights & Biases for tracking experiments, visualizing results, and collaborating effectively.
Leveraging Weave: Discover how Weave can be used for lightweight tracking and evaluation of LLM applications, providing practical insights into model performance.
Introduction to Hemm: Get acquainted with Hemm and learn how it facilitates comprehensive benchmarking of text-to-image diffusion models.
Holistic Model Evaluation: Learn best practices for conducting thorough evaluations of multimodal models, ensuring they meet desired performance standards across various metrics.

Workshop: Finetuning a Large Language Model on A Custom Dataset

Presenter:
Aniket Maurya, Developer Advocate, Lightning AI

About the Speaker:
Aniket is a Developer advocate at Lightning AI. He is an open source enthusiast and contributor to some popular repos like Lit-GPT and Gradsflow.

Talk Track: Workshop

Talk Technical Level: 5/7

Talk Abstract:
This is a hands-on workshop for finetuning large language models using custom dataset. By the end of this workshop, you will learn about parameter efficient finetuning, optimised inference and tricks to finetune models at scale.

What You’ll Learn:
Parameter efficient finetuning and LLM optimisations for very large models.

Prerequisite Knowledge:
Python, PyTorch basics

Talk: From Black Box to Mission Critical: Implementing Advanced AI Explainability and Alignment in FSIs

Presenter:
Vinay Kumar Sankarapu, Founder & CEO, Arya.ai

About the Speaker:
Vinay Kumar Sankarapu is the Founder and CEO of Arya.ai. He did his Bachelor’s and Master’s in Mechanical Engineering at IIT Bombay with research in Deep Learning and published his thesis on CNNs in manufacturing. He started Arya.ai in 2013, one of the first deep learning startups, along with Deekshith, while finishing his Master’s at IIT Bombay.

He co-authored a patent for designing a new explainability technique for deep learning and implementing it in underwriting in FSIs. He also authored a paper on AI technical debt in FSIs. He wrote multiple guest articles on ‘Responsible AI’, ‘AI usage risks in FSIs’. He presented multiple technical and industry presentations globally – Nvidia GTC (SF & Mumbai), ReWork (SF & London), Cypher (Bangalore), Nasscom(Bangalore), TEDx (Mumbai) etc. He was the youngest member of ‘AI task force’ set up by the Indian Commerce and Ministry in 2017 to provide inputs on policy and to support AI adoption as part of Industry 4.0. He was listed in Forbes Asia 30-Under-30 under the technology section.

Talk Track: Virtual Workshop

Talk Technical Level: 4/7

Talk Abstract:
In highly regulated industries like FSIs, there are more stringent policies regarding the use of ‘ML Models’ in production. To gain acceptance from all stakeholders, multiple additional criteria are required in addition to model performance.

This workshop will discuss the challenges of deploying ML and the stakeholders’ requirements in FSIs. We will review the sample setup in use cases like claim fraud monitoring and health claim processing, along with the case study details of model performance and MLOps architecture iterations.

The workshop will also discuss the AryaXAI MLObservability competition specifications and launch details.

What You’ll Learn:
In this workshop, you will gain a comprehensive understanding of the expectations of FSIs while deploying machine learning models. We’ll explore the additional criteria beyond model performance essential for gaining acceptance from various stakeholders, including compliance officers, risk managers, and business leaders. We’ll delve into how AI explainability outputs must be iterated for multiple stakeholders and how alignment is implemented through real-world case studies in claim fraud monitoring and health claim processing. You’ll also gain insights into why the iterative process of developing MLOps architectures is needed to meet performance and compliance requirements.

Talk: Building AI Applications as a Developer

Presenters:
Roy Derks, Technical Product Manager, IBM watsonx.ai | Alex Seymour, Technical Product Manager, IBM watsonx.ai

About the Speaker:
Roy Derks is a lifelong software developer, author and public speaker from the Netherlands. His mission is to make the world a better place through technology by inspiring developers all over the world. Before jumping into Developer Advocacy and joining IBM, he founded and worked at multiple startups. His personal mission is making the world better through technology.

Talk Track: Virtual Workshop

Talk Technical Level: 5/7

Talk Abstract:
In today’s world, developers are essential for creating exciting AI applications. They build powerful applications and APIs that use Large Language Models (LLMs), relying on open-source frameworks or tools from LLM providers. In this session, you’ll learn how to build your own AI applications using the watsonx and watsonx.ai ecosystem, including use cases such as Retrieval-Augmented Generation (RAG) and Agents. Through live, hands-on demos, we’ll explore the watsonx.ai developer toolkit and the watsonx.ai Flows Engine. Join us to gain practical skills and unlock new possibilities in AI development!

What You’ll Learn:
By attending this session, you’ll acquire essential skills for effectively leveraging Large Language Models (LLMs) in your projects. You’ll learn to use LLMs via APIs and SDKs, integrate them with your own data, and understand Retrieval-Augmented Generation (RAG) concepts while building RAG systems using watsonx.ai. Additionally, this session will cover Agentic workflows, guiding you through their creation with watsonx.ai. Finally, you’ll explore how to work with various LLMs, including Granite, LLama, and Mistral, equipping you with the versatility needed to optimize AI applications in your development work.

Talk: RAG Hyperparameter Optimization: Translating a Traditional ML Design Pattern to RAG Applications

Presenter:
Niels Bantilan, Chief ML Engineer, Union.ai

About the Speaker:
Niels is the Chief Machine Learning Engineer at Union.ai, and core maintainer of Flyte, an open source workflow orchestration tool, author of UnionML, an MLOps framework for machine learning microservices, and creator of Pandera, a statistical typing and data testing tool for scientific data containers. His mission is to help data science and machine learning practitioners be more productive.

Talk Track: Research or Advanced Technical

Talk Technical Level: 4/7

Talk Abstract:
In the era of Foundation LLMs, a lot of energy has moved from the model training stage to the inference stage of the ML lifecycle, as we can see in the explosion of different RAG architectures. But has a lot changed in terms of the techniques to systematically improve performance of models at inference time? In this talk, we’ll recast hyperparameter optimization in terms of improving RAG pipelines against a “golden evaluation dataset” and see that not much has changed at a fundamental level: gridsearch, random search, and bayesian optimization still apply, and we can use these tried and true techniques for any type of inference architecture. All you need is a high quality dataset.

What You’ll Learn:
You’ll learn about hyperparameter optimization (HPO) techniques that are typically used in model training and apply them to the context of RAG applications. This session will highlight the conceptual and practical differences when implementing HPO in the AI inference setting and see how some of the traditional concepts in ML still apply, such as the bias-variance tradeoff.

Talk: Multi-Graph Multi-Agent systems - Determinism through Structured Representations

Presenter:
Tom Smoker, Technical Founder, WhyHow.AI

About the Speaker:
Co-Founder @ WhyHow.AI

Talk Track: Applied Case Studies

Talk Technical Level: 4/7

Talk Abstract:
As multi-agent systems increasingly get adopted, the range of unstructured information that agents need to process in structured ways, both to return to a user, but also to return to an agent system will increase. We explore the increasing trend of multi-graph multi-agent systems to allow for deterministic information representation and retrieval look like.

What You’ll Learn:
Why structured knowledge representations are important, how structured knowledge representation requirements have changed and will change in an increasingly agentic-driven world with complex multi-agent systems.

Talk: Fast Data Loading for Deep Learning Workloads with lakeFS Mount

Presenter:
Amit Kesarwani, Director, Solution Engineering, lakeFS

About the Speaker:
Amit heads the solution architecture group at Treeverse, the company behind lakeFS, an open-source platform that delivers a Git-like experience to object-storage based data lakes.
Amit has 30+ years of experience as a technologist working with Fortune 100 companies as well as start-ups. Designing and implementing technical solutions for complicated business problems.
As an entrepreneur, he launched a cloud offering to provide Data Warehouse as a Service. Amit holds a Master’s certificate in Project Management from George Washington University and a bachelor’s degree in Computer Science and Technology from Indian Institute of Technology (IIT), India. He is the inventor of the patent: System and Method for Managing and Controlling Data

Talk Track: Virtual Talk

Talk Technical Level: 6/7

Talk Abstract:
Working with large datasets locally allows for a lot more control in your executions and workflows mainly for AI and Deep Learning workloads.

However, this can present a number of tradeoffs that lakeFS Mount helps solve:
• 𝗚𝗶𝘁 𝗶𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗶𝗼𝗻 – Mounting a path in a Git repo automatically tracks the data version, linking it with your code. When checking older code versions, you get the corresponding data version, preventing local-only successes.

• 𝗦𝗽𝗲𝗲𝗱 – Data consistency and performance are guaranteed. lakeFS prefetches commit metadata into a local cache in sub-milliseconds, allowing you to work immediately without having to wait for large dataset downloads.
Intelligent – lakeFS Mount efficiently uses cache, accurately predicting which objects will be accessed. This enables granular pre-fetching for metadata and data files before processing starts.

• 𝗖𝗼𝗻𝘀𝗶𝘀𝘁𝗲𝗻𝗰𝘆 – Working locally risks using outdated or incorrect data versions. With Mount, you can work with consistent, immutable versions, ensuring you know exactly what data version you’re using.

What You’ll Learn
With lakeFS Mount, you can transparently mount an object store reference as a local directory (yes, even at petabyte-scale), while avoiding the common pitfalls typically associated with trying to access an object store as a filesystem.

In this talk, you will learn about lakeFS Mount and you will also see a demonstration of:
• Training a TensorFlow predictive model on data mounted using lakeFS Mount
• Integration with Git to version code and data together
• Reproducibility of code as well as data

Talk: HybridRAG: Merging Knowledge Graphs with Vector Retrieval for Efficient Information Extraction

Presenter:
Bhaskarjit Sarmah, Vice President, BlackRock

About the Speaker:
As a Vice President and Data Scientist at BlackRock, I apply my machine learning skills and domain knowledge to build innovative solutions for the world’s largest asset manager. I have over 10 years of experience in data science, spanning multiple industries and domains such as retail, airlines, media, entertainment, and BFSI.

At BlackRock, I am responsible for developing and deploying machine learning algorithms to enhance the liquidity risk analytics framework, identify price-making opportunities in the securities lending market, and create an early warning system using network science to detect regime change in markets. I also leverage my expertise in natural language processing and computer vision to extract insights from unstructured data sources and generate actionable reports. My mission is to use data and technology to empower investors and drive better financial outcomes.

Talk Track: Virtual Talk

Talk Technical Level: 7/7

Talk Abstract:
In this session we will introduce HybridRAG, a novel approach that combines Knowledge Graphs (KGs) and Vector Retrieval Augmented Generation (VectorRAG) to improve information extraction from financial documents. HybridRAG addresses challenges in analyzing financial documents, such as domain-specific language and complex data formats, which traditional RAG methods often struggle with. By integrating Knowledge Graphs, HybridRAG provides a structured representation of financial data, thereby enhancing the accuracy and relevance of the generated answers. Experimental results demonstrate that HybridRAG outperforms both VectorRAG and GraphRAG individually in terms of retrieval accuracy and answer generation.

What You’ll Learn
Key learnings from this session will include an understanding of the integration of Knowledge Graphs (KGs) and Vector Retrieval Augmented Generation (VectorRAG) to enhance information extraction from financial documents. The paper addresses challenges posed by domain-specific language and complex data formats in financial documents, which are often not well-handled by general-purpose language models. The HybridRAG approach demonstrates improved retrieval accuracy and answer generation compared to using VectorRAG or GraphRAG alone, highlighting its effectiveness in generating contextually relevant answers. Although the focus is on financial documents, the techniques discussed have broader applications, offering insights into the wider utility of HybridRAG beyond the financial domain.

Talk: Robustness with Sidecars: Weak-To-Strong Supervision For Making Generative AI Robust For Enterprise

Presenter:
Dan Adamson, Interim Chief Executive Officer & Co-Founder, AutoAlign AI

About the Speaker:
Dan Adamson is a co-founder of AutoAlign, a company focused on AI safety and performance. He has also co-founded PointChain (developing a neo-banking platform using AI for high-risk and underserved industries) and Armilla AI (a company helping enterprises manage AI risk with risk transfer solutions). He previously founded OutsideIQ, deploying AI-based AML and anti-fraud solutions to over 100 global financial institutions. He also previously served as the Chief Architect at Medstory, a vertical search start-up acquired by Microsoft. Adamson holds several search algorithm and AI patents in addition to numerous academic awards and holding an M.Sc. from U.C. Berkeley and B.Sc. from McGill. He also serves on the McGill Faculty of Science Advisory Board.

Talk Track: Business Strategy or Ethics

Talk Technical Level: 2/7

Talk Abstract:
Many enterprise pilots with GenAI are stalling because of a lack of consistent performance as well as compliance, safety and security concerns. Comprehensive GenAI safety must continually evolve to mitigate critical issues such as hallucinations, jailbreaks, data leakage, biased content, and more.

Learn how AutoAlign CEO and co-founder Dan Adamson leveraged over two decades building regulated AI solutions to launch Sidecar — to ensure models are powerful AND safe. Learn how weak-to-strong controls work to put decisions directly in users’ hands — improving model power while ensuring Generative AI is safe to use.

What You’ll Learn:
During this session, participants will have the opportunity to learn about common approaches to protect GenAI against jailbreaks, bias, data leakage and hallucinations and other harms. We’ll discuss the unique requirements of bringing LLMs to production in real-world applications, the critical importance of ensuring a high level of robustness and safety, and tools for solving these problems.

We’ll then discuss a new approach: weak supervision with a sidecar that can not only increase safety but can also make models more powerful. Finally, we’ll show some of our latest benchmarks around accuracy and discuss these state-of-the-art results.

Talk: Revolutionizing the skies: Mlops case study of LATAM airlines

Presenters:
Michael Haacke Concha, MLOps Lead, LATAM Airlines | Diego Castillo Warnken, Staff Machine Learning Engineer, LATAM Airlines

About the Speaker:
Michael Haacke Concha is the Lead Machine Learning Engineer of the centralized MLOps team at LATAM Airlines. He holds both a Bachelor’s and a Master’s degree in Theoretical Physics from Pontificia Universidad Católica de Chile (PUC). Over his three years at LATAM Airlines, he developed an archival and retrieval system for black box data of the aircraft to support analytics. He then played a key role in building the framework for integrating the Iguazio MLOps platform within the company. In the past year, he has been leading the development of a new platform using Vertex GCP.

Prior to joining LATAM Airlines, Michael worked as a data scientist on the ATLAS experiment at the Large Hadron Collider (LHC), where he contributed to various studies, including the search for a long-lived Dark Photon and a Heavy Higgs.

Diego Castillo is a Consultant Machine Learning Engineer at Neuralworks, currently on assignment as Staff in LATAM Airlines, where he plays a pivotal role within the decentralized Data & AI Operations team. A graduate of the University of Chile with a degree in Electrical Engineering, Diego has excelled in cross-functional roles, driving the seamless integration of machine learning models into large-scale production environments. As a Staff Machine Learning Engineer at LATAM, he not only leads and mentors other MLEs but also shapes the technical direction across key business areas.

Throughout his career at LATAM Airlines, Diego has significantly impacted diverse domains, including Cargo, Customer Care and the App and Landing Page teams. He has more recently been supporting the migration of the MLOPS internal framework from Iguazio to Vertex GCP.

With a comprehensive expertise spanning the entire machine learning lifecycle, Diego brings a wealth of experience from previous roles, including Data Scientist, Backend Developer, and Data Engineer, making him a versatile leader in the AI space.

Talk Track: Applied Case Studies

Talk Technical Level: 2/7

Talk Abstract:
This talk explores how LATAM Airlines leveraged MLOps to revolutionize their operations and achieve financial gain in the hundred of millions of dollars. By integrating machine learning models into their daily workflows and automating the deployment and management processes, LATAM Airlines was able to optimize tariffs, enhance customer experiences, and streamline maintenance operations. The talk will highlight key MLOps strategies employed, such as continuous integration and delivery of ML models, real-time data processing. Attendees will gain insights into the tangible benefits of MLOps, including cost savings, operational efficiencies, and revenue growth, showcasing how strategic ML operations can create substantial value in the airline industry.

What You’ll Learn
You will acquire insight into how a scalable and decentralized tech team grows inside LATAM airlines, thanks to technology and organizational structure. also you will learn some of our successful use cases of our MLOps ecosystem.

Talk: LeRobot: Democratizing Robotics

Presenter:
Remi Cadene, ML for Robotics, Hugging Face

About the Speaker:
I build next-gen robots at Hugging Face. Before, I was a research scientist at Tesla on Autopilot and Optimus. Academically, I did some postdoctoral studies at Brown University and my PhD at Sorbonne.

My scientific interest lies in understanding the underlying mechanisms of intelligence. My research is focused on learning human behaviors with neural networks. I am working on novel architectures, learning approaches, theoritical frameworks and explainability methods. I like to contribute to open-source projects and to read about neuroscience!

Talk Track: Virtual Talk

Talk Technical Level: 3/7

Talk Abstract:
Learn about how LeRobot aims to lower the barrier of entry to robotics, and how you can get started!

What You’ll Learn
1. What LeRobot’s mission is.
2. Ways in which LeRobot aims to lower the barrier of entry to robotics.
3. How you can get started with you own robot.
4. How you can get involved in LeRobot’s development.

Talk: From ML Repository to ML Production Pipeline

Presenters:
Jakub Witkowski, IT Expert, Roche Informatics | Dariusz Adamczyk, IT Expert, Roche Informatics

About the Speaker:
Jakub Witkowski, PhD is a data scientist and MLOps engineer with experience spanning various industries, including consulting, media, and pharmaceuticals. At Roche, he focuses on understanding the needs of data scientists to help them make their work and models production-ready. He achieves this by providing comprehensive frameworks and upskilling opportunities.

Dariusz is a DevOps and MLOps engineer. He has experience in various industries such as public cloud computing, telecommunications, and pharmaceuticals. At Roche, he focuses on infrastructure and the process of deploying machine learning models into production.

Talk Track: Virtual Talk

Talk Technical Level: 4/7

Talk Abstract:
In the pRED MLOps team, we collaborate closely with research scientists to transition their machine learning models into a production environment seamlessly. Through our efforts, we have developed a robust framework that standardises and scales this process effectively. In this talk, we will provide an in-depth look at our framework, the tools we leverage, and the challenges we overcome in this journey.

What You’ll Learn
– How to create framework for moving ML code to production
– What can be automated in this process (role of containerisation, CI/CD, building reusable components for repeating tasks)
– What tools are important for dev team
– What are most important challenges to tackle in this process

Talk: Striking the Balance: Leveraging Human Intelligence with LLMs for Cost-Effective Annotations

Presenter:
Geoff LaPorte, Applied AI Solutions Architect, Appen

About the Speaker:
Geoff is a seasoned tech innovator with over 13 years of experience, transitioning from management consulting to software development. He specializes in bridging the gap between technology and business strategy, consistently delivering user-focused, high-impact solutions. Geoff is known for pushing boundaries and tackling complex technology challenges with a passion.

Talk Track: Applied Case Studies

Talk Technical Level: 7/7

Talk Abstract:
Data annotation involves assigning relevant information to raw data to enhance machine learning (ML) model performance. While this process is crucial, it can be time-consuming and expensive. The emergence of Large Language Models (LLMs) offers a unique opportunity to automate data annotation. However, the complexity of data annotation, stemming from unclear task instructions and subjective human judgment on equivocal data points, presents challenges that are not immediately apparent.

In this session, Chris Stephens, Field CTO and Head of AI Solutions at Appen will provide a overview of an experiment that the company recently conducted to test the tradeoff between quality and cost of training ML models via LLMs vs human input. Their goal was to differentiate between utterances that could be confidently annotated by LLMs, and those that required human intervention. This differentiation was crucial to ensure a diverse range of opinions or to prevent incorrect responses from overly general models. Chris will walk audience members through the dataset used as well as methodology for the experiment, as well as the company’s research findings.

What You’ll Learn
Geoff will walk audience members through an experiment that highlights a key issue with using a vanilla LLM—it might struggle with complex real-world tasks. Researchers recommend exercising caution when relying solely on LLMs for annotation. Instead, a balanced approach combining human input with LLM capabilities is recommended, considering their complementary strengths in terms of annotation quality and cost-efficiency.

Talk: ML Deployment at Faire: Predicting the Future, Serving the Present

Presenter:
Harshit Agarwal, Senior Machine Learning Engineer, Faire Wholesale Inc

About the Speaker:
How Faire transitioned a traditional infrastructure into a modern, flexible model deployment and serving stack that supports a range of model types, while ensuring operational excellence and scalability in a dynamic e-commerce environment.

Over the past few years at Faire, we have overhauled our ML serving infrastructure, moving from hosting XGBoost models in a monolithic service to a flexible and powerful ML deployment and serving stack that powers all types of models, small and big.

In this talk, we’ll cover how we set up a system that makes it easy to migrate, deploy, scale, and manage different types of models. Key points will include how we set up infrastructure as code and CI/CD pipelines for smooth deployment, automated testing, and created user-friendly tools for managing model releases. We’ll also touch on how we built in observability and monitoring to keep an eye on model performance and reliability.

Come and learn how Faire’s ML serving stack helps our team quickly bring new ideas to life, while also maintaining the operational stability needed for a growing marketplace.

Talk Track: Research or Advanced Technical

Talk Technical Level: 5/7

Talk Abstract:
How Faire transitioned a traditional infrastructure into a modern, flexible model deployment and serving stack that supports a range of model types, while ensuring operational excellence and scalability in a dynamic e-commerce environment.

Over the past few years at Faire, we have overhauled our ML serving infrastructure, moving from hosting XGBoost models in a monolithic service to a flexible and powerful ML deployment and serving stack that powers all types of models, small and big.

In this talk, we’ll cover how we set up a system that makes it easy to migrate, deploy, scale, and manage different types of models. Key points will include how we set up infrastructure as code and CI/CD pipelines for smooth deployment, automated testing, and created user-friendly tools for managing model releases. We’ll also touch on how we built in observability and monitoring to keep an eye on model performance and reliability.

Come and learn how Faire’s ML serving stack helps our team quickly bring new ideas to life, while also maintaining the operational stability needed for a growing marketplace.

What You’ll Learn
1. How to best structure an ML serving and deployment infrastruture
2. How to build testing and observability into your deployment and serving infra
3. How to build production grade tools that your data scientists and MLEs will love
4. See how we are serving users at scale and the design choices that we made

Talk: Memory Optimizations for Machine Learning

Presenter:
Tejas Chopra, Senior Software Engineer, Netflix

About the Speaker:
Tejas Chopra is a Senior Software Engineer, working in the Data Storage Platform team at Netflix, where he is responsible for architecting storage solutions to support Netflix Studios and Netflix Streaming Platform. Prior to Netflix, Tejas was working on designing and implementing the storage infrastructure at Box, Inc. to support a cloud content management platform that scales to petabytes of storage & millions of users. Tejas has worked on distributed file systems & backend architectures, both in on-premise and cloud environments as part of several startups in his career. Tejas is an International Keynote Speaker and periodically conducts seminars on Micro services, NFTs, Software Development & Cloud Computing and has a Masters Degree in Electrical & Computer Engineering from Carnegie Mellon University, with a specialization in Computer Systems.

Talk Track: Research or Advanced Technical

Talk Technical Level: 5/7

Talk Abstract:
As Machine Learning continues to forge its way into diverse industries and applications, optimizing computational resources, particularly memory, has become a critical aspect of effective model deployment. This session, “”Memory Optimizations for Machine Learning,”” aims to offer an exhaustive look into the specific memory requirements in Machine Learning tasks, including Large Language Models (LLMs), and the cutting-edge strategies to minimize memory consumption efficiently.

We’ll begin by demystifying the memory footprint of typical Machine Learning data structures and algorithms, elucidating the nuances of memory allocation and deallocation during model training phases. The talk will then focus on memory-saving techniques such as data quantization, model pruning, and efficient mini-batch selection. These techniques offer the advantage of conserving memory resources without significant degradation in model performance.
A special emphasis will be placed on the memory footprint of LLMs during inferencing. LLMs, known for their immense size and complexity, pose unique challenges in terms of memory consumption during deployment. We will explore the factors contributing to the memory footprint of LLMs, such as model architecture, input sequence length, and vocabulary size. Additionally, we will discuss practical strategies to optimize memory usage during LLM inferencing, including techniques like model distillation, dynamic memory allocation, and efficient caching mechanisms.
By the end of this session, attendees will have a comprehensive understanding of memory optimization techniques for Machine Learning, with a particular focus on the challenges and solutions related to LLM inferencing.

What You’ll Learn
By the end of this session, attendees will have a comprehensive understanding of memory optimization techniques for Machine Learning, including: pruning, quantization, distillation, etc. and where to apply them. They will also learn about how to implement these techniques using pytorch.

Talk: From Black Box to Glass Box: Interpreting your Model

Presenter:
Zachary Carrico, Senior Machine Learning Engineer, Apella

About the Speaker:
Zac is a Senior Machine Learning Engineer at Apella, specializing in machine learning products for improving surgical operations. He has a deep interest in healthcare applications of machine learning, and has worked on cancer and Alzheimer’s disease diagnostics. He has end-to-end experience developing ML systems: from early research to serving thousands of daily customers. Zac is an active member of the ML community, having presented at conferences such as Ray Summit, TWIMLCon, and Data Day. He has also published eight journal articles. He is passionate about advancing model interpretability and reducing model bias. In addition, he has extensive experience in improving MLOps to streamline the deployment and monitoring of models, reducing complexity and time. Outside of work, Zac enjoys spending time with his family in Austin and traveling the world in search of the best surfing spots.

Talk Track: Research or Advanced Technical

Talk Technical Level: 5/7

Talk Abstract:
Interpretability is crucial for improving model performance, reducing biases, and ensuring compliance with AI safety and fairness regulations. In this session, complex neural networks will be transformed from opaque “black boxes” into interpretable “glass boxes” by exploring a wide range of neural network-specific interpretability techniques. Attendees will learn about methods such as saliency maps, integrated gradients, Grad-CAM, SHAP, and activation maximization. The session will combine theoretical explanations with practical demonstrations, helping attendees effectively improve transparency and trust in neural network predictions.

What You’ll Learn
Attendees will learn how to apply various neural network interpretability techniques to understand model behavior better. They will gain insights into methods such as saliency maps, Grad-CAM for visualizing important regions in images, and integrated gradients for attributing feature importance. The session will also cover feature visualization methods to understand neuron activations and how to use layer-wise relevance propagation to track the impact of inputs through network layers. By the end of the session, participants will know how to use these tools to make neural networks more understandable and how to communicate the insights to diverse stakeholders.

Talk: LLMs in Vision Models

Presenter:
Arpita Vats, Senior AI Engineer

About the Speaker:
I am a Senior AI Engineer at LinkedIn with expertise in AI, Deep Learning, NLP, and Computer Vision. I have experience from Meta and Amazon, where I focused on LLM and Generative AI. I have published papers and led projects enhancing recommendation algorithms and multimedia models for various industry applications.

Talk Track: Virtual Talk

Talk Technical Level: 5/7

Talk Abstract:
The integration of Large Language Models (LLMs) in vision-based AI systems has sparked a new frontier in multimedia understanding. Traditional vision models, while powerful, often lack the ability to comprehend contextual information beyond visual features. By incorporating LLMs, vision models can process both visual and textual information, creating a more holistic and interpretable understanding of multimedia content. This presentation will explore the convergence of LLMs with vision models, highlighting their application in image captioning, object recognition, and multimodal recommendation systems.

What You’ll Learn:
By attending this presentation, the audience will learn how Large Language Models (LLMs) can enhance the capabilities of vision-based AI systems, creating more context-aware and interpretable multimedia models. Attendees will gain insights into the architecture and integration techniques used to combine vision and language models, practical industry applications, and the challenges and solutions associated with building these advanced systems. They will leave with a deeper understanding of how LLMs in vision models are transforming multimedia analysis, enabling more accurate, scalable, and personalized AI-driven solutions.