Who Should Attend

2021 Event Demographics

Real Work Case Studies

%

Actively Looking for Solutions

%

Currently Job-seeking

%

Currently Hiring

2021 Technical Background

  • Expert/Researcher 21.8% 21.8%
  • Advanced 23.6% 23.6%
  • Intermediate 39.7% 39.7%
  • Beginner/Business 14.9% 14.9%

Top 10 Most Common Attendees

1. Data Scientist
Data Scientist | Senior Data Scientist | Data Science Team Lead | Lead Data Scientist | Director, Data Science

2. ML Engineer
ML Developer | AI Engineer | MLOps Engineer | Senior Data Engineer Lead ML Engineer | Senior ML Engineer

3. Software Engineer
Software Engineer | Software Developer | Software Architect | Senior Software Developer

4. Data Engineer
Data Engineer | Director of Engineering | Engineer | Software Architect | Software Developer

5. Student
PhD | Masters | Post-Graduate | Undergrad

6. CEO/Founder
CEO | Founder | Co Founder

7. Data Analyst
Data Analyst | Senior Data Analyst | Analyst

8. Consultant/CTO
Consultant | Senior Consultant | CTO

9. Researchers
Researcher | Research Director | Research Scientist | Research Engineer | Research Data Scientist

10. Product Manager
Product Manager | Product Manager

Companies who have Presented include

Algorithmia
Alibaba Gro up
Altair Engineering
AltaML
Amazon Alexa
Anodot
Anyscale
Aporia
Arango DB
Avast
H2O.AI
Hugging Face
IBM
Iguazio
Imperial College London
Infinstor Inc
Intuit
Iterative.ai
John Snow Labs
Leverness
Plotly
Privat e AI
Publicis Sapient
PyCaret
Quadrical.ai
Quicken Loans
RBC
Replicate.ai
Rocket Science
SAP

 

view full list >

get an email recap with details of agenda, workshops & learnings etc.

Top Industries

Technology & Service

Computer Software

Banking & Financial Services

Insurance

Hospital & Health Care

Automotive

Telecommunications

Environmental Services

Food & Beverages

Marketing & Advertising

Harish Doddi

Founder & CEO, Datatron

Harish Doddi has 10+ years experience building AI infrastructure systems early on at places like LYFT, TWTR and SNAP Some notable work of him relates to building LYFT AI pricing engine from scratch which contributes to a significant revenue today. He has finished his graduation from Stanford University and undergrad from International Institute of Information Technology, Hyderabad, India. He got inspired from his previous experiences to start Datatron, an Enterprise grade MLOps and Model Governance platform which helps large enterprises to accelerate the time to production for models. The platform has already been proven in several fortune 100 companies


Victor Thu

VP of Customer Success & Ops, Datatron

Victor Thu is the VP of Customer Success and Operations at Datatron where he is overseeing multiple functions including marketing, business development, operations, and customer success.
In the last five years, Victor has been with machine learning startups and have experienced first-hand the challenges companies have with AI. Whether they are startups or enterprises, a common challenge everyone has is on how to get AI models running in production.
Victor has worked closely with some of the top enterprises including those in banking and financial services, top ranked international airports, retailers, manufacturers, and more.



Abstract:

In this talk, I would like to focus on some of the lessons learnt deploying MLOps solutions in Enterprise. MLOps is a relatively new area and most of the time people are not sure which direction to go, what are some common pitfalls, how can they avoid them and also best practices that can be adopted early on in the model life cycle. I would like to focus on this aspect of the conversation.

What You'll Learn:

Best MLOps practices for IT and Machine learning engineers

Talk: Best Practices Deploying MLOps Solutions in Enterprise

You have Successfully Subscribed!

Gilad Shaham

Director of Product Management, Iguazio

Gilad has over 15 years of experience in product management and a solid R&D background. He combines analytical skills and technical innovation with Data Science market experience. Gilad’s passion is to define a product vision and turn it into reality. As Director of Product Management at Iguazio, Gilad manages both the Enterprise MLOps Platform product as well as MLRun, Iguazio’s open-source MLOps orchestration framework.
Prior to joining Iguazio, Gilad managed several different products at NICE-Actimize, a leading vendor of financial crime prevention solutions, including coverage of machine-learning-based solutions, formation of a marketplace, and addressing customer needs across different domains.
Gilad holds a B.A in Computer Science, M.Sc. in Biomedical Engineering, and MBA from Tel-Aviv University

Abstract:

There are many challenges to operationalizing machine learning, but perhaps one of the most difficult is online feature engineering. Generating a new feature based on batch processing takes an enormous amount of work for ML teams, and those features must be used for the training stage as well as the inference layer. Feature engineering for real-time use cases is even more complex. Real-time pipelines require an extremely fast and low latency event-processing mechanism, that can run complex algorithms to calculate features in real time. With the growing business demand for real-time use cases such as fraud prediction, predictive maintenance and real-time recommendations, ML teams are feeling immense pressure to solve the operational challenges of real-time feature engineering for machine learning, in a simple and reproducible way. This is where online feature stores come in. An online feature store accelerates the development and deployment of real-time AI applications by automating feature engineering and providing a single pane of glass to build, share and manage features across the organization. This improves model accuracy, even when complex calculations and data transformation is involved, saving your team valuable time and providing seamless integration with training, serving and monitoring frameworks.

What You'll Learn:

In this talk, we’ll cover the challenges associated with online feature engineering across training and serving environments, how feature stores enable teams to collaborate on building, sharing and managing features across the organization, solutions that exist to enable you to build a real-time operational ML pipeline that can handle events arriving in ultra-high velocity and high volume, calculate and trigger an action in seconds, how to build your ML pipeline in a way that enables ingestion and analysis of real-time data on the fly and how to monitor your real-time AI applications in production to detect and mitigate drift, to make your method repeatable and resilient to changes in market conditions.

Talk: Building Real-Time ML Pipelines with A Feature Store

You have Successfully Subscribed!

Amit Paka

CPO & Co-founder, Fiddler AI

Amit is the co-founder and CPO of Fiddler, a Machine Learning Monitoring company that empowers companies to efficiently monitor and troubleshoot ML models with Explainable AI. Prior to founding Fiddler, Amit led the shopping apps product team at Samsung and founded Parable, the Creative Photo Network, now part of the Samsung family. He also led PayPal's consumer in-store mobile payments launching innovations like hardware beacon payments and has developed successful startup products, particularly in online advertising - paid search, contextual, ad exchange, and display advertising. Amit has passions for actualizing new concepts, building great teams, and pushing the envelope. He aims to leverage these skills to help define how AI can be fair, ethical, and responsible.

Abstract:

In this session, we will talk about why monitoring is critical to ML success. ML models can fail silently and lose their predictive power. This talk will discuss the key reasons models fail and hurt business performance: model drift, data integrity, outliers and bias. Once identified, operational issues are time consuming to fix. This talk will focus on how cutting-edge Explainable AI and model analytics can help find the root cause of an operational issue quickly. MLOps is iterative. This talk will also outline how model and cohort comparisons can help reduce time to market for new models.

What You'll Learn:

- How to measure model drift and how it can help identify model degradation, even without ground truth
- How to monitor for key model metrics
- How to root cause issues with Explainable AI and model analytics
- How to build high performance ML iteratively with model and cohort comparisons
- How bias and outlier detection can help in model monitoring success

Talk: Build High Performance MLOps With ML Monitoring and AI Explainability

You have Successfully Subscribed!

Jimmy Whitaker

Data Science Evangelist, Pachyderm

Jimmy Whitaker is the Data Science Evangelist at Pachyderm. He focuses on creating a great data science experience and sharing best practices for how to use Pachyderm. When he isn’t at work, he’s either playing music or trying to learn something new, because “You suddenly understand something you’ve understood all your life, but in a new way.

Abstract:

As teams ramp up their Machine Learning capabilities, the need for a robust MLOps pipeline has emerged. How can your team train and deploy models quickly to production? Join us for this workshop as we use Open Source tools to develop a robust MLOps stack.

What You'll Learn:

In this workshop, attendees will learn how to:
- Build a pipeline that can process new data automatically
- Combine new human-labeled data seamlessly with your models in production
- Continuously train a new model from new data sources

Workshop: Intro to MLOps: Build, Deploy, and Retrain a Model in Production

You have Successfully Subscribed!

Sai Xiao

Machine Learning Engineer, Pinterest

Sai Xiao is a machine learning engineer from Pinterest. She currently works on building shopping recommendations systems and improving the mid-funnel shopping experience on shopping surfaces at Pinterest. Before Sai joined Pinterest in 2018, she has also worked in data science in a few companies such as Liberty Mutual, eBay and Apple. She received her PhD in statistics from University of California, Santa Cruz.

Abstract:

Millions of people across the world come to Pinterest to find new ideas every day. Shopping is at the core of Pinterest’s mission to help people create a life they love. This talk will introduce how Pinterest shopping team build related products recommendations systems, including engagement based and embedding based candidates generations, indexing and serving method to support different multiple types of recommenders and its deep neural network ranking models.

What You'll Learn:

1. the details about how Pinterest built shopping recommendations.
2. engagement and embedding based candidate generators.
3. indexing and serving method to support filter based retrieval.
4. multi-head deep neural network ranking in shopping.

Talk: Shopping Recommendations at Pinterest

You have Successfully Subscribed!

Amit Paka

CPO & Co-founder, Fiddler AI

Amit is the co-founder and CPO of Fiddler, a Machine Learning Monitoring company that empowers companies to efficiently monitor and troubleshoot ML models with Explainable AI. Prior to founding Fiddler, Amit led the shopping apps product team at Samsung and founded Parable, the Creative Photo Network, now part of the Samsung family. He also led PayPal's consumer in-store mobile payments launching innovations like hardware beacon payments and has developed successful startup products, particularly in online advertising - paid search, contextual, ad exchange, and display advertising. Amit has passions for actualizing new concepts, building great teams, and pushing the envelope. He aims to leverage these skills to help define how AI can be fair, ethical, and responsible.

Abstract:

In this session, we will talk about why monitoring is critical to ML success. ML models can fail silently and lose their predictive power. This talk will discuss the key reasons models fail and hurt business performance: model drift, data integrity, outliers and bias. Once identified, operational issues are time consuming to fix. This talk will focus on how cutting-edge Explainable AI and model analytics can help find the root cause of an operational issue quickly. MLOps is iterative. This talk will also outline how model and cohort comparisons can help reduce time to market for new models.

What You'll Learn:

- How to measure model drift and how it can help identify model degradation, even without ground truth
- How to monitor for key model metrics
- How to root cause issues with Explainable AI and model analytics
- How to build high performance ML iteratively with model and cohort comparisons
- How bias and outlier detection can help in model monitoring success

Talk: Build High Performance MLOps With ML Monitoring and AI Explainability

You have Successfully Subscribed!

Deepak Pai

Senior Manager, Machine Learning, Adobe

Deepak is a Machine Learning Engineer with 16 years of experience. He has published papers in top peer reviewed conferences and holds multiple US patents. Currently he manages a team of Machine Learning Engineers developing multiple products and services at Adobe, that are part of the Digital Experience business. Deepak holds Masters and Bachelor degree in Computer Science from a leading universities in India.


Vijay Srivastava

Manager, Machine Learning, Adobe

I have 15 years of industry experience across ML & E-Learning products. I have extensive experience in developing scalable Machine Learning & e-Learning cloud services from inception. During my technical journey at Adobe, I worked on multiple key positions as Senior Computer Scientist and Staff Data Scientist. As a part of my current job, I manage team of machine learning engineers developing core ML services which feeds ML insights to Experience Cloud Intelligent Services. I hold bachelor’s degree from Indian Institute of Information Technology - Allahabad, India.

Abstract:

You spend lots of time cleansing data, visualizing it to gain insights, feature engineering, modeling while ensuring that you picked the best algorithm and architecture, best hyper parameters and so on. Finally you deploy the model and claim victory and move on. Well, that is not the end of task for an ML engineer. Model monitoring is much more critical than model building and often neglected area of MLOps. Despite their critical roles, ML models in production are not actively monitored. Ideally one should monitor the production systems proactively, but unfortunately being reactive is the norm. When a problem first arises, it may go unnoticed for some time. Once it is noticed, investigating its underlying cause is a time-consuming, manual process, not to mention the damage that is already done in production. Even if you are manually monitoring the models in production, the approach does not scale when you get to tens of models if not hundreds.
Like the saying goes, “A stitch in time saves nine”, wouldn’t it be great if the model’s output were automatically monitored? If they could be visualized, sliced by different dimensions? If the system could automatically detect performance degradation and trigger alerts? If problems in the model output could be attributed to the characteristics of input data? In this presentation, we describe our experience from building such a core machine-learning services: Model Evaluation and Data Quality. Our service provides automated, continuous evaluation of the performance of a deployed model over commonly-used metrics like the area-under-the-curve (AUC), root-mean-square-error (RMSE) among others. In addition, summary statistics about the model’s output, their distributions are also computed. The service also provides a dashboard to visualize the performance metrics, summary statistics and distributions of a model over time along with REST APIs to retrieve these metrics programmatically. The service can correlate model performance issues to likely causes in input data/data quality, which can be leveraged by data scientists and engineers to debug the problems. This significantly reduces the turn around time for identifying and fixing issues in production.
Further, these metrics can be sliced by input features to provide insights into model performance over different segments, and potentially improve the model. The talk will describe various components that are required in building such a service and metrics of interest. Our system has a backend component built with spark on Azure Databricks. The backend can scale to analyze TBs of data to generate model evaluation metrics. The REST endpoints are powered by a python-Flask middleware application hosted on Azure webapp and the UI is built with React.

What You'll Learn:

In this talk, the audience will learn the significance of monitoring ML models in production and the value of having automated systems do this, instead of being manual. The audience can benefit from our learnings of building a scalable system that does the above for hundreds of models and how we could achieve efficiencies of scale.

Talk: Automated Monitoring in Production for Continuous Model Improvements

You have Successfully Subscribed!

Reah Miyara

Head of Product, Arize AI

Reah Miyara is Head of Product at Arize AI, a startup focused on ML Observability.
He joins Arize from Google AI where he led product development for research, tools, and infrastructure related to graph-based machine learning, data-driven large-scale optimization, and market economics. Reah’s experience as a team and product leader is extensive, building and growing products across a broad cross section of the AI landscape. He’s played pivotal roles in ML and AI initiatives at IBM Watson, Intuit, and NASA Jet Propulsion Laboratory. Reah also co-led Google Research’s Responsible AI initiative, confronting the risks of AI being misused and taking steps to minimize AI’s negative influence on the world. He has a bachelors from UC Berkeley’s Electrical Engineering and Computer Science program and was the founder and president of the Cal UAV team in 2014.

Abstract:

In this talk, Reah will highlight common model failure modes including model drift, data quality issues, performance degradation, etc. The talk will also surface how ML Observability can address these challenges by monitoring for failures, providing tools to troubleshoot and identify the root cause, as well as playing an important part in the feedback loop to improving models. The talk will highlight best practices and share examples from across the industry.

What You'll Learn:

1. Use statistical distance checks to monitor features and model input in the production
2. Analyze performance regressions such as drift and how it impacts business metrics
3. Use troubleshooting techniques to determine if issues are model or data related

Talk: ML Observability: A Critical Piece in the ML Stack

You have Successfully Subscribed!

Daniel Jeffries

Chief Technology Evangelist, Pachyderm

Dan Jeffries is Chief Technology Evangelist at Pachyderm. He’s also an author, engineer, futurist, pro blogger and he’s given talks all over the world on AI and cryptographic platforms. He’s spent more than two decades in IT as a consultant and at open source pioneer Red Hat.
With more than 50K followers on Medium, his articles have held the number one writer's spot on Medium for Artificial Intelligence, Bitcoin, Cryptocurrency and Economics more than 25 times. His breakout AI tutorial series "Learning AI If You Suck at Math" along with his explosive pieces on cryptocurrency, "Why Everyone Missed the Most Important Invention of the Last 500 Years” and "Why Everyone Missed the Most Mind-Blowing Feature of Cryptocurrency,” are shared hundreds of times daily all over social media and been read by more than 5 million people worldwide.

Abstract:

Just a few years ago, every cutting-edge tech company, like Google, Lyft, Microsoft, and Amazon rolled their own AI/ML tech stack from scratch. Fast forward to today and we've got a Cambrian explosion of new companies building a massive array of software to democratize AI for the rest of us. But how do we make sense of it all? We created the AI Infrastructure Alliance to bring all those companies together. That’s because for AI apps to become as ubiquitous as the apps on your phone, you need a canonical stack for machine learning that makes it easier for non-tech companies to level up fast. We need a LAMP stack for AI/ML to truly unleash the power of machine learning for companies big and small.

What You'll Learn:

Attendees will learn how the MLOps space is evolving and how they can adopt best of breed technologies to create their canonical stack.

Talk: The Rapid Evolution of the Canonical Stack for Machine Learning

You have Successfully Subscribed!

Stacey Svetlichnaya

Deep Learning Engineer, Weights & Biases

Stacey Svetlichnaya is a deep learning engineer at Weights & Biases, building developer tools for visualization, explainability, reproducibility, and collaboration in AI. She’s not sure if the climate crisis or AI safety is the bigger existential threat, so she strives to maximize impact on both. She enjoys the intersection of machine learning research, application, and UX, mostly for vision & language models (image aesthetic quality and style classification, object recognition, caption generation, and emoji semantics). Previously, she worked on image search, productionizing ML systems, and discovery & recommendation on Flickr, following the acquisition of LookFlow, a visual similarity search engine. Stacey holds a Stanford BS ‘11 and MS ’12 in Symbolic Systems, focusing on neuroscience.

Abstract:

Reliably reproducing deep learning research—whether from others' code or your own past attempts—is a notorious challenge in machine learning. What are some useful patterns for keeping your experiments organized and repeatable? Standardized logging, granular and annotated versions of datasets and model checkpoints, and templated workflows for analysis and visualization can help capture the full scope of an ML project. We will cover general practices for building reproducible training and inference pipelines, finding and recreating individual steps or checkpoints, and presenting your analysis for more effective collaboration. The easier it becomes to revisit earlier states of your whole team's development cycle, the more confidently you can train models, minimizing the need to duplicate effort or redo important steps.

What You'll Learn:

You will learn design patterns and useful practices to help track your machine learning experiments, version datasets, organize different variants of your models, and generally streamline your ML development workflows. We will walk through concrete examples to demonstrate how you can reproduce past experiments, substitute new training or test data, restore specific models, compare results across many exploratory branches, and effectively share exact model recipes or higher-level analysis with your team.

Talk: Reproducible Machine Learning: How (Not) to Repeat the Past

You have Successfully Subscribed!

Ville Tuulos

CEO, Co-Founder, Outerbounds

Ville has been developing infrastructure for machine learning for more than two decades. He has worked as an ML researcher in academia and as an infrastructure leader at a number of companies, including Netflix where he led the ML infrastructure team that created Metaflow, a popular open-source framework for data science infrastructure. He is a co-founder and CEO of Outerbounds, a company that continues the Metaflow journey. He is also the author of a new book, Effective Data Science Infrastructure, which will be published by Manning in 2021.

Abstract:

The concept of MLOps evokes the idea of complex systems: Operating Machines that Learn. It doesn’t sound like a job for the faint of heart. But does it have to be that way? Looking back at the past five decades of the history of computing, revolutions have happened not when impossible things have become possible but when possible things have become easy. Easy means that more people are able to build more ML systems, faster, which leads to innovations that we can’t even imagine today. Hence, to radically advance the field of MLOps, we must focus on the people who build and operate these systems, not the systems themselves. However, in itself, human-centricity is one of the most challenging technical problems.

What You'll Learn:

- What are some key elements of high-quality developer tools
- What to consider when designing an effective ML platform
- How Netflix and other companies use Metaflow to boost productivity of ML practitioners

Talk: Human-Centric MLOps: Why Productivity Matters the Most

You have Successfully Subscribed!

Alessya Visnjic

CEO, WhyLabs

Alessya Visnjic is the CEO and co-founder of WhyLabs, the AI Observability company on a mission to build the interface between AI and human operators. Prior to WhyLabs, Alessya was a CTO-in-residence at the Allen Institute for AI (AI2), where she evaluated commercial potential for the latest advancements in AI research. Earlier in her career, Alessya spent 9 years at Amazon leading Machine Learning adoption and tooling efforts. She was a founding member of Amazon’s first ML research center in Berlin, Germany. Alessya is also the founder of Rsqrd AI, a global community of 1,000+ AI practitioners who are committed to making AI technology Robust & Responsible.

Abstract:

The day the ML application is deployed to production and begins facing the real world is the best and the worst day in the life of the model builder. The joy of seeing accurate predictions is quickly overshadowed by a myriad of operational challenges. Debugging, troubleshooting & monitoring takes over the majority of their day, leaving little time for model building. In DevOps, software operations are taken to a level of an art. Sophisticated tools enable engineers to quickly identify and resolve issues, continuously improving software stability and robustness. In the ML world, operations are still largely a manual process that involves Jupyter notebooks and shell scripts. One of the cornerstones of the DevOps toolchain is logging. Traces and metrics are built on top of logs enabling monitoring and feedback loops. What does logging look like in an ML system?
In this talk we will show you how to enable data logging for an AI application. We will discuss how something so simple enables testing, monitoring and debugging of the AI application and the upstream data pipeline. We will dive deeper into key properties of a logging library that can handle TBs of data, run with a constraint memory footprint and produce statistically accurate log profiles of structured and unstructured data. Attendees will leave the talk equipped with best practices to supercharge MLOps in their team.

What You'll Learn:

Monitoring models is essential to ensure that the model is delivering the impact it was designed for.
How to enable model & data monitoring in any ML pipeline

Talk: The Critical Missing Component in the Production ML Stack

You have Successfully Subscribed!

Ashwini Badgujar

Machine Learning Engineer, Impulse Logic

A machine learning engineer at Impulselogic, I have been in the machine learning domain since last 3 years working on various research and commercial projects. I have been part of research teams at University of San Francisco and with a focus on machine learning problems in NLP and Computer Vision.

Abstract:

The talk will consist of a short introduction to MLOps including what it is, it's significance and how it is different from other "Ops". Areas of Computer Vision (in this case mainly neural networks on images/stack of images)where MLOps is used. A few end to end computer vision pipeline descriptions of MLOps starting from data gathering, processing, analysis to model training, validation and monitoring and how MLOps is implemented in these pipelines which would include writing python scripts manually, ML pipelines and CI/CD pipelines. Some of the tools that would be included in the pipeline would be Amazon Sagemaker, Kubernetes, Kubeflow, MLFlow.

What You'll Learn:

Tools that would be included in the pipeline would be Amazon Sagemaker, Kubernetes, Kubeflow, MLFlow.

Talk: End-to-end MLOps for Computer Vision

You have Successfully Subscribed!

Bin Zhao

Lead Data Scientist, Datatron

PhD in Machine Learning from Carnegie Mellon University. Previously managed ML team of 30+ at Petuum.

Abstract:

Leverage an Enterprise-grade, state-of-the art MLOps Tool to Deploy, Monitor, and Govern an AI Model

Workshop: MLOps Step-by-Step Tutorial: Deploy, Monitor & Govern an AI Model in Real-Time

You have Successfully Subscribed!

Adam Gugliciello

Senior Solutions Engineer, Dataiku

Adam is a 20 year veteran of data and software systems integration. He's spent the last decade helping the world's largest firms integrate their data with Hadoop, and scale their business capabilities with application networks. Adam now leverages all his skills to help those self-same industry giants to industrialize AI and Analytics with Dataiku.

Abstract:

1. Key Concepts in MLOps as Dataiku sees it
2. What constitutes readiness for deployment
3. Considerations for model monitoring, deployment and management strategies
4. How Dataiku can streamline your MLOps processes

Workshop: MLOPs with Dataiku: Considerations For Model Deployment & Monitoring

You have Successfully Subscribed!

Eric Duffy

Director, Tenstorrent

Eric is a business development director at Tenstorrent, a company designing microprocessors tailored for Machine Learning training and inference workloads from the edge to the data-center. Eric's experience in the AI domain spans 15 years, having developed Computer Vision applications for life-sciences, consulted on AI with the United Nations technology division, ITU, and having worked with a large FinTech on next-generation AI-enabled transaction banking services.

Abstract:

Machine Learning is driving up compute demand, but so much of that activity is wasted cycles. Join Tenstorrent as we explore what is driving up the volume of computation, why throwing money at GPUs isn't the solution, and how we need to be smarter about what we process.

Talk: Next-Gen AI Processors, and Why We Need to Think Differently About SW 2.0

You have Successfully Subscribed!

Matt Zeiler

Founder & CEO, Clarifai

Matthew Zeiler, founder and CEO of Clarifai, is a machine learning Ph.D. and thought leader pioneering the field of applied artificial intelligence. Matt has worked alongside renowned machine learning pioneers Geoff Hinton, Rob Fergus, Yann LeCun and Jeff Dean. This experience cultivated his passion for convolutional and deconvolutional neural networks and visual recognition. Matt's unprecedented knowledge and know-how in deep learning landed him winning the top five places in image classification at the ImageNet 2013 competition. Matt received his undergraduate degree at University of Toronto and a Ph.D. in machine learning and image recognition from New York University.

Abstract:

The quality of training, validation and test data have huge impacts on model performance. Companies with unlimited budgets can theoretically train models on millions of perfectly labeled inputs and process these images with unlimited compute power. But the reality for most companies is that there are serious time and resource constraints when trying to get AI into production. If you’re like most organizations, you’re labeling the work internally and looking for ways to reclaim your teams’ time to focus on higher-value initiatives.
Learn how to accelerate the process of building AI-powered applications and speed time-to-value 100x. See how using one dedicated AI platform that includes data labeling and pre-built AI models will save your organization time and improve the quality of your training data. Models can be pushed to production quickly and then you can iterate on them to perfect your designs over time. This session explores two separate approaches to training detection models: Region Classification Workflows and Deep-Trained Object Detectors. See how to power data labeling pipelines that accelerate the process of training specialized models.

What You'll Learn:

How to jump start your project with “zero-shot” learning
How to quickly iterate on your designs using only a few data samples and “quick training”
How to develop a baseline for model performance
How to use AI-automation to speed model development by 100X

Talk: Deploying Efficient Data-Centric AI 100x Faster

You have Successfully Subscribed!

Eric Schles

Data Scientist, Johns Hopkins University

Eric Schles is a principal data scientist at Johns Hopkins University. He is also a PhD student at CUNY Graduate Center, studying machine learning and category theory.

Abstract:

Eric Schles talks about Drifter-ML, which is a testing framework that helps ensure your machine learning models satisfy the assumptions you made when training your model. He’ll walk through the motivations of modeling testing and a few examples of how to use the framework.

What You'll Learn:

The idea behind drifter-ml and how to use it.

Talk: Introduction to Drifter-ML

You have Successfully Subscribed!

Jennifer Prendki

Founder and CEO, Alectio

Dr. Jennifer Prendki is the founder and CEO of Alectio, the first startup focused on DataPrepOps, a term that she coined. She and her team are on a fundamental mission to help ML teams build models with less data. Prior to Alectio, Jennifer was the VP of Machine Learning at Figure Eight; she also built an entire ML function from scratch at Atlassian, and led multiple Data Science projects on the Search team at Walmart Labs. She is recognized as one of the top industry experts on Active Learning and ML lifecycle management, and is an accomplished speaker who enjoys addressing both technical and non-technical audiences.

Abstract:

What an exciting decade for Machine Learning: think about those supernaturally fast and accurate search engines, those cars that drive themselves, those voice assistants who can pass for humans on the phone and those cheetah-shaped robots capable of doing the dishes for us. No doubt about it, the future is here.
But things are far from easy, and if there is one challenge remaining at this stage, it’s how hard deploying and maintaining all those sophisticated models in production has proven to be. Our one hope: the maturing of the ML Ops space. And boy, has the ML Ops community delivered over the past few years!
As predicted by several Tech analysts, 2020 has indeed turned out to be the year of ML Ops, and experts have now built a deeper understanding of what an efficient, seamless ML lifecycle should look like. And with countless new tools (both platforms and open-source ones) to tune, maintain and deploy models, it seems that nothing stands in the way of AI finally achieving its full potential.
Nothing, maybe, but data prep: you may have the best ML production pipeline in the world, if you are refreshing your model with new raw data, you will have to hit pause to send your new data out to be labeled, audited and validated, before you can resume. In other terms, the full end-to-end ML pipeline can’t exist as long as data preparation remains this painfully manual and sketchy (there is a reason why data scientists complain preparing data takes up about 70% of their time!).
The solution: a Data Prep Ops framework that selects data and manages, sources labeling providers, audits and versions your labels for you - automatically and in real-time. In this talk, we will discuss what such a framework would look like, and how integrating it to the rest of your data pipeline might finally get us past the biggest remaining bottleneck in the ML lifecycle.

What You'll Learn:

- Why Data Prep is the biggest bottleneck in ML Ops
- What Data Prep Ops is
- Why we need agility when labeling data
- How to automate Data Prep

Talk: Data Prep Ops: The Last Frontier of ML Ops

You have Successfully Subscribed!

Christopher A. Choquette Choo

Machine Learning Researcher, Google Brain

Christopher is an AI Resident (2020) at Google researching in the areas of model interpretability, federated learning, and differential privacy. Previously, Christopher worked on adversarial machine learning research in the CleverHans Lab at the Vector Institute. He explored privacy attacks and defenses for machine learning, confidential and private collaborative learning protocols, and protecting machine learning from IP theft.
Christopher also has experience in production ML settings. He created DualDNN while at Intel Corp., designed an open source AutoML while at Georgian Partners LP, and contributed to the open source Differential Privacy package Tensorflow/Privacy.

Abstract:

Production machine learning models and application programming interfaces can be accessed by anyone, including malicious users or `adversaries'. Adversaries can exploit weaknesses in machine learning models to breach the privacy of the user training data or of the trained machine learning model itself, which is considered valuable intellectual property. These concerns become exacerbated when owners of data or models wish to collaborate, so as to collectively improve their machine learning algorithms. In this talk, we'll go over state of the art research at the intersection of privacy and machine learning, in particular, we'll explore how to estimate privacy leakage, defend against it, and how to collaborate without compromising privacy. We'll also go over a new research area enabling Machine Unlearning, so as to comply with new legislation like GDPR that gives more power to the users to withdraw and remove their data, potentially even from already trained machine learning models.

What You'll Learn:

The SOTA in privacy attacks and defenses, in particular of the canonical Membership Inference attack.
How to verify ownership of a machine learning model and defend against model extraction (stealing) attacks.
How to collaborate without compromising privacy.
Efficient techniques to enable guaranteed unlearning of user data.

Talk: The Privacy Considerations of Production Machine Learning

You have Successfully Subscribed!

Milecia Mcgregor

Developer Advocate, Iterative.ai

Milecia is a senior software engineer, international tech speaker, and mad scientist that works with hardware and software. She will try to make anything with JavaScript first. In her free time, she enjoys learning random things, like how to ride a unicycle, and playing with her dog.

Abstract:

It's easy to lose track of which changes gave you the best result when you start exploring multiple model architectures. Tracking the changes in your hyperparameter values, along with code and data changes, will help you build a more efficient model by giving you an exact reproduction of the conditions that made the model better.
In this talk, you will learn how you can use the open-source tool, DVC, to increase reproducibility for two methods of tuning hyperparameters: grid search and random search. We'll go through a live demo of setting up and running grid search and random search experiments. By the end of the talk, you'll know how to add reproducibility to your existing projects.

What You'll Learn:

How to perform grid search and random search for hyperparameters
How to compare metrics across different experiments to get the best model

Talk: Using Reproducible Experiments To Create Better Machine Learning Models

You have Successfully Subscribed!

Mike Purewal

Front Office Data Scientist, Bank of America

Mike Purewal is a Front office data scientist and product leader with expertise driving business value utilizing technology, programming and machine learning. He has spent more than a decade in financial services and is currently a Director in Global Equities at Bank of America. Mike is also an Adjunct Professor at NYU where he teaches a Machine Learning in Finance course with an emphasis on practical implementation. He received his PhD in Applied Physics from Columbia University where he pioneered cutting edge research on electron transport in low dimensional material, receiving more than 2k citations.

Abstract:

Trust is a factor in gaining adoption of ML projects. Having strong model interpretability practice enforces trust, results in productive dialogue and closes the mutual subject matter gap between the end-user and data scientist, all driving better adoption rates. Model interpretability should reflect that ML models are often conceptually a codified, scaled version of heuristics already employed by the business. Once established, these practices form a backbone for all stakeholders, including the monitoring and maintenance tools used by MLOps. This talk addresses techniques to achieve this quantitatively, technically and organizationally.

What You'll Learn:

Techniques to achieve model adoption: quantitatively, technically and organizationally.

Talk: 5 Techniques to Increase Adoption Rates via Interpretability

You have Successfully Subscribed!

Samantha Zeitlin

Security, Principal Machine Learning Engineer, Elastic

Samantha Zeitlin is a former cancer researcher with a PhD in biochemistry, where she specialized in DNA damage, cell division, and high-throughput image analysis (aka "big data"). She has been working in software for about ten years, including stints at multiple startups, Yahoo, and as an independent consultant.

Abstract:

The scenario: you’ve just started at a new job, and the crown jewel machine learning model is becomes stale at a consistent pace, so it has to be re-trained and re-released sooner rather than later. The real problem is that training and deploying a model takes 2 weeks, and it’s done almost entirely by hand. And the last person who was doing this just left. How do you go about starting to automate this? Oh and actually you’re supporting two products: the new one, and the legacy one from the startup that was acquired by the bigger company that you just joined. Which means you now have to learn and automate two completely different processes for shipping the same model, but the documentation is sparse and contains a lot of ambiguous terminology. The good news is, this is a highly collaborative environment, and everyone on the team has been working remotely since before the pandemic started. In this talk, I’ll share the story of how we rewrote the code to get 7 different artifacts deployed, and some of the tradeoffs we had to make to get it done in less than a month. I’ll talk about some things we learned the hard way, some things I wish I had done differently, and how we’re planning to make this all easier.

What You'll Learn:

1) What not to do with model version tracking
2) What kinds of artifact tests and validation can be automated, and how
3) Some challenges of migrating from legacy scripts to productionized code

Talk: Deploying Model Artifacts in Hard Mode

You have Successfully Subscribed!

Danny Luo

Machine Learning Engineer, Nextdoor

Danny is a Machine Learning Engineer on the Nextdoor CoreML team in San Francisco. He builds ML infrastructure for a variety of production ML use cases at Nextdoor, such as feed, notifications and ads. Danny previously worked in the ML space in Toronto, and has given talks at the Toronto Machine Learning Micro-Summit Series, Toronto Deep Learning Series, and Apache Spark meetup. He holds a B.Sc in Mathematics & Physics from the University of Toronto. Learn more at dluo.me.

Abstract:

Running a ML inference layer in a shared hosting environment (ECS, K8s, etc.) comes with a number of unobvious pitfalls that have significant impact on latency and throughput. In this talk, we describe how Nextdoor’s ML team experienced these issues, discovered their sources and fixed them, and in the end received latency drops of a factor of 4, throughput increases of 3x and improved resource utilization (CPU 10% -> 50%) while maintaining performance. The main points of concern are request queue management and OpenMP parameter tuning.

What You'll Learn:

1. Why your load balancing algorithm matters
2. The importance of request queue timeouts for service recovery
3. What resources are actually being shared in a shared hosting environment

Talk: Running ML Inference Services in Shared Hosting Environments

You have Successfully Subscribed!

Shivam Bharuka

Senior AI Infra Engineer, Facebook

Shivam Bharuka is an engineer in the AI Infrastructure team at Facebook. He received his master's and bachelor's degree in Computer Engineering from University of Illinois at Urbana-Champaign. During his time at Facebook, he has helped scale the machine learning infrastructure at Facebook to support large scale ranking and recommendation models, serving more than a billion users. He is responsible for driving performance, reliability, and efficiency oriented designs across the components of the AI Infrastructure stack at Facebook.

Abstract:

Machine learning models are growing rapidly in scale to support the ranking models at Facebook scale. In order to support this growth, we have re-imagined the entire AI Infrastructure stack, from creating special hardwares using powerful GPUs and network devices to designing optimized distributed training algorithms using PyTorch. In this talk, I will talk about the challenges we encountered and the approach we took to re-design and scale the stack.

What You'll Learn:

Understand how Facebook supports the growth in machine learning by pushing the limits of the different layers in the stack.

Talk: Machine Learning Infrastructure at Facebook Scale

You have Successfully Subscribed!

Mike Berg

Vice President, Technology in Admin, AltaML

Mike is a Technology leader with a track record of delivering business impact and driving technical innovation. Currently VP Tech at AltaML, one of Canada's largest applied machine learning and AI companies. Focused on advancing our technology vision at scale.

Abstract:

A lot of people are talking about the ML components in MLOps, but there has not been much attention devoted to the Ops components. In this talk we will explore techniques to create ML environments in a way that is cost predictive, security compliant and follows proper data governance.

What You'll Learn:

In this talk, you will learn about the lessons learn from the trenches on how to manage creating and maintaining multiple ML Environments in a scalable and maintainable manner while doing so in a way that's cost predictive, security compliant and follows proper data governance

Talk: The Ops In MLOps

You have Successfully Subscribed!

Betty Zhang

Data Scientist, Amazon Web Services (AWS)

Betty is a data scientist at Amazon Web Services where she shares her machine learning expertise internally and externally. Betty has experience building and productionizing computer vision algorithms, recommendation systems, demand forecasting systems across clients in different industries. Prior to Amazon, she worked at a startup to help develop and productionize machine learning algorithms in the retail space.

Abstract:

Machine learning models provide the most values when they are being put into production. The sooner the productionalization takes place, the earlier it is to create impact. In this talk, I will be going through some of the common challenges in deploying machine learning models and different frameworks and approaches to speed up the process.

What You'll Learn:

Some of the common challenges in deploying machine learning models and different frameworks and approaches to speed up the process.

Talk: How to Iterate and Productionize Machine Learning Models with Speed

You have Successfully Subscribed!

Mengdi Huang

Deep Learning Engineer, NVIDIA

Mengdi Huang is a deep learning engineer at NVIDIA with six years of experience working in various DL-based AI research and application areas, including MLOps, recommender systems, and multimodal language, vision, and speech processing.


Shashank Verma

Technical Marketing Engineer, NVIDIA

Shashank Verma is a Deep Learning Technical Marketing Engineer at NVIDIA. He is responsible for developing and presenting developer-focused content on various Deep Learning frameworks and libraries. He obtained his master's in Electrical Engineering from the University of Wisconsin-Madison, where he focused on Computer Vision, security aspects in Data Science, and High Performance Computing.



Abstract:

Recommendation systems must constantly evolve through the digestion of new data and algorithmic improvements of the model for its recommendations to stay relevant and effective in production. In this talk, we focus on leveraging MLOps tools and practices to operationalize a recommendation system at scale and continuously deliver improvements in production. To demonstrate this, we use NVIDIA Merlin, an open-source framework for building GPU accelerlearning recommender systems, along with Kubeflow as an orchestrator on Google Kubernetes Engine.

What You'll Learn:

- Continuous training, serving, monitoring, and autoscaling of AI systems
- How to build large scale GPU accelerated recommender systems

Talk: Continuously Improving Large Scale Recommenders with MLOps Tools and Practices

You have Successfully Subscribed!

Jaya Kawale

Director of Machine Learning, Tubi

Jaya Kawale is the Director of Machine Learning at Tubi and her team works on solving various ML problems for the product ranging from recommendations, content understanding and acquisition and ads ML. Prior to that, she worked at Netflix as a lead research scientist. She has also worked at Adobe research and Yahoo research in the past. She did her PhD in Computer Science from the University of Minnesota working on graph based approaches to understand climate data. She has published several papers at various conferences like NeurIPS, KDD, IJCAI, SDM, etc.

Abstract:

Machine Learning is widely used to solve a myriad of problems for video on demand streaming services ranging for homepage personalization to understanding content for acquisition and even ads targeting. In this talk, I will focus on some of the challenges faced in the real-work environment such as balancing various business metrics, feedback loops in recommendations, model interpretability and scalability, algorithmic fairness and biases. I will also talk about some of our solutions to the problems along with some future work.

What You'll Learn:

Industrial application of machine learning to solve challenging problems.

Talk: Machine Learning for Entertainment: Challenges and Opportunities

You have Successfully Subscribed!

Auro Tripathy

Technical Marketing Manager, Tenstorrent

Machine Learning Modeler

Abstract:

As model sizes grow proportionally increasing training costs, interest in sparsification techniques is high. While there's no shortage of literature on the topic of sparsification, we’ll highlight a few popular methods and work through a concrete example to cement our understanding of the benefits of sparsification.

What You'll Learn:

What is sparsity and how it reduces training costs

Talk: A Gentle Introduction to Sparsity with A Concrete Example

You have Successfully Subscribed!

Mohamed Sabri

Consultant in MLOps,Rocket Science

Mohamed Sabri is a results-driven data science and MLOps specialist with 8 years of experience in Machine learning and deep learning, systems design concepts, understanding data architecture concepts, data modeling, project design and project implementation. This Trilingual consultant has a very wide range of project experience for various industries. Mohamed Sabri has been as well a director of data science and AI in a startup in Montréal, with his experience on the field and as a manager Mohamed is capable of supporting all organizations in their AI project implementation. Mohamed is also the author of the book "Data Scientist Pocket Guide" available on Amazon.

Abstract:

Learn how to create a retraining ML pipeline.
Get more familiar with Kubernetes.
Learn how to use Kubeflow at 100% capabilities.

Hands-on Workshop: Create Machine Learning Retraining Pipelines

You have Successfully Subscribed!

Ken Conroy

VP of Data Science, Finn AI

Dr. Ken leads the development of the Finn AI proprietary Natural Language Processing system. He is an expert data scientist and a sought-after speaker at technical events locally and abroad. Dr. Ken earned his PhD for Computer Science at Dublin City University. His specialties include deep machine learning and artificial intelligence.

Abstract:

Constructing and maintaining an ever evolving supervised learning approach to conversational assistants is a complex task. This talk will discuss the approach required to anticipate and adapt to changing needs. How much of your data model must be pre-planned, what tooling is required to maintain and alter your labeled data and annotation requirements? How do you visualize and identify where gaps remain in your models capabilities, and how can you ensure quality does not suffer from increased coverage. This talk will speak about how this approach was applied over the last few years, leading to the development of our conversational assistant at Finn AI.

What You'll Learn:

Focus on your data, improve your data management capabilities to truly understand and solve your customer needs.

Talk: How A Data-Centric Approach to ML led Finn to World-Leading Conversational AI for Banking

You have Successfully Subscribed!

Vered Shwartz

Assistant Professor, UBC Computer Science

Vered Shwartz is an Assistant Professor of Computer Science at the University of British Columbia. Her research on natural language processing focuses on commonsense reasoning, computational semantics and pragmatics, and multiword expressions. Vered was a postdoctoral researcher at the Allen Institute for AI (AI2) and the University of Washington, and received her PhD in Computer Science from Bar-Ilan University. Vered's work has been recognized with several awards, including The Eric and Wendy Schmidt Postdoctoral Award for Women in Mathematical and Computing Sciences, the Clore Foundation Scholarship, and an ACL 2016 outstanding paper award.

Abstract:

Deep Learning has changed the face of Natural Language Processing (NLP). Across NLP tasks, generic neural architectures surpass the performance of systems designed based on domain knowledge, and in some cases even perform on par with humans. The recent advancement of pre-trained language models (such as Google's BERT), trained on a massive amount of texts, further boosted performance and reduced development time. Their ease of use has made NLP accessible to non-experts. But looking beyond popular media reports and performance metrics, is NLP anywhere near being solved? In this talk I will present some of the remaining challenges in NLP. I will discuss current blind spots that limit the real-world applicability of models, such as their limited ability to generalize outside the training domain, the vast data requirements, and the lack of common sense knowledge. We will also review broader consequences related to environmental and ethical issues.

What You'll Learn:

The talk is built as a checklist of what works and what doesn't yet work.

Talk: Recent Breakthroughs and Uphill Battles in Modern Natural Language

You have Successfully Subscribed!

Toren Huntley

Senior Data Scientist, Atreides

Toren is the lead data scientist of Atreides, a startup providing leaders from defense, intelligence, and humanitarian organizations with the online backbone for making high stakes, centralized and strategic decisions. Atreides brings global enriched datasets to its users at lightning speed, spanning land, sea, air, mobile and cyber domains. Toren leads data science with a customer-focused mindset to turn the science of complexity into the art of simplicity. She uses a first-principles approach to ensure data solutions are valid, explainable, and lasting.

Abstract:

The most successful ML solutions start with desired outcomes and the right data. We’ve all heard the statistic that 80% of the effort of an ML project is spent on data cleaning. Not only is this true, but it only increases when moving a model into production. Having good data practices will pay off in saved time and effort, especially when the full team is on board. There are three main questions to ask when creating an end-to-end ML solution.

What You'll Learn:

Understanding the entire data process is critical for accurate and long-lasting ML models.

Talk: A Data First Approach

You have Successfully Subscribed!

Amir Abdi

Senior Research Engineer / Research Team Lead, Borealis AI

Amir received his PhD in Computer Engineering from the University of British Columbia with research topics in computer vision and reinforcement learning, and focusing on biomedical solutions. He is a recipient of the NSERC Vanier Scholarship and Gilles Brassard Award for his interdisciplinary research. Amir is the CTO and co-founder of Offerland, a proptech AI-based startup in Vancouver. He is currently a Senior Research Engineer in Borealis AI and leads the NOMI Forecasts team.

Abstract:

AI systems are biased because their models and data are human creations and reflect the inherent racial, economic, and gender inequalities in the society. Bias can enter the collection, generation, and labeling of data as well as the design and evaluation of algorithms. This, often unintentional, phenomenon adversely impacts predictions and poses a danger for the business and its clients. The concept of bias in AI, to some extent, is tied to the ethics and fairness concepts as it can result in discriminatory outcomes for certain minorities depending on the applications.
In this talk, the main roots of bias in AI systems and their impact on the business and the society are discussed. Some of the essential technical solutions for model explainability and model debiasing, including data-centric and algorithmic approaches, as well as model monitoring solutions, are explored. Finally, the executive actions organizations could exercise to raise awareness and mitigate the bias risk are laid out.

What You'll Learn:

Importance of fairness in ML systems and how to monitor it.

Talk: Monitoring AI Fairness

You have Successfully Subscribed!

Razi Bayati

Vancouver Chapter Chair, MLOps World: Machine Learning in Production

Razi Bayati is a UBC alumni and currently works at Rogers communications as a machine learning engineer. Her area of research is the intersection of machine learning and 5G.
Before joining Rogers, she worked in enterprise-level companies - Huawei and Nokia, successful startups - Theory + Practice and Zennea Technologies, and the telecommunication lab at the University of British Columbia.
She has always tried to bring the community together, previously as an elected vice president of academic and university affairs at the graduate student society of UBC to advocate for all graduate students and currently as a co-chair of women in technology group with RISE for Women.
As chair, Razi is determined to bring together the Vancouver machine learning society with her vision to promote data-centric AI. Her goal is to connect BC-based companies and improve their visibility to expose the potentials in BC for machine learning in production.

You have Successfully Subscribed!

Zaid Haddad

Data Scientist Leader, Slalom

Zaid is a Data Scientist Leader at Slalom. He brings experience delivering high quality data science solutions in personalized healthcare and big data while supporting commercialization in regulated environments. He has supported several organizations in the development and implementation of data science products’ road map from conception to deployment. Zaid holds his Bachelor of Science from Simon Fraser University in Computing Science and Molecular Biology & Biochemistry.

Abstract:

Meet the MLOps challenge with Slalom to unleash the power of intelligent applications.

What You'll Learn:

Build ML models at scale and MLOps framework

TalK: Machine Learning Operations

You have Successfully Subscribed!

Mohammad Ghodratigohar

Data Scientist and AI Cloud Solution Architect,Microsoft

Mohammad Ghodratigohar (MG) is a Microsoft data scientist and cloud solution architect with years of hands-on knowledge and expertise in leveraging advanced AI techniques to solve business problems. He joins Microsoft from Transport Canada, where, in collaboration with Microsoft’s account team, he played a key role in developing Transport Canada’s big data & MLOps platform on Microsoft Azure. Also, he designed, developed, and led various machine learning projects mainly in healthcare through private sector companies and hospitals in computer vision, signal processing and geospatial analysis fields. Mohammad has a MSc degree in biomedical engineering with a specialization and journal publications in machine learning and healthcare.

Abstract:

Throughout the development and use of machine learning models, the end-to-end MlOps process should addresses specific risks of AI with a focus on risk-adjusted value delivery and efficiency in AI scaling in alignment with regulations.
Responsible machine learning contains values and principles which at the core encompasses understanding explainability and fairness of model. This is critical for data scientists, auditors, and business decision-makers alike to ensure compliance with company policies, industry standards, and government regulations in their MLOps process.
This session will talk about how, where and when responsible AI principles can be applied in different MlOps steps through showcasing hands-on demo that uses the open-source techniques for assessing explainability and fairness of trained model.

What You'll Learn:

What is Responsible AI in action
How each component of Responsible AI can be applied and considered in MLOps
How some of Responsible AI toolkits fit into MLOps process
Actual example of applying Responsible AI toolkits in Azure Machine learning

Talk: Responsible MLOps

You have Successfully Subscribed!

Greg Loughnane

Lead Instructor, MLOps, FourthBrain / Product Manager, Career Coach, FactoryFix

Greg is the lead instructor of the MLOps program at FourthBrain, where he teaches a cohort-based course that aims to give ML practitioners the skills necessary to deploy, scale, and monitor ML models in production environments.
He also serves as product manager for Career Coach at FactoryFix, which aims to help skilled manufacturing professionals level-up their careers through personalized and proactive guidance.
A former founder, data science consultant, professor, and researcher, Greg's current interests lie in maximizing value through community building and AI-first product thinking.

Abstract:

The most valuable tech companies in the world have successfully leveraged network effects and data to create unparalleled business value and competitive advantage. Yet today, it still remains difficult to replicate their success. Why? While being able to operationalize ML in production environments is incredibly important, other barriers to value creation remain. Integrating ML into existing digital product paradigms requires a holistic approach to AI-first product management, both from technical and business perspectives. In this talk, a framework for AI-first product management is presented that considers product, infrastructure, process, and people problems in turn. These problems correspond to answerable questions about the why, what, how, and who of MLOps. Best practices for digital product portfolios and pipelines are discussed through the lens of maximizing the business value of MLOps for any organization in 2022.

What You'll Learn:

Actionable recommendations about MLOps, especially about what not to do. Maybe even some guidance or additional confidence about where to allocate limited resources to maximize value from MLOps initiatives.

Talk: Business-Value Best Practices for MLOps: Building Data Science into Digital Products

You have Successfully Subscribed!

Stella Wu

Head of Machine Learning, Shakudo

Stella is a machine learning researcher experienced in developing AI models for real life applications. Stella has built machine learning models in natural language processing, time-series prediction, self-supervised learning, recommendation system and image processing at BMO, Borealis AI and several startups. Stella has a PhD in geophysical modeling from University of Münster in Germany.

Abstract:

AutoML is a capability that can scale up AI adoption across industries. MLOps tools can enable AutoML end to end to make it easy to get started and use in production.

What You'll Learn:

AutoML and MLOps are the The next frontier in large scale adoption of AI

Talk: AutoML and MLOps

You have Successfully Subscribed!

Yaser Khalighi

Founder and CEO, SceneBox

Yaser Khalighi (Stanford Ph.D.) is a leader on the frontier of machine learning data operations for computer vision and autonomous systems. Having previous experience building production-grade AI data platforms for autonomous vehicles and other applications, Yaser has spent the last two years at the helm of Caliber Data Labs. Caliber is the creator of SceneBox, the premier data operations platform for all things computer vision. Prior to his work building AI data platforms, Dr. Khalighi was a Senior Manager of Data Analytics at Ericsson.

Abstract:

It's all in the data.

As machine learning (ML) development continues at a frenetic pace across the mobility sector, many teams are neglecting the required parallel development of intelligent data operations solutions. In order to wrangle the vast amounts of unstructured perception data they collect, teams need to have access to a purpose-built, production grade data operations platform to expedite their ML development. In this talk, our speaker argues that optimized data operations in and of itself is the key to unlocking L4 and L5 autonomous systems. Even teams with seemingly sufficient tools may find value in exploring this ever-changing landscape.

What You'll Learn:

The importance of data operations when developing autonomous systems.

Talk: Software 2.0 and Data-Centric AI

You have Successfully Subscribed!

Sonya David

Strategic Analytics Solutions Manager, JPMorgan Chase & Co.

Sonya David is a Strategic Analytics Solutions Manager at JPMorgan Chase & Co. She has experience in data science and analytics. Sonya holds an Honours Bachelor’s degree in Neuroscience from the University of Toronto and a Master of Computer Science from Wilfrid Laurier University.
Sonya is deeply involved in the community, having volunteered for institutions such as Sick Kids and Sunnybrook, among others, holding positions ranging from Research Assistant to Committee Executive.
As Chair, Sonya will bring her committee experience, data science chops, and wide-ranging partnerships to the role.

You have Successfully Subscribed!

Jacopo Tagliabue

Director of AI, Coveo

Educated in several acronyms across the globe (UNISR, SFI, MIT), Jacopo Tagliabue was co-founder of Tooso, an A.I. company acquired by Coveo in 2019. Jacopo is currently the Director of A.I. at Coveo, shipping models to hundreds of customers and millions of users. When not busy building products, he teaches MLSys at NYU and explores topics at the intersection of language, reasoning and learning (with research work presented at NAACL, RecSys, ACL, SIGIR). In previous lives, he managed to get a Ph.D., do sciency things for a pro basketball team, and simulate a pre-Columbian civilization.

Abstract:

As with most Machine Learning systems, recommender systems are typically evaluated through performance metrics computed over held-out data points. However, real-world behavior is undoubtedly nuanced: ad hoc error analysis and case-specific tests must be employed to ensure the desired quality in actual deployments. We introduce RecList, a behavioral-based testing methodology and open source package for RecSys, designed to scale up testing through sensible defaults, extensible abstractions and wrappers for popular datasets.

What You'll Learn:

Recommender systems are everywhere, but it's hard to make sure they do what they are supposed to when actually deployed. We motivate our use case leveraging real-world experience across hundreds of organizations, and then discuss the shortcomings of standard research metrics. Finally, we introduce RecList, an open source tool that helps scale up behavioral testing for RecSys.

Talk: Beyond NDCG: Behavioral Testing of Recommender Systems with RecList

You have Successfully Subscribed!

David de la Iglesia Castro

Software Engineer, iterative.ai

David is a software engineer at iterative.ai, where he mainly works on open source tools like DVC and DVCLive. Before that, he was a Senior Computer Vision Researcher at gradiant.org.

Abstract:

Machine learning operations (MLOps) have gained attention among practitioners aiming to automate the development of Machine Learning models, attempting to mimic the impact of DevOps in software.
However, MLOps platforms are usually built isolated from the software development process, arguing that the well-proven tools used for DevOps can’t be applied to Machine Learning projects.
In this workshop, we will use HuggingFace to train a model that predicts labels for GitHub issues.
By extending the power of Git and Github with DVC and CML, our workflow will be able to handle the entire lifecycle of a Machine Learning model using the same tools and platforms that have been proven to work for software development.
The workshop only requires a web browser in order to follow from start to finish.

What You'll Learn:

In this workshop, we will learn what it means and how to build an "MLOps workflow" by extending the power of Git and GitHub with open-source tools. At the end of the workshop, we will have a workflow that covers the entire lifecycle of a Machine Learning mode

Workshop: Making MLOps Uncool Again

You have Successfully Subscribed!

Jason Katz

ML Engineer, LinkedIn

Jason Katz, a leading ML engineer at LinkedIn's job search ML team, and a top ranking freelancer on Upwork, teams up with his graduate school friend David Kebudi, the Co-Head of AI at OpenRisk Technologies with pending patents in Imitation Learning and Natural Language Processing. Jason and David got their Master's in Data Science from Brown University in 2020, after working on DARPA funded projects and cancer research, respectively.

Abstract:

Have you ever wondered how machine learning differs across various domains? Training models on petabytes of data that are then deployed to millions of users is very different from trying to create a predictive model from only a few data points that need to be deployed into a client's pipeline. Properly maintaining pipelines that generate in house data versus scouring the web for supplemental data versus working with clean datasets to test theories, the process couldn't be more varied! Join us to explore the differences between the ML experience at all levels!

What You'll Learn:

You will have an opportunity to hear about 3 main walks of ML life:
(1) Start-up ML, which often has limited data, and needs to prove itself to be useful within the strict limitations to secure value and funding;
(2) Big-Tech ML, with a vast ocean of data and resources, these projects often have effects on millions of people - which requires extensive testing before deployment and iterative problem solving strategies for improvement;
(3) Academia, which requires innovative thinking and experimenting while keeping the budget under control.
The talk aims to give exposure to the differences among ML teams across the industry, hopefully helping people make easier decisions about their career.

Talk: ML Everywhere - Start Up, Big Tech, Academia

Co-Presenter: David Kebudi

You have Successfully Subscribed!

David Kebudi

ML Engineer, OpenRisk Technologies

David Kebudi, the Co-Head of AI at OpenRisk Technologies with pending patents in Imitation Learning and Natural Language Processing, teams up with his graduate school friend Jason Katz, a leading ML engineer at LinkedIn's job search ML team, and a top ranking freelancer on Upwork. David and Jason got their Master's in Data Science from Brown University in 2020, after working on DARPA funded projects and cancer research, respectively.

Abstract:

Have you ever wondered how machine learning differs across various domains? Training models on petabytes of data that are then deployed to millions of users is very different from trying to create a predictive model from only a few data points that need to be deployed into a client's pipeline. Properly maintaining pipelines that generate in house data versus scouring the web for supplemental data versus working with clean datasets to test theories, the process couldn't be more varied! Join us to explore the differences between the ML experience at all levels!

What You'll Learn:

You will have an opportunity to hear about 3 main walks of ML life:
(1) Start-up ML, which often has limited data, and needs to prove itself to be useful within the strict limitations to secure value and funding;
(2) Big-Tech ML, with a vast ocean of data and resources, these projects often have effects on millions of people - which requires extensive testing before deployment and iterative problem solving strategies for improvement;
(3) Academia, which requires innovative thinking and experimenting while keeping the budget under control.
The talk aims to give exposure to the differences among ML teams across the industry, hopefully helping people make easier decisions about their career.

Talk: ML Everywhere - Start Up, Big Tech, Academia

Co-Presenter: Jason Katz

You have Successfully Subscribed!

Kenneth Chen

Former Machine Learning Engineer, ArthurAI

Kenny was a former MLE @ ArthurAI, creating enterprise software for monitoring production models for performance, data drift, bias mitigation, and explainability, where he researched the content of this presentation and productionized it in Arthur's B2B SaaS Platform.
He holds an undergraduate degree from Harvard University in Statistics and Computer Science, with a focus in Bayesian deep learning.

Abstract:

Automating data drift thresholding is a must for ML monitoring platforms for detecting model degradation as a result of shifting model inputs. There are many different data drift measures and many different distributional forms for reference data and observed data that raise various robustness concerns. To fix and address these concerns, we present two solutions, an empirical bootstrapping-based approach and flexible closed-form probabilistic approximations based on an adjusted Bayesian conjugacy, for f-divergence data drift thresholding. Robustness and computational trade-offs will be addressed.

What You'll Learn:

(1) Specific algorithmic approaches for automating data drift thresholds based on empirical approximations of data distributions, via bootstrapping and closed-form probability derivations/series approximations.
(2) Pitfalls to watch for (skewed data, low data sizes/high number of categories leading to sampling instability and degraded data drift thresholds) and how the proposed approaches are robust in these scenarios.
(3) Performance and computational trade-offs between the proposed approaches and summary of which to use depending on backend infrastructure.

Talk: Automating Data Drift Thresholding and Addressing Common Computational and Theoretical Pitfalls in Production ML Monitoring

You have Successfully Subscribed!

Mohamed Sabri

Senior Consultant in MLOps, Rocket Science

Mohamed Sabri is a results-driven Machine learning Engineer and MLOps specialist with 8 years of experience in Machine learning and data field. This Trilingual consultant has a very wide range of project experience for various industries. Mohamed Sabri has been as well a director of data science and AI in a startup in Montréal, with his experience on the field and as a manager Mohamed is capable of supporting all organization in their AI project implementation. Mohamed is also the author of the book “Data Scientist Pocket Guide” and instructor and mentor at the University of Austin Texas.

Abstract:

Description: The workshop is a hands-on session where we will discover Kubeflow pipelines. We will learn how to create an environment with Kubeflow on Kubernetes then get familiar with the environment. After that we will create our pipelines and upload them in Kubeflow. We will create couple of pipelines and run schedules for retraining.
Session length: 4 to 5 hours.
Requirements: Have a personal AWS account with credit card processed
Background: Knowledge in Docker/Kubernetes/Virtualization
Experience with Python

What You'll Learn:

Become more comfortable with tools like Kubeflow

Hands on Workshop: How to Build Pipelines with Kubeflow

You have Successfully Subscribed!

Alessya Visnjic

CEO, WhyLabs

Alessya Visnjic is the CEO of WhyLabs, the AI Observability company building the interface between AI & human operators. Prior to WhyLabs, Alessya was a CTO-in-residence at the Allen Institute for AI, where she evaluated commercial potential for the latest AI research. Earlier, Alessya spent 9 years at Amazon leading ML adoption & tooling efforts. Alessya is also the founder of Rsqrd AI, a global community of 1,000+ AI practitioners who are making AI technology Robust & Responsible.

Abstract:

The day the ML application is deployed to production and begins facing the real world is the best and the worst day in the life of the model builder. Debugging, troubleshooting & monitoring takes over the majority of their day, leaving little time for model building. In DevOps, software operations are taken to a level of an art. Sophisticated tools enable engineers to quickly identify and resolve issues, continuously improving software robustness. In the ML world, operations are still largely a manual process that involves Jupyter notebooks and shell scripts. One of the cornerstones of the DevOps toolchain is logging, it’s the foundation of testing and monitoring tools. What does logging look like in an ML system?
In this talk we will show you how to enable data logging for an AI application. We will discuss how something so simple enables testing, monitoring and debugging of the entire AI pipeline. We will dive deeper into key properties of a logging library that can handle TBs of data, run with a constraint memory footprint and produce statistically accurate log profiles of structured and unstructured data. Attendees will leave the talk equipped with best practices to supercharge MLOps in their team.

What You'll Learn:

The audience will leave the talk equipped with a set of best practices around logging, monitoring, and observability which they can implement in their organization. Best practices will focus on process, culture, and open source tools - so will be actionable for an organization of any size and stage of ML adoption.

Talk: The Critical Missing Component in the Production ML Stack

You have Successfully Subscribed!

Yuval Fernbach

CTO & Co-Founder, Qwak

Yuval Fernbach is the Co-founder & CTO of Qwak, where he is focused on building next-generation ML Infrastructure for ML teams of various sizes. Before Qwak, Yuval was an ML Specialist at AWS , where he helped AWS Customers across EMEA with their ML challenges. Previous to that, he was the CTO of the IT department of the IDF ("Mamram").

Abstract:

As more and more companies are building ML based production systems, practitioners are finding out that maintaining and managing models in production is difficult and expensive.
Building a production grade model contains many “moving parts”; Data / Code changes, package dependencies, research/production infrastructure, production interface, etc.

What You'll Learn:

In this session, we’ll cover some of the best practices of building models that can scale, fit production needs, and ready for the real world

Talk: Why Build System is Probably the Most Critical Thing When Shifting ML From Buzz to the Real World

You have Successfully Subscribed!

Raluca Crisan

Co-Founder, Etiq AI

Raluca has 10+ years of experience in data science working with a variety of clients (UK retailers, banks & telco companies). Raluca's experience spans managing teams to hands-on data product development. She is Co-founder of Etiq, a start-up in the ML observability space, with a focus on the algorithmic bias problem. Prior to Etiq she was Director - Data Science for Merkle Aquila.

Abstract:

There is a good deal of research creating metrics to help industry practitioners tackle the algorithmic bias problem. However, in practice, if you manage to reduce one bias metric you're likely to increase another. This talk will focus on setting forth strategies and examples of how practitioners can still use these bias metrics to obtain meaningful results. Technical implementation details will be discussed.

What You'll Learn:

- What are the key bias indicators/metrics? And how can they each help you? What are some strategies to deal with conflicting results...
- Technical implementation will be discussed for some metrics, including pointers on what are some things you want to be mindful of when you choose one calculation over another.
- Intersectionality is not covered as much in the academic literature as other bias topics, but it is a key topic in an applied setting. What are some strategies there?
Examples will help you understand how this can relate to your day-to-day work as data scientist/ML practitioner.

Talk: How Do You Identify Bias in a World of Conflicting Metrics?

You have Successfully Subscribed!

Gift Ojeabulu

Data Scientist, CBB Analytics

A Data Scientist, Machine Learning Engineer, Open Source Advocate, Occasional Speaker and Writer, Founder of The African Data Community Newsletter and a Community Advocate who is passionate about democratizing data knowledge to everyone and constantly providing guidance to enthusiast on how to navigate and transition in the data science space with experience in community management, social media content creation and Developer Education.

Abstract:

Deepchecks is a minimally intrusive MLOps solution for continuous validation of machine learning systems, meant to enable you to trust your models through the continuous changes in your data lifecycle. Deepchecks includes the must-have features for any ML Monitoring system: performance monitoring, data drift detection and anomaly detection alerts, along with some unique features that are especially helpful for complex ML pipelines: Monitoring various phases of the pipeline, detecting hidden data integrity issues, detecting low confidence segments, detecting inconsistencies that are hidden within unstructured text, etc. ML output are difficult to validate as manual testing only work for small dataset, Deepchecks is built for Machine learning engineers and data science leaders for continuous validation of machine learning systems as this is a sure way to scale.

What You'll Learn:

- What is Deepchecks
- When to use Deepchecks
- Data Centric and Model Centric with Deepchecks

Talk: Validating Your Machine Learning Models and Data With Minimal Effort Using DeepChecks

You have Successfully Subscribed!

Itay Ben Haim

ML Engineer, Superwise

Itay is an ML engineer with a background in software and data. 33, married, father of 1 baby girl, 1 dog, and 1 cat, who all get along with each other. In his free time (not that dads have much of that), he's into Crossfit and surfing.

Abstract:

In this workshop, we'll take a dive into MLOps CI/CD + CT pipeline automation. Part 1, we’ll focus on how to put together a continuous ML pipeline to train, deploy, monitor, and retrain your models. Part 2 will focus on automations and production-first insights to detect and resolve issues faster.
This hand-on build will be done with the Google Vertex platform, Superwise model observability, and retraining notebooks. Ahead of the event you’ll get access to all the platforms, environments, and resources we’ll use so you can follow along during the session.

What You'll Learn:

- How to build a continuous MLOps stack - Platform and tool alternatives for each step
- Considerations for scaling up
- Production-first insights and automations

Workshop: A Guide to Putting Together a Continuous ML Stack

Co-Presenter: Or Itzary

You have Successfully Subscribed!

Or Itzary

Chief Architect, Superwise

 

Abstract:

In this workshop, we'll take a dive into MLOps CI/CD + CT pipeline automation. Part 1, we’ll focus on how to put together a continuous ML pipeline to train, deploy, monitor, and retrain your models. Part 2 will focus on automations and production-first insights to detect and resolve issues faster.
This hand-on build will be done with the Google Vertex platform, Superwise model observability, and retraining notebooks. Ahead of the event you’ll get access to all the platforms, environments, and resources we’ll use so you can follow along during the session.

What You'll Learn:

- How to build a continuous MLOps stack - Platform and tool alternatives for each step
- Considerations for scaling up
- Production-first insights and automations

Workshop: A Guide to Putting Together a Continuous ML Stack

Co-Presenter: Itay Ben Haim

You have Successfully Subscribed!

Savin Goyal

Co-Founder and CTO, Outerbounds

Savin is the co-founder and CTO of Outerbounds where he focuses on building generalizable infrastructure to accelerate the adoption and impact of ML in enterprises. Previously, he was at Netflix where he built Metaflow, a popular OSS ML platform. In his previous life, he obtained a bachelors degree in CS from IIT Delhi.

Abstract:

There is a pressing need for tools and workflows that meet data scientists where they are. This is also a serious business need: How to enable an organization of data scientists, who are not software engineers by training, to build and deploy end-to-end machine learning workflows and applications independently. In this talk, we discuss the problem space and the approach we took to solving it with Metaflow, the open-source framework we developed at Netflix, which now powers hundreds of business-critical ML projects at Netflix and other companies from bioinformatics and drones to real estate. We wanted to provide the best possible user experience for data scientists, allowing them to focus on parts they like (modeling using their favorite off-the-shelf libraries) while providing robust built-in solutions for the foundational infrastructure: data, compute, orchestration, and versioning.

What You'll Learn:

In this talk, you will learn about
- What to expect from a modern ML infrastructure stack.
- Using tools such as Metaflow to boost the productivity of your data science organization, based on lessons learned from Netflix and many other companies.
- Deployment strategies for a full stack of ML infrastructure that plays nicely with your existing systems and policies.

Talk: Human-Friendly, Production-Ready Data Science

You have Successfully Subscribed!

Kevin Stumpf

Co-Founder and CTO, Tecton

Kevin Stumpf co-founded Tecton and is the CTO, where he leads the engineering team that is building a next-generation feature store for Operational Machine Learning. Kevin and his co-founders built deep expertise in Operational ML platforms while at Uber. They created the Michelangelo platform that enabled Uber to scale from 0 to 1000's of ML-driven applications in just a few years.
Kevin holds an MBA from Stanford University and a Bachelor’s Degree in Computer and Management Sciences from Germany’s University of Hagen. Outside of work, Kevin is a passionate long-distance endurance athlete.

Abstract:

Deploying ML in production is hard, and data is often the hardest part. Unlike analytics pipelines, production ML pipelines need to process both historical data for training, and fresh data for online serving, often using streaming or real-time data sources. They must ensure training/serving parity, provide point-in-time correctness, and serve data with production service levels. These challenges are difficult to tackle with traditional ETL tools, and can often add weeks or months to project timelines.
Kevin learned this lesson first-hand while trying to scale ML initiatives at Uber. To solve these data challenges, his team built the industry’s first feature store as part of the Uber Michelangelo platform. It was instrumental in scaling Uber’s ML initiatives to thousands of models in production. It’s now used to power every aspect of Uber’s business: ride ETAs, demand forecasting, pricing, and restaurant recommendations.
Since those early days, feature stores have emerged as the tool of choice to solve the data challenges of operational ML. At their core, feature stores provide a simple solution to store, serve and share features.
However, solving the storage and serving problem is not enough. Teams still need to tackle the data transformation problem - which typically means creating bespoke data pipelines to process raw data into features in real-time. To solve the end-to-end data problem for ML, organizations need more than a feature store. They need a complete feature platform, which includes automated ML data pipelines to transform data from batch and real-time sources. These platforms enable data teams to easily build batch, streaming, and real-time features, and deploy them to production quickly and reliably.

What You'll Learn:

In this session, attendees will learn about the data challenges faced by ML teams at Uber, and how they were solved with feature stores.
They will learn the common architecture of feature stores, and how they fit in the MLOps stack to solve the challenges of building production ML features. They’ll also learn the limitations of most feature stores, and why data teams really need a complete feature platform that can also automate ML data pipelines and process data from batch, streaming, and real-time sources.

Talk: Building Production-Ready ML Features with Feature Stores

You have Successfully Subscribed!

Rafael Pierre

Solutions Architect Data Science Expert, Databricks

Rafael Pierre is a Solutions Architect at Databricks, the founders of open source platforms such as Apache Spark, MLflow and Delta.
He holds a Bachelor’s Degree in Computer Science and has more than a decade experience in software development and architecture for Fortune 500 companies in mission critical, data intensive fields, such as the stock exchange, high frequency trading, IoT & Telematics. He also holds a Master’s Degree from the University of Amsterdam and extensive experience in delivering cloud, data & AI solutions at scale, designing and implementing end-to-end machine learning systems.

Abstract:

In this talk, you will learn how Databricks and MLflow provide a powerful set of features for implementing an end-to-end MLOps workflow.

What You'll Learn:

Experiment Tracking, Model Governance, Automatic CICD Pipelines with MLflow Webhooks

Talk: End-to-end MLOps MLflow and Databricks
Co-Presenter: Anastasiia Prokaieva

You have Successfully Subscribed!

Anastasiia Prokaieva

Specialist Solution Engineer – Data Science and MLOps, Databricks

Having 2 master's degrees in Science, I've been working on various Big Data areas from analysing data from Big Hadron Collider to satellite imageries on greenhouse emissions. Prior to Databricks worked as a Environmental Research scientist and as a Data Scientist in a French FinTech constructing proxies for macro-economic and extra-financial indicators based on alternative data using AI.
At Databricks I help our customers to accelerate their end-to-end Data Science projects by providing best practice approaches on MlOps with MlFlow, Spark and Delta and by delivering workshops and private ILT trainings created by Databricks team.

Abstract:

In this talk, you will learn how Databricks and MLflow provide a powerful set of features for implementing an end-to-end MLOps workflow.

What You'll Learn:

Experiment Tracking, Model Governance, Automatic CICD Pipelines with MLflow Webhooks

Talk: End-to-end MLOps MLflow and Databricks
Co-Presenter: Rafael Pierre

You have Successfully Subscribed!

Subhabrata Das

ML Engineer, JPMorgan Chase & Co.

Subhabrata Das has an ME from Cornell University and PhD from Columbia University, New York in the area of Computational Physics. He has worked at Avantor and Ally Bank prior to joining JP Morgan Chase AI/ML team. He had also consulted for Bank of America, Microsoft and Consumers Energy earlier. His research interests are in Machine Learning, Deep Learning and Natural Language Processing.

Abstract:

To a large extent, power outage forecasting is a daunting task because it depends on factors, such as weather, animal, tree, birds, load etc. Those factors generate a lot of fluctuations to outage data. As a type of RNN, LSTM has a good performance on processing time series data as well as some nonlinear and complex problems like stock price forecasting, fluctuations in energy consumption, demand response, traffic management etc. To explore more accurate power outage forecasting approach, in this paper, a new hybrid model based on Principal Component Analysis (PCA), Poisson regression (PR), Seq2Seq and Adam optimized LSTM neural network, denoted as PCA-PR-Seq2Seq-Adam-LSTM, is proposed. After the PCA, nonlinear sequence of power outage can be framed and processed data will have a more stable variance, and the combination of Adam, one of efficient stochastic gradient-based optimizers, and LSTM can capture appropriate behaviors precisely for the power failure. This study presented four cases to verify the performance of the hybrid model, and the dataset from Michigan area were adopted to illustrate the excellence of the hybrid model. The results show that the proposed model can significantly improve the prediction accuracy

What You'll Learn:

Applications of Machine learning and Deep learning in Power Utility Industry

Talk: A Hybrid Deep Learning Model for Power Outage Prediction

You have Successfully Subscribed!

Sree Gowri Addepalli

Senior AI Engineer, Target

Sree Gowri Addepalli is currently a Senior AI Engineer with the Advanced Machine Learning group in the AI Computer Vision Organization. This team is responsible for researching and developing cutting edge Computer Vision products in the digital space for Target. Sree Gowri Addepalli graduated from New York University’s Courant Institute of Mathematical Sciences with a Master’s in Computer Science. She has actively been part of academic research in the field of Computer Vision at NYU Center for Data Science. She has further gained industrial research experience as an intern at Amgen and Etsy while working on search problems. Prior to her masters, she held software engineering experience with Diebold Nixdorf in Mumbai after graduating from Mumbai University, India with a Bachelor’s in Computer Engineering. She likes ideating and building products and won the R&D hackathon at Diebold Nixdorf, 2018, among internal teams worldwide apart from organizing the NYUHack, 2019 at a global scale by serving as chair. She has previously gained scholarships to attend conferences like ECCV, CVPR, GHC to keep herself updated with the latest research. Her current research focus is computer vision and building machine learning systems in production. Apart from academics, she is a program manager with Indian Women in Computing where she works with AnitaB.org and various companies in helping to push the reach of STEM, specifically Artificial Intelligence, to the diverse sectors of society. She also founded the `Lean in Circle` in Collaboration with NYU women in computing.

Abstract:

Machine learning models drive some of the most important business decisions at Target. Therefore, monitoring the model behavior in production is of utmost importance. Using behavioral insights from internal models, Target has built a monitoring capability as a component of model observability for models deployed in production for a Visual Discovery application. This ensures that these models remain relevant and meet the desired performance in production while giving scientists the ability to understand and improve the quality of models in production.

What You'll Learn:

PRESENTATION DELIVERY TALKING POINTS:
• THE PROBLEM:
o How do we ensure that a machine learning model performs as expected in production over time? o How can you make ML models more observable? o Why is it important to solve this problem?
• THE SOLUTION:
o Overview of model monitoring framework architecture.
o Brief explanation of all components in the framework library.
o Ease of plugging in the framework to any machine learning application and infrastructure.
o Accessibility and availability to various machine learning applications across Target.
o Root cause analysis using evaluation metrics to improve model observability.
• DEMO:
o Showcase the usage of model monitoring framework plugged in as a library in a visual discovery application at Target.
o Components of model monitoring framework interacting with Target internal infrastructure to provide monitoring, visualizing and alerting capabilities.
• OUTCOME FOR THE ENTERPRISE:
o Capability to proactively monitor a machine learning model input and predictions thereby ensuring robust machine learning systems at Target.
o Model monitoring framework was made available across Target as a python library and various machine learning teams were able to easily plug in the library into their respective machine learning applications seamlessly with Target infrastructure.
o The custom-built solution over third party tools to ensure seamless integration with Target tech stack and provide machine learning teams with the ability to build custom features and evaluation metrics to understand model performance in production.

Talk: Model Monitoring: Ensure Robust Machine Learning Systems in Production
Co-Presenter: Vishal Vijay Shanker

You have Successfully Subscribed!

Dan Fu

PhD Student, Stanford University

Dan Fu is a PhD student in the Computer Science Department at Stanford University, where he is co-advised by Christopher Ré and Kayvon Fatahalian. His research focuses on understanding the principles behind why machine learning methods work and using that understanding to build the next generation of ML systems. He is supported by a Department of Defense NDSEG fellowship.

Abstract:

Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to reduce the compute complexity, but often do not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-aware -- accounting for reads and writes between levels of GPU memory. We propose FlashAttention, an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM. We analyze the IO complexity of FlashAttention, showing that it requires fewer HBM accesses than standard attention, and is optimal for a range of SRAM sizes. We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method. FlashAttention trains Transformers faster than existing baselines: 15% end-to-end wall-clock speedup on BERT-large (seq. length 512) compared to the MLPerf 1.1 training speed record, 3× speedup on GPT-2 (seq. length 1K), and 2.4× speedup on long-range arena (seq. length 1K-4K). FlashAttention and block-sparse FlashAttention enable longer context in Transformers, yielding higher quality models (0.7 better perplexity on GPT-2 and 6.4 points of lift on long-document classification) and entirely new capabilities: the first Transformers to achieve better-than-chance performance on the Path-X challenge (seq. length 16K, 61.4% accuracy) and Path-256 (seq. length 64K, 63.1% accuracy).

What You'll Learn:

We present FlashAttention, a new fast and memory-efficient algorithm for exact attention. Our key insight is that attention is bottlenecked by slow memory accesses. We'll go over how we reduce the compute and memory footprint -- resulting in 2-4X faster training with 10-20X less memory, and enabling Transformers to scale up to sequences of length 64K.

Talk: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

You have Successfully Subscribed!

Gideon Mendels

CEO & Founder, Comet

Gideon Mendels is CEO and Founder of Comet, an ML platform provider. He led a team that trained and deployed more than 50 NLP models in 15 languages as founder of GroupWize. He also worked on hate-speech and deception detection at Google, and he trained and put into production deep learning classifiers for 500 languages at Columbia University jointly with IBM Research.

Abstract:

While ML model development is a challenging process, the management of these models becomes even more complex once they're in production. Shifting data distributions, upstream pipeline failures, and model predictions that affect the very datasets they’re trained on can create thorny feedback loops between development and production. In this talk, Gideon Mendels, CEO and founder of Comet, will share industry examples of production ML systems.

What You'll Learn:

* Why naive ML workflows that don’t take the development-production feedback loop into account break down
* System design principles that will help manage these feedback loops more effectively
* Industry case studies where teams have applied these principles to their production ML systems

Talk: ML System Design for Continuous Experimentation

You have Successfully Subscribed!

Zoya Bylinskii

Research Scientist, Adobe Inc.

Zoya Bylinskii, Ph.D., is a Research Scientist in the Creative Intelligence Lab at Adobe Research as of 2018. Previously, she was an affiliate researcher at the Massachusetts Institute of Technology and at the Harvard Institute of Applied Computational Science. Zoya received her Computer Science Ph.D. and M.Sc. from MIT in 2018 and 2015, respectively, and an Hon. B.Sc. in Computer Science and Statistics, with a focus on Artificial Intelligence, from the University of Toronto in 2012. Zoya is a 2018 EECS Rising Star, 2016 Adobe Research Fellow, 2014-2016 NSERC Postgraduate Scholar, 2013 Julie Payette Research Scholar, and 2011 Anita Borg Scholar. Zoya works at the interface of human perception & cognition, computer vision, and human-computer interaction on applications in design, photography, and readability.

Abstract:

Dr. Bylinskii has pioneered the use of cognitive and perceptual science methods to create A.I. applications for graphic design, photography, and digital text that can automatically adapt and respond to users’ needs. Operating at the intersection of cognitive science & computer science, her work on visual attention and memorability has been recognized by interdisciplinary communities for driving both our understanding of human perception and the development of perceptually-guided algorithms. In this talk, Dr. Bylinskii will give you a taste of what we know about human memory & attention and how these insights can be applied to the creation of ML datasets, design of algorithms, and evaluation of AI tools.

What You'll Learn:

- What people tend to attend to, remember, and find important in photos, designs, and data visualizations
- How human perception & cognition data can be collected (i.e., crowdsourced at scale)
- The extra edge that ML algorithms that incorporate perception & cognition insights can have

Talk: Perception & Cognition Insights for ML

You have Successfully Subscribed!

Navdeep Gill

Engineering Lead, H2O.ai

Navdeep is an Engineering Lead at H2O.ai where he leads a team of researchers and engineers working on various facets of Responsible AI. He also leads science and product efforts around explainable AI, ethical AI, model governance, model debugging, interpretable machine learning, and security of machine learning. Navdeep previously focused on GPU accelerated machine learning, automated machine learning, and the core H2O-3 platform at H2O.ai.
Prior to joining H2O.ai, Navdeep worked as a Senior Data Scientist at Cisco and as a Data Scientist at FICO. Before that Navdeep was a research assistant in several neuroscience labs at the following institutions: California State University, East Bay, Smith Kettlewell Eye Research Institute, University of California, San Francisco, and University of California, Berkeley.
Navdeep graduated from California State University, East Bay with a M.S. in statistics with an emphasis on computational statistics, a B.S. in statistics, and a B.A. in psychology with a minor in mathematics.

Abstract:

Artificial intelligence and machine learning present significant opportunities to businesses. To reap the full benefits of ML, organizations need to trust the algorithms and incorporate them in their existing workflows. This presentation outlines a set of actionable best practices for people, process, and technology that not only enables incorporation of machine learning in business applications, but also ensures that it is done in a responsible manner for the business, and for the society at large. The goal is to increase adoption of machine learning in businesses by incorporating the appropriate governance, driving trust, efficiency, and business results.

What You'll Learn:

AI governance is needed for all organization to fully harness the application of AI

Talk: Security Audits for Machine Learning Attacks
Co-Presenter: Abhishek Mathur

You have Successfully Subscribed!

Yaron Singer

CEO & Co-Founder, Robust Intelligence

Yaron is the CEO of Robust Intelligence and Professor of Computer Science and Applied Math at Harvard. Yaron is known for breakthrough results in machine learning, algorithms, and optimization. Previously, Yaron worked at Google Research and obtained his PhD from UC Berkeley.

Abstract:

AI is eating the world, but corrupted data, drift, biased decisions, liabilities, and malicious actors regularly cause ML models to fail when making critical business decisions. This introduces risk to businesses, and importantly the people, affected by these failing models. Companies try to limit this by constantly monitoring dashboards and firefighting errors, This is incredibly time intensive, doesn’t get at the root of the problem, and can actually erode model accuracy. These reasons make it impossible to confidently apply ML at scale.
To solve this challenge and awaken the true force of AI, companies need to achieve ML Integrity. ML Integrity is the assurance of model quality, validity, and reliability. It is only realized when such considerations are applied to every stage of the ML lifecycle - from model development to production. Once companies have ML Integrity, intelligent applications can be developed with confidence and at greater velocity.

What You'll Learn:

Attendees will learn how to instill integrity into their ML models by implementing proactive measures throughout the lifecycle, from development to production. We’ll discuss what attendees should expect from a ML Integrity offering and review several use cases.

Talk: ML Integrity: The Quest to Awaken the True Force of AI

You have Successfully Subscribed!

Alex Kim

ML Engineer, Iterative

Alex’s past work experience involved solving data science and machine learning problems in various domains: physics, aerospace, telemetry/log analytics, image, and video processing.
In the last couple of years, he became increasingly interested in the engineering side of ML projects: processes and tools needed to go from an idea to a production solution. Currently, he works as an MLOps Solutions Engineer at Iterative.ai, helping customers extract the most value from the Iterative ecosystem of tools.

Abstract:

In the last few years, training a well-performing Computer Vision (CV) model in Jupyter Notebooks became fairly straightforward if you use pretrained Deep Learning models and high-level libraries that abstract away much of the complexity (fastai, keras, pytorch-lightning are just a few examples).
The hard part is still incorporating this model into an application that runs in a production environment bringing value to the customers and our business.
A typical ML project lifecycle goes through 3 phases:
• Active exploration or proof-of-concept phase. Here we try many different approaches to data preprocessing, model architectures and hyper-parameter tuning with the goal to finally settle on the most promising combination
• Application development phase. Here we build all the “plumbing” around the model, that is: getting input data, massaging it into the right format, passing it to the model and, finally, serving the model’s output. The goal here is to end up with a version of an end-to-end application that is robust and well-performing enough to be considered for deployment to the production environment.
• Production deployment phase. Here we promote a well-tested version of the application from the development environment to the production environment.
In this talk, I’ll describe an approach that streamlines all three phases. As our demo project, I’ve selected a very common deployment pattern in CV projects: a CV model wrapped in a web API service. Automatic defect detection is an example problem I am addressing with this pattern.
I assume the target audience of this talk to be technical folks (e.g. Software Engineers, ML Engineers, Data Scientists) who are familiar with the general Machine Learning concepts, Python programming, CI/CD processes and Cloud infrastructure.

What You'll Learn:

- How to quickly configure a remote development environment with TPI: write code locally while executing on a remote machine with a GPU
- How to version large datasets and models with DVC
- When it’s the right time to move from Jupyter notebooks to ML pipelines and how to do that with DVC
- Why it’s beneficial to integrate CI/CD workflows into your model development process and how to do that with CML
- How to manage experiments and collaborate on ML projects using Iterative Studio

Talk: Best MLOps Practices for Building End-to-End Deep Learning Projects

You have Successfully Subscribed!

Vishal Vijay Shanker

Lead Data Scientist, Target

Vishal is currently a Lead Data Scientist at Target leading the engineering efforts for Advanced Machine Learning - Visual Discovery team, focusing on Deep learning model deployment and inference pipelines and general MLOps. Through his prior work experience at Intel and Seagate, he has worked in several areas of Computer Science such as Web application development, Data warehousing, Big Data engineering, Storage devices and Firmware engineering. His passion is to research cutting edge Deep Learning architectures and quantize complex Deep Learning models for memory and performance so they can be used to solve real time problems in the AI space.

Abstract:

Machine learning models drive some of the most important business decisions at Target. Therefore, monitoring the model behavior in production is of utmost importance. Using behavioral insights from internal models, Target has built a monitoring capability as a component of model observability for models deployed in production for a Visual Discovery application. This ensures that these models remain relevant and meet the desired performance in production while giving scientists the ability to understand and improve the quality of models in production.

What You'll Learn:

PRESENTATION DELIVERY TALKING POINTS:
• THE PROBLEM:
o How do we ensure that a machine learning model performs as expected in production over time? o How can you make ML models more observable? o Why is it important to solve this problem?
• THE SOLUTION:
o Overview of model monitoring framework architecture.
o Brief explanation of all components in the framework library.
o Ease of plugging in the framework to any machine learning application and infrastructure.
o Accessibility and availability to various machine learning applications across Target.
o Root cause analysis using evaluation metrics to improve model observability.
• DEMO:
o Showcase the usage of model monitoring framework plugged in as a library in a visual discovery application at Target.
o Components of model monitoring framework interacting with Target internal infrastructure to provide monitoring, visualizing and alerting capabilities.
• OUTCOME FOR THE ENTERPRISE:
o Capability to proactively monitor a machine learning model input and predictions thereby ensuring robust machine learning systems at Target.
o Model monitoring framework was made available across Target as a python library and various machine learning teams were able to easily plug in the library into their respective machine learning applications seamlessly with Target infrastructure.
o The custom-built solution over third party tools to ensure seamless integration with Target tech stack and provide machine learning teams with the ability to build custom features and evaluation metrics to understand model performance in production.

Talk: Model Monitoring: Ensure Robust Machine Learning Systems in Production
Co-Presenter: Sree Gowri Addepalli

You have Successfully Subscribed!

Adi Hirstein

VP Product, Iguazio

Adi Hirschtein brings 20 years of experience as an executive, product manager and entrepreneur building and driving innovation in technology companies. As the VP of Product at Iguazio, the MLOps platform built for production and real-time use cases, he leads the product roadmap and strategy.
His previous roles spanned technology companies such as Dell EMC, Zettapoint and InfraGate, in diverse positions including product management, business development, marketing, sales and execution, with a strong focus on machine learning, database and storage technology. When working with startups and corporates, Adi’s passion lies in taking a team’s ideas from their very first day, through a successful market penetration, all the way to an established business.
Adi holds a B.A. in Business Administration and Information Technology from the College of Management Academic Studies.

Abstract:

Snowflake is one of the most popular data clouds today. With so much data at your fingertips, how do you harness it to create business value?
Most data science teams start their AI journey from what they perceive to be the logical beginning: building AI models using manually extracted datasets. Operationalizing machine learning, in the sense of considering all the requirements of the business; handling online and federated data sources, scale, performance, security, continuous operations, etc. comes as an afterthought, making it hard and resource-intensive to create real business value with AI.
But now – There is a better solution to work in an automated, production-ready environment and cut your time to production from 12 months to 30 days (https://www.iguazio.com/customers/)!
The Iguazio MLOps platform now comes with a built-in Snowflake connector, enabling enterprises to seamlessly access the Snowflake Data Cloud to build, store and share features that are ready for use in machine learning applications on Iguazio, and automate their ML pipeline end to end.
In this session, we will describe the challenges in operationalizing machine & deep learning. We’ll explain the production-first approach to MLOps pipelines - using a modular strategy, where the different components provide a continuous, automated, and far simpler way to move from research and development to scalable production pipelines. Without the need to refactor code, add glue logic, and spend significant efforts on data and ML engineering. We will cover various real-world implementations and examples, and discuss the different stages, including automating feature creation using a feature store, building CI/CD automation for models and apps, deploying real-time application pipelines, observing the model and application results, creating a feedback loop and re-training with fresh data.
We’ll demonstrate how to use Iguazio & Snowflake to create a simple, seamless, and automated path to production at scale!

What You'll Learn:

In this session, we will describe the challenges in operationalizing machine & deep learning. We’ll explain the production-first approach to MLOps pipelines - using a modular strategy, where the different components provide a continuous, automated, and far simpler way to move from research and development to scalable production pipelines. Without the need to refactor code, add glue logic, and spend significant efforts on data and ML engineering. We will cover various real-world implementations and examples, and discuss the different stages, including automating feature creation using a feature store, building CI/CD automation for models and apps, deploying real-time application pipelines, observing the model and application results, creating a feedback loop and re-training with fresh data.

Talk: How to Build an Automated ML Pipeline with a Feature Store using Iguazio & Snowflake

You have Successfully Subscribed!

Mark Moyou

Senior Data Scientist, NVIDIA

Dr. Mark Moyou is a Senior Data Scientist at NVIDIA on the Retail team focused on enabling scalable machine learning for the nation's top Retailers. He is the Conference Director of the Southern Data Science Conference in ATL, GA and host the the Caribbean Data Science Podcast. Before NVIDIA, he was a Data Science Manager in the Professional Services division at Lucidworks, and Enterprise Search and Recommendations company. Prior to Lucidworks, he was a Data Scientist at Alstom Transportation where he applied Data Science to the Railroad Industry. Mark is quite active in his fitness pursuits, so feel free to ask about that topic.

Abstract:

This talk will cover getting started and best practices for doing XGBoost training and inference on GPUs. Why should you consider using GPUs with XGBoost? When training XGBoost models, finding the optimal hyperparameters can take significant time if model training times are slow on CPUs (especially at scale). By leveraging both single and multi-GPUs, we can often see 10x+ speedups in model training, enabling the discovery of optimal hyperparameters in much less time.
We will show how to train XGBoost models across single and multiple GPUs leveraging Dask (Advanced Parallelism in Python). After model training, we will show how to get the best performance at inference time on CPU and GPU by leveraging the open-source RAPIDS Forest Inference Library (FIL) and NVIDIA's open-source inference server, Triton.
Results show an increase in inference throughput up to 9.4x on CPU and 240x on GPU; reduction in compute time by 88% and cost savings up to 56% on GPU vs. CPU. Notebooks and slides will be shared.

What You'll Learn:

How to use XGBoost on GPUs along with some feature engineering best practices

Talk: Leveraging GPUs for Faster XGBoost Training and Inference at Reasonable and Large Scale

You have Successfully Subscribed!

Abhishek Mathur

Product Lead, Responsible AI, H2O.ai

Abhishek is the Product Lead for AI Governance, Responsible AI, and MLOps products at H2O.ai. He focuses on removing friction from organizations (regulatory, compliance, transparency, fairness, audit, accountability, amongst others) in adoption and realizing value from machine learning. Abhishek has previously led product teams in developing NLP (conversational AI), computer vision, and internet of things platforms. He enjoys teaching product management foundations to aspiring PMs, and is a web3 enthusiast.

Abstract:

Artificial intelligence and machine learning present significant opportunities to businesses. To reap the full benefits of ML, organizations need to trust the algorithms and incorporate them in their existing workflows. This presentation outlines a set of actionable best practices for people, process, and technology that not only enables incorporation of machine learning in business applications, but also ensures that it is done in a responsible manner for the business, and for the society at large. The goal is to increase adoption of machine learning in businesses by incorporating the appropriate governance, driving trust, efficiency, and business results.

What You'll Learn:

AI governance is needed for all organization to fully harness the application of AI

Talk: Security Audits for Machine Learning Attacks
Co-Presenter: Navdeep Gill

You have Successfully Subscribed!

Alon Gubkin

Aporia, CTO

Alon Gubkin is the CTO of Aporia, a customizable monitoring platform that enables data science teams to build their own monitoring solution for ML models in production. Writing code from the age of seven, Alon is passionate about programming, machine learning, biology, video games, and software architecture.

Abstract:

Building your ML Platform on top of Kubernetes rocks. In this session we will discuss the core principles of Kubernetes and cloud native applications, as well as Infrastructure as Code and why MLOps benefits from it so much. We'll also demonstrate a reference architecture to help you get started.

What You'll Learn:

How to get started with building an ML Platform on top of K8s

Talk: Why Cloud Native MLOps Rocks

You have Successfully Subscribed!

David Hershey

Senior Solutions Architect, Tecton

David Hershey is a Solutions Architect at Tecton, where he helps customers implement feature stores as part of their stack for Operational ML. Prior to Tecton, David was a Solutions Engineer at Determined AI and a Product Manager for Ford’s ML platform. David holds an MS in Computer Science from Stanford University and BS in Aerospace Engineering from the University of Michigan.

Abstract:

Tecton integrates with Snowflake and enables data teams to process ML features and serve them in production quickly and reliably, without building custom data pipelines.
In this workshop, David shows how to build an end-to-end movie recommendation system using a feature platform in three stages:
- Batch, daily computed, recommendations
- Online recommendations using batch features
- Online recommendations using real-time features

What You'll Learn:

You will learn how to:
- Build new features using Tecton’s declarative framework
- Automate the transformation of batch data directly on Snowflake
- Automate the transformation of real-time data using Snowpark
- Create training datasets from data stored in Snowflake
- Serve data online using DynamoDB or Redis

Workshop: Building a Movie Recommendation System on Tecton with Snowflake

You have Successfully Subscribed!

Victor Sonck

Evangelist, ClearML

Victor is an enthusiastic machine learning engineer and tinkerer. Having worked on dozens of machine learning projects in the world of consultancy, he’s now more focused on making inspiring, entertaining and questioning content. Making youtube videos about silly machine learning projects or writing more serious blogposts on how to deploy them. He loves public speaking and is want to share the magic of machine learning with everyone willing to listen.

Abstract:

Data scientists are usually not trained to go further than their analyses, however in order to get to a more mature AI infrastructure that can support more models in production, additional steps will have to be taken. Experiment management and Data versioning are very important first steps in the direction of the "MLops" way of working. Done properly, they can serve as a foundation to build more advanced systems on top, such as pipelines, remote workers and advanced automation. When a data scientist can include this way of working into their day-to-day, they have a very powerful tool in hand to raise the success rate of their models and analyses.

What You'll Learn:

A set of tools, tips, tricks and example workflows you can use in your own life to help alleviate common data science challenges.

Workshop: The Importance of Experiment Tracking and Data Traceability

You have Successfully Subscribed!

Yaron Haviv

Co-Founder & CTO, Iguazio

Yaron Haviv is a serial entrepreneur who has been applying his deep technological experience in data, cloud, AI and networking to leading startups and enterprise companies since the late 1990s. As the co-founder and CTO of Iguazio, Yaron drives the strategy for the company’s MLOps platform and led the shift towards the production-first approach to data science and catering to real-time AI use cases. He also initiated and built Nuclio, a leading open source serverless platform with over 4,000 Github stars and MLRun, Iguazio’s open source MLOps orchestration framework.
Prior to co-founding Iguazio in 2014, Yaron was the Vice President of Datacenter Solutions at Mellanox (now NVIDIA), where he led technology innovation, software development and solution integrations. He was also the CTO and Vice President of R&D at Voltaire, a high-performance computing, IO and networking company which floated on the NYSE in 2007. Yaron is an active contributor to the CNCF Working Group and was one of the foundation’s first members. He presents at major industry events and writes tech content for leading publications including TheNewStack, Hackernoon, DZone, Towards Data Science and more.

Abstract:

Most data science teams start their AI journey from what they perceive to be the logical beginning: building AI models using manually extracted datasets. Operationalizing machine learning, in the sense of considering all the requirements of the business; handling online and federated data sources, scale, performance, security, continuous operations, etc. comes as an afterthought, making it hard and resource-intensive to create real business value with AI.
Today, forward-thinking enterprises are taking a new, production-first approach to MLOps. This means designing a continuous operational pipeline, and then making sure the various components and practices map into it. Automating as many components as possible, constantly measuring business metrics and making the process repeatable, so that it generates measurable ROI for the business.

What You'll Learn:

In this session, we will describe the challenges in operationalizing machine & deep learning. We’ll explain the production-first approach to MLOps pipelines - using a modular strategy, where the different components provide a continuous, automated, and far simpler way to move from research and development to scalable production pipelines. Without the need to refactor code, add glue logic, and spend significant efforts on data and ML engineering.
We will cover various real-world implementations and examples, and discuss the different stages, including automating feature creation using a feature store, building CI/CD automation for models and apps, deploying real-time application pipelines, observing the model and application results, creating a feedback loop and re-training with fresh data.

Talk: From AutoML to AutoMLOps

You have Successfully Subscribed!

Simba Khadder

Founder & CEO, FeatureForm

Simba Khadder is the Founder & CEO of Featureform, the virtual feature store company. After leaving Google, Simba founded his first company, TritonML. His startup grew quickly and Simba and his team built ML infrastructure that handled over 100M monthly active users. He instilled his learnings into Featureform's virtual feature store. Featureform turns your existing infrastructure into a Feature Store. He’s also an avid surfer, a mixed martial artist, a published astrophysicist, and he ran the SF marathon in basketball shoes.

Abstract:

We will perform feature engineering on a fraud detection dataset locally using Featureform. We will then deploy to a Postgres and Redis environment to serve the model in production.

What You'll Learn:

This workshop will walk you through how a feature store fits into the data science workflow. From local experimentation, to production deployment.
We will explore how Featureform's virtual feature store approach allows it to work as a framework and abstraction over your existing infrastructure, rather than a heavyweight replacement.
In the workshop you will:
* Do feature engineering locally, and train a local model
* Deploy our features into production and train a production model with Featureform on Postgres
* Serve features to a deployed model in production with Featureform on Redis

Workshop: From AutoML to AutoMLOps

You have Successfully Subscribed!

David Hershey

Senior Solutions Architect, Tecton

David Hershey is a Solutions Architect at Tecton, where he helps customers implement feature stores as part of their stack for Operational ML. Prior to Tecton, David was a Solutions Engineer at Determined AI and a Product Manager for Ford’s ML platform. David holds an MS in Computer Science from Stanford University and BS in Aerospace Engineering from the University of Michigan.

Abstract:

Tecton integrates with Snowflake and enables data teams to process ML features and serve them in production quickly and reliably, without building custom data pipelines.
In this workshop, David shows how to build an end-to-end movie recommendation system using a feature platform in three stages:
- Batch, daily computed, recommendations
- Online recommendations using batch features
- Online recommendations using real-time features

What You'll Learn:

You will learn how to:
- Build new features using Tecton’s declarative framework
- Automate the transformation of batch data directly on Snowflake
- Automate the transformation of real-time data using Snowpark
- Create training datasets from data stored in Snowflake
- Serve data online using DynamoDB or Redis

Workshop: Building a Movie Recommendation System on Tecton with Snowflake

You have Successfully Subscribed!