A series of informal conversations with thought leaders, researchers, practitioners, and writers on a wide range of topics in technology, science, and of course big data, data science, artificial intelligence, and related applications. Anchored by Ben Lorica (@BigData), the Data Exchange also features a roundup of the most important stories from the worlds of data, machine learning and AI. Detailed show notes for each episode can be found on https://thedataexchange.media/ The Data Exchange podcast is a production of Gradient Flow [https://gradientflow.com/].
The AI Infrastructure Revolution: From Cloud Computing to Data Center Design
Bryan Cantrill, CTO and Co-founder of Oxide Cloud Computer, leads a startup delivering integrated hardware and software solutions for enterprises seeking cloud computing systems with hyperscaler agility. Oxide specializes in vertically integrated, scale-ready cloud infrastructure tailored for mainstream business needs.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
2/22/2024 • 42 minutes, 44 seconds
AI in Depth: Transforming Transportation, Enterprise, and Policy
Evangelos Simoudis is a seasoned venture investor and a senior advisor to global corporations, and Managing Director at Synapse Partners, a company that invests in startups developing enterprise applications that exploit Big Data and AI.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
2/15/2024 • 41 minutes, 12 seconds
Software Meets Hardware: Enabling AMD for Large Language Models
Sharon Zhou and Greg Diamos are co-founders of Lamini, a startup at the forefront of enabling enterprise adoption of large language models (LLMs). We discussed Lamini’s work with AMD, which focused on closing the gap between AMD hardware capabilities and software integration in LLM applications.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
2/8/2024 • 38 minutes, 37 seconds
Incentives are Superpowers: Mastering Motivation in the AI Era
Uri Gneezy is Professor of Economics and Strategy at UC San Diego, and author of our 2023 Book of the Year, “Mixed Signals: How Incentives Really Work”.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
2/1/2024 • 31 minutes, 9 seconds
Synthetic Futures: The Convergence of Biology and AI
Dmitriy Ryaboy is the VP of AI Enablement at Ginkgo Bioworks, a startup that uses machine learning and AI to develop a wide range of applications. The conversation focuses on the intersection of AI, machine learning, and biology, particularly in the field of synthetic biology.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
1/25/2024 • 32 minutes, 22 seconds
AI Co-Pilots in Action: Transforming Function Calling in Cybersecurity
Jian Zhang is co-founder, CTO, VP Engineering at Nexusflow AI a startup that uses Generative AI to build tools for Cybersecurity. This conversation revolves around the integration of various AI components, with a specific focus on cybersecurity and function calling copilots.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
1/18/2024 • 45 minutes
Leveling Up: Tools and Techniques to Make AI Development More Accessible
Sarmad Qadri, founder and CEO of LastMile, a startup building an AI developer platform for engineering teams. This conversation delves into key artificial intelligence and machine learning themes, focusing on injecting software engineering rigor into the development of LLM and GenAI applications.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
1/11/2024 • 45 minutes, 16 seconds
LLMs on CPUs, Period
Nir Shavit, Professor at MIT’s Computer Science and Artificial Intelligence Laboratory, is also a Founder of Neural Magic, a startup working to accelerate open-source large language models and simplify AI deployments.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
1/4/2024 • 33 minutes, 13 seconds
Democratizing Wealth Management With AI
Chirag Yagnik is a co-founder of Arta , a company that harnesses innovations in artificial intelligence and software to develop wealth management solutions. Arta aims to democratize access to sophisticated investment tools typically only available to ultra-high net worth individuals through family offices.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
12/28/2023 • 47 minutes, 27 seconds
Knowledge Graphs: Contextualizing Enterprise Data for More Accurate LLMs
Juan Sequeda (Principal Scientist & Head of AI Lab) and Dean Allemang (Principal Solutions Architect) are knowledge graph experts at data.world, a startup that offers a data catalog powered by a knowledge graph to help organizations better understand and gain value from their data.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
12/21/2023 • 41 minutes, 36 seconds
TimeGPT: Machine Learning for Time Series, Made Accessible
Max Mergenthaler (CEO) and Azul Garza Ramirez (CTO) are co-founders of Nixtla, a startup that seeks to make cutting-edge predictive insights widely accessible. In this episode we discuss TimeGPT, Nixtla’s new frontier model for time series forecasting.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
12/14/2023 • 44 minutes, 7 seconds
Best Practices for Building LLM-Backed Applications
Waleed Kadous, Chief Scientist at Anyscale, is one of my go-to experts for best practices on building applications leveraging large language models.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
12/7/2023 • 53 minutes, 50 seconds
The Evolution of Crypto, Blockchain, and Web3
Kieren James-Lubin, CEO of BlockApps and the Co-Chair Technical Steering Community for the Enterprise Ethereum Alliance. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
11/30/2023 • 49 minutes, 15 seconds
Open Source Data and AI: Past, Present, Future
Earlier this year, I had a conversation with Sam Ramji, Chief Strategy Officer at DataStax and host of the Open||Source||Data podcast, where we talked about the evolution of big data and AI technologies. I’m airing our original conversation in its entirety on this holiday weekend in the U.S. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
11/23/2023 • 43 minutes, 7 seconds
Orchestration for LLM and RAG applications
Malte Pietsch is co-founder & CTO of Deepset, the company behind the popular open source project Haystack, an orchestration framework for LLMs.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
11/16/2023 • 49 minutes, 58 seconds
Reflections from the First AI Conference in San Francisco
In this episode, Paco Nathan and I dive into insights from the inaugural AI Conference in San Francisco (video of talks can be found here). Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
11/9/2023 • 49 minutes, 28 seconds
Kùzu: A simple, extremely fast, and embeddable graph database
Semih Salihoglu is an Associate Professor at University of Waterloo, and co-creator of Kuzu an open source embeddable property graph database management system.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
11/2/2023 • 51 minutes, 9 seconds
Navigating the Nuances of Retrieval Augmented Generation
Philipp Moritz (Co-founder and CTO) and Goku Mohandas (ML and Product Lead) of Anyscale do a deep dive into retrieval augmented generation (RAG) and large language models (LLMs). Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
10/26/2023 • 42 minutes, 40 seconds
The Rise of Generative AI-Powered Social Media Manipulation
Bill Marcellino is a senior behavioral scientist at the RAND Corporation, and Nathan Beauchamp-Mustafaga, policy researcher at the RAND Corporation. They are the principal researchers behind the new report “The Rise of Generative AI and the Coming Era of Social Media Manipulation 3.0”. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
10/19/2023 • 40 minutes, 3 seconds
Versioning and MLOps for Generative AI
Yucheng Low, Cofounder & CEO of XetHub, discusses the challenges of managing large-scale machine learning assets and the need for version control.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
10/12/2023 • 38 minutes, 35 seconds
Navigating the Generative AI Landscape
Christopher Nguyen is CEO and Co-founder of Aitomatic, a startup that builds virtual advisors tailored with domain-specific expertise, primarily catering to industrial AI applications. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
10/5/2023 • 40 minutes, 31 seconds
Trends in Data Management: From Source to BI and Generative AI
Sudhir Hasbe, Chief Product Officer at Neo4j, and a longtime technical and product leader in the data management space.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
9/28/2023 • 48 minutes, 4 seconds
AI and the Future of Speech Technologies
Yishay Carmiel is the CEO of Meaning, a startup at the forefront of building real-time speech applications for enterprises. We discuss the state of AI for speech and audio, including trends in Generative AI, automatic speech recognition, diarization, restoration, voice cloning, speech synthesis and more.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
9/21/2023 • 36 minutes, 52 seconds
The Future of Cybersecurity: Generative AI and its Implications
Casey Ellis is Founder/Chair/CTO of Bugcrowd, a Crowdsourced Cybersecurity Platform. Bugcrowd recently released “Inside the Mind of a Hacker 2023”, an interesting report that provides insights into the motivations, challenges, and specializations of hackers, as well as security implications of AI.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
9/14/2023 • 49 minutes, 7 seconds
Ivy: The One-Stop Interface for AI Model Deployment and Development
Daniel Lenton is the CEO of Ivy, a suite of tools designed to accelerate AI Model Development and Model Deployment. Ivy serves as a glue that connects various frameworks and compiler infrastructures, making them compatible. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
9/7/2023 • 38 minutes, 59 seconds
Navigating the Risk Landscape: A Deep Dive into Generative AI
Andrew Burt is the Managing Partner at Luminos.Law, the first law firm focused on helping teams manage the privacy, fairness, security, and transparency of their AI and data — including generative AI systems. We explore the state of risk and compliance in light of generative AI. This episode further explores the challenges and risks posed by AI, and the implications of the FTC probe into OpenAI, as well as the NIST AI Risk Management Framework.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
8/31/2023 • 42 minutes, 25 seconds
Software Development with AI and LLMs
Michele Catasta is VP of AI at Replit, an AI-powered software development platform that allows teams to build and deploy applications on any device, without any setup required.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
8/24/2023 • 49 minutes, 9 seconds
A Lightweight SDK for Integrating AI Models and Plugins
Alex Chao is a Product Manager at Microsoft focused on Semantic Kernel, an open-source AI and LLM orchestrator. Semantic Kernel (SK) is a lightweight SDK that makes it easy to integrate AI models and plugins into applications. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
8/17/2023 • 45 minutes, 55 seconds
Using LLMs to Build AI Co-pilots for Knowledge Workers
Steve Hsu wears many hats, but most recently he is co-founder of SuperFocus, a startup building LLM-backed knowledge co-pilots for enterprises.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
8/10/2023 • 48 minutes, 21 seconds
ETL for LLMs
Brian Raymond is the founder of Unstructured, a startup building open source data pre-processing and ingestion tools specifically for Large Language Models (LLMs). Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
8/3/2023 • 36 minutes, 10 seconds
The Future of Graph Databases
Emil Eifrem is co-founder and CEO of Neo4j, the leading graph database and graph data science software provider. We discussed a range of topics including: the current state of graph databases, graph data science and graph neural networks, vector databases, the interplay between LLMs, knowledge graphs, and graph databases.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
7/27/2023 • 1 hour, 1 minute, 24 seconds
Delivering Safe and Effective LLM and NLP Applications
David Talby is the CTO and Founder of John Snow Labs, the company behind two popular open source projects: Spark NLP and LangTest. In this episode we focus on LangTest, an open-source Python library designed to help developers deliver safe and effective Natural Language Processing (NLP) models. [Note: After we recorded this episode, NLTest was renamed to LangTest.]Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
7/20/2023 • 37 minutes, 56 seconds
Using Data and AI to Democratize Entity Resolution and Master Data Management
Jeff Jonas is Founder and CEO of Senzing, a startup focused on democratizing entity resolution – making this deceptively complicated task easy for programmers to use and deploy.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
7/13/2023 • 50 minutes, 39 seconds
An Open Source Data Framework for LLMs
Jerry Liu is CEO and co-founder of LlamaIndex, an open source project and startup that builds tools that enable teams to augment LLMs with their own private data. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
7/6/2023 • 49 minutes, 24 seconds
Redefining AI Infrastructure: Deploying and Developing with a Next-Generation Developer Platform
Tim Davis is the Co-Founder & Chief Product Officer of Modular, a startup focused on building tools to help simplify AI infrastructure.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
6/29/2023 • 50 minutes, 19 seconds
The Rise of Custom Foundation Models
Andrew Feldman is CEO and co-founder of Cerebras, a startup that has released the fastest AI accelerator, based on the largest processor. We discussed Cerebras-GPT, a family of language models that have set new benchmarks for accuracy and compute efficiency, with sizes ranging from 111 million to 13 billion parameters.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
6/22/2023 • 39 minutes, 20 seconds
The Future of Vector Databases and the Rise of Instant Updates
Louis Brandy is VP of Engineering at Rockset, the real-time search and analytics database startup formed by the creators of the popular open source project, RocksDB. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
6/15/2023 • 48 minutes, 17 seconds
LLMs Are the Key to Unlocking the Next Generation of Search
Amin Ahmad, the co-founder of Vectara, has played a crucial role in developing a powerful API platform specifically tailored for developers.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
6/8/2023 • 44 minutes, 58 seconds
Building and Deploying Foundation Models for Enterprises
Jonas Andrulis is the Founder & CEO Aleph Alpha, a startup that provides enterprise software solutions backed with their own large language models and multimodal modelsSubscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
6/1/2023 • 34 minutes, 9 seconds
Building Robust AI Infrastructure for Critical Solutions
Alex Remedios, founder of Treebeardtech, leads a London-based consulting firm dedicated to assisting machine learning teams in constructing dependable, secure, and adaptable cloud infrastructures crucial for delivering business-critical artificial intelligence solutions. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
5/25/2023 • 31 minutes, 40 seconds
Machine Learning for High-Risk Applications
Patrick Hall, is co-founder of BNH and a visiting faculty member of decision sciences at the George Washington University School of Business. Agus Sudjianto, EVP, Head of Corporate Model Risk at Wells Fargo. We explore several topics covered in the new book Machine Learning for High-Risk Applications, co-authored by Patrick and with a foreword by Agus.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
5/18/2023 • 46 minutes, 25 seconds
Boosting Perception With Synthetic Data
Omar Maher is Director of Product Marketing at Parallel Domain, a startup that is advancing machine perception capabilities by harnessing the power of synthetic data. We delve into the growing adoption of synthetic data and the factors driving its use. We discuss major developments in synthetic data generation and its overlap with Generative AI. The conversation also covers data privacy, intellectual property, the generation of structured data like LiDAR, the current state of adoption, and key research directions to overcome existing challenges.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
5/11/2023 • 35 minutes, 34 seconds
Revolutionizing B2B: Unleashing the Power of AI and Data
Simon Chan is the General Partner at Firsthand Alliance, a venture capital fund focused on the future of B2B and enterprise software. We explore the evolution of AI, cloud computing, and business collaboration tools, revealing how a new generation of generative AI technologies is enabling applications to generate content and drive transformative innovation across various industries.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
5/4/2023 • 43 minutes, 8 seconds
AI Metadata
Gev Sogomonian is co-author of AimStack, an open-source, self-hosted AI metadata tracker that logs all your AI metadata, such as experiments and prompts, and provides a user-friendly UI for comparing and observing them. It also offers an SDK for programmatically querying tracked metadata.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
4/27/2023 • 31 minutes, 30 seconds
The 2023 AI Index
Raymond Perrault is a Distinguished Computer Scientist at SRI International, and Co-Director of the Steering Committee for the AI Index, an annual report that tracks, collates, distills, and visualizes data relating to AI, to help inform decision-makers and teams to take meaningful action for responsible and ethical AI. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
4/20/2023 • 43 minutes, 38 seconds
Custom Foundation Models
Hagay Lupesko, is VP Engineering at MosaicML, a startup that enables teams to easily train large AI models on their data and in their own secure environment. We discuss the the evolution of cloud based machine learning (from “traditional” ML through LLMs), his experience building machine learning applications at leading technology companies, and the need for companies to build their own custom foundation models.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
4/13/2023 • 38 minutes, 6 seconds
Uncovering and Highlighting AI Trends
Jakub Zavrel is the Founder and CEO at Zeta Alpha, a premier Neural Discovery Platform that utilizes cutting-edge Neural Search technology to enhance the way you and your team uncover, arrange, and disseminate knowledge. Our conversation focuses on the latest developments in artificial intelligence, taking inspiration from their recent viral article featuring the top the 100 most cited AI papers of 2022.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
4/6/2023 • 49 minutes, 2 seconds
How Data and AI Happened
Chris Wiggins is a Professor at Columbia University and the Chief Data Scientist at the NYTimes. He is also co-author of a fascinating new historical exploration of how data has been used as a tool in shaping society, from the census to eugenics to Google search. How Data Happened traces the trajectory of data and explores new mathematical and computational techniques that serve to shape people, ideas, society, and economies.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
3/30/2023 • 48 minutes, 48 seconds
Blazing fast bulk data transfers between any cloud
Paras Jain and Sarah Wooders are graduate students at UC Berkeley’s Sky Computing Lab. They are part of the team behind Skyplane, and open source project that accelerates wide-area transfers in the cloud via overlay routing and parallelism. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
3/23/2023 • 31 minutes, 29 seconds
Exhaustion of High-Quality Data Could Slow Down AI Progress in Coming Decades
Pablo Villalobos is a Staff Researcher at Epoch, and lead author of the recent paper “Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning”. We discuss the key findings in this paper, as well as a related study Pablo conducted on scaling laws. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
3/16/2023 • 33 minutes, 9 seconds
Generating high-fidelity and privacy-preserving synthetic data
Jinsung Yoon (Senior Research Scientist) and Sercan Arik (Staff Research Scientist and Manager) are part of the Google team behind EHR-Safe, a set of tools for generating highly realistic and privacy-preserving synthetic Electronic Health Records.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
3/9/2023 • 35 minutes, 47 seconds
How technology is disrupting the venture capital industry
Brandon Jenkins, Co-founder and COO of Fundrise, the largest direct-to-individuals alternative investment platform in the country. Our conversation centered on their recent foray into technology investing, specifically startup companies in the data infrastructure space. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
3/2/2023 • 36 minutes, 19 seconds
Running Machine Learning Workloads On Any Cloud
Zongheng Yang, is a researcher in the Sky Computing Lab at UC Berkeley, a multi-year research initiative that utilizes distributed systems, programming languages, security and machine learning to separate the services that a company requires from the choice of a specific cloud. He provides a detailed overview and update on SkyPilot, a groundbreaking intercloud broker that views the cloud ecosystem as a unified and integrated entity rather than a collection of disparate, largely incompatible clouds. SkyPilot enables users to run Machine Learning and Data Science batch jobs on any cloud, realize substantial cost savings, access the best hardware across clouds, and enjoy higher resource availability.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
2/23/2023 • 37 minutes, 11 seconds
2023 Trends in Data Engineering and Infrastructure
Jesse Anderson, Evan Chan, and I delve into the current developments and possibilities within the realm of data engineering and platforms. As the foundation for artificial intelligence and machine learning, data plays a crucial role in the advancement of these technologies. Download a copy of the FREE Report: https://gradientflow.com/2023trendsreport/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
2/16/2023 • 45 minutes, 47 seconds
Preparing for the Implementation of the EU AI Act and Other AI Regulations
This week we discuss AI regulations with Gabriela Zanfir-Fortuna is VP for Global Privacy at the Future of Privacy Forum, and Andrew Burt, Managing Partner at BNH, the first law firm focused on AI and Analytics.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
2/9/2023 • 36 minutes, 38 seconds
The Open Source Stack Unleashing a Game-Changing AI Hardware Shift
Dylan Patel is the Chief Analyst at SemiAnalysis, a boutique semiconductor research and consulting firm focused on the semiconductor supply chain from chemical inputs to fabs to design IP and strategy. In this episode, we discuss the emerging open source software stack for PyTorch that makes it easier and more accessible to implement non-Nvidia backends (see his recent post).Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
2/2/2023 • 41 minutes, 55 seconds
Data Science and AI in Context
Peter Norvig (of Google and Stanford) and Alfred Spector (of MIT) are part of the team of authors behind the must-read book Data Science in Context: Foundations, Challenges, Opportunities. We discussed their recent book and tool a deep dive into their Data Science Analysis Rubric, and we also talked about a trending topics in AI including looming regulations, synthetic data, and Large Language and Foundation Models.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
1/26/2023 • 47 minutes, 40 seconds
Evaluating Language Models
Percy Liang is Associate Professor of Computer Science and Statistics, and Director of the new Center for Research on Foundation Models at Stanford University. We discussed a new suit of tools (HELM) designed to help users and researchers understand language models in their totality. We also discuss recent trends in AI including the rise of Generative AI and Foundation Models.Download a copy of our FREE 2023 Trends in Data and AI Report: https://gradientflow.com/2023trendsreport/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
1/19/2023 • 45 minutes, 36 seconds
2023 Opportunities and Trends: Data, Machine Learning, and AI
Jenn Webb, special correspondent and managing editor at Gradient Flow, recently organized a mini-panel to discuss themes and trends for 2023. The panel consisted of myself and Mikio Braun. More information on these trends can be found in our Annual Trends Report, which is available for free download (see details below). Download a copy of the FREE Report: https://gradientflow.com/2023trendsreport/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
1/12/2023 • 1 hour, 5 minutes, 31 seconds
Exploring DALL·E 2
Given the growing interest in Generative AI, we revisit a conversation with Mark Chen, Research Scientist at OpenAI and part of the team behind DALL·E 2, a new AI system that can create realistic images and art based on natural language descriptions. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
1/5/2023 • 37 minutes, 40 seconds
Data Science at Shopify and Stitch Fix
On this special end of the year episode, we revisit conversations with two data science leaders in the e-commerce space:Wendy Foster, Director, Engineering & Data Science at Shopify.Olivia Liao, Senior Director of Data Science at Stitch Fix.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes can be found on The Data Exchange web site.
12/29/2022 • 37 minutes, 25 seconds
Building a data management system for unstructured data
Shayan Mohanty is the CEO of Watchful, a modern and interactive solution that places the control of data labeling back in the hands of data scientists, machine learning practitioners, and subject matter experts. This podcast focuses on a data management system (written in Rust) they built to support the level of automation and interactivity required to support Watchful.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
12/22/2022 • 36 minutes, 32 seconds
A Cloud Native Vector Database Management System
Frank Liu is Director of Operations & ML Architect at Zilliz, the company behind Milvus, an open source vector database. We discuss their recent VLDB paper (“A Cloud Native Vector Database Management System”) that describes recent updates to Milvus, as well as vector databases and vector search in general.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
12/15/2022 • 48 minutes, 50 seconds
What’s Next for Machine Learning in Time Series
Ira Cohen is co-founder, Chief Data Scientist at Anodot, a startup that uses time series tools to monitor business data in real time, so organizations can proactively resolve revenue, cost, and customer experience issues before they impact business performance. We recently wrote a well-received post that provided a detailed overview on the state of technologies for collecting, storing, and unlocking time series. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
12/8/2022 • 38 minutes, 8 seconds
Efficient Methods for Natural Language Processing
Roy Schwartz is Professor of Natural Language Processing at The Hebrew University of Jerusalem. We discussed a recent survey paper that Roy co-wrote that presented a broad overview of existing methods to improve NLP efficiency through the lens of traditional NLP pipelines. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
12/1/2022 • 45 minutes, 40 seconds
Responsible and Trustworthy AI
On this Thanksgiving holiday weekend in the U.S., we revisit a Twitter Spaces conversation I had withAndrew Burt, Managing Partner at BNH1, the first law firm focused on AI risks.Bob Friday, Chief AI Officer at Juniper Networks.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
11/23/2022 • 30 minutes, 1 second
Building a premier industrial AI research and product group
Hung Bui is the CEO of VinAI, a premier Artificial Intelligence research-based company developing world-class products and services. Hung assembled the VinAI team just over three years ago and they are now among the Top 20 Global Companies in AI Research in 2022. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
11/17/2022 • 37 minutes, 50 seconds
An open source, production grade vector search engine
Bob van Luijt, is CEO of SeMI Technologies, the company behind the popular vector search engine Weaviate. Bob describes their key features and core components, popular use cases, and he also provides an overview of Weaviate’s near-term roadmap. We also discuss how vector search engines compare with existing data management systems.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
11/10/2022 • 35 minutes, 14 seconds
A comprehensive suite of open source tools for time series modeling
Federico Garza and Max Mergenthaler Canseco are both CTOs and co-founders of Nixtla, a startup building developer-friendly software that helps data scientists deploy predictive pipelines.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Stitcher • Google • AntennaPod • Podcast Addict • RSS.Detailed show notes can be found on The Data Exchange web site.
11/3/2022 • 35 minutes, 11 seconds
Building Safe and Reliable AI applications
Christopher Nguyen is CEO and cofounder of Aitomatic, a startup that uses a knowledge-first approach to build and deploy machine learning solutions, with a focus on industrial applications (manufacturing and other physical settings).Join us at K1st World, a fantastic symposium and networking event slated for November 16 & 17. Use the discount code GRADIENTFLOW60 to attend in person or online.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
10/27/2022 • 30 minutes, 39 seconds
A new storage engine for vectors
Ram Sriharsha is VP of Engineering and R&D at Pinecone, a startup that offers a fully managed vector database (not just an index). We discuss Pinecone’s new proprietary storage engine, which was first described around the time we recorded this conversation.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
Karthik Ramasamy, is the Head of Streaming at Databricks. He has extensive experience in streaming, having led teams at Twitter (Apache Heron), Splunk, and Streamlio (Apache Pulsar).Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
10/13/2022 • 41 minutes, 43 seconds
The Unreasonable Effectiveness of Speech Data
Piotr Żelasko is Head of Research at Meaning, a startup building an AI platform using speech technologies. He has years of experience in speech technologies, both as a researcher and as a software engineer. We recorded this episode on the week of the release of Whisper, deep learning model (from OpenAI) that approaches human level robustness and accuracy on English speech recognition. Our conversation centered on Whisper and speech recognition, but also touched on the new speech data processing tools (Lhotse, k2, Icefall) that we described in our recent post.Download a FREE copy of our recent 2022 Trends Report (Data, Machine Learning, AI): https://gradientflow.com/2022trendsreport/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
10/6/2022 • 35 minutes
Machine Learning Integrity
Yaron Singer is the CEO of Robust Intelligence, a company building tools to help manage and mitigate risks associated with machine learning models and applications. Download a FREE copy of our recent 2022 Trends Report (Data, Machine Learning, AI): https://gradientflow.com/2022trendsreport/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
9/29/2022 • 44 minutes, 33 seconds
Synthetic data technologies can enable more capable and ethical AI
Yashar Behzadi is the CEO & Founder of Synthesis AI, a startup that uses synthetic data technologies to enable teams building AI applications, as well as gaming and metaverse applications.Download a FREE copy of our recent 2022 Trends Report (Data, Machine Learning, AI): https://gradientflow.com/2022trendsreport/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
9/22/2022 • 39 minutes, 4 seconds
Confidential Computing for Machine Learning
Sadegh Riazi is CEO and co-founder of CipherMode Labs, a startup building tools that enable data and machine learning teams to build and deploy models directly on encrypted data. CipherMode’s new open source project enables teams to develop and deploy machine learning algorithms using familiar tools, and thus opens up the possibility of using sensitive data in different scenarios both within an organization, and in cooperation with other organizations.Download a FREE copy of our recent 2022 Trends Report (Data, Machine Learning, AI): https://gradientflow.com/2022trendsreport/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
9/15/2022 • 36 minutes, 20 seconds
Applied NLP Research at Primer
John Bohannon is a Senior Director of Data Science and Head of Research at Primer AI, an end-to-end machine intelligence solution for textual data. We discussed their process of translating ML research into ML products, through the lens of several use cases.Download a FREE copy of our recent NLP Industry Survey Results: https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
9/8/2022 • 41 minutes, 50 seconds
Using SQL to Retrieve Data from APIs and Web Services
Jon Udell is community lead for Steampipe, an open-source tool that populates a database table with data retrieved from APIs. They use Postgres, which means that data is easy to explore and retrieve using SQL. Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
9/1/2022 • 31 minutes, 9 seconds
Machine Learning for Time Series Intelligence
Aadyot Bhatnagar, is a Senior Research Engineer at Salesforce, and co-creator of Merlion an open source framework for applying machine learning on time series data. Merlion supports a wide range of time series learning tasks including forecasting, anomaly detection, and change point detection. Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
8/25/2022 • 40 minutes, 12 seconds
Unleashing the power of large language models
Maarten Grootendorst, is a data scientist at IKNL, and more importantly, he’s the author of two open source libraries that I’ve come to love: BERTopic (topic modeling with transformers and c-TF-IDF) and PolyFuzz (fuzzy string matching). Both these projects bring the power of transformers and other leading edge models, and package them with simple APIs, clear documentation, and visualization tools.Download a FREE copy of our recent NLP Industry Survey Results: https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
8/18/2022 • 38 minutes, 51 seconds
Building production-ready machine learning pipelines
Hamza Tahir and Adam Probst are co-creators of ZenML, an extensible open source framework for building reproducible pipelines. We discuss the current state of ZenML, the many use cases that ZenML has been designed for, and its near-term roadmap. Download the FREE Report: State of Workflow Orchestration → https://gradientflow.com/2022-workflow-orchestration-survey/?utm_source=gradientflow&utm_medium=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site
8/11/2022 • 49 minutes, 9 seconds
Machine Learning at Gong
Dr. Omri Allouche is Head of Research at Gong, a company that uses advances in NLP and speech models to identify and highlight risks and opportunities during customer interactions. Download a FREE copy of our recent NLP Industry Survey Results: https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
8/4/2022 • 36 minutes, 43 seconds
Data Infrastructure for Computer Vision
Danny Bickson and Amir Alush are the creators of fastdup, a very impressive free tool for surfacing duplicates, anomalies, and leakage in visual data. In line with its name, it’s fast: fastdup is written in C++ and can handle millions of images easily. Download a FREE copy of our recent NLP Industry Survey Results: https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
7/28/2022 • 36 minutes
How DALL·E works
Mark Chen is a Research Scientist at OpenAI and part of the team behind DALL·E 2, a new AI system that can create realistic images and art based on natural language descriptions. Download a FREE copy of our recent NLP Industry Survey Results: https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
7/21/2022 • 37 minutes, 37 seconds
Scalable, end-to-end machine learning, for everyone
Jules Damji is lead developer advocate, and Richard Liaw is an engineering manager at Anyscale, the startup founded by the creators of Ray, the open source project that makes it simple to scale any compute-intensive Python workload. To learn more about Ray and how to scale machine learning applications, attend the Ray Summit (San Francisco / Aug 23-24) https://www.anyscale.com/ray-summit-2022?utm_source=gradientflow&utm_medium=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site
7/14/2022 • 46 minutes, 42 seconds
Orchestration and Pipelines for Data Scientists
Rick Lamers is co-Founder and CEO at Orchest, the startup behind an open source project that enables data scientists to create, manage, and execute complex end-to-end data pipelines. Download the FREE Report: State of Workflow Orchestration → https://gradientflow.com/2022-workflow-orchestration-survey/?utm_source=gradientflow&utm_medium=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site
7/7/2022 • 44 minutes, 15 seconds
Dataframes at scale
Devin Petersohn is CTO and co-founder of Ponder, and the creator of Modin, a fast, scalable, drop-in replacement for the popular Pandas library. Download the FREE Report: State of Workflow Orchestration → https://gradientflow.com/2022-workflow-orchestration-survey/?utm_source=gradientflow&utm_medium=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site
6/30/2022 • 37 minutes, 15 seconds
Software-Defined Assets
Nick Schrock is founder and Elementl, the startup behind Dagster, a popular open source, data orchestration platform. We discussed recent trends in data engineering and infrastructure, and Dagster’s introduction of software-defined assets, a new approach to managing, maintaining, and orchestrating data declaratively.Download the FREE Report: State of Workflow Orchestration → https://gradientflow.com/2022-workflow-orchestration-survey/?utm_source=gradientflow&utm_medium=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site
6/23/2022 • 40 minutes, 49 seconds
Adversarial Machine Learning
Edmon Begoli, leads the AI Systems R&D section at Oak Ridge National Laboratory (ORNL), where he is also a distinguished member of the ORNL research staff. Our conversation centered on his upcoming presentation at the Data+AI Summit, where he will describe the four principal categories of Adversarial AI and their future implications.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
6/16/2022 • 46 minutes, 42 seconds
Orchestrating Machine Learning Applications
Haytham Abuelfutuh is co-founder and CTO of Union, a startup founded by the team behind Flyte, a popular open source project originated by Lyft. Flyte is a workflow automation platform used for many different applications, but especially as an orchestrator for machine learning applications.Download the FREE Report: State of Workflow Orchestration → https://www.prefect.io/lp/gradientflow?utm_source=gradientflow&utm_medium=newsletterSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
6/9/2022 • 47 minutes, 11 seconds
Narrative AI
This week’s guest is Hilary Mason, co-founder of Hidden Door, a startup that uses AI and machine learning to help create and power role-playing games (RPG). Download a FREE copy of our recent NLP Industry Survey Results: https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
6/2/2022 • 40 minutes, 38 seconds
Machine Learning Model Observability
Oren Razon is CEO and co-founder of Superwise, a startup that builds tools to streamline observability for machine learning models. This episode provides a comprehensive overview of tools and best practices for deploying, monitoring, and managing machine learning models in production.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
5/26/2022 • 39 minutes, 32 seconds
Dataflow Automation
Jeremiah Lowin is co-founder and CEO of Prefect, the company behind the popular open source data workflow orchestration system with the same name. We discussed the major design changes in Prefect 2.0, their move towards treating “code as workflows”, data engineering challenges facing data and ML teams today, and implications of looming trends in machine learning and AI.Download the FREE Report: State of Workflow Orchestration → https://www.prefect.io/lp/gradientflow?utm_source=gradientflow&utm_medium=newsletterSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
5/19/2022 • 46 minutes, 55 seconds
Practical Machine Learning and Deep learning
Sebastian Raschka is lead author of a new book from Packt entitled “Machine Learning with PyTorch and Scikit-Learn”. He is also an Assistant Professor of Statistics at the University of Wisconsin (Madison), and serves as the Lead AI Educator at Grid.ai. Download a FREE copy of our recent NLP Industry Survey Results: https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
5/12/2022 • 48 minutes, 27 seconds
Machine Learning for Optimization
This week’s guests are Ade Fajemisin (Postdoctoral Researcher) and Donato Maragno (PhD Student) of the University of Amsterdam. They were co-authors of a recent paper (“Optimization with Constraint Learning: A Framework and Survey”) that explores how machine learning can be used to learn constraints in optimization problems. Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
5/5/2022 • 26 minutes, 25 seconds
Efficient Scaling of Language Models
This week’s guests are Barret Zoph and Liam Fedus, research scientists at Google Brain. Our conversation centered around Large Language Models (LLM), specifically recent work by Barret, Liam, and their collaborators on efficient scaling of large language models.Download a FREE copy of our recent NLP Industry Survey Results: https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
4/28/2022 • 27 minutes, 6 seconds
Data Science at Stitch Fix
Olivia Liao is Senior Director of Data Science at Stitch Fix, a company that uses data science and expert stylists to deliver personalization at scale. We discuss how they blend data science and domain expertise, how they tune recommendations in light of logistics and supply chain constraints, and how they incorporate new developments in large language models, multimodal models and Responsible AI.Download a FREE copy of our recent NLP Industry Survey Results: https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
4/21/2022 • 30 minutes, 57 seconds
The 2022 AI Index
Jack Clark is co-director of the AI Index Steering Committee. In this episode we discuss key findings of the fifth edition of the AI Index. The report uses multiple metrics (benchmarks, publications, patents, legislation, etc.) to track progress in AI (mainly deep learning) in key areas that include computer vision, speech recognition, and language models. Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
4/14/2022 • 45 minutes, 13 seconds
Why You Need A Time-Series Database
This week’s guests are Ajay Kulkarni (CEO) and Mike Freedman (CTO), co-founders of Timescale, the startup behind the popular relational database for time-series and analytics. Mike is also a Professor of Computer Science at Princeton University. Our conversation took place a few weeks after Timescale raised a massive funding round and achieved unicorn status. Download the FREE Report: 2022 Data Engineering Survey Report → https://gradientflow.com/2022desurvey/?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
4/7/2022 • 45 minutes, 47 seconds
Data Science at Shopify
This week’s guest is Wendy Foster, Director of Engineering & Data Science at Shopify. We discussed applications of data science within Shopify, how they organize their data teams, the lifecycle of a data science project within the company, and how they approach emerging challenges like Responsible AI, large language models, and multimodal models.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
3/31/2022 • 35 minutes, 28 seconds
An AI Risk Management Framework
This week’s guests are Elham Tabassi of the National Institute of Standards and Technology (NIST) and Andrew Burt, Managing Partner of BNH.ai, the first law firm focused on AI compliance, risk mitigation, and related topics. We discuss the new NIST framework – “AI Risk Management Framework” – intended for voluntary use to manage risks in the design, development and use of AI products and systems. Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
3/24/2022 • 30 minutes, 55 seconds
An open source and end-to-end library for causal inference
This week’s guests are Amit Sharma (Principal Researcher) and Emre Kiciman (Senior Principal Researcher) of Microsoft Research. We talk about practical applications of causal inference, a set of tools and techniques that enable data teams to draw causal conclusions based on data. Amit and Emre are part of the team behind DoWhy, a new open source library for estimating causal effects based on historical data alone, particularly useful when we cannot run an experiment because of time, expense, or ethical concerns.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
3/17/2022 • 39 minutes, 57 seconds
The Graph Intelligence Stack
Leo Meyerovich is founder and CEO of Graphistry, a startup building tools to democratize visual graph intelligence and graph machine learning. Leo and I recently wrote a well-received post (“What Is Graph Intelligence?”) making the case for why companies need to revisit graph analytics and graph intelligence.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
3/10/2022 • 37 minutes, 21 seconds
NLP and Language Models in Healthcare and the Life Sciences
This week’s guests are Dia Trambitas-Miron (Head of Product) and David Talby (CTO) of John Snow Labs, the startup behind the popular open source project, Spark NLP. The company also has a suite of products including an NLP platform targeted specifically for the healthcare, pharmaceutical, and biotech sectors. Download a FREE copy of our recent NLP Industry Survey Results: https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
3/3/2022 • 37 minutes, 44 seconds
Delivering Continuous Intelligence at Scale
Simon Crosby is CTO of Swim.ai, a startup building tools (based on the Swim open source project) for next-generation data and AI applications. Swim is one of several projects (along with Ray and Akka) contributing to interest in the Actor Model for building large-scale machine learning and data applications and infrastructure. Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
2/24/2022 • 31 minutes, 23 seconds
Imperceptible NLP Attacks
Nicholas Boucher is a PhD at Cambridge University where his focus is on security including on topics like homomorphic encryption, voting systems, and adversarial machine learning. He is the lead author of a fascinating new paper – “Bad Characters: Imperceptible NLP Attacks” – which provides a taxonomy of attacks against text-based NLP models, that are based on Unicode and other encoding systems. Download a FREE copy of our recent NLP Industry Survey Results: https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
2/17/2022 • 44 minutes, 53 seconds
Evolving Data Science Training Programs
This week’s guest is Anjali Samani, Director of Data Science and Data Intelligence at SalesForce. We first met during the early days of Faculty, one of the leading data science and AI startups in Europe. Anjali helped design and lead the early Fellowship programs at Faculty (these are intensive bootcamps that turn STEM PhDs and turn them into industrial data scientists).Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
2/10/2022 • 33 minutes, 52 seconds
Building Machine Learning Infrastructure at Netflix and beyond
Savin Goyal is CTO and co-founder of Outerbounds, a startup building infrastructure to help teams streamline how they build machine learning applications. Prior to starting Outerbounds, Savin and team worked at Netflix, where they were instrumental in the creation and release of Metaflow, an open source Python framework that addresses some of the challenges data scientists face around scalability and version control.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
2/3/2022 • 35 minutes, 6 seconds
Democratizing NLP
Moshe Wasserblat is a Senior Principal Engineer at Intel, where he serves as a Research Manager focused on NLP and Deep Learning. Download a FREE copy of our recent NLP Industry Survey Results: https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
1/27/2022 • 43 minutes, 32 seconds
Machine Learning at Discord
Gaurav Chakravorty, is a Senior Manager at Discord, where he leads the team responsible for machine learning models in the area of search and notification. Prior to discord Gaurav was a manager at Google where he led the team responsible for personalized podcast recommendations.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.
1/20/2022 • 40 minutes, 14 seconds
Applications of Knowledge Graphs
This week's guest is Mike Tung, founder and CEO of Diffbot, a startup that crawls the web and offers one of the most comprehensive knowledge graphs accessible through a variety of simple interfaces. Detailed show notes can be found on The Data Exchange web site.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.
1/13/2022 • 39 minutes, 48 seconds
Key AI and Data Trends for 2022
In this episode of the Data Exchange, our special correspondent and managing editor Jenn Webb organized a mini-panel composed of myself and my podcast co-organizer Mikio Braun. This conversation took place as we were assembling our list of trends for 2022.Download the FREE Report: Trends in Data, Machine Learning, and AI → https://gradientflow.com/2022trendsreport?utm_source=DEpodcastSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.
1/6/2022 • 36 minutes, 49 seconds
Large Language Models
This episode features conversations with two experts who have helped train and release models that can recognize, predict, and generate human language on the basis of very large text-based data sets. First is an excerpt of my conversation with Connor Leahy, AI Researcher at Aleph Alpha GmbH, and founding member of EleutherAI, (pronounced “ee-luther”) a collective of researchers and engineers building resources and models for researchers who work on natural language models. Next up is an excerpt from a recent conversation with Yoav Shoham, co-founder of AI21 Labs, creators of the largest language model available to developers. Download a FREE copy of our recent NLP Industry Survey Results: https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/30/2021 • 41 minutes, 13 seconds
Data and Machine Learning Platforms at Shopify
Azeem Ahmed, is Director of Engineering at Shopify, where he leads the team that builds the primitives and the API’s used by all data scientists, machine learning engineers, and members of Shopify's engineering team. Our conversation focused on the evolution and design of data and machine learning platforms within Shopify. Azeem and I also discussed broader trends, including the rise of modern data platforms and the maturation of data lakehouses.Download a FREE copy of our recent NLP Industry Survey Results: https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/23/2021 • 43 minutes, 34 seconds
What is AI Engineering?
Christopher Nguyen is CEO and co-founder of Aitomatic, a startup building a platform for Industrial AI applications. Christopher previously held executive and leadership roles at organizations tasked with building machine learning solutions for traditional enterprises. Our conversation centered around what Christopher terms, AI Engineering – a new discipline concerned with the qualitative and quantitative design, construction, and operation of systems with artificial-intelligence capabilities.Download a FREE copy of our recent Data Engineering Survey Results: https://gradientflow.com/2022desurveySubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/16/2021 • 32 minutes, 48 seconds
NLP and AI in Financial Services
This week’s guest is Anshul Pandey, CTO and co-founder at Accern, a startup helping financial services companies build and deploy AI applications via a no-code platform. Our conversation focused on the specific challenges of building AI and NLP applications within financial services. Download a FREE copy of our recent NLP Industry Survey Results: https://gradientflow.com/2021nlpsurvey/Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/9/2021 • 46 minutes, 24 seconds
Modern Experimentation Platforms
Che Sharma is the founder and CEO of Eppo, an experimentation framework that integrates with modern data platforms (cloud lakehouses and cloud data warehouses). We discuss the importance of investing in experimentation tools and the power of having a well-oiled experimentation culture within an organization. Che also explains how modern data platforms enable applications like experimentation frameworks like Eppo.Download a FREE copy of our recent Data Engineering Survey Results: https://gradientflow.com/2022desurveySubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/2/2021 • 53 minutes, 55 seconds
Reinforcement Learning in Real-World Applications
Happy Thanksgiving to listeners who celebrate it! This episode features conversations with two experts who have been applying reinforcement learning to problems in industry. First is an excerpt of my conversation with Nicolas (Nic) Hohn, Chief Data Scientist, McKinsey/QuantumBlack Australia. Nic led a team of data scientists charged with helping America’s Cup winning team, Emirates Team New Zealand, test new designs for hydrofoils – important sailing boat components that could be modified based on rules set forth by race organizers. I also include an excerpt of a conversation with Max Pumperla, Data Science Professor at IU International University of Applied Sciences, who at the time of our conversation, was also the Head of Product Research at Pathmind, a SaaS that helps businesses use reinforcement learning in real-world applications.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
11/24/2021 • 37 minutes, 5 seconds
MLOps Anti-Patterns
This week’s guest is Nikhil Muralidhar, a Graduate Research Assistant at Virginia Tech College of Engineering. He is the lead author of an excellent survey paper entitled “Using AntiPatterns to avoid MLOps Mistakes”. Download a FREE copy of our recent Data Engineering Survey Results: https://gradientflow.com/2022desurveySubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
11/18/2021 • 37 minutes, 10 seconds
Why You Need a Modern Metadata Platform
Pardhu Gunnam (CEO) and Mars Lan (CTO), are co-founders of Metaphor Data, creators of the first Modern Metadata Platform. As we noted in a previous post, a metadata fabric is the right foundation for data governance and data discovery solutions, data catalogs, and other enterprise data services. This insight resulted in several metadata systems being created within technology companies a few years ago. In fact, the team at Metaphor created one of the more popular systems – DataHub – while they were at Linkedin.Video version has a detailed table of contents: https://www.youtube.com/watch?v=W8ZJHN77IegSubscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
11/11/2021 • 46 minutes, 5 seconds
Making Large Language Models Smarter
This week’s guest is Yoav Shoham, co-founder of AI21 Labs, creators of the largest language model available to developers. Yoav is also a Professor Emeritus of Computer Science at Stanford University, and a serial entrepreneur who has co-founded numerous data and AI startups. Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
11/4/2021 • 38 minutes, 42 seconds
AI Begins With Data Quality
This week’s guest is Jeremy Stanley, co-founder and CTO of Anomalo, a startup building SaaS tools to help companies with data quality. Prior to Anomalo, Jeremy was VP of Data Science at Instacart.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/28/2021 • 43 minutes, 23 seconds
Modernizing Data Integration
This week’s guest is Michel Tricot, co-founder and CEO of Airbyte, a startup behind the popular open source project with the same name. While still a relatively young open source project, Airbyte has emerged a favorite among data and platform engineers tasked with building and maintaining data integration systems within companies.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/21/2021 • 37 minutes, 44 seconds
Deploying Machine Learning Models Safely and Systematically
This week’s guest is Hamel Husain, Staff Machine Learning Engineer at GitHub and a core developer for fastai. Prior to GitHub, Hamel worked on machine learning applications and systems at Airbnb and DataRobot.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/14/2021 • 41 minutes, 32 seconds
Large-scale machine learning and AI on multi-modal data
This week’s guest is Bob Friday, VP and CTO at Mist Systems a Juniper Company. Bob is a serial entrepreneur and seasoned technologist, and at Mist his team uses data technologies, machine learning , and AI to “optimize user experiences and simplify operations across the wireless access, wired access, and SD-WAN domains”. Bob and his team build models from structured, semi-structured, and unstructured data. They have deployed anomaly detection models that rely on deep learning (LSTMs) and have begun exploring the use of graph neural networks for a variety of use cases. They have also built and deployed systems that use recent advances in natural language models. Their virtual assistant provides insight and guidance to IT staff via a natural language conversational interface.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/7/2021 • 32 minutes, 5 seconds
Machine Learning in Astronomy and Physics
This week’s guest is Dr. Viviana Acquaviva, Associate Professor in the Physics Department at the CUNY NYC College of Technology and at the CUNY Graduate Center. She is an Astrophysicist with a strong interest in Data Science and Machine Learning.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/30/2021 • 40 minutes, 48 seconds
The Unreasonable Effectiveness of Multiple Dispatch
This week I have my annual check-in on the state of Julia with Viral Shah, Co-founder and CEO of Julia Computing. Since we spoke last year, Julia continues to make inroads and grow its user base, and Julia Computing closed their $24M Series A round in July.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/23/2021 • 51 minutes, 18 seconds
How To Lead In Data Science
This week our special correspondent and editor Jenn Webb and I speak with Jike Chong and Cathy Chang, executives and seasoned leaders of data science teams. Our conversation is focused on their new book “How to Lead in Data Science”.Subscribe: Apple • Android • Spotify • Stitcher • Google • AntennaPod • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/16/2021 • 43 minutes, 54 seconds
Why interest in graph databases and graph analytics are growing
In this episode of the Data Exchange, our special correspondent and editor Jenn Webb organized a mini-panel composed of myself and Paco Nathan, author, teacher, and founder of Derwen.ai, a boutique consulting firm specializing in Data, machine learning, and AI. Of late, Paco has been doing a lot of work with graphs and as such he’s had to immerse himself in the world of graph data management technologies. This conversation is focused on what’s new with graph databases, and why there’s been a resurgence in interest in them. We also discuss use cases of graph databases, graph analytics, and graph neural networks. Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/9/2021 • 53 minutes, 18 seconds
The State of Data Journalism
This week our special correspondent and editor Jenn Webb speak with Tara Kelly, Data Editor at DataJournalism.com (DJC) an organization created by the European Journalism Centre. DJC provides journalists and media groups with free resources, materials, online video courses and community forums. Most recently they created two free e-books: The Verification Handbook and an updated edition of the Data Journalism Handbook.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/2/2021 • 53 minutes, 31 seconds
Auditing machine learning models for discrimination, bias, and other risks
This week’s guest are Rayid Ghani, Distinguished Career Professor in the Machine Learning Department and the Heinz College of Information Systems and Public Policy at Carnegie Mellon University, and Andrew Burt, co-founder and Managing Partner of BNH.ai, a new law firm focused on AI compliance, risk mitigation, and related topics. BNH is the first law firm run by lawyers and technologists focused on helping companies identify and mitigate risks associated with machine learning and AI.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
8/26/2021 • 52 minutes, 13 seconds
An oscilloscope for deep learning
This week’s guest is Charles Martin, independent researcher and founder of Calculation Consulting, a boutique consultancy focused on data science and machine learning. Along with Michael Mahoney and Serena Peng, Charles is co-author of a recent Nature paper on new methods for evaluating and tuning deep learning models (“Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data”).Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
8/19/2021 • 49 minutes, 57 seconds
What’s new in data engineering
This week our special correspondent and editor Jenn Webb organized a mini-panel composed of myself and Jesse Anderson, Managing Director at the Big Data Institute. Jesse is the author of a recent book entitled “Data Teams: A Unified Management Model for Successful Data-Focused Teams”. This conversation was focused on key areas in data engineering.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
8/12/2021 • 36 minutes, 45 seconds
The evolution of the data science role and of data science tools
This week our managing editor Jenn Webb and I speak with Sean Taylor, Data Science Manager at Lyft. Sean was previously a research scientist and manager at Facebook where he was instrumental in the creation and release of Prophet, a very popular open source library for time-series forecasting.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
8/5/2021 • 50 minutes, 15 seconds
Data Augmentation in Natural Language Processing
This week’s guests are Steven Feng, Graduate Student and Ed Hovy, Research Professor, both from the Language Technologies Institute of Carnegie Mellon University. We discussed their recent survey paper on Data Augmentation Approaches in NLP (GitHub), an active field of research on techniques for increasing the diversity of training examples without explicitly collecting new data. One key reason why such strategies are important is that augmented data can act as a regularizer to reduce overfitting when training models.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/29/2021 • 51 minutes, 44 seconds
Storage Technologies for a Multi-cloud World
This week’s guest is Brad King, CTO of Scality, a company that builds software-defined file and object storage systems for hybrid & multi-cloud settings. Storage and compute are the basic building blocks of (cloud) computing platforms and this episode highlights all the important considerations and recent innovations in storage technologies that data engineers, architects, and machine learning professionals need to know.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/22/2021 • 42 minutes, 43 seconds
Building a next-generation dataflow orchestration and automation system
In this episode, our managing editor Jenn Webb and I speak with Chris White, CTO of Prefect, a startup building tools to help companies build, monitor, and manage dataflows. Prefect originated from lessons Chris and his co-founder learned while they were at Capital One, where they were early users and contributors to related projects like Apache Airflow.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/15/2021 • 48 minutes, 36 seconds
Building a flexible, intuitive, and fast forecasting library
This week’s guests are Reza Hosseini, Staff Software Engineer, and Albert Chen, Staff Data Scientist, both at Linkedin. Reza and Albert are part of the team behind the new open source library Greykite, a flexible and fast library for time-series forecasting.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/8/2021 • 44 minutes, 14 seconds
Neural Models for Tabular Data
This week’s guest is Sercan Arik, Research Scientist at Google Cloud AI. Sercan and his collaborators recently published a paper on TabNet, a deep neural network architecture for tabular data. It uses sequential attention to select features, is explainable, and based on tests Sarjan and team have done spanning many domains, TabNet outperforms or is on par with other models (e.g., XGBoost) on classification and regression problems.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/1/2021 • 43 minutes, 55 seconds
Training and Sharing Large Language Models
This week’s guest is Connor Leahy, AI Researcher at Aleph Alpha GmbH, and founding member of EleutherAI, (pronnounced “ee-luther”) a collective of researchers and engineers building resources and models for researchers who work on natural language models. As NLP research becomes more computationally demanding and data intensive, there is a need for researchers to work together to develop tools and resources for the broader community. While relatively new, EleutherAI has already released a models and data that many researchers are benefitting from.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
6/24/2021 • 50 minutes, 53 seconds
Questioning the Efficacy of Neural Recommendation Systems
This week’s guests are leading researchers in recommendation systems: Paolo Cremonesi is Professor of Computer Science and Maurizio Ferrari Dacrema is a Postdoc at Politecnico di Milano, where they are both part of the RecSys research group. Paolo is also the Reproducibility co-chair for the upcoming RecSys Conference.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
6/17/2021 • 59 minutes, 24 seconds
Automation in Data Management and Data Labeling
This week’s guest is Hyun Kim, co-founder and CEO of Superb AI, a startup building tools to help companies manage data across the entire machine learning application lifecycle. This includes tools to label, store, and monitor data assets that power all computer vision applications. We also discussed emerging trends in machine learning and AI including synthetic data, reinforcement learning, and self-supervised learning.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
6/10/2021 • 43 minutes, 50 seconds
Reinforcement Learning For the Win
This week’s guest is Nicolas (Nic) Hohn, Chief Data Scientist, McKinsey/QuantumBlack Australia. Nic led a team of data scientists charged with helping America’s Cup winning team, Emirates Team New Zealand, test new designs for hydrofoils – important sailing boat components that could be modified based on rules set forth by race organizers. More precisely the QuantumBlack team used Ray RLlib to design an AI agent that could learn to sail the boat for a given design at an optimal speed, and this AI agent proved crucial during the design process.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
6/3/2021 • 48 minutes, 15 seconds
How Companies Are Investing in AI Risk and Liability Minimization
In this episode of the Data Exchange I speak with Andrew Burt, co-founder and Managing Partner of BNH.ai, a new law firm focused on AI compliance, risk mitigation, and related topics. BNH is the first law firm run by lawyers and technologists focused on helping companies identify and mitigate risks associated with machine learning and AI. Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
5/27/2021 • 41 minutes, 57 seconds
The Future of Machine Learning Lies in Better Abstractions
This week’s guest is Travis Addair, he previously led the team at Uber that was responsible for building Uber’s deep learning infrastructure. Travis is deeply involved with two popular open source projects related to deep learning:He is maintainer of Horovod, a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.And Travis is a co-maintainer of Ludwig, a toolbox that allows users to train and test deep learning models without the need to write code.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
5/20/2021 • 48 minutes, 53 seconds
Why You Should Optimize Your Deep Learning Inference Platform
In this episode of the Data Exchange, I speak with Yonatan Geifman, CEO and co-founder of Deci, as well as with Ran El-Yaniv, Chief Scientist and co-founder of Deci and Professor of Computer Science at Technion. Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
5/13/2021 • 41 minutes, 37 seconds
AI Beyond Automation
In this episode of the Data Exchange, our special correspondent and managing editor Jenn Webb organized a mini-panel composed of myself and Jerry Overton, who previously served as a DXC Fellow, Head of AI at DXC Technology. We discussed Jerry’s experience helping companies across many industries adopt data science and machine learning. We spoke about Centers of Excellence for AI, automation in the workforce, human-centered and responsible AI, and cyborgs!Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
5/6/2021 • 43 minutes, 18 seconds
Injecting Software Engineering Practices and Rigor into Data Governance
As the amount and importance of data grows within organizations, there is growing interest in tools that enable them to strategically utilize, manage, and unlock their data resources. This week’s guest is Steven (Steve) Touw, cofounder and CTO of Immuta, a startup that builds tools that help companies address data governance, data discovery, data privacy and security.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
4/29/2021 • 42 minutes, 37 seconds
Building a data store for unstructured data and deep learning applications
In this episode of the Data Exchange, I speak with Davit Buniatyan, founder and CEO of ActiveLoop, a startup building data management tools for unstructured data types commonly associated with deep learning.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
4/22/2021 • 35 minutes, 33 seconds
How Technology Companies Are Using Ray
In this episode of the Data Exchange, I speak Zhe Zhang, Engineering Manager at Anyscale where he leads the team that works on the Ray and its ecosystem of libraries and partners. Ray is an open source, general purpose framework for building distributed applications (more details in this post and video).Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
4/15/2021 • 35 minutes, 26 seconds
Data quality is key to great AI products and services
In this episode of the Data Exchange, I speak with Abe Gong, CEO and co-founder at Superconductive, a startup founded by the team behind the Great Expectations (GE) open source project. GE is one of a growing number of tools aimed at improving data quality through tools for validation and testing. Other projects in this area include TensorFlow DV, assertr, dataframe-rules-engine, deequ, data-describe, and Apache Griffin.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
4/8/2021 • 41 minutes, 1 second
Machine Learning in Healthcare
In this episode of the Data Exchange, I speak with Parisa Rashidi, Associate Professor at the Department of Biomedical Engineering at University of Florida. Parisa is a computer scientist and machine learning researcher who specializes in applications of ML to healthcare and biomedical domains.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
4/1/2021 • 43 minutes, 9 seconds
Measuring the Impact of AI and Machine Learning Research
In this episode of the Data Exchange, our special correspondent and managing editor Jenn Webb organized a mini-panel composed of myself and Simon Rodriguez, Data Research Assistant at the Center for Security and Emerging Technology (CSET) at Georgetown University. Through a series of reports and data briefs, CSET provides policymakers with data rich material to inform and guide public policy.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
3/25/2021 • 40 minutes, 43 seconds
The Mathematics of Data Integration and Data Quality
In this episode of the Data Exchange, I speak with Ryan Wisnesky, CTO and co-founder of Conexus, a startup that uses techniques from mathematics and incorporates them into novel tools for data integration, data management, and knowledge management.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
3/18/2021 • 44 minutes
Pricing Data Products
In this episode of the Data Exchange, I speak with Jian Pei, Professor, School of Computing Science, Simon Fraser University. His research spans data science, big data, data mining, and database systems. But in this podcast we talk about tools for estimating the economic value of data. Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
3/11/2021 • 46 minutes, 24 seconds
Challenges, Opportunities, and Trends in EdTech
In this episode of the Data Exchange, our special correspondent and managing editor Jenn Webb and I speak with Sharon Zhou, a PhD student in Computer Science at Stanford University. Sharon has been teaching very popular courses on GANs (generative adversarial networks) on Coursera. In this conversation we examine the state of Education Technology (EdTech), learning platforms, and other tools for teaching online. A year into the global pandemic, we discuss advantages and disadvantages of various technologies for delivering classes, as well as broader issues in education.We also took the opportunity to discuss Sharon’s work on deep learning, including her work using GANs to help the general public and policy makers to better understand the implications of climate change.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
3/4/2021 • 51 minutes, 32 seconds
Towards Simple, Interpretable, and Trustworthy AI
In this episode of the Data Exchange I speak with Sheldon Fernandez, CEO at Darwin AI, and Alex Wong, Professor at the University of Waterloo, Co-Founder of DarwinAI (Chief Scientist) and Euclid Labs.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
2/25/2021 • 41 minutes, 42 seconds
The Rise of Metadata Management Systems
In this episode of the Data Exchange, our special correspondent and managing editor Jenn Webb organized a mini-panel composed of myself and Assaf Araki, investment manager at Intel Capital. Assaf and I have written a series of articles and this interview took place shortly before the release of our most recent collaboration: The Growing Importance of Metadata Management Systems. We devote this episode to how metadata management will impact many enterprise data systems.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
2/18/2021 • 30 minutes, 19 seconds
Tools for building robust, state-of-the-art machine learning models
In this episode of the Data Exchange I speak with Michael Mahoney, a researcher at UC Berkeley’s RISELab, ICSI, and Department of Statistics. Mike and his collaborators were recently awarded one of the best papers awards at NeurIPS 2020, one of leading research conferences in machine learning.Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2021 Trends Report: Data, Machine Learning, AI and learn emerging technologies for data management, data engineering, machine learning, and AI.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
2/11/2021 • 42 minutes, 26 seconds
Creating Master Data at Scale with AI
In this episode of the Data Exchange, our special correspondent and managing editor Jenn Webb organized a mini-panel composed of myself and Sonal Goyal, founder of Aficx, a startup that builds solutions to unify data silos for cross selling and upselling, fraud and risk management, compliance and regulatory reporting.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
2/4/2021 • 38 minutes, 11 seconds
Bringing AI and computing closer to data sources
In this episode of the Data Exchange I speak with Bruno Fernandez-Ruiz, CTO and cofounder of Nexar, Inc., a startup that uses dash cams powered by vision-based applications to improve driving and logistics. Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2021 Trends Report: Data, Machine Learning, AI and learn emerging technologies for data management, data engineering, machine learning, and AI.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
1/28/2021 • 48 minutes, 43 seconds
Deep Learning in the Sciences
In this episode of the Data Exchange I speak Bharath (“Bart”) Ramsundar, author and open source developer. While in graduate school, Bart created deepchem, an open source project that aims to democratize deep learning for science.Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
1/21/2021 • 39 minutes, 47 seconds
Taking business intelligence and analyst tools to the next level
In this episode of the Data Exchange I speak with Ira Cohen: co-founder and Chief Data Scientist at Anodot, a startup that uses AI for business monitoring.Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
1/14/2021 • 48 minutes, 16 seconds
Data exchanges and their applications in healthcare and the life sciences
In this episode of the Data Exchange I speak with Omer Dror, CEO and co-founder of Lynx.md, a startup that enables data exchanges and markets in the health and life sciences. Data exchanges match data providers and suppliers, with data buyers and users. Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
1/7/2021 • 51 minutes, 51 seconds
Key AI and Data Trends for 2021
In this episode of the Data Exchange, our special correspondent and editor Jenn Webb organized a mini-panel composed of myself and my podcast co-organizer Mikio Braun. We began our conversation by taking a look back at some of our predictions from last year which included applications of reinforcement learning, end-to-end machine learning platforms, and more. This year we organized trends in the following categories:Tools for building and managing machine learning and AI applications.Foundational data technologies.(Cloud) Computing and Hardware.Emerging trends in AI.This episode provides a sneak peak to a formal report that comes out in early 2021. Sign-up here and we will send you a copy of our 2021 Trends Report as soon as it comes out.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/31/2020 • 52 minutes, 32 seconds
A Unified Management Model for Successful Data-Focused Teams
In this episode of the Data Exchange, our special correspondent and editor Jenn Webb organized a mini-panel composed of myself and Jesse Anderson, Managing Director at the Big Data Institute. Jesse is the author of a recent book entitled “Data Teams: A Unified Management Model for Successful Data-Focused Teams”.Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/24/2020 • 47 minutes, 58 seconds
Security and privacy for the disoriented
In this episode of the Data Exchange I speak with Dan Geer, Senior Fellow at In-Q-tel and Andrew Burt, co-founder and Managing Partner of BNH.ai and Chief Legal Officer at Immuta. Dan is one the leading experts in cybersecurity and risk management, and he has written numerous influential essays on security, privacy, and risk (examples here and here). Andrew serves as co-founder of a new law firm focused on AI compliance and related topics. BNH is the first law firm run by lawyers and technologists focused on helping companies identify and mitigate those risks.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/17/2020 • 46 minutes, 34 seconds
The State of Responsible AI
In this episode of the Data Exchange I speak with Dr. Rumman Chowdhury, founder of Parity, a startup building products and services to help companies build and deploy ethical and responsible AI. Prior to starting Parity, Rumman was Global Lead for Responsible AI at Accenture Applied Intelligence.Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/10/2020 • 38 minutes, 55 seconds
Improving the robustness of natural language applications
In this episode of the Data Exchange I speak with Jack Morris, a member of Google’s AI Residency program. He is co-creator of TextAttack, an open source framework for adversarial attacks, data augmentation, and adversarial training in NLP (paper, code).Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
12/3/2020 • 37 minutes, 35 seconds
End-to-end deep learning models for speech applications
In this episode of the Data Exchange I speak with Yishay Carmiel, an AI Leader at Avaya, a company focused on digital communications. He has long been immersed in speech technologies and conversational applications and I have frequently used him as a resource to understand the latest in speech systems. We previously co-wrote an article that listed out recommendations for teams building speech applications. We also had a previous conversation on the impact of deep learning and big data on speech technologies.Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
11/26/2020 • 43 minutes, 48 seconds
Securing machine learning applications
In this episode of the Data Exchange I speak with Ram Shankar, a Berkman Klein Center affiliate, and a researcher and engineer who works at the intersection of Machine Learning and Security. This episode is focused on the current state of tools and techniques for securing machine learning applications.Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
11/19/2020 • 45 minutes, 40 seconds
Testing Natural Language Models
In this episode of the Data Exchange I speak with Marco Ribeiro, Senior Researcher at Microsoft Research, and lead author of the award-winning paper ”Beyond Accuracy: Behavioral Testing of NLP models with CheckList”. As machine learning gains importance across many application domains and industries, there is a growing need to formalize how ML models get built, deployed, and used. MLOps is an emerging set of practices focused on productionizing the machine learning lifecycle, that draws ideas from CI/CD. But even before we talk about deploying a model to production, how do we inject more rigor into the model development process?Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
11/12/2020 • 30 minutes, 9 seconds
Detecting Fake News
Subscribe: Apple • Android • Spotify • Stitcher • Google • RSS.In this episode of the Data Exchange I speak with Xinyi Zhou, a graduate student in Computer and Information Science at Syracuse University. Xinyi and her advisor (Reza Zafarani) recently wrote a comprehensive survey paper entitled “A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities”. They set out to organize the many different methods and perspectives used to detect fake news. Their paper is a great resource for anyone wanting to understand the strengths and limitations of various state-of-the-art techniques, and a feel for where the research community might be headed in the near future.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
11/5/2020 • 32 minutes, 46 seconds
The Computational Limits of Deep Learning
Subscribe: Apple, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Neil Thompson, Research Scientist at Computer Science and Artificial Intelligence Lab (CSAIL) and the Initiative on the Digital Economy, both at MIT. I wanted Neil on the podcast to discuss a recent paper he co-wrote entitled “The Computational Limits of Deep Learning” (summary version here). This paper provides estimates of the amount of computation, economic costs, and environmental impact that come with increasingly large and more accurate deep learning models. Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/29/2020 • 43 minutes, 4 seconds
Making deep learning accessible
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Piero Molino, creator of Ludwig, a toolbox that allows users to train and test deep learning models through a declarative interface. Piero created Ludwig while serving as a Senior Research Scientist at Uber AI. He originally created Ludwig for his personal use and it slowly garnered users within Uber. By the time it was open sourced in early 2019, the project immediately found a receptive audience in the conferences I was chairing at the time.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/22/2020 • 46 minutes, 56 seconds
Building and deploying knowledge graphs
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Mayank Kejriwal, a Research Assistant Professor in the Department of Industrial and Systems Engineering, and a Research Lead at the USC Information Sciences Institute. The focus of our conversation is knowledge graphs, a collection of linked entities (objects, events, concepts) that is used in many AI applications. For example, Google uses a knowledge graph to enhance its search engine results with infoboxes that appear in some search results. Other areas where knowledge graphs are common include e-commerce, healthcare, and financial services.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/15/2020 • 49 minutes, 30 seconds
Financial Time Series Forecasting with Deep Learning
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Murat Özbayoğlu, Chair of Artificial Intelligence Engineering at TOBB University of Economics and Technology in Ankara, Turkey. I wanted Murat on to discuss two survey papers he and his colleagues wrote on the use of deep learning in finance.I’ve long been fascinated with finance and trading. My first job after I left academia was as the lead quant in a hedge fund, and ever since, I’ve tried to stay abreast of what tools and techniques quants and data scientists in finance are using. Forecasting in this setting usually means price prediction or price movement (trend) prediction. Output of forecasting models are used to inform investment decisions. What makes finance particularly challenging is that many people are using the same underlying data (time series of prices/values), and thus as Murat notes, many firms use alternative data sources (such as text) as potential sources of additional signal.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/8/2020 • 37 minutes, 10 seconds
A programming language for scientific machine learning and differentiable programming
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Viral Shah, co-founder and CEO, Julia Computing. Along with his Julia language co-creators, Viral was awarded the 2019 Wilkinson prize, for outstanding contributions in the field of numerical software. I first tweeted about Julia at the beginning of March 2012 after seeing Jeff Bezanson give a talk in Stanford. I’ve dabbled with it here and there, but have never used it for a major project. Over the past few years, Julia continued to add packages at a steady pace and the package manager is really quite impressive and solid. We spent much of the podcast discussing the state of Julia, Julia 1.5, and the Julia ecosystem and community.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
10/1/2020 • 50 minutes, 17 seconds
Using machine learning to modernize medical triage and monitoring systems
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Kira Radinsky, Chairwoman & Chief Technology Officer at Diagnostic Robotics, a startup using AI to build a medical-grade triage and clinical-predictions platform. She is also a visiting Professor at Technion – Israel Institute of Technology. Kira has extensive experience using data science and machine learning in a variety of settings, and she was one of the pioneers in using alternative data sources to augment forecasting models. Her earlier work includes models to predict social unrest as well as disease outbreaks. The global pandemic has increased the need for experts in medical data mining, a field where Kira has made many significant contributions to.Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/24/2020 • 32 minutes, 55 seconds
Connecting Reinforcement Learning to Simulation Software
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Max Pumperla, deep learning engineer at Pathmind and a contributor to many open source projects in data science and machine learning. Max is speaking on applications of reinforcement learning to simulation problems at the upcoming Ray Summit, a free virtual conference scheduled for Sep 30th and Oct 1st. Earlier this year I had Pathmind’s CEO Chris Nicholson on this podcast and he described how reinforcement learning might play a role in simulation problems. In this episode, Max provides an update and a technical description of how Pathmind uses reinforcement learning, RLLib, and Tune, to help users of AnyLogic, a widely used software for simulations in business applications.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/17/2020 • 52 minutes, 47 seconds
Using machine learning to detect shifts in government policy
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Weifeng Zhong, Senior Research Fellow at the Mercatus Center at George Mason University. He is the core maintainer of the open source Policy Change Index (PCI), a framework that uses machine learning and NLP to “process and read” large amounts of text to discern government priorities and policies. The initial PCI is focused on major policy shifts in China and uses NLP and machine learning to process and analyze the People’s Daily.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/10/2020 • 43 minutes, 8 seconds
What is AI Assurance?
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Ofer Razon, co-Founder & CEO at Superwise, a startup focused on building tools that help companies gain more visibility and control of machine learning models in production. Ofer and Superwise are part of a group in the early stage of building tools and best practices for scaling AI operations. The goal is to help multiple stakeholders build the necessary solutions to evaluate models, receive alerts and troubleshoot on time, validate, observe, and gather insights for more efficiency. AI assurance will ultimately bring together different parts of an organization including business, data science and operational teams, legal and compliance, and privacy and security.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
9/3/2020 • 38 minutes, 5 seconds
Best practices for building conversational AI applications
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Alan Nichol, co-founder and CTO of Rasa, the startup behind the popular open source framework for building conversational AI applications. I had Alan on as a guest in my old podcast, and that conversation was focused on components of Rasa and of chatbot applications. This time around we talked about the state of developer tools, as well as software engineering best practices for building conversational AI applications.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
8/27/2020 • 43 minutes, 55 seconds
Tools for scaling machine learning
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange, our special correspondent and editor Jenn Webb organized a mini-panel composed of myself and Paco Nathan, author, teacher, and founder of Derwen.ai, a boutique consulting firm specializing in Data, machine learning (ML), and AI.We began by discussing tools for scaling machine learning. Paco and I have been impressed with the growth in the number of libraries being built on top of Ray as well as the variety of use cases that are being addressed by Ray.We then discussed the upcoming Ray Summit, a FREE virtual conference featuring over 50 talks on machine learning, Python, serverless and cloud native technologies.We also looked back at the first eight months of this podcast (here’s an archive of previous episodes). Both Paco and Jenn were instrumental in getting this podcast started, and I wanted to mark crossing the 30episode threshold with a short retrospective.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
8/20/2020 • 39 minutes, 6 seconds
From Python beginner to seasoned software engineer
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Joel Grus, Principal Engineer at the Capital Group. He previously served as a Senior Research Engineer at the Allen Institute for AI, where he was a core engineer on AllenNLP, a PyTorch-based library for NLP research. Joel is also the author of one of the most widely read books in data science – Data Science from Scratch. Joel has a new book which I recommend highly: Ten Essays on Fizz Buzz.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
8/13/2020 • 49 minutes, 20 seconds
Assessing Models and Simulations of Epidemic Infectious Diseases
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I bring back Bruno Gonçalves, a data scientist working at the intersection of Data Science and Finance. Bruno was a guest on this podcast in April, when the COVID-19 cases were spiking in his home base in NYC. Prior to shifting over to data science, he spent several years as a researcher focused on mathematical models in Epidemiology – a field with a rich history dating as far back as the 1920s. I wanted to bring him back to get an update on the mathematical models being used to model the global pandemic.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
8/6/2020 • 43 minutes, 38 seconds
Improving the hiring pipeline for software engineers
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Karthik Ramasamy (Senior Director Of Engineering at Splunk) and Arun Kejariwal (experienced engineering leader). The focus of our conversation was hiring technical talent such as software engineers, developers, data scientists, architects, etc. The global pandemic has caused a global economic slowdown and massive layoffs across many industry sectors. But many companies are still hiring and companies are still competing for technical talent. In our bi-weekly newsletter, links pertaining to hiring and work culture have been very popular from the outset.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/30/2020 • 52 minutes, 30 seconds
How to build state-of-the-art chatbots
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Lauren Kunze, CEO of Pandorabots, a widely used platform for building chatbots. About four years ago I attended Bot Day in San Francisco, and at the time, chatbots were very much in the news. Today, chatbots are used across many industries and use cases, and on many types of devices. Lauren Kunze and Pandorabots have been at the forefront of many important developments in the conversational applications space. They assist many enterprises build and deploy bots, and they also create leading edge chatbots like Mitsuku.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/23/2020 • 45 minutes, 12 seconds
Democratizing machine learning
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Ameet Talwalkar, co-founder and Chief Scientist at Determined AI1, and an Assistant Professor in the Machine Learning Department at Carnegie Mellon University. A few months ago, I spoke with one of Ameet’s co-founders (Evan Sparks), around the time they announced that they were open sourcing the Determined Training Platform (DTP). Ameet and I started off by discussing the first few months of DTP as an open source project, specifically initial feedback from users, applications and use cases that they are seeing, and much more.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/16/2020 • 44 minutes, 26 seconds
How graph technologies are being used to solve complex business problems
Subscribe: iTunes, Android, Spotify, Stitcher, Google, and RSS.In this episode of the Data Exchange I speak with Denise Gosnell, Chief Data Officer at DataStax. Denise is also the co-author of the new book, The Practitioner’s Guide to Graph Data, which covers foundational tools and techniques needed to utilize graph technologies in production applications. This conversation is a great introduction to what has become an important class of technologies and tools. Graph technologies are used to power a wide array of applications, including recommendation engines, fraud detection systems, identity and access management, search, and many other use cases.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/9/2020 • 49 minutes, 38 seconds
Machines for unlocking the deluge of COVID-19 papers, articles, and conversations
In this episode of the Data Exchange I speak with Amy Heineike, Principal Product Architect at Primer.ai, a startup building machines that can read and write. Primer recently used their technology to build COVID-19 Primer, a web site that provides an overview of the latest research papers, media coverage, and social media conversations pertaining to COVID-19.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
7/2/2020 • 42 minutes, 57 seconds
Designing machine learning models for both consumer and industrial applications
In this episode of the Data Exchange I speak with Christopher Nguyen, CEO of Arimo (a Panasonic company). I first met Christopher in the early days of Apache Spark, Arimo was one of the first companies to embrace Spark and make it a central component of their data platform. He was also an early proponent of exploring deep learning for enterprise applications. A serial entrepreneur, Christopher was also an Engineering Director at Google where he was responsible for Google Apps.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
6/25/2020 • 33 minutes, 34 seconds
Building open source developer tools for language applications
In this episode of the Data Exchange I speak with Matthew Honnibal, founder of Explosion AI, a startup focused on building developer tools for AI and natural language processing. Matthew and team are the creators of popular tools like spaCy (NLP), Thinc (lightweight deep learning library), and Prodigy (annotation and active learning). Our conversation focused on a range of topics including:spaCyThincExplosion AI and ProdigyDistributed computing with RayDetailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
6/18/2020 • 43 minutes, 55 seconds
Viewing machine learning and data science applications as sociotechnical systems
In this episode of the Data Exchange I speak with Chris Wiggins, Associate Professor at Columbia University, Chief Data Scientist at the New York Times, and co-founder of hackNY. He began his career in theoretical physics but he always had a strong interest in applying quantitative techniques to other disciplines. Early in his career he became interested in applications of machine learning to problems in biology and the health sciences.Our conversation focused on a range of topics including:How he shifted his focus from physics to machine learning and data science.Applications of reinforcement learning.“Data scientist” as a job title, and data science training programs.Ethics in machine learning and data science, including training the next generation of data scientists.A 2015 essay written by Michael Jordan and Tom Mitchell.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
6/11/2020 • 40 minutes, 34 seconds
Identifying and mitigating liabilities and risks associated with AI
In this episode of the Data Exchange I speak with Andrew Burt, Chief Legal Officer at Immuta and co-founder and Managing Partner of BNH.ai, a new law firm focused on AI compliance and related topics. As AI and machine learning become more widely deployed, lawyers and technologists need to collaborate more closely so they can identify and mitigate liabilities and risks associated with AI. BNH is the first law firm run by lawyers and technologists focused on helping companies identify and mitigate those risks.Our conversation focused on a range of topics including:Why a law firm is the right vehicle for helping companies manage and mitigate risks associated with AI and machine learning.The legal profession’s long history in managing risk and regulatory frameworks.Model governance.Incident response and recovery.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
6/4/2020 • 35 minutes, 7 seconds
How machine learning is being used in quantitative finance
In this episode of the Data Exchange our special correspondent and editor Jenn Webb speaks with Arum Verma, Head of Quantitative Research Solutions at Bloomberg. My first job post-academia was as lead quant in a small hedge fund. Since then, I’ve followed the industry from afar and I’ve long been interested in the role of data and models in financial services. Arun and I discussed quantitative finance when we ran into each other at the O’Reilly AI conference in London last year. He was slated to give a talk on extracting trading signals from alternative data sets, an important subject among quants.Jenn and Arun discussed a range of topics including:The quantitative finance landscape.The challenges in identifying and using alternative data sources.Applications of machine learning in finance, specifically deep learning and reinforcement learning.New natural language models and their applications in finance.Model Explainability and Model Risk Management.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
5/28/2020 • 40 minutes, 4 seconds
Understanding machine learning model governance
In this episode of the Data Exchange I speak with Harish Doddi, cofounder of Datatron, a startup focused on helping companies operationalize machine learning. Over the past two years, Harish has worked closely with enterprises to understand their needs in the areas of model operations and model governance. Last year Harish and I, along with David Talby, wrote two articles on these topics. In the first article, we described these emerging areas (“What are model governance and model operations?”), and in the second we listed lessons that ML engineers can draw from two highly regulated industries (“Managing machine learning in the enterprise: Lessons from banking and health care”).As machine learning becomes widely deployed, organizations will need to develop processes and tools to ensure that models behave as intended. This means having the right set of controls and validation steps in place.Our conversation focused on model governance and related topics:We discussed the three related areas of MLOps, Model Governance, Model Observability.I asked Harish to describe how model governance is perceived and practiced in different industries.We discussed real-world examples of model governance, and organizational and staffing considerations that come into play.CI/CD for machine learning.Key enterprise features for model governance solutions.Detailed show notes, including a full transcript, can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
5/21/2020 • 35 minutes, 8 seconds
Improving performance and scalability of data science libraries
In this episode of the Data Exchange I speak with Wes McKinney, Director of Ursa Labs and an Apache Arrow PMC Member. Wes is the creator of pandas, one of the most widely used Python libraries for data science. He is also the author of the best-selling book, “Python for Data Analysis” – a book that has become essential reading for both aspiring and experienced data scientists.Our conversation focused on data science tools and other topics including:Two open source projects Wes has long been associated with: pandas and Apache Arrow.The need for a shared infrastructure for data science.Ursa Labs: its mission and structure.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
5/14/2020 • 33 minutes, 43 seconds
Why TinyML will be huge
In this episode of the Data Exchange I speak with Pete Warden, Staff Research Engineer at Google. Pete is a prolific author and teacher, and he has made many important contributions across many open source software projects. To name just a couple of his projects: he put together the Data Science toolkit (open data sets and open-source tools for data science) and he assembled tools to help developers get started using deep learning, long before TensorFlow and PyTorch were available. Most recently, Pete has been focused on implementing machine learning in ultra-low power systems (TinyML).Our conversation focused on TinyML and other topics including:The early days of using deep learning for computer visionTensorFlow – Pete was part of the team at Google that originated TF.What is TinyML and why is going to be an important topic in the years ahead.Privacy and security in the context of TinyML.Pete’s new book and accompanying video series on YouTube, both designed to help developers get started building TinyML applications.Detailed show notes, including a full transcript, can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
5/7/2020 • 36 minutes, 49 seconds
An open source platform for training deep learning models
In this episode of the Data Exchange I speak with Evan Sparks, cofounder and CEO of Determined AI, a startup that recently open sourced a platform for training deep learning models. Many of the impressive results and applications of deep learning have happened at a handful of companies and research groups. As more companies use deep learning they are learning that infrastructure for training and transfer learning isn’t widely available.Our conversation focused on deep learning and other topics including:Their decision to open source the Determined Training Platform (DTP).Enterprise use cases and applications of deep learning, and why Evan thinks more companies will need a platform for training DL models.The components that come with the DTP: Distributed Training and Hyperparameter Tuning, Experiment Tracking and tools for collaboration and governance, Scheduler specialized for DL workflows, and more.Some examples of how teams have been using DTP.Detailed show notes, including a full transcript, can be found on The Data Exchange web site.Subscribe to The Gradient Flow Newsletter.
4/30/2020 • 40 minutes, 44 seconds
Algorithms that continually invent both problems and solutions
In this episode of the Data Exchange I speak with Kenneth Stanley, a Senior Research Manager at Uber AI and a Professor at UCF. Ken just announced that starting in June he is starting a new research group focused on open-endedness at OpenAI. He is a pioneer in the field of neuroevolution – a method for evolving and learning neural networks through evolutionary algorithms. Ken and his colleague, Joel Lehman, wrote one of my favorite books on AI aimed at a broad audience: Why Greatness Cannot Be Planned. In this episode we discuss his upcoming move to OpenAI, as well as his recent work on open-ended algorithms.Our conversation covered:Ken’s new position at OpenAI.The transition from being a longtime academic researcher to founding and helping lead an industrial research team (Uber AI Labs).Open-ended algorithms, specifically his work on POET (Paired Open-Ended Trailblazer) and Enhanced POET.Generative Teaching NetworksDetailed show notes can be found on The Data Exchange web site.Subscribe to The Data Exchange Newsletter.
4/23/2020 • 43 minutes, 45 seconds
Computational Models and Simulations of Epidemic Infectious Diseases
In this episode of the Data Exchange I speak with Bruno Gonçalves, a data scientist working at the intersection of Data Science and Finance. I have known Bruno for several years and we met when I recruited him to teach several extremely popular conference tutorials and talks on machine learning and deep learning. Prior to shifting over to data science, he spent several years as a researcher focused on mathematical models in Epidemiology – a field with a rich history dating as far back as the 1920s. This episode is devoted to tools and techniques for modeling epidemics.Our conversation covered:Bruno’s background and his experience in modeling epidemics.The field of epidemic models: what techniques are used, the size of the community of researchers, and how do models get evaluated.His two recent posts: “Epidemic Modeling 101 – Or why your CoVID-19 exponential fits are wrong” and “Epidemic Modeling 102 – All CoVID-19 models are wrong, but some are useful”The role that epidemic models play in decision making.Detailed show notes can be found on The Data Exchange web site.Subscribe to The Data Exchange Newsletter.
4/16/2020 • 34 minutes, 37 seconds
Human-in-the-loop machine learning
In this episode of the Data Exchange I speak with Rob Munro, CEO of Machine Learning Consulting and author of the forthcoming book, “Human-in-the-loop Machine Learning”. If you want a copy of Rob’s book, use the discount code podexchange20.Our conversation covered:Rob’s experience building data and machine learning products at Powerset, Idibon, and AWS.Natural language processing - Given Rob’s extensive experience as a researcher, practitioner, and entrepreneur in areas that touch on NLP, we discussed recent trends in language technologies.Human-in-the-loop machine learning.Our goal in this podcast is to build a community of people interested in Data, Machine Learning and AI. If you have suggestions for us on what to recommend (books, conferences, links), and guests to book, please visit TheDataExchange.media site and fill out the “contact” form. The first five people who fill out the form get a free book from Manning (you can view Manning’s catalog here).Detailed show notes can be found on The Data Exchange web site.
4/9/2020 • 43 minutes, 35 seconds
Next-generation simulation software will incorporate deep reinforcement learning
In this episode of the Data Exchange I speak with Chris Nicholson, founder and CEO of Pathmind, a startup applying deep reinforcement learning (DRL) to simulation problems. In a recent post I highlighted two areas where companies can begin to add DRL to their suite of tools: personalization and recommendation engines, and simulation software. My interest in the interplay between DRL and simulation software began when I came across the work of Pathmind in this area.Our conversation focused on deep reinforcement learning and its applications:We began with the basics: what is reinforcement learning and why should businesses pay attention to it?We discussed enterprise applications of DRL, with particular emphasis in areas where Chris and Pathmind have been focused of late: Business Process Simulation and Optimization.Pathmind have been early adopters of Ray and of RLlib, a popular open-source library for reinforcement learning built on top of Ray. I asked Chris why they chose to build on top of RLlib.Detailed show notes can be found on The Data Exchange web site.
4/2/2020 • 39 minutes, 55 seconds
Business at the speed of AI: Lessons from Shopify
In this episode of the Data Exchange I speak with Solmaz Shahalizadeh, VP and Head of Data Science and Data Platform Engineering at Shopify. Shopify is a powerhouse in ecommerce and their technology powers over a million businesses worldwide. Solmaz is a frequent speaker and presenter at conferences throughout the world and she has played a critical role in helping Shopify scale its data and machine learning infrastructure.Our conversation covered many important technical and business topics including:Building and scaling machine learning data products.Building and scaling data teams.Data informed product building.Detailed show notes can be found on The Data Exchange web site.
3/26/2020 • 37 minutes, 12 seconds
How deep learning is being used in search and information retrieval
In this episode of the Data Exchange I speak with Edo Liberty, founder of Hypercube, a startup building tools for deploying deep learning models in search and information retrieval involving large collections. When I spoke at AI Week in Tel Aviv last November several friends encouraged me to learn more about Hypercube - I’m glad I took their advice!Our conversation covered several topics including:Edo’s experience applying machine learning and building tools for ML at places like Yale, Yahoo's Research Lab in New York, and Amazon’s AI Lab.How deep learning is being used in search and information retrieval.Challenges one faces in building search and information retrieval applications when the size of collections are large.Deep learning based search and information retrieval and what Edo describes as “enterprise end-to-end deep search platforms”.Detailed show notes can be found on The Data Exchange web site.
3/19/2020 • 39 minutes, 50 seconds
The responsible development, deployment and operation of machine learning systems
In this episode of the Data Exchange I speak with Alejandro Saucedo, Engineering Director at Seldon, a startup building tools for productionizing machine learning. Alejandro is also Chief Scientist at The Institute for Ethical AI & Machine Learning, a UK-based research center that conducts “research into processes and frameworks that support the responsible development, deployment and operation of machine learning systems”.Our conversation covered Alejandro’s work at both Seldon and the Institute for Ethical AI & Machine Learning:We discussed topic areas that the Institute focuses on including explainability, MLOps, adversarial robustness, and privacy-preserving machine learningWe covered some of the recent output from the Institute including the machine learning maturity model, their open source explainable AI library, their AI-RFX Procurement Framework, and their list of Principles for Responsible AIWe also discussed his role at Seldon, and areas that Seldon has been focused on.Detailed show notes can be found on The Data Exchange web site.
3/12/2020 • 38 minutes, 52 seconds
Hyperscaling natural language processing
In this episode of the Data Exchange I speak with Edmon Begoli, Chief Data Architect at Oak Ridge National Laboratory (ORNL). Edmon has developed and implemented large-scale data applications on systems like Open MPI, Hadoop/MapReduce, Apache Calcite, Apache Spark, and Akka. Most recently he has been building large-scale machine learning and natural language applications with Ray, a distributed execution framework that makes it easy to scale machine learning and Python applications.Our conversation included a range of topics, including:Edmon’s role at the ORNL and his experience building applications with Hadoop and Spark.What is distributed online learning?Why they started using Ray to build distributed online learning applications.Two important use cases: suicide prevention among US veterans and infectious disease surveillance.Detailed show notes can be found on The Data Exchange web site.Join Michael Jordan, Manuela Veloso, Azalia Mirhoseini, Zoubin Ghahramani, Wes McKinney, Ion Stoica, Gaël Varoquaux, and many other speakers at the first Ray Summit In San Francisco, May 27-28. Tickets start at $200.
3/5/2020 • 35 minutes, 14 seconds
What businesses need to know about model explainability
In this episode of the Data Exchange I speak with Krishna Gade, founder and CEO at Fiddler Labs, a startup focused on helping companies build trustworthy and understandable AI solutions. Prior to founding Fiddler, Krishna led engineering teams at Pinterest and Facebook.Our conversation included a range of topics, including:Krishna’s background as an engineering manager at Facebook and Pinterest.Why Krishna decided to start a company focused on explainability.Guidelines for companies who want to begin working on incorporating model explainability into their data products.The relationship between model explainability (transparency) and security (ML that can resist adversarial attacks).Detailed show notes can be found on The Data Exchange web site.Join Michael Jordan, Manuela Veloso, Azalia Mirhoseini, Zoubin Ghahramani, Wes McKinney, Ion Stoica, Gaël Varoquaux, and many other speakers at the first Ray Summit In San Francisco, May 27-28. Tickets start at $200.
2/27/2020 • 36 minutes, 10 seconds
Scalable Machine Learning, Scalable Python, For Everyone
In this episode of the Data Exchange I speak with Dean Wampler, Head of Developer Relations at Anyscale, the startup founded by the creators of Ray. Ray is a distributed execution framework that makes it easy to scale machine learning and Python applications. It has a very simple API and as someone who uses both Python and machine learning, Ray has been a wonderful addition to my toolbox. Dean has long been one of my favorite architects, speakers and teachers, and we have known each other since the early days of Apache Spark. He has authored numerous books and is known for his interest in Scala and programming languages, as well as in software architecture.Our conversation spanned many topics, including:What is Ray and why should someone consider using it?The first Ray Summit (May 27-28 in San Francisco)Dean’s first impressions of Ray, and his journey from Scala to Python.An update on Ray’s core libraries, Ray on Windows, and distributed training with Ray.Detailed show notes can be found on The Data Exchange web site.For more on Ray and scalable machine learning & Python, come hear from Dean Wampler, Michael Jordan, Ion Stoica, Manuela Veloso, Wes McKinney and many other leading developers and researchers at the first Ray Summit in San Francisco (May 27-28).
2/20/2020 • 35 minutes, 45 seconds
Computational humanness, analogy and innovation, and soft concepts
In this episode of the Data Exchange I speak with Dafna Shahaf, Associate Professor at the School of Computer Science and Engineering, the Hebrew University of Jerusalem. She also runs the hyadata lab, a research group that consistently produces unique and interesting projects at the intersection of computer science, data, and the social sciences.Our conversation included a range of topics, including:Computational analogy: Dafna and her students mine online sources like patent filings, research papers, and data from crowdsourcing platforms focused on innovation, and in the process they produce tools that should be of interest to innovation officers and members of innovation labs.Soft Concepts: Dafna has continued her work on computational humor, and along with her students, they have new tools for automatically finding trivia facts in Wikipedia.An upcoming workshop on Innovative Ideas in Data Science (April 20th in Taipei; the deadline to submit proposals is: 21 February 2020).Detailed show notes can be found on The Data Exchange web site.
2/13/2020 • 33 minutes, 38 seconds
Building domain specific natural language applications
In this episode of the Data Exchange I speak with David Talby, co-creator of Spark NLP, an open source, highly scalable, production grade natural language processing (NLP) library. Spark NLP has become one of the more popular NLP libraries and is available on PyPI, Conda, Maven, and Spark Packages. With recent advances in research in large-scale natural language models, there is strong interest in domain specific natural language applications. Besides their work on Spark NLP, David and his collaborators are building natural language models tuned specifically for healthcare applications.Our conversation spanned many topics, including:Spark NLP: its current status and some common and surprising use cases.Recent developments in NLP research and their implications for companies.Spark NLP for HealthcareDetailed show notes can be found on The Data Exchange web site.
2/6/2020 • 33 minutes, 9 seconds
The state of privacy-preserving machine learning
In this episode of the Data Exchange I speak with Morten Dahl, research scientist at Dropout Labs, a startup building a platform and tools for privacy-preserving machine learning. He is also behind TF Encrypted, an open source framework for encrypted machine learning in TensorFlow. The rise of privacy regulations like CCPA and GDPR combined with the growing importance of ML has led to a strong interest in tools and techniques for privacy-preserving machine learning among researchers and practitioners. Morten brings the unique perspective of being a longtime security researcher who has also worked as a data scientist in industry.Our conversation spanned many topics, including:Morten’s unique background as an experienced security researcher, developer, and data scientist.The current state of TF Encrypted.Federated learning (FL) and secure aggregation for FL.Privacy-preserving ML solutions will employ a variety of techniques, and thus we also discussed related topics such as differential privacy, homomorphic encryption, and RISELab’s stack for coopetitive learning (MC2).Detailed show notes can be found on The Data Exchange web site.
1/30/2020 • 42 minutes, 15 seconds
Taking messaging and data ingestion systems to the next level
Sijie Guo on how Apache Pulsar is able to handle both queuing and streaming, and both online and offline applications.In this episode of the Data Exchange I speak with Sijie Guo, founder of StreamNative, a new startup focused on making enterprise messaging technologies - specifically Apache Pulsar - easy to use on the cloud. Sijie was previously a cofounder of Streamlio (acquired by Splunk) and prior to that he led the messaging team at Twitter. He is also the main organizer behind the Pulsar Summit (April in San Francisco), a new conference whose Call for Speakers closes on January 31st. Our conversation spanned many topics, including:The role of messaging in modern data applications and platforms.The two main types of messaging applications: queuing and streaming.Apache Pulsar as a unified messaging platform, able to handle both queuing and streaming, and both online and offline applications.A status update on Apache Pulsar.Detailed show notes can be found on The Data Exchange web site.
1/23/2020 • 38 minutes
Business at the speed of AI: Lessons from Rakuten
The Data Exchange Podcast: Bahman Bahmani on attracting and retaining talent, and the importance of delivery-oriented teams.In this episode of the Data Exchange I speak with Bahman Bahmani, VP of Data Science and Engineering at Rakuten, a large Japanese ecommerce and online retail company. When I first met Bahman several years ago, he was finishing up his Computer Science PhD at Stanford, and at the time he was giving technical talks on machine learning algorithms and their applications to computer security. Today he leads a large team at Rakuten, and in my opinion he has established an organizational structure, processes and an AI practice that other companies should study.Our conversation spanned many topics, including:The impact that AI, machine learning, and data have had on Rakuten’s businesses.Attracting, nurturing, and retaining talent in an environment when data scientists, data engineers, and analysts who all have many other options.The trio of strategic options: operational excellence, product leadership, customer intimacy.Organization and culture, including key roles within an AI practice.The power of delivery-oriented teams with end-to-end responsibility.Detailed show notes can be found on The Data Exchange web site.
1/16/2020 • 41 minutes, 16 seconds
The combination of the right software and commodity hardware will prove capable of handling most machine learning tasks
In this episode of the Data Exchange I speak with Nir Shavit, Professor of EECS at MIT, and cofounder and CEO of Neural Magic, a startup that is creating software to enable deep neural networks to run on commodity CPUs (at GPU speeds or faster). Their initial products are focused on model inference, but they are also working on similar software for model training.Our conversation spanned many topics, including:Neurobiology, in particular the combination of Nir’s research areas of multicore software and connectomics – a branch of neurobiology.Why he believes the combination of the right software and CPUs will prove capable of handling many deep learning tasks.Speed is not the only factor: the “unlimited memory” of CPUs are able to unlock larger problems and architectures.Neural Magic’s initial offering is in inference, model training using CPUs is also on the horizon.Detailed show notes can be found on The Data Exchange web site.
1/9/2020 • 30 minutes, 23 seconds
Key AI and Data Trends for 2020
In this episode of the Data Exchange, I speak with my podcast co-organizer Mikio Braun, data scientist at GetYourGuide, and a former machine learning researcher and data architect. Mikio and I go out on a limb and speculate about new trends in AI and Data that we think people should pay attention to in 2020.Our conversation spanned many topics, and we listed trends in:Models: reinforcement learning, deep learning, language models, and related topics.Applications: including emerging use cases for reinforcement learning.Infrastructure and Tools: end-to-end machine learning platforms, the importance of distributed computing, etc.Managing risks: privacy, security, safety, fairness, etc.Emerging technologies to watch for in 2020.Detailed show notes can be found on The Data Exchange web site.
12/26/2019 • 36 minutes, 26 seconds
The evolution of TensorFlow and of machine learning infrastructure
In this episode of the Data Exchange I speak with Rajat Monga, one of the founding members of the TensorFlow Engineering team. Up until recently Rajat was the engineering manager for TensorFlow at Google. Our conversation spanned many topics, including:TFX, a production scale machine learning platform based on TensorFlow.Distributed training.MLIR (Multi-Level Intermediate Representation), “a representation format and library of compiler utilities that sits between the model representation and low-level compilers/executors that generate hardware-specific code.”Deep learning in the enterprise.The state of machine learning infrastructure.[full show notes can be found on the Data Exchange web site.]
12/12/2019 • 36 minutes, 24 seconds
Building large-scale, real-time computer vision applications
In this episode of the Data Exchange I speak with Reza Zadeh, founder and CEO of Matroid, a startup focused on making computer vision applications easy to build and deploy. Reza is also an adjunct professor at Stanford.This particular conversation spanned many topics pertaining to computer vision, including:Challenges in building large-scale, real-time computer vision applications.Robustness of computer vision applications (adversarial attacks, deepfakes).Impact of computer vision technologies on society: security, privacy and surveillanceWe also preview the upcoming 2020 edition of the ScaledML conference: Reza is the main organizer behind one of my favorite conferences in the SF Bay Area.[full show notes can be found on the Data Exchange site.]