Hello everyone and welcome to the very first season of Modern Data Show- a podcast where you can hear stories from data practitioners and enthusiasts - the folks in the arena, on their journey of building and operating a modern data stack. I’m your host - Aayush and we’re super happy you’re here. In this first season of our show, we’d be talking about all things ETL - how organisations at a large scale manage their ETL processes, the tools and technologies they use, common pitfalls and much more
S02 E15 From Data Source to API in Minutes with Matteo Pelati and Vivek Gudapuri, founders at Dozer
Prepare to be amazed in this episode as Matteo Pelati and Vivek Gudapuri, the brilliant minds behind Dozer, reveal their experience in pushing the boundaries of data management and analysis. By simplifying the process of data serving and allowing companies to create APIs quickly and efficiently, Dozer's approach sets them apart from the modern data stack. Their open-source approach allows developers to build custom operators and extend connectors, ensuring that Dozer can cover a wide range of use cases while still offering customization at each step. They also discuss the challenges they faced during the development of Dozer and how they are positioned to adapt to upcoming trends and developments in real-time data processing.
6/6/2023 • 26 minutes, 6 seconds
S02 E14 Transforming Data Pipelines for the Future: An Interview with Sean Knapp, CEO of Ascend.io
Uncover the secret to turning data engineering into a superpower! As Sean Knapp, the CEO and founder of Ascend.io, joined us and discussed the value of depth and breadth in capturing the entire data value chain, emphasizing the need for an automation layer to adapt to the evolving data landscape. Ascend's platform enables intelligent data pipeline creation and management, with a dynamic control plane that detects and responds to changes in real time across extensive pipeline networks. Sean further explored the potential of generative AI in data engineering & his optimism about the future of the modern data stack, foreseeing consolidation and the emergence of new parallel spaces in the data ecosystem.
5/30/2023 • 32 minutes, 52 seconds
S02 E13 Building a Data-Driven Fashion Empire: The Zalando Data Foundation Story with Dr. Alexander Borek, Director of Data and Analytics at Zalando
Step into the world of Zalando, Europe's leading online fashion retailer, where data drives innovation and enhances the customer experience. In this episode, join us as we interview Dr. Alexander Borek, the brilliant mind behind Zalando's data and analytics strategy. Discover how Dr. Borek and his team have revolutionized the company's approach to data by implementing the cutting-edge concept of data mesh. Learn how Zalando successfully strikes the perfect balance between decentralization and structure, unleashing the full potential of data while maintaining collaboration with various business units. Dr. Borek also unveils the secrets to leveraging data for innovation and value creation in the dynamic world of online fashion. Tune in now for an eye-opening exploration of data management, leadership, and the future of data-driven decision-making at Zalando.
5/23/2023 • 39 minutes, 22 seconds
S02 E12 Unveiling Twilio’s Data Transformation: A Journey into Modern Data Stack with Don Oriti, Head of Data Platform and Engineering at Twilio
Twilio has built an open source data lake using AWS technologies and Databricks, processing billions of events daily through their Kafka environment. They aim to provide a cohesive view of data across platforms and enable other businesses to use data wherever they want. Don, the Head of Data Platform and Engineering at Twilio, shares insights into Twilio's data stack in the latest episode of the Modern Data Show. The conversation covers the Twilio data stack, which begins with data ingestion through Kafka or CDC for Aurora databases, followed by storage in S3, high-level aggregation and curation using Spark, and the use of tools such as Kudu, Reverse ETL, data governance, cataloging, and BI tools.
5/17/2023 • 36 minutes, 14 seconds
S02 E11: The Reverse ETL Revolution: Overcoming Challenges in Syncing Live Data to SaaS Tools with Tejas Manohar, Co-founder and Co-CEO at Hightouch
Did your business ever face challenges to sync live data to your sales, marketing, and customer success tools? Then this is where you need Hightouch, a Reverse ETL platform that syncs data from a data warehouse to SaaS tools in minutes. It enables businesses to get accurate customer data quickly without requiring engineering effort or manual work. In this episode, Tejas Manohar shared his journey from developing games at a young age to becoming the Co-founder and CEO of Hightouch. He provided valuable insights into Hightouch's internal connector framework, which automatically performs tasks like change data capture and batching, as well as providing methods to send rows that may need to be retried in future syncs. He also talked about Hightouch's two new products and the future of reverse ETL.
5/9/2023 • 35 minutes, 54 seconds
S02 E10: From On-Prem to the Cloud: Managing ClickHouse with DoubleCloud with Natalia Shuliak, COO at DoubleCloud
When working with open-source technologies, you benefit from the community's creations, but you also have to do a lot of admin and support work as the technologies tend to break, and support usually falls on yourself. This is where DoubleCloud's platform comes into the picture. In this latest episode of the Modern Data Show, Natalia Shuliak talks about how DoubleCloud saves you from administrative work and allows you to focus on data pipeline development and management, while providing backup, security, and support.
5/2/2023 • 29 minutes, 2 seconds
S02 E09: Building Data Pipelines at Shopify: Insights from Marc Laforet, Senior Data Engineer at Shopify
With its widespread popularity and success in the e-commerce industry, it is difficult to imagine anyone who has not at least heard of Shopify. This episode features Marc Laforet, a senior data engineer at Shopify, who shares his journey of how he transitioned from being a biochemist to a data engineer at Shopify. Marc explains the type of data Shopify works with, which is diverse in format and comes from different sources, and how the company determines which tools to build to extract the most value from the data. Marc also discusses data governance and explains two possible architectures: a gating process or a trust-but-verify approach.
4/25/2023 • 25 minutes, 18 seconds
S02 E08: Data-Driven Fitness: An Inside Look into Urban Sports Club’s Innovative Data Platform with Artur Yatsenko, Head of Data Platform at Urban Sports CLUB
Urban Sports Club, a company that connects fitness enthusiasts started their data journey when they realised treating data as a product instead of a by-product could help them unlock the value of data. In the latest episode of the Modern Data Show, we are joined by Artur Yatsenko, Head of Data Platform at Urban Sports Club to discuss the company's platform, its evolving data stack, and the challenges faced while building it. Arthur shared insights on adopting open-source software and tools for data management and implementing data as a product strategy.
4/11/2023 • 28 minutes, 15 seconds
S02 E07 Revolutionizing the Data Landscape: Inside Salesforce’s modernization journey with Murali Kallem, Head of Data Platform at Slesforce
Salesforce is moving towards a more user-friendly and modernized data platform that allows for faster migration and operation, while also enabling users to take advantage of new functionalities that were previously unavailable. In the latest episode of the Modern Data Show, Murali Kallen, Head of Office of Data at Salesforce discusses the Snowflake modernization efforts, including migrating to Snowflake and adopting cloud-friendly tools. Murali also covers the importance of vendor support structures for established companies and the consideration of open-source versus commercial offerings.
00:00:00 Introduction
00:03:12 Data platform at Salesforce
00:07:53 Structure of Salesforce's data team
00:12:28 Data tool buying criteria from the data leader's perspective
00:23:05 Partnership with Snowflake
00:27:24 Future of data space
4/4/2023 • 32 minutes, 44 seconds
S02 E06 Breaking Down the Buzzword: What Data Mesh Really Means for Organizations with Colleen Tartow, Director of Engineering at Starburst data
With the introduction of the Data Mesh concept a lot of people are trying to wrap their heads around the term, In the latest episode of the Modern Data Show, Colleen Tartow Director Of Engineering at Starburst Data provides a comprehensive explanation of what data mesh actually is, the socio-technical aspect of data mesh and the fundamental shift in the way data is produced and governed within an organization.
3/28/2023 • 31 minutes, 27 seconds
S02 E05: What’s Fundamentally Wrong with Modern Data Stack with Lauren Balik, Owner at Upright Analytics
Lauren Balik, who runs Upright Analytics and is a leading data consultant and investor, discusses why she believes the modern data stack is flawed and the three factors that affect the cost of a data platform. Balik also compares building versus buying a data platform and recommends an OLAP database in the cloud for small companies. However, she thinks centralizing data out of a line of business is a mistake for larger companies. Balik does not anticipate consolidation in the modern data stack and thinks that large language models such as GPT-3 will be crucial.
3/21/2023 • 36 minutes, 35 seconds
S02 E04: Legacy to Modern: Transforming Analytics Infrastructure with Ian Macomber, Head of Analytics Engineering & Data Science at Ramp
Ian Macomber, Head of Analytics Engineering & Data Science at Ramp, discusses the company's approach to automating finance tools and building the next generation of finance through data-driven decision-making. Macomber emphasizes the importance of cross-functional collaboration and embedding the data team into every part of the product engineering process. He also highlights the need for data compliance and privacy to be invested in every day and not treated as a one-time effort. Macomber warns against "Layerinitis," where teams prioritize quick solutions over long-term effects, and advises celebrating the hardening of code and inviting people into codebases to teach them best practices.
3/14/2023 • 30 minutes, 22 seconds
S02 E03: Innovating the Modern Data Stack: Change Data Capture and Beyond with Gunnar Morling Senior Staff Software Engineer at Decodable
In this episode of Modern Data Show Gunnar Morling discussed his interest in software engineering and databases and his recent move to Decodable, a real-time stream processing platform based on Apache Flink. He talked about the importance of cohesive data pipelines, from source to sink, and how his work with Debezium led him to become interested in stream processing. Gunnar also discussed how Decodable provides managed stream processing based on Apache Flink, ingesting real-time data streams and processing them, and putting the data into other systems.
3/7/2023 • 34 minutes, 16 seconds
S02 E02: Building for Scale: When and How to Invest in Data Platforms with Brennon York, Head of Data Platform at Lyft
In this episode of the Modern Data Show, Brennon York, Head of the Data Platform at Lyft, gives insights into the critical aspects of the data platform ecosystem in the early stages when there is no scale. Brennon also discusses the structure of the data platform team and new emerging technologies within the modern data stack that have impressed him, such as machine learning orchestration systems like SageMaker, Union-ai, and Flyte. The episode provides valuable insights into building a data platform that can scale with the growth of a company, enabling businesses to stay competitive in the fast-paced technological landscape.
2/28/2023 • 35 minutes, 33 seconds
S02 E01: A deep dive into the world of Data Streaming with Kai Waehner, Global Field CTO at Confluent
In this episode of the Modern Data Show, host Aayush Jain is joined by Kai Waehner, the Global Field CTO at Confluent, to discuss all things about Apache Kafka, Confluent, and event streaming. Confluent is a complete event streaming platform and fully managed Kafka service used by tech giants, modern internet startups, and traditional enterprises to build mission-critical scalable systems. During the podcast, Kai discusses the benefits of using Confluent over deploying Kafka, the role of a global Field CTO, and the company's complete data streaming platform.
2/21/2023 • 37 minutes, 40 seconds
S01 E11: Unlocking behavioral data at scale with Alex Dean CEO and Co-founder of Snowlpow
'Data as oil' is an extensively used metaphor and its impact can be gauged by how every business is heavily dependent on the data provided to them by 3rd party sources. Source data systems are finite, they have a certain amount of data with a limited associated scope. This is where Snowlplow comes in and helps businesses deliberately create that data. In the latest episode of the Modern Data Show, we have Alex Dean, CEO and Co-founder of Snowplow data discuss data creation, behavrioul analytics, data contracts, tracking catalog and where the modern data stack is heading in 2023.
11/22/2022 • 32 minutes, 44 seconds
S01 E10: Commoditizing data integration with Airbyte, Michel Tricot, Co-founder and CEO, Airbyte
When Michel and his team founded Airbyte back in 2020 there were already a ton of data integration tools out there and by 2020, it was a pretty mature space altogether. So what led them to start this company and what unique problem did they aim to address? To answer this, for this week's episode we have Michel Tricot, the co-founder and CEO of Airbyte.