
Warehouse-Native Data Architecture with Soumyadeb Mitra, Founder and CEO at RudderStack
10/03/23 • 39 min
This episode of The Analytics Edge, sponsored by NetSpring, features an interview with Soumyadeb Mitra, Founder and CEO at RudderStack, the leading warehouse Customer Data Platform that’s purpose-built for data teams. RudderStack is an open-source, enterprise-ready platform for collecting, storing, and routing customer event data to your data warehouse and dozens of other tools.
After founding the company in 2019, Soumyadeb led RudderStack to 100+ employees and a $56 million Series B funding round in 2022. Prior to RudderStack, he co-founded Mariana, a VC-funded B2B martech startup, which was later acquired by 8x8 in 2018. Soumyadeb earned his PhD in Computer Science from the University of Illinois Urbana-Champaign.
In this episode, Soumyadeb talks about the founding stories behind RudderStack, the evolution of Customer Data Platforms, and the significant impact that a warehouse-centric CDP approach has on business.
Key Quotes
“We want to look at product funnels and customer journeys, but then combine that with Salesforce data, right? I mean, I want to look at funnels separately for enterprise customers and customers who closed versus customers where we are competing with a specific vendor and so on. And this is a very standard thing I would imagine. I mean, we see that across all our companies, but it was surprisingly hard to do with a lot of these cloud product analytics tools, right? They're amazing tools, but then they're only designed to ingest a specific kind of data. And if you want to combine other data sources, it becomes really fragile and complicated to set up those data pipelines, right? So yeah, I think Warehouse-native enables that and kind of unlocks that set of use cases. Plus there are all these challenges around data privacy, which again, it's not so much for a company like us, but at scale, it becomes a problem, right? I mean, you're centralizing your data in a data warehouse. Why do you need to ship everything to another vendor to do specific parts of your analytics? It just does not make good sense..” - Soumyadeb Mitra
Episode Timestamps
(01:11) Founding story behind RudderStack
(02:50) The evolution of CDP
(06:50) Business challenges CDPs are trying to solve
(08:06) Packaged vs. composable debate
(10:55) Benefits of warehouse-native CDP
(17:14) Analytics on customer data
(18:47) Data activation and reverse ETL
(21:10) Real-time personalization
(26:05) Achieving customer 360 view
(28:08) Business impact with a warehouse-centric CDP approach
(30:17) The future of CDPs
(34:48) Takeaways
Links
This episode of The Analytics Edge, sponsored by NetSpring, features an interview with Soumyadeb Mitra, Founder and CEO at RudderStack, the leading warehouse Customer Data Platform that’s purpose-built for data teams. RudderStack is an open-source, enterprise-ready platform for collecting, storing, and routing customer event data to your data warehouse and dozens of other tools.
After founding the company in 2019, Soumyadeb led RudderStack to 100+ employees and a $56 million Series B funding round in 2022. Prior to RudderStack, he co-founded Mariana, a VC-funded B2B martech startup, which was later acquired by 8x8 in 2018. Soumyadeb earned his PhD in Computer Science from the University of Illinois Urbana-Champaign.
In this episode, Soumyadeb talks about the founding stories behind RudderStack, the evolution of Customer Data Platforms, and the significant impact that a warehouse-centric CDP approach has on business.
Key Quotes
“We want to look at product funnels and customer journeys, but then combine that with Salesforce data, right? I mean, I want to look at funnels separately for enterprise customers and customers who closed versus customers where we are competing with a specific vendor and so on. And this is a very standard thing I would imagine. I mean, we see that across all our companies, but it was surprisingly hard to do with a lot of these cloud product analytics tools, right? They're amazing tools, but then they're only designed to ingest a specific kind of data. And if you want to combine other data sources, it becomes really fragile and complicated to set up those data pipelines, right? So yeah, I think Warehouse-native enables that and kind of unlocks that set of use cases. Plus there are all these challenges around data privacy, which again, it's not so much for a company like us, but at scale, it becomes a problem, right? I mean, you're centralizing your data in a data warehouse. Why do you need to ship everything to another vendor to do specific parts of your analytics? It just does not make good sense..” - Soumyadeb Mitra
Episode Timestamps
(01:11) Founding story behind RudderStack
(02:50) The evolution of CDP
(06:50) Business challenges CDPs are trying to solve
(08:06) Packaged vs. composable debate
(10:55) Benefits of warehouse-native CDP
(17:14) Analytics on customer data
(18:47) Data activation and reverse ETL
(21:10) Real-time personalization
(26:05) Achieving customer 360 view
(28:08) Business impact with a warehouse-centric CDP approach
(30:17) The future of CDPs
(34:48) Takeaways
Links
Previous Episode

Key Trends in Databases with Nikita Shamgunov, Founder and CEO at Neon
This episode features an interview with Nikita Shamgunov, legendary founder of MemSQL (now SingleStore). His latest endeavor, Neon, offers serverless Postgres as a fully managed multi-cloud database that separates storage and compute, with auto scaling, branching, and bottomless storage.
Nikita is also a Partner at Khosla Ventures, where he is incubating Neon and raised $104M to date. He is passionate about deep tech, data infrastructure, and system software. Prior to Neon, Nikita co-founded MemSQL (now SingleStore), a unicorn data and analytics company valued at over $1.3 billion. He served as a founding CTO, and then CEO, successfully scaling the company to over $40 million in ARR. Prior to SingleStore, he worked as a senior engineer at Facebook and at Microsoft on SQL Server. Nikita earned a Ph.D. in computer science from the National Research University in St. Petersburg, Russia.
In this episode, Nikita recounts the founding stories behind both MemSQL and Neon, and elaborates on the key trends driving database technologies today, from serverless and generative AI, to open data and the convergence of transactional and analytical workloads.
-----------
Key Quotes
Amplitude and Mixpanel, they basically are a time series database underneath with the UI. Time series data tends to be, you know, ‘write once’, most of it. And so, you need to take advantage of those techniques that data warehouses are basically born with, right? They are in the business of storing data relatively cheaply. And every enterprise, unless it's not like an archaic enterprise, should have a data warehouse. So it makes only too much sense to put this into a data warehouse rather than either a custom database, you know, like a platform like Datadog, Mixpanel, Amplitude. Plus you have additional benefits from it because you can cross reference that data with the rest of the business data." - Nikita Shamgunov
-----------
Episode Timestamps
(01:41) Founding stories behind MemSQL and Neon
(03:39) Addressing new challenges for databases
(09:20) Criteria for evaluating databases
(12:36) HTAP and zero ETL between transactional and analytical applications
(19:07) Evolving standards around table formats
(24:07) Thoughts on Generative AI and LLM-native in the data warehouse
(26:38) Warehouse centric approaches to data storage
(29:45) Open source for data warehouses
(33:54) Potential for new applications to be built around real time applications
(38:10) Managing large volumes of data
(40:59) Serverless Postgres is as easy as Stripe
(45:40) Takeaways
-----------
Links
Next Episode

Fueling Product-Led Growth with Data Science with Anahita Tafvizi, VP and Head of Data Science & Business Operations at Instacart
This episode features an interview with Anahita Tafvizi, VP and Head of Data Science & Business Operations at Instacart. Instacart is the leading grocery technology company in North America
As a senior executive at Instacart, Anahita drives key operations and strategic decisions across all company product pillars and ensures data investments are aligned with the long-term business strategy. She leads a team of over 150+ Data Science and Strategy individuals across all company product lines including consumers, shoppers, advertisers, and retailer products. Previously, Anahita was the Director of Finance for Google Commerce, Retail & Travel, as well as the Head of Finance for YouTube Ads and Head of Analytics & Data Science for eBay Ads. She is passionate about building high-performance data and strategy organizations with a focus on agility and impact. Anahita earned a Ph.D in Physics from Harvard University.
In this episode, Anahita talks about structuring her data science team to reveal opportunities for new efficiencies that guide Instacart’s 4-sided marketplace, her approach to hiring the leadership team and overseeing 150+ employees, and reveals recent data science initiatives fueling product-led growth.
Bio:
Anahita Tafvizi is currently the Vice President and Head of Data Science & Business Operations at Instacart. As a senior executive of the company, she drives key operations and strategic decisions across all Instacart product pillars and ensures data investments are aligned with the company’s long-term business strategy. She is passionate about building high-performance data and strategy organizations with a focus on agility and impact.
Key Quotes:
“How can we make the experience of buying groceries on Instacart not just more convenient but also more efficient and delightful? To inspire product strategy, we spend a lot of time trying to understand patterns of shopping so we can build personalized experiences.” - Anahita Tafvizi
Episode Timestamps
(01:17) Anahita’s path to data science
(03:04) Instacart’s 4-sided marketplace
(04:56) Structure of the data science team
(07:10) How business teams can unlock new insights
(09:51) Benefits and drawbacks of virtual teams
(12:45) Data needs of product-led growth
(14:30) Key data science techniques, tools, and skills
(16:30) Recent data science initiatives fueling PLG
(19:16) Instacart's data maturity
(20:20) Data access for business context
(22:00) Approach to hiring data science leaders
(23:30) Career growth paths in data science
(26:37) Increasing internal talent bench
(27:59) Driving efficiency in an economic downturn
(32:08) Key insights on grocery delivery services
(34:35) Takeaways
Links
If you like this episode you’ll love
Episode Comments
Generate a badge
Get a badge for your website that links back to this episode
<a href="https://goodpods.com/podcasts/the-analytics-edge-460784/warehouse-native-data-architecture-with-soumyadeb-mitra-founder-and-ce-62374208"> <img src="https://storage.googleapis.com/goodpods-images-bucket/badges/generic-badge-1.svg" alt="listen to warehouse-native data architecture with soumyadeb mitra, founder and ceo at rudderstack on goodpods" style="width: 225px" /> </a>
Copy