Building A Data Lake For The Database Administrator At Upsolver
Data Engineering Podcast06/02/20 • 56 min
Summary
Data lakes offer a great deal of flexibility and the potential for reduced cost for your analytics, but they also introduce a great deal of complexity. What used to be entirely managed by the database engine is now a composition of multiple systems that need to be properly configured to work in concert. In order to bring the DBA into the new era of data management the team at Upsolver added a SQL interface to their data lake platform. In this episode Upsolver CEO Ori Rafael and CTO Yoni Iny describe how they have grown their platform deliberately to allow for layering SQL on top of a robust foundation for creating and operating a data lake, how to bring more people on board to work with the data being collected, and the unique benefits that a data lake provides. This was an interesting look at the impact that the interface to your data can have on who is empowered to work with it.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise.
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- You listen to this show because you love working with data and want to keep your skills up to date. Machine learning is finding its way into every aspect of the data landscape. Springboard has partnered with us to help you take the next step in your career by offering a scholarship to their Machine Learning Engineering career track program. In this online, project-based course every student is paired with a Machine Learning expert who provides unlimited 1:1 mentorship support throughout the program via video conferences. You’ll build up your portfolio of machine learning projects and gain hands-on experience in writing machine learning algorithms, deploying models into production, and managing the lifecycle of a deep learning prototype. Springboard offers a job guarantee, meaning that you don’t have to pay for the program until you get a job in the space. The Data Engineering Podcast is exclusively offering listeners 20 scholarships of $500 to eligible applicants. It only takes 10 minutes and there’s no obligation. Go to dataengineeringpodcast.com/springboard and apply today! Make sure to use the code AISPRINGBOARD when you enroll.
- Your host is Tobias Macey and today I’m interviewing Ori Rafael and Yoni Iny about building a data lake for the DBA at Upsolver
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you start by sharing your definition of what a data lake is and what it is comprised of?
- We talked last in November of 2018. How has the landscape of data lake technologies and adoption changed in that time?
- How has Upsolver changed or evolved since we last spoke?
- How has the evolution of the underlying technologies impacted your implementation and overall product strategy?
- How has Upsolver changed or evolved since we last spoke?
- What are some of the common challenges that accompany a data lake implementation?
- How do those challenges influence the adoption or viability of a data lake?
- How does the introduction of a universal SQL layer change the staffing requirements for building and maintaining a data lake?
- What are the advantages of a data lake over a data warehouse if everything is being managed via SQL anyway?
- What are some of the underlying realities of the data systems that power the lake which will eventually need to be understood by the operators of the platform?
- How is the SQL layer in Upsolver implemented?
- What are the most challenging or compl...
06/02/20 • 56 min
Data Engineering Podcast - Building A Data Lake For The Database Administrator At Upsolver
Transcript
Hello, and welcome to the Data Engineering Podcast, the show about modern data management.
What advice do you wish you had received early in your career of data engineering?
If you hand a book to a new data engineer, what wisdom would you add to it?
I'm working with O'Reilly Media on a project to collect the 97 things that every data engineer should know, and I need your h
Generate a badge
Get a badge for your website that links back to this episode
<a href="https://goodpods.com/podcasts/data-engineering-podcast-203077/building-a-data-lake-for-the-database-administrator-at-upsolver-20706484"> <img src="https://storage.googleapis.com/goodpods-images-bucket/badges/generic-badge-1.svg" alt="listen to building a data lake for the database administrator at upsolver on goodpods" style="width: 225px" /> </a>
Copy