Log in

goodpods headphones icon

To access all our features

Open the Goodpods app
Close icon
headphones
Towards Data Science

Towards Data Science

The TDS team

Note: The TDS podcast's current run has ended. Researchers and business leaders at the forefront of the field unpack the most pressing questions around data science and AI.
bookmark
Share icon

All episodes

Best episodes

Seasons

Top 10 Towards Data Science Episodes

Goodpods has curated a list of the 10 best Towards Data Science episodes, ranked by the number of listens and likes each episode have garnered from our listeners. If you are listening to Towards Data Science for the first time, there's no better place to start than with one of these standout episodes. If you are a fan of the show, vote for your favorite Towards Data Science episode by adding your comments to the episode page.

If the name data2vec sounds familiar, that’s probably because it made quite a splash on social and even traditional media when it came out, about two months ago. It’s an important entry in what is now a growing list of strategies that are focused on creating individual machine learning architectures that handle many different data types, like text, image and speech.

Most self-supervised learning techniques involve getting a model to take some input data (say, an image or a piece of text) and mask out certain components of those inputs (say by blacking out pixels or words) in order to get the models to predict those masked out components.

That “filling in the blanks” task is hard enough to force AIs to learn facts about their data that generalize well, but it also means training models to perform tasks that are very different depending on the input data type. Filling in blacked out pixels is quite different from filling in blanks in a sentence, for example.

So what if there was a way to come up with one task that we could use to train machine learning models on any kind of data? That’s where data2vec comes in.

For this episode of the podcast, I’m joined by Alexei Baevski, a researcher at Meta AI one of the creators of data2vec. In addition to data2vec, Alexei has been involved in quite a bit of pioneering work on text and speech models, including wav2vec, Facebook’s widely publicized unsupervised speech model. Alexei joined me to talk about how data2vec works and what’s next for that research direction, as well as the future of multi-modal learning.

***

Intro music:

Artist: Ron Gelinas

Track Title: Daybreak Chill Blend (original mix)

Link to Track: https://youtu.be/d8Y2sKIgFWc

***

Chapters:
  • 2:00 Alexei’s background
  • 10:00 Software engineering knowledge
  • 14:10 Role of data2vec in progression
  • 30:00 Delta between student and teacher
  • 38:30 Losing interpreting ability
  • 41:45 Influence of greater abilities
  • 49:15 Wrap-up
bookmark
plus icon
share episode
Towards Data Science - 12. Rachael Tatman - Data science at Kaggle
play

11/06/19 • 48 min

One question I’ve been getting a lot lately is whether graduate degrees — especially PhDs — are necessary in order to land a job in data science. Of course, education requirements vary widely from company to company, which is why I think the most informative answers to this question tend to come not from recruiters or hiring managers, but from data scientists with those fancy degrees, who can speak to whether they were actually useful.

That’s far from the only reason I wanted to sit down with Rachael Tatman for this episode of the podcast though. In addition to holding a PhD in computational sociolinguistics, Rachael is a data scientist at Kaggle, and a popular livestreaming coder (check out her Twitch stream here). She’s has a lot of great insights about breaking into data science, how to get the most out of Kaggle, the future of NLP, and yes, the value of graduate degrees for data science roles.

bookmark
plus icon
share episode
Towards Data Science - 101. Ayanna Howard - AI and the trust problem
play

11/03/21 • 53 min

Over the last two years, the capabilities of AI systems have exploded. AlphaFold2, MuZero, CLIP, DALLE, GPT-3 and many other models have extended the reach of AI to new problem classes. There’s a lot to be excited about.

But as we’ve seen in other episodes of the podcast, there’s a lot more to getting value from an AI system than jacking up its capabilities. And increasingly, one of these additional missing factors is becoming trust. You can make all the powerful AIs you want, but if no one trusts their output — or if people trust it when they shouldn’t — you can end up doing more harm than good.

That’s why we invited Ayanna Howard on the podcast. Ayanna is a roboticist, entrepreneur and Dean of the College of Engineering at Ohio State University, where she focuses her research on human-machine interactions and the factors that go into building human trust in AI systems. She joined me to talk about her research, its applications in medicine and education, and the future of human-machine trust.

---

Intro music:

Artist: Ron Gelinas

Track Title: Daybreak Chill Blend (original mix)

Link to Track: https://youtu.be/d8Y2sKIgFWc

---

Chapters:

0:00 Intro

1:30 Ayanna’s background

6:10 The interpretability of neural networks

12:40 Domain of machine-human interaction

17:00 The issue of preference

20:50 Gelman/newspaper amnesia

26:35 Assessing a person’s persuadability

31:40 Doctors and new technology

36:00 Responsibility and accountability

43:15 The social pressure aspect

47:15 Is Ayanna optimistic?

53:00 Wrap-up

bookmark
plus icon
share episode
Towards Data Science - 118. Angela Fan - Generating Wikipedia articles with AI
play

04/06/22 • 51 min

Generating well-referenced and accurate Wikipedia articles has always been an important problem: Wikipedia has essentially become the Internet's encyclopedia of record, and hundreds of millions of people use it do understand the world.

But over the last decade Wikipedia has also become a critical source of training data for data-hungry text generation models. As a result, any shortcomings in Wikipedia’s content are at risk of being amplified by the text generation tools of the future. If one type of topic or person is chronically under-represented in Wikipedia’s corpus, we can expect generative text models to mirror — or even amplify — that under-representation in their outputs.

Through that lens, the project of Wikipedia article generation is about much more than it seems — it’s quite literally about setting the scene for the language generation systems of the future, and empowering humans to guide those systems in more robust ways.

That’s why I wanted to talk to Meta AI researcher Angela Fan, whose latest project is focused on generating reliable, accurate, and structured Wikipedia articles. She joined me to talk about her work, the implications of high-quality long-form text generation, and the future of human/AI collaboration on this episode of the TDS podcast.

---

Intro music:

Artist: Ron Gelinas

Track Title: Daybreak Chill Blend (original mix)

Link to Track: https://youtu.be/d8Y2sKIgFWc

---

Chapters:
  • 1:45 Journey into Meta AI
  • 5:45 Transition to Wikipedia
  • 11:30 How articles are generated
  • 18:00 Quality of text
  • 21:30 Accuracy metrics
  • 25:30 Risk of hallucinated facts
  • 30:45 Keeping up with changes
  • 36:15 UI/UX problems
  • 45:00 Technical cause of gender imbalance
  • 51:00 Wrap-up
bookmark
plus icon
share episode

There’s an idea in machine learning that most of the progress we see in AI doesn’t come from new algorithms of model architectures. instead, some argue, progress almost entirely comes from scaling up compute power, datasets and model sizes — and besides those three ingredients, nothing else really matters.

Through that lens the history of AI becomes the history f processing power and compute budgets. And if that turns out to be true, then we might be able to do a decent job of predicting AI progress by studying trends in compute power and their impact on AI development.

And that’s why I wanted to talk to Jaime Sevilla, an independent researcher and AI forecaster, and affiliate researcher at Cambridge University’s Centre for the Study of Existential Risk, where he works on technological forecasting and understanding trends in AI in particular. His work’s been cited in a lot of cool places, including Our World In Data, who used his team’s data to put together an exposé on trends in compute. Jaime joined me to talk about compute trends and AI forecasting on this episode of the TDS podcast.

***

Intro music:

Artist: Ron Gelinas

Track Title: Daybreak Chill Blend (original mix)

Link to Track: https://youtu.be/d8Y2sKIgFWc

***

Chapters:

  • 2:00 Trends in compute
  • 4:30 Transformative AI
  • 13:00 Industrial applications
  • 19:00 GPT-3 and scaling
  • 25:00 The two papers
  • 33:00 Biological anchors
  • 39:00 Timing of projects
  • 43:00 The trade-off
  • 47:45 Wrap-up
bookmark
plus icon
share episode
Towards Data Science - 107. Kevin Hu - Data observability and why it matters
play

12/15/21 • 49 min

Imagine for a minute that you’re running a profitable business, and that part of your sales strategy is to send the occasional mass email to people who’ve signed up to be on your mailing list. For a while, this approach leads to a reliable flow of new sales, but then one day, that abruptly stops. What happened?

You pour over logs, looking for an explanation, but it turns out that the problem wasn’t with your software; it was with your data. Maybe the new intern accidentally added a character to every email address in your dataset, or shuffled the names on your mailing list so that Christina got a message addressed to “John”, or vice-versa. Versions of this story happen surprisingly often, and when they happen, the cost can be significant: lost revenue, disappointed customers, or worse — an irreversible loss of trust.

Today, entire products are being built on top of datasets that aren’t monitored properly for critical failures — and an increasing number of those products are operating in high-stakes situations. That’s why data observability is so important: the ability to track the origin, transformations and characteristics of mission-critical data to detect problems before they lead to downstream harm.

And it’s also why we’ll be talking to Kevin Hu, the co-founder and CEO of Metaplane, one of the world’s first data observability startups. Kevin has a deep understanding of data pipelines, and the problems that cap pop up if you they aren’t properly monitored. He joined me to talk about data observability, why it matters, and how it might be connected to responsible AI on this episode of the TDS podcast.

Intro music:

➞ Artist: Ron Gelinas

➞ Track Title: Daybreak Chill Blend (original mix)

➞ Link to Track: https://youtu.be/d8Y2sKIgFWc 0:00

Chapters:

  • 0:00 Intro
  • 2:00 What is data observability?
  • 8:20 Difference between a dataset’s internal and external characteristics
  • 12:20 Why is data so difficult to log?
  • 17:15 Tracing back models
  • 22:00 Algorithmic analyzation of a date
  • 26:30 Data ops in five years
  • 33:20 Relation to cutting-edge AI work
  • 39:25 Software engineering and startup funding
  • 42:05 Problems on a smaller scale
  • 46:40 Future data ops problems to solve
  • 48:45 Wrap-up
bookmark
plus icon
share episode
Towards Data Science - 81. Nicolas Miailhe - AI risk is a global problem
play

04/28/21 • 56 min

In December 1938, a frustrated nuclear physicist named Leo Szilard wrote a letter to the British Admiralty telling them that he had given up on his greatest invention — the nuclear chain reaction.

"The idea of a nuclear chain reaction won’t work. There’s no need to keep this patent secret, and indeed there’s no need to keep this patent too. It won’t work." — Leo Szilard

What Szilard didn’t know when he licked the envelope was that, on that very same day, a research team in Berlin had just split the uranium atom for the very first time. Within a year, the Manhatta Project would begin, and by 1945, the first atomic bomb was dropped on the Japanese city of Hiroshima. It was only four years later — barely a decade after Szilard had written off the idea as impossible — that Russia successfully tested its first atomic weapon, kicking off a global nuclear arms race that continues in various forms to this day.

It’s a surprisingly short jump from cutting edge technology to global-scale risk. But although the nuclear story is a high-profile example of this kind of leap, it’s far from the only one. Today, many see artificial intelligence as a class of technology whose development will lead to global risks — and as a result, as a technology that needs to be managed globally. In much the same way that international treaties have allowed us to reduce the risk of nuclear war, we may need global coordination around AI to mitigate its potential negative impacts.

One of the world’s leading experts on AI’s global coordination problem is Nicolas Miailhe. Nicolas is the co-founder of The Future Society, a global nonprofit whose primary focus is encouraging responsible adoption of AI, and ensuring that countries around the world come to a common understanding of the risks associated with it. Nicolas is a veteran of the prestigious Harvard Kennedy School of Government, an appointed expert to the Global Partnership on AI, and advises cities, governments, international organizations about AI policy.

bookmark
plus icon
share episode

There’s been a lot of talk about the future direction of data science, and for good reason. The space is finally coming into its own, and as the Wild West phase of the mid-2010s well and truly comes to an end, there’s keen interest among data professionals to stay ahead of the curve, and understand what their jobs are likely to look like 2, 5 and 10 years down the road.

And amid all the noise, one trend is clearly emerging, and has already materialized to a significant degree: as more and more of the data science lifecycle is automated or abstracted away, data professionals can afford to spend more time adding value to companies in more strategic ways. One way to do this is to invest your time deepening your subject matter expertise, and mastering the business side of the equation. Another is to double down on technical skills, and focus on owning more and more of the data stack —particularly including productionization and deployment stages.

My guest for today’s episode of the Towards Data Science podcast has been down both of these paths, first as a business-focused data scientist at Spotify, where he spent his time defining business metrics and evaluating products, and second as a data engineer at Better.com, where his focus has shifted towards productionization and engineering. During our chat, Kenny shared his insights about the relative merits of each approach, and the future of the field.

bookmark
plus icon
share episode

This special episode of the Towards Data Science podcast is a cross-over with our friends over at the Banana Data podcast. We’ll be zooming out and talking about some of the most important current challenges AI creates for humanity, and some of the likely future directions the technology might take.

bookmark
plus icon
share episode
Towards Data Science - 126. JR King - Does the brain run on deep learning?
play

09/14/22 • 55 min

Deep learning models — transformers in particular — are defining the cutting edge of AI today. They’re based on an architecture called an artificial neural network, as you probably already know if you’re a regular Towards Data Science reader. And if you are, then you might also already know that as their name suggests, artificial neural networks were inspired by the structure and function of biological neural networks, like those that handle information processing in our brains.

So it’s a natural question to ask: how far does that analogy go? Today, deep neural networks can master an increasingly wide range of skills that were historically unique to humans — skills like creating images, or using language, planning, playing video games, and so on. Could that mean that these systems are processing information like the human brain, too?

To explore that question, we’ll be talking to JR King, a CNRS researcher at the Ecole Normale Supérieure, affiliated with Meta AI, where he leads the Brain & AI group. There, he works on identifying the computational basis of human intelligence, with a focus on language. JR is a remarkably insightful thinker, who’s spent a lot of time studying biological intelligence, where it comes from, and how it maps onto artificial intelligence. And he joined me to explore the fascinating intersection of biological and artificial information processing on this episode of the TDS podcast.

***

Intro music:

Artist: Ron Gelinas

Track Title: Daybreak Chill Blend (original mix)

Link to Track: https://youtu.be/d8Y2sKIgFWc

***

Chapters:
  • 2:30 What is JR’s day-to-day?
  • 5:00 AI and neuroscience
  • 12:15 Quality of signals within the research
  • 21:30 Universality of structures
  • 28:45 What makes up a brain?
  • 37:00 Scaling AI systems
  • 43:30 Growth of the human brain
  • 48:45 Observing certain overlaps
  • 55:30 Wrap-up
bookmark
plus icon
share episode

Show more best episodes

Toggle view more icon

FAQ

How many episodes does Towards Data Science have?

Towards Data Science currently has 131 episodes available.

What topics does Towards Data Science cover?

The podcast is about Podcasts and Technology.

What is the most popular episode on Towards Data Science?

The episode title '44. Jakob Foerster - Multi-agent reinforcement learning and the future of AI' is the most popular.

What is the average episode length on Towards Data Science?

The average episode length on Towards Data Science is 50 minutes.

How often are episodes of Towards Data Science released?

Episodes of Towards Data Science are typically released every 7 days.

When was the first episode of Towards Data Science?

The first episode of Towards Data Science was released on Jul 16, 2019.

Show more FAQ

Toggle view more icon

Comments