Data Science Decoded

Mike E

We discuss seminal mathematical papers (sometimes really old 😎 ) that have shaped and established the fields of machine learning and data science as we know them today. The goal of the podcast is to introduce you to the evolution of these fields from a mathematical and slightly philosophical perspective. We will discuss the contribution of these papers, not just from pure a math aspect but also how they influenced the discourse in the field, which areas were opened up as a result, and so on. Our podcast episodes are also available on our youtube: https://youtu.be/wThcXx_vXjQ?si=vnMfs

#8 in the Top 100 mathematics All time chart

1 Listener

All episodes

Best episodes

Seasons

Top 10 Data Science Decoded Episodes

Goodpods has curated a list of the 10 best Data Science Decoded episodes, ranked by the number of listens and likes each episode have garnered from our listeners. If you are listening to Data Science Decoded for the first time, there's no better place to start than with one of these standout episodes. If you are a fan of the show, vote for your favorite Data Science Decoded episode by adding your comments to the episode page.

Data Science #18 - The k-nearest neighbors algorithm (1951)

Data Science Decoded

11/25/24 • 44 min

In the 18th episode we go over the original k-nearest neighbors algorithm; Fix, Evelyn; Hodges, Joseph L. (1951). Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties USAF School of Aviation Medicine, Randolph Field, Texas They introduces a nonparametric method for classifying a new observation z z as belonging to one of two distributions, F F or G G, without assuming specific parametric forms. Using k k-nearest neighbor density estimates, the paper implements a likelihood ratio test for classification and rigorously proves the method's consistency.

The work is a precursor to the modern k k-Nearest Neighbors (KNN) algorithm and established nonparametric approaches as viable alternatives to parametric methods. Its focus on consistency and data-driven learning influenced many modern machine learning techniques, including kernel density estimation and decision trees.

This paper's impact on data science is significant, introducing concepts like neighborhood-based learning and flexible discrimination.

These ideas underpin algorithms widely used today in healthcare, finance, and artificial intelligence, where robust and interpretable models are critical.

1 Listener

Data Science #17 - The Monte Carlo Algorithm (1949)

Data Science Decoded

11/18/24 • 38 min

We review the original Monte Carlo paper from 1949 by Metropolis, Nicholas, and Stanislaw Ulam. "The monte carlo method." Journal of the American statistical association 44.247 (1949): 335-341. The Monte Carlo method uses random sampling to approximate solutions for problems that are too complex for analytical methods, such as integration, optimization, and simulation. Its power lies in leveraging randomness to solve high-dimensional and nonlinear problems, making it a fundamental tool in computational science. In modern data science and AI, Monte Carlo drives key techniques like Bayesian inference (via MCMC) for probabilistic modeling, reinforcement learning for policy evaluation, and uncertainty quantification in predictions. It is essential for handling intractable computations in machine learning and AI systems. By combining scalability and flexibility, Monte Carlo methods enable breakthroughs in areas like natural language processing, computer vision, and autonomous systems. Its ability to approximate solutions underpins advancements in probabilistic reasoning, decision-making, and optimization in the era of AI and big data.

Data Science #22 - The theory of dynamic programming, Paper review 1954

Data Science Decoded

01/07/25 • 47 min

We review Richard Bellman's "The Theory of Dynamic Programming" paper from 1954 which revolutionized how we approach complex decision-making problems through two key innovations. First, his Principle of Optimality established that optimal solutions have a recursive structure - each sub-decision must be optimal given the state resulting from previous decisions. Second, he introduced the concept of focusing on immediate states rather than complete historical sequences, providing a practical way to tackle what he termed the "curse of dimensionality." These foundational ideas directly shaped modern artificial intelligence, particularly reinforcement learning. The mathematical framework Bellman developed - breaking complex problems into smaller, manageable subproblems and making decisions based on current state - underpins many contemporary AI achievements, from game-playing agents like AlphaGo to autonomous systems and robotics. His work essentially created the theoretical backbone that enables modern AI systems to handle sequential decision-making under uncertainty. The principles established in this 1954 paper continue to influence how we design AI systems today, particularly in reinforcement learning and neural network architectures dealing with sequential decision problems.

Data Science #15 - The First Decision Tree Algorithm (1963)

Data Science Decoded

10/28/24 • 36 min

the 15th episode we went over the paper "Problems in the Analysis of Survey Data, and a Proposal" by James N. Morgan and John A. Sonquist from 1963. It highlights seven key issues in analyzing complex survey data, such as high dimensionality, categorical variables, measurement errors, sample variability, intercorrelations, interaction effects, and causal chains.

These challenges complicate efforts to draw meaningful conclusions about relationships between factors like income, education, and occupation. To address these problems, the authors propose a method that sequentially splits data by identifying features that reduce unexplained variance, much like modern decision trees.

The method focuses on maximizing explained variance (SSE), capturing interaction effects, and accounting for sample variability.

It handles both categorical and continuous variables while respecting logical causal priorities. This paper has had a significant influence on modern data science and AI, laying the groundwork for decision trees, CART, random forests, and boosting algorithms.

Its method of splitting data to reduce error, handle interactions, and respect feature hierarchies is foundational in many machine learning models used today. Link to full paper at our website:

https://datasciencedecodedpodcast.com/episode-15-the-first-decision-tree-algorithm-1963

Data Science #10 - The original principal component analysis (PCA) paper by Harold Hotelling (1935)

Data Science Decoded

09/12/24 • 55 min

Hotelling, Harold. "Analysis of a complex of statistical variables into principal components." Journal of educational psychology 24.6 (1933): 417.

This seminal work by Harold Hotelling on PCA remains highly relevant to modern data science because PCA is still widely used for dimensionality reduction, feature extraction, and data visualization. The foundational concepts of eigenvalue decomposition and maximizing variance in orthogonal directions form the backbone of PCA, which is now automated through numerical methods such as Singular Value Decomposition (SVD). Modern PCA handles much larger datasets with advanced variants (e.g., Kernel PCA, Sparse PCA), but the core ideas from the paper—identifying and interpreting key components to reduce dimensionality while preserving the most important information—are still crucial in handling high-dimensional data efficiently today.

Data Science #12 - Kolmogorov complexity paper review (1965) - Part 1

Data Science Decoded

09/28/24 • 38 min

In the 12th episode we review the first part of Kolmogorov's seminal paper:

"3 approaches to the quantitative definition of information’." Problems of information transmission 1.1 (1965): 1-7. The paper introduces algorithmic complexity (or Kolmogorov complexity), which measures the amount of information in an object based on the length of the shortest program that can describe it.

This shifts focus from Shannon entropy, which measures uncertainty probabilistically, to understanding the complexity of structured objects.

Kolmogorov argues that systems like texts or biological data, governed by rules and patterns, are better analyzed by their compressibility—how efficiently they can be described—rather than by random probabilistic models. In modern data science and AI, these ideas are crucial. Machine learning models, like neural networks, aim to compress data into efficient representations to generalize and predict. Kolmogorov complexity underpins the idea of minimizing model complexity while preserving key information, which is essential for preventing overfitting and improving generalization.

In AI, tasks such as text generation and data compression directly apply Kolmogorov's concept of finding the most compact representation, making his work foundational for building efficient, powerful models. This is part 1 out of 2 episodes covering this paper

Data Science #23- The Markov Chain Monte Carl MCMC Paper review (1953)

Data Science Decoded

01/14/25 • 37 min

In the 23rd episode we review the The 1953 paper Metropolis, Nicholas, et al. "Equation of state calculations by fast computing machines."

The journal of chemical physics 21.6 (1953): 1087-1092 which introduced the Monte Carlo method for simulating molecular systems, particularly focusing on two-dimensional rigid-sphere models.

The study used random sampling to compute equilibrium properties like pressure and density, demonstrating a feasible approach for solving analytically intractable statistical mechanics problems. The work pioneered the Metropolis algorithm, a key development in what later became known as Markov Chain Monte Carlo (MCMC) methods.

By validating the Monte Carlo technique against free volume theories and virial expansions, the study showcased its accuracy and set the stage for MCMC as a powerful tool for exploring complex probability distributions. This breakthrough has had a profound impact on modern AI and ML, where MCMC methods are now central to probabilistic modeling, Bayesian inference, and optimization.

These techniques enable applications like generative models, reinforcement learning, and neural network training, supporting the development of robust, data-driven AI systems.

Youtube: https://www.youtube.com/watch?v=gWOawt7hc88&t

Data Science #7 - "The use of multiple measurements in taxonomic problems." (1936), Fisher RA

Data Science Decoded

08/12/24 • 47 min

This paper introduced linear discriminant analysis(LDA), a statistical technique that revolutionized classification in biology and beyond.

Fisher demonstrated how to use multiple measurements to distinguish between different species of iris flowers, laying the foundation for modern multivariate statistics.

His work showed that combining several characteristics could provide more accurate classification than relying on any single trait.

This paper not only solved a practical problem in botany but also opened up new avenues for statistical analysis across various fields.

Fisher's method became a cornerstone of pattern recognition and machine learning, influencing diverse areas from medical diagnostics to AI.

The iris dataset he used, now known as the "Fisher iris" or "Anderson iris" dataset, remains a popular example in data science education and research.

Data Science #4 - "A Mathematical Theory of Communication" (1948), Shannon, C. E. Part - 2

Data Science Decoded

07/21/24 • 41 min

Shannon, Claude Elwood. "A mathematical theory of communication." The Bell system technical journal 27.3 (1948): 379-423. Part 2/3. The paper fundamentally reshapes how we understand communication. The paper introduces a formal framework for analyzing communication systems, addressing the transmission of information with and without noise. Key concepts include the definition of information entropy, the logarithmic measure of information, and the capacity of communication channels. Shannon demonstrates that information can be efficiently encoded and decoded to maximize the transmission rate while minimizing errors introduced by noise. This work is pivotal today as it underpins digital communication technologies, from data compression to error correction in modern telecommunication systems. Full breakdown of the paper with math and python code is at our website: https://datasciencedecodedpodcast.com... This is the second part out of 3, as the paper is quite long!

Data Science #9 - The Unreasonable Effectiveness of Mathematics in Natural Sciences, Eugene Wigner

Data Science Decoded

09/10/24 • 84 min

In this special episode, Daniel Aronovich joins forces with the 632 nm podcast. In this timeless paper Wigner reflects on how mathematical concepts, often developed independently of any concern for the physical world, turn out to be remarkably effective in describing natural phenomena.

This effectiveness is "unreasonable" because there is no clear reason why abstract mathematical constructs should align so well with the laws governing the universe. Full paper is at our website:

https://datasciencedecodedpodcast.com/episode-9-the-unreasonable-effectiveness-of-mathematics-in-natural-sciences-eugene-wigner-1960

Show more best episodes

FAQ

How many episodes does Data Science Decoded have?

Data Science Decoded currently has 25 episodes available.

What topics does Data Science Decoded cover?

The podcast is about Mathematics, Podcasts and Science.

What is the most popular episode on Data Science Decoded?

The episode title 'Data Science #18 - The k-nearest neighbors algorithm (1951)' is the most popular.

What is the average episode length on Data Science Decoded?

The average episode length on Data Science Decoded is 50 minutes.

How often are episodes of Data Science Decoded released?

Episodes of Data Science Decoded are typically released every 8 days, 1 hour.

When was the first episode of Data Science Decoded?

The first episode of Data Science Decoded was released on Jul 7, 2024.

Show more FAQ

Data Science Decoded

Mike E

Top 10 Data Science Decoded Episodes

FAQ

How many episodes does Data Science Decoded have?

What topics does Data Science Decoded cover?

What is the most popular episode on Data Science Decoded?

What is the average episode length on Data Science Decoded?

How often are episodes of Data Science Decoded released?

When was the first episode of Data Science Decoded?

Comments