
Sepp Hochreiter - LSTM: The Comeback Story?
02/12/25 • 67 min
1 Listener
Sepp Hochreiter, the inventor of LSTM (Long Short-Term Memory) networks – a foundational technology in AI. Sepp discusses his journey, the origins of LSTM, and why he believes his latest work, XLSTM, could be the next big thing in AI, particularly for applications like robotics and industrial simulation. He also shares his controversial perspective on Large Language Models (LLMs) and why reasoning is a critical missing piece in current AI systems.
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
TRANSCRIPT AND BACKGROUND READING:
https://www.dropbox.com/scl/fi/n1vzm79t3uuss8xyinxzo/SEPPH.pdf?rlkey=fp7gwaopjk17uyvgjxekxrh5v&dl=0
Prof. Sepp Hochreiter
https://www.nx-ai.com/
https://x.com/hochreitersepp
https://scholar.google.at/citations?user=tvUH3WMAAAAJ&hl=en
TOC:
1. LLM Evolution and Reasoning Capabilities
[00:00:00] 1.1 LLM Capabilities and Limitations Debate
[00:03:16] 1.2 Program Generation and Reasoning in AI Systems
[00:06:30] 1.3 Human vs AI Reasoning Comparison
[00:09:59] 1.4 New Research Initiatives and Hybrid Approaches
2. LSTM Technical Architecture
[00:13:18] 2.1 LSTM Development History and Technical Background
[00:20:38] 2.2 LSTM vs RNN Architecture and Computational Complexity
[00:25:10] 2.3 xLSTM Architecture and Flash Attention Comparison
[00:30:51] 2.4 Evolution of Gating Mechanisms from Sigmoid to Exponential
3. Industrial Applications and Neuro-Symbolic AI
[00:40:35] 3.1 Industrial Applications and Fixed Memory Advantages
[00:42:31] 3.2 Neuro-Symbolic Integration and Pi AI Project
[00:46:00] 3.3 Integration of Symbolic and Neural AI Approaches
[00:51:29] 3.4 Evolution of AI Paradigms and System Thinking
[00:54:55] 3.5 AI Reasoning and Human Intelligence Comparison
[00:58:12] 3.6 NXAI Company and Industrial AI Applications
REFS:
[00:00:15] Seminal LSTM paper establishing Hochreiter's expertise (Hochreiter & Schmidhuber)
https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-Memory
[00:04:20] Kolmogorov complexity and program composition limitations (Kolmogorov)
https://link.springer.com/article/10.1007/BF02478259
[00:07:10] Limitations of LLM mathematical reasoning and symbolic integration (Various Authors)
https://www.arxiv.org/pdf/2502.03671
[00:09:05] AlphaGo’s Move 37 demonstrating creative AI (Google DeepMind)
https://deepmind.google/research/breakthroughs/alphago/
[00:10:15] New AI research lab in Zurich for fundamental LLM research (Benjamin Crouzier)
https://tufalabs.ai
[00:19:40] Introduction of xLSTM with exponential gating (Beck, Hochreiter, et al.)
https://arxiv.org/abs/2405.04517
[00:22:55] FlashAttention: fast & memory-efficient attention (Tri Dao et al.)
https://arxiv.org/abs/2205.14135
[00:31:00] Historical use of sigmoid/tanh activation in 1990s (James A. McCaffrey)
https://visualstudiomagazine.com/articles/2015/06/01/alternative-activation-functions.aspx
[00:36:10] Mamba 2 state space model architecture (Albert Gu et al.)
https://arxiv.org/abs/2312.00752
[00:46:00] Austria’s Pi AI project integrating symbolic & neural AI (Hochreiter et al.)
https://www.jku.at/en/institute-of-machine-learning/research/projects/
[00:48:10] Neuro-symbolic integration challenges in language models (Diego Calanzone et al.)
https://openreview.net/forum?id=7PGluppo4k
[00:49:30] JKU Linz’s historical and neuro-symbolic research (Sepp Hochreiter)
https://www.jku.at/en/news-events/news/detail/news/bilaterale-ki-projekt-unter-leitung-der-jku-erhaelt-fwf-cluster-of-excellence/
YT: https://www.youtube.com/watch?v=8u2pW2zZLCs
<truncated, see show notes/YT>
Sepp Hochreiter, the inventor of LSTM (Long Short-Term Memory) networks – a foundational technology in AI. Sepp discusses his journey, the origins of LSTM, and why he believes his latest work, XLSTM, could be the next big thing in AI, particularly for applications like robotics and industrial simulation. He also shares his controversial perspective on Large Language Models (LLMs) and why reasoning is a critical missing piece in current AI systems.
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
TRANSCRIPT AND BACKGROUND READING:
https://www.dropbox.com/scl/fi/n1vzm79t3uuss8xyinxzo/SEPPH.pdf?rlkey=fp7gwaopjk17uyvgjxekxrh5v&dl=0
Prof. Sepp Hochreiter
https://www.nx-ai.com/
https://x.com/hochreitersepp
https://scholar.google.at/citations?user=tvUH3WMAAAAJ&hl=en
TOC:
1. LLM Evolution and Reasoning Capabilities
[00:00:00] 1.1 LLM Capabilities and Limitations Debate
[00:03:16] 1.2 Program Generation and Reasoning in AI Systems
[00:06:30] 1.3 Human vs AI Reasoning Comparison
[00:09:59] 1.4 New Research Initiatives and Hybrid Approaches
2. LSTM Technical Architecture
[00:13:18] 2.1 LSTM Development History and Technical Background
[00:20:38] 2.2 LSTM vs RNN Architecture and Computational Complexity
[00:25:10] 2.3 xLSTM Architecture and Flash Attention Comparison
[00:30:51] 2.4 Evolution of Gating Mechanisms from Sigmoid to Exponential
3. Industrial Applications and Neuro-Symbolic AI
[00:40:35] 3.1 Industrial Applications and Fixed Memory Advantages
[00:42:31] 3.2 Neuro-Symbolic Integration and Pi AI Project
[00:46:00] 3.3 Integration of Symbolic and Neural AI Approaches
[00:51:29] 3.4 Evolution of AI Paradigms and System Thinking
[00:54:55] 3.5 AI Reasoning and Human Intelligence Comparison
[00:58:12] 3.6 NXAI Company and Industrial AI Applications
REFS:
[00:00:15] Seminal LSTM paper establishing Hochreiter's expertise (Hochreiter & Schmidhuber)
https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-Memory
[00:04:20] Kolmogorov complexity and program composition limitations (Kolmogorov)
https://link.springer.com/article/10.1007/BF02478259
[00:07:10] Limitations of LLM mathematical reasoning and symbolic integration (Various Authors)
https://www.arxiv.org/pdf/2502.03671
[00:09:05] AlphaGo’s Move 37 demonstrating creative AI (Google DeepMind)
https://deepmind.google/research/breakthroughs/alphago/
[00:10:15] New AI research lab in Zurich for fundamental LLM research (Benjamin Crouzier)
https://tufalabs.ai
[00:19:40] Introduction of xLSTM with exponential gating (Beck, Hochreiter, et al.)
https://arxiv.org/abs/2405.04517
[00:22:55] FlashAttention: fast & memory-efficient attention (Tri Dao et al.)
https://arxiv.org/abs/2205.14135
[00:31:00] Historical use of sigmoid/tanh activation in 1990s (James A. McCaffrey)
https://visualstudiomagazine.com/articles/2015/06/01/alternative-activation-functions.aspx
[00:36:10] Mamba 2 state space model architecture (Albert Gu et al.)
https://arxiv.org/abs/2312.00752
[00:46:00] Austria’s Pi AI project integrating symbolic & neural AI (Hochreiter et al.)
https://www.jku.at/en/institute-of-machine-learning/research/projects/
[00:48:10] Neuro-symbolic integration challenges in language models (Diego Calanzone et al.)
https://openreview.net/forum?id=7PGluppo4k
[00:49:30] JKU Linz’s historical and neuro-symbolic research (Sepp Hochreiter)
https://www.jku.at/en/news-events/news/detail/news/bilaterale-ki-projekt-unter-leitung-der-jku-erhaelt-fwf-cluster-of-excellence/
YT: https://www.youtube.com/watch?v=8u2pW2zZLCs
<truncated, see show notes/YT>
Previous Episode

Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero
Professor Randall Balestriero joins us to discuss neural network geometry, spline theory, and emerging phenomena in deep learning, based on research presented at ICML. Topics include the delayed emergence of adversarial robustness in neural networks ("grokking"), geometric interpretations of neural networks via spline theory, and challenges in reconstruction learning. We also cover geometric analysis of Large Language Models (LLMs) for toxicity detection and the relationship between intrinsic dimensionality and model control in RLHF.
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?
Goto https://tufalabs.ai/
***
Randall Balestriero
https://x.com/randall_balestr
https://randallbalestriero.github.io/
Show notes and transcript: https://www.dropbox.com/scl/fi/3lufge4upq5gy0ug75j4a/RANDALLSHOW.pdf?rlkey=nbemgpa0jhawt1e86rx7372e4&dl=0
TOC:
Introduction
00:00:00: Introduction
Neural Network Geometry and Spline Theory
00:01:41: Neural Network Geometry and Spline Theory
00:07:41: Deep Networks Always Grok
00:11:39: Grokking and Adversarial Robustness
00:16:09: Double Descent and Catastrophic Forgetting
Reconstruction Learning
00:18:49: Reconstruction Learning
00:24:15: Frequency Bias in Neural Networks
Geometric Analysis of Neural Networks
00:29:02: Geometric Analysis of Neural Networks
00:34:41: Adversarial Examples and Region Concentration
LLM Safety and Geometric Analysis
00:40:05: LLM Safety and Geometric Analysis
00:46:11: Toxicity Detection in LLMs
00:52:24: Intrinsic Dimensionality and Model Control
00:58:07: RLHF and High-Dimensional Spaces
Conclusion
01:02:13: Neural Tangent Kernel
01:08:07: Conclusion
REFS:
[00:01:35] Humayun – Deep network geometry & input space partitioning
https://arxiv.org/html/2408.04809v1
[00:03:55] Balestriero & Paris – Linking deep networks to adaptive spline operators
https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf
[00:13:55] Song et al. – Gradient-based white-box adversarial attacks
https://arxiv.org/abs/2012.14965
[00:16:05] Humayun, Balestriero & Baraniuk – Grokking phenomenon & emergent robustness
https://arxiv.org/abs/2402.15555
[00:18:25] Humayun – Training dynamics & double descent via linear region evolution
https://arxiv.org/abs/2310.12977
[00:20:15] Balestriero – Power diagram partitions in DNN decision boundaries
https://arxiv.org/abs/1905.08443
[00:23:00] Frankle & Carbin – Lottery Ticket Hypothesis for network pruning
https://arxiv.org/abs/1803.03635
[00:24:00] Belkin et al. – Double descent phenomenon in modern ML
https://arxiv.org/abs/1812.11118
[00:25:55] Balestriero et al. – Batch normalization’s regularization effects
https://arxiv.org/pdf/2209.14778
[00:29:35] EU – EU AI Act 2024 with compute restrictions
https://www.lw.com/admin/upload/SiteAttachments/EU-AI-Act-Navigating-a-Brave-New-World.pdf
[00:39:30] Humayun, Balestriero & Baraniuk – SplineCam: Visualizing deep network geometry
https://openaccess.thecvf.com/content/CVPR2023/papers/Humayun_SplineCam_Exact_Visualization_and_Characterization_of_Deep_Network_Geometry_and_CVPR_2023_paper.pdf
[00:40:40] Carlini – Trade-offs between adversarial robustness and accuracy
https://arxiv.org/pdf/2407.20099
[00:44:55] Balestriero & LeCun – Limitations of reconstruction-based learning methods
https://openreview.net/forum?id=ez7w0Ss4g9
(truncated, see shownotes PDF)
Next Episode

Daniel Franzen & Jan Disselhoff - ARC Prize 2024 winners
Daniel Franzen and Jan Disselhoff, the "ARChitects" are the official winners of the ARC Prize 2024. Filmed at Tufa Labs in Zurich - they revealed how they achieved a remarkable 53.5% accuracy by creatively utilising large language models (LLMs) in new ways. Discover their innovative techniques, including depth-first search for token selection, test-time training, and a novel augmentation-based validation system. Their results were extremely surprising.
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
Jan Disselhoff
https://www.linkedin.com/in/jan-disselhoff-1423a2240/
Daniel Franzen
https://github.com/da-fr
ARC Prize: http://arcprize.org/
TRANSCRIPT AND BACKGROUND READING:
https://www.dropbox.com/scl/fi/utkn2i1ma79fn6an4yvjw/ARCHitects.pdf?rlkey=67pe38mtss7oyhjk2ad0d2aza&dl=0
TOC
1. Solution Architecture and Strategy Overview
[00:00:00] 1.1 Initial Solution Overview and Model Architecture
[00:04:25] 1.2 LLM Capabilities and Dataset Approach
[00:10:51] 1.3 Test-Time Training and Data Augmentation Strategies
[00:14:08] 1.4 Sampling Methods and Search Implementation
[00:17:52] 1.5 ARC vs Language Model Context Comparison
2. LLM Search and Model Implementation
[00:21:53] 2.1 LLM-Guided Search Approaches and Solution Validation
[00:27:04] 2.2 Symmetry Augmentation and Model Architecture
[00:30:11] 2.3 Model Intelligence Characteristics and Performance
[00:37:23] 2.4 Tokenization and Numerical Processing Challenges
3. Advanced Training and Optimization
[00:45:15] 3.1 DFS Token Selection and Probability Thresholds
[00:49:41] 3.2 Model Size and Fine-tuning Performance Trade-offs
[00:53:07] 3.3 LoRA Implementation and Catastrophic Forgetting Prevention
[00:56:10] 3.4 Training Infrastructure and Optimization Experiments
[01:02:34] 3.5 Search Tree Analysis and Entropy Distribution Patterns
REFS
[00:01:05] Winning ARC 2024 solution using 12B param model, Franzen, Disselhoff, Hartmann
https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf
[00:03:40] Robustness of analogical reasoning in LLMs, Melanie Mitchell
https://arxiv.org/html/2411.14215
[00:07:50] Re-ARC dataset generator for ARC task variations, Michael Hodel
https://github.com/michaelhodel/re-arc
[00:15:00] Analysis of search methods in LLMs (greedy, beam, DFS), Chen et al.
https://arxiv.org/html/2408.00724v2
[00:16:55] Language model reachability space exploration, University of Toronto
https://www.youtube.com/watch?v=Bpgloy1dDn0
[00:22:30] GPT-4 guided code solutions for ARC tasks, Ryan Greenblatt
https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt
[00:41:20] GPT tokenization approach for numbers, OpenAI
https://platform.openai.com/docs/guides/text-generation/tokenizer-examples
[00:46:25] DFS in AI search strategies, Russell & Norvig
https://www.amazon.com/Artificial-Intelligence-Modern-Approach-4th/dp/0134610997
[00:53:10] Paper on catastrophic forgetting in neural networks, Kirkpatrick et al.
https://www.pnas.org/doi/10.1073/pnas.1611835114
[00:54:00] LoRA for efficient fine-tuning of LLMs, Hu et al.
https://arxiv.org/abs/2106.09685
[00:57:20] NVIDIA H100 Tensor Core GPU specs, NVIDIA
https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/
[01:04:55] Original MCTS in computer Go, Yifan Jin
https://stanford.edu/~rezab/classes/cme323/S15/projects/montecarlo_search_tree_report.pdf
If you like this episode you’ll love
Episode Comments
Featured in these lists
Generate a badge
Get a badge for your website that links back to this episode
<a href="https://goodpods.com/podcasts/machine-learning-street-talk-mlst-213859/sepp-hochreiter-lstm-the-comeback-story-84019419"> <img src="https://storage.googleapis.com/goodpods-images-bucket/badges/generic-badge-1.svg" alt="listen to sepp hochreiter - lstm: the comeback story? on goodpods" style="width: 225px" /> </a>
Copy