Epiplexity, Thinking, and AGI
Part 1: What’s Hiding in Your Data
Posted January-2026
On social media, I saw the announcement of a new paper[1] in Information Theory. I believe this is a major breakthrough with significant ramifications for AI. It is perhaps foundational (pun intended) and I expect it will win prizes. At the same time, the paper is not accessible to anyone without a grad degree in Mathematics, and it’s science not business, so there’s some space for me to add a bit. But I want to be clear how much I admire the work in this paper - it’s really fantastic.
The paper begins with three paradoxes where observed behavior of AI models does not follow the predictions of standard Information Theory. One of these - generalization to out of distribution (OOD) queries - is… or was… a favorite of mine. Staying non-technical, Information Theory says that machine learning can only interpolate within the training data; extrapolation out of the distribution of the data is not possible. When discussing AI with people, I used to say, “I know the Math and that can’t happen.” That ended when I read about epiplexity.
Information Theory measures data “entropy”… a measure of randomness. Machine learning finds where the data is less random. An LLM for instance, correlates to find “next word” probabilities. If the data were completely random, all words would be equally probable as the next word. But that’s obviously not so… the information in the data is the fact that the next word probabilities are not equal. That is what the LLM learning is capturing, at least that’s what we’re told.
But Information Theory says that the information lives only within the data. Extrapolating outside of the data cannot be done at anything better than random luck. The paradox is that anyone can do an experiment right now to prove that LLMs can indeed extrapolate! Just go to you favorite LLM and ask it to make up a story about X… where X is something you create from whole cloth. “Tell me a story about Glibglops. They are creatures with five arms and three heads that eat sand" will get you something passible. It probably won’t be great, or art, or even that creative, but it will be a decent text. How can this be?
The answer is that data often have structure that underlies it. Sticking with text for training an LLM, note that at one level, the text is grouped into documents of given types. A contract is different from a medical record is different from a novel, even though it’s all just words. Note that at another level, the text was all created using the language’s grammar. Again, this is not indicated in the words themselves, but does impact which words are more likely next in a sentence.
Epistemic Complexity - or epiplexity - was created by the authors to measure this underlying structure. They then prove all sorts of useful things, including that it varies with the aperture used to examine the data… just as document types and language grammar are different levels.
Most importantly, epiplexity enables a theoretical solution to the paradoxes. In the example above, if the data has underlying structure, the machine can learn it. And… to the extent that that structure extends beyond the dataset, the machine can extrapolate. That is a fantastic result!
At the same time, epiplexity solves the paradoxes without resorting to explanations requiring thinking, abstraction, or reasoning to “emerge” from training. That magic does not happen. Rather, we can observe that machine learning does data compression by reducing entropy and structure generalization by exploiting epiplexity.
In Part 2, we will discuss the implications for AI, AGI, and more of this breakthrough.
[1] “From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence” by Marc Finzi, Shikai Qiu, Yiding Jiang, Pavel Izmailov, J. Zico Kolter, Andrew Gordon Wilson at https://arxiv.org/abs/2601.03220.
A colleague of mine ran the paper through ChatGPT, asking to recast it for a slightly less technical audience. The result is a bit more accessible.