Are mental models the key to the next stage of AI?

Despite its spectacular results, particularly since ChatGPT’s release in 2022, AI faces a significant structural limitation today: it relies on superficial statistical correlations rather than a profound comprehension of the laws of reality. AI is incapable of true causal reasoning, resulting in logical hallucinations and an inability to plan complex tasks over the long term. This lack of internal structure renders learning extremely inefficient, necessitating vast amounts of data when a human would require only a few examples to comprehend and predict a new situation. It is precisely this idea of “internal structure” that could enable the next big step in AI: the use of mental models.

The concept of mental models originated with the work of Kenneth Craik, a relatively unknown Scottish psychologist. His 1943 book, The Nature of Explanation, revolutionized our understanding of how the mind works. According to Craik, thinking is not a direct reaction to stimuli (contra the behaviorists) but rather the manipulation of a small-scale mental model of reality, or “model of the world.” He believed that the brain functions like a machine that translates external events into symbols, transforms them through inference, and retranslates the results into actions or predictions. This fundamental contribution makes it possible to conceive of intelligence as an internal simulation mechanism that allows an organism to anticipate the future without exposing itself to the immediate dangers of the physical world.

This perspective defines human cognition as an economy of effort and risk. With this internal representation of the world, human beings can test hypotheses without acting. Thought is not merely an accumulation of knowledge but rather a dynamic ability to structure cause-and-effect relationships. Unfortunately, Craik died shortly after publishing his book and was unable to continue his work. In the 1980s, philosopher and linguist Philip Johnson-Laird updated this theory in his book Mental Models. Johnson-Laird demonstrated that our logical errors are not failures of intelligence but rather limitations of our working memory. We fail when we are unable to simulate all the alternative models of a given situation. Neurologist Antonio Damasio also highlights the importance of models in his remarkable book Descartes’ Error (he refers to them as images of the world). I use this concept often to understand decision-making in uncertain situations and the difficulty of change.

The current limitations of AI

As Yann Le Cun, a pioneer in the field and former chief AI scientist at Meta, points out, the lack of internal models limits current AI based on learning via language models (LLMs). They lack “common sense.” In essence, language alone is insufficient for understanding the world. A knowledge structure is needed. By adopting Craik’s approach, AI shifts from performing complex statistical calculations to constructing a structured representation. This solves the problem of fragility. By possessing a model of the world, an AI can simulate unprecedented scenarios, plan complex actions, and understand that a falling cup will break without having to read it a thousand times. We transcend the limitations of pure induction. The evolution toward these models also bridges the gap between raw data and real understanding. While current AIs are content to correlate pixels or words, next-generation architectures attempt to capture the deep structure of reality.

However, this approach raises several questions. First, how can these models be created automatically to be effective? Furthermore, a model is subjective by definition. While it is easy to code the law of gravity causally, economic models are not as simple because they are not based on physical laws but on values and beliefs. Not to mention subjects in the humanities. Is there a risk of falling into the scientistic illusion that everything can be objectively formalized? If they are subjective, though, who is the subject? Can models be created independently of the notion of personality? Furthermore, Craik’s model is rooted in biology and sensory experience, whereas AI models are mathematical constructs. Can a machine acquire true causal intuition without a body to experience physical reality? Damasio would answer no because, to him, consciousness requires a body. Fortunately, significant progress thanks to models does not require answering this question beforehand.

Paradigm shift

Innovation progresses in stages through paradigm shifts. Progress can sometimes be blocked without such a change, as is the case with Alzheimer’s disease or NATO’s strategy after Ukraine. AI has experienced such a blockage. For a long time, the paradigm was the expert system, which assumed that knowledge could be formalized as explicit rules. For example, “If the patient’s temperature is above 38° and they have a rash, then consider disease X.” However, this approach was inaccurate and led to the near-total failure of AI in the 1990s. Machine learning succeeded it and is the current paradigm. It has been remarkably successful, but it is already reaching its limits. These limits cannot be overcome by doing more of the same. Adopting the use of mental models would represent a third paradigm and a significant advancement. It is difficult to say whether this will succeed, but many are already betting on it, including Yann Le Cun with his new startup and Google with its Project Genie.

✚ If you want to know more about mental models and their role in management, browse this site, I have published numerous articles on the topic.

🔎 Original source for this article: The Economist, I can show you the world.

🇫🇷 A version in French of this article is available here.

📬 If you enjoyed this article, feel free to subscribe to be notified of future ones.

Leave a Reply