This post is a reply to the article "AI Is Evolving — And Changing Our Understanding Of Intelligence" by Blaise Agüera Y Arcas and James Manyika in Noema Magazine.
I want to start with a few quotes from that article:
"The knowledge frontier expands when those new learnings are shared — whether they arise from scientific experimentation, discussion or extended creative thinking offline". I thought offline a lot and now I want to share my findings and discuss.
"combine such advances with sideways steps into adjacent fields or paradigms to discover rich new intellectual territory, rethink our assumptions and reimagine our foundations. New paradigms will be needed to develop intelligence that will benefit humanity, advance science, and ultimately help us understand ourselves." From what I see, the currently applied metaphors and assumptions are natural but misleading. In order to adopt a new paradigm for AI, we need to shift our research focus away from those misleading assumptions.
"many considerations that we must get right" and "we must begin by reassessing our assumptions about the nature of computing." Luckily, intelligence leaves a lot of hints about its principles, components, and algorithm. Criticizing, I will offer an alternative paradigm - efficient, meaningful, reliable.
"bring the theoretical foundations of machine learning, neuroscience and even theoretical biology onto a common footing." The paradigm I am going to propose will provide that "common footing."
"learn cumulatively and open-endedly" One of the problems of current models is that adding one more fact updates the fully trained model's weights ever so slightly, even if for humans that fact may change a lot. Not to mention open-endedness, which is considered challenging by most AI researchers. Consider taking a look at my analysis of open-endedness - Opening Open-Endedness.
Now let's "rethink our assumptions" about intelligence.
Nature of cognitive computation
"Mind is computer" is a natural but misleading metaphor. The reason why I say that lies in what most people think about what computation is. AI experts are no exception. We think about arithmetical operations and those derived from them. Intelligence is based on a different kind of computation.
"In the beginning was the Word." Let's pay attention to the very word. From Latin 'intellegentia' - discernment, understanding, intelligence. Discernment, telling the difference is the key to understanding intelligence. Romans knew better.
To differentiate we need to compare. Comparison is a kind of computation but not the first one to come to mind. That is why "mind is computer" is a misleading metaphor. It naturally leads us to vector embeddings and matrix multiplications instead of the two most important questions for intelligence, "Does it make a difference?" and "What difference does it make?" Let's take a look at what difference comparisons make.
Lessons from 20 Questions
I mentioned above that intelligence leaves hints about its mechanisms. The game 20 Questions is one such hint. I consider this game the most important for cognitive scientists. It fascinated Charles Sanders Peirce, who was so occupied with rigorous logic that he missed the full potential of that game. The failure of GOFAI mentioned in the article hints that intelligence does not rely on rigorous logic. Allen Newell claimed that "You can't play 20 questions with nature and win." I claim that we do play and do win.
Given an unknown object, we can call it "phenomenon" but it does not give us much. We want to classify/categorize/conceptualize that object. The game proceeds by asking yes/no questions, constructing a binary tree. Please pay attention that at each level a question relies on a new property of the object. That property, in turn, relies on the previous answers. The use of properties makes the whole process a "semantic binary search."
There is some freedom about what questions to ask at each level. But there are also key principles to respect. The first question usually is, "Is it tangible?" An answer gives us a category, No - "Abstract", Yes - "Tangible." Please pay attention to the first lesson - we categorize Tangible phenomena not by the similarities of that category instances but by the difference from its sibling category. Being tangible or not may be considered as "similarity" but it is crucial for performance to view them as difference.
Let's talk about performance. Recall the exemplar theory. It explains categorization via comparing the similarities of an unknown object to a given category. You may also recall Fodor's idea about atomic concepts. Imagine you know 1 million categories. If each category is atomic and defined by the similarities of its instances, recognition will be O(N) and take 1 million comparisons multiple properties each and selection of the category with the best results. If a category is defined by differences from its sibling category, as in the game 20 Questions, then recognition will be O(logN) and take up to 20 comparisons 1 property each with no need to select a winner in the end. Even people without Computer Science background will see the difference.
Think about atomic concepts. Apples are different from peaches, from fruits, from hammers, from galaxies, from democracy. But it looks like there are differences to differences. Concepts cannot be independent. With respect to concepts, everything is recognized in comparison. Because of that, concepts should not be considered in isolation. Always take into account a parent category, the defining feature, and a sibling category. Intelligence leaves hints. Please pay attention to how dictionaries define words - C is P but with D (concept is parent with difference (from other parent instances that fall into the sibling category)). In fact, there may be many sibling categories - it is not "red" and "non-red" colors. Updating for that makes the procedure non-binary and even more efficient.
So, what is a category? It is an important question. Many consider that categories stand for objects. It is not so. Any category is defined by a set of defining features (answers in the game 20 Questions) but that set is not exhausting - there are many more features. Some features are immediate - we can ask that category about them and differentiate further. Some features are distant, for example, we cannot ask Tangible about taste, for that we need to go down to Edible. Well, children try to eat/taste everything, but for efficiency, it is meaningful to delay using that property for differentiation.
We want each new property used to divide the remaining possible categories roughly in half. Then we are guaranteed that for any answer we will proceed fast enough as promised by the binary search.
Consider each property as an axis. Answers in the game break it into ranges. It is expected. We cannot rely on point-accurate measurement in categorization. Instances are supposed to be interchangeable. Ranges allow for that. A category then is a set of ranges (of different properties, related by going down from "Phenomenon" according to the game procedure). We may then consider a category as a filter for objects - some of them will pass the filter if their respective properties project to the right ranges, some will be filtered out.
In terms of properties, objects are multidimensional. But note that recognition uses only a subset of those properties. Hence, recognition is a dimensionality-reducing procedure. Actions affect only a subset of properties ("rename" affects the "name" property but not the "weight" property, for example) - another example of dimensionality reduction. Linguistic references are also reducing dimensionality. Dimensionality reduction, along with semantic binary search, makes cognitive functions efficient.
The game 20 Questions relies on the specialization procedure, introducing differences and moving down the tree. For example, it breaks the Fruit category into multiple categories - Apples, Peaches, etc. What happens when we consider Apples and Peaches and decide to ignore the differences between them? We will generalize to the Fruit category. Consider the Furniture category. We may break it down by the Function property or by the Material property or by the Color property. By selecting the property differences in which to ignore, we may generalize differently. The same person may be generalized to Relative or Employee or Human.
We may differentiate, specialize, and generalize not only objects but also actions. We may apply these procedures to any constituent - object, action, adverb of time/place, etc.
For details on generalization, consider taking a look at my post on Foundations of Generalization.
Consider now the general algorithm of the game. It starts with a set of possible categories. It proceeds by filtering the set on the basis of properties of a given object (the game can be adapted to actions or anything else worth categorization). It stops at a desired level. We may generalize this algorithm to any cognitive function or task. It will sound like this, "Selection of the most fitting option from the available ones respecting relevant constraints." We need to consider only available options because of the real-time limitations. If those limitations are relaxed, the set of options may be expanded by transporting, producing, buying, creating additional ones.
The article mentions "discovering some special algorithm." There, I described the core algorithm of cognition based on comparisons.
Given the tree, which we can traverse up (generalization) and down (specialization), we can add rules at each category level and exceptions at subcategory levels.
We may map input parameters of actions to their results in terms of ranges. Differences in results will define ranges boundaries for input parameters. Intelligence leaves hints. The general rule of scientific experiments requires changing one parameter, keeping others the same, and observing the results.
The game 20 Questions is amazing! If you are a cognitive scientist, please do not repeat the mistakes of C.S. Peirce and A. Newell.
Prediction vs Selection
The article analyzed how we solved flight and claimed that planes and birds "serve the same function" even though built differently. I agree with that.
The article continues with, "brains evolved to continually model and predict the future." And then the authors claim that prediction is the "common footing" for all cognitive functions. I described above the algorithm based on selection to provide an alternative to prediction. I do not agree that prediction, be it for the next token or the next action, is the underlying mechanism of cognition. Let's consider a few cases.
I love to mention Roger Federer and how many shots he missed - almost 50%. If he predicted missing a shot, is it intelligent to perform the shot that way? No one will deny that Roger Federer is a highly intelligent player. Real-time limitations do not allow for prediction computations - there, in one phrase, I explained why both prediction and computation are misleading metaphors.
Roger Federer knows many kinds of shots. Given a specific situation, he selects the kind of shot to perform next, shoots and ... "Be it what it may." Intelligence is not about guaranteed results. It is about selecting meaningful actions that are the best given the available options and constraints. In a calm situation, a sportsman may shoot an arrow into a bull's eye (figuratively speaking). Tired, stressed, from a galloping horse, the same shooter may miss the whole target.
Not to mention competition. Does Roger Federer predict missing the match and still decide to play that way? Is he the only defining factor of the result? Prediction in real-time is computationally infeasible even if an agent controls all the parameters. In the presence of competition or opposition, the prediction metaphor fails.
Let's select and go with a new paradigm and ... "Be it what it may."
The next paradigm
I love the "genius of language" hypothesis. One reason for that is that I consider language to be intelligent, as it is created and used by intelligence. I have a post on how language relies on the use of concepts, comparisons, and the core algorithm - Symbolic Communication.
Another reason is related to how to proceed with this new paradigm. Language allows humans to address everything relevant to them and it already has all the concepts, properties and actions affecting them. By analyzing language, we may extract many rules and constructs all those trees to use them as a starting point. Upon their validation, we will see the gaps in our knowledge. But the existing semantic knowledge about properties will guide the process of filling them.
Intelligence leaves hints. Shadows on the walls of Plato's cave. Intelligence looks at objects and sees properties. We also need to switch to comparable properties if we want to adopt a new paradigm for AI.
***
Hey, Google, will your "Paradigms of Intelligence team" consider this one? Maybe, even inviting one more member? I am available for collaboration.
Thanks. I don't get it. That intelligence is symbolic is not a new idea.
Honestly, I think you should read up on the work of others. Historically, breakthroughs in complex matters never come from some genius with his or her own very original idea. It always builds on others. All breakthroughs are by groups or people who are very deeply into the expertise of the field.
Also, if you're using LLMs as an echo chamber, you need to quit that.
I say this with sincere intentions.
Hi. Can you sum up your theory in a few lines?