In this post, I will address multiple issues mentioned in the paper "Naturalistic Computational Cognitive Science: Towards generalizable models and theories that capture the full range of natural behavior" by Wilka Carvalho and Andrew Lampinen in their "quest for building general models of natural intelligence" (all quotes are from that article if not specified otherwise). Since I propose a model of natural intelligence it is only natural.
To develop "models that can work with real-world data" for now is like embracing the unembraceable, so brace yourself. It's going to be a bumpy road as we want to shake off misleading assumptions about intelligence and shift your focus to more promising ones.
Some Metaphysics
The authors mention the search for "generalizable models that can explain a broad range of behavior from a simpler set of principles". I started my research in AI with NLP and even narrower - NLU. I analyzed three types of content words in natural languages - nouns, verbs, and adjectives. My first insight was, "Nouns stand for objects, verbs stand for actions, and adjectives stand for properties. But objects have properties and actions change properties. We may be fine with properties only."
Of course, objects have their role in my model and this role is important. I even extend them to cover anything we can talk about, including actions and properties. But properties are of such a critical importance that I urge to shift the research focus from objects and actions to properties. And what is the most important about properties is their comparability.
Think about how we compare objects - one property at a time. Comparison may be considered "cognitive computation" everyone is talking about.
Recognition/categorization of objects is based on the use of comparable properties. But to think that it relies on similarities of instances within a class is one of those misleading assumptions I mentioned in the beginning. I propose cognitive scientists to adopt the game 20 Questions as a hint into how cognition works. It relies on what I call "semantic binary search", which has amazing performance - in 20 simple comparisons we are able to recognize more than a million categories.
In the beginning, all categories are possible. With one question, "Is it tangible?" we separate two broad categories - Tangible vs Abstract. The following questions rely (which makes it semantic) on the previous answers. Each question relies on the use of a new comparable property (which confirms their importance). But notice that the class is defined by the stack of previously used properties and their selected values. One may say that those are "similar" for instances of that category. True, but what is more important is that those differentiate instances of that category from instances of sibling categories at each level. That is why I like to say that categorization relies on differences and not on similarities.
Compare that approach to what Wittgenstein's family resemblance theory or exemplar theory or prototype theory advocate. These three theories have one problem - they cannot specify how to select a category to compare an unknown object to and what properties to use for comparison. That problem makes them unfitting for real-time applications. The approach of 20 Questions provides defining features (note that they are only a tiny subset of the object's properties) for each category and it determines the procedure that will ensure narrowing down the range of categories for a considered phenomenon (we may update the game to recognize actions, for example).
Please note that the game 20 Questions teaches about generalization as well. The tree it constructs grows from the root by introducing differences specializing the remaining categories. By ignoring differences and moving up the tree we generalize. Depending on what properties we decide to ignore, generalization may proceed differently.
Having that tree we may introduced rules at any level, adding exceptions at the levels below. Exceptions, in fact, help us to define the boundaries between ranges of properties. This is reflected in Einstein's definition of insanity - the opposite of intelligence.
Finally, I want to mention that the game 20 Questions demonstrates the core algorithm of cognition - the selection of the most fitting option from the available ones respecting relevant constraints. In the case of object categorization, we use the set of known categories as available options and the observed properties of an object to be categorized as constraints. At which level to stop going down the tree depends on the current task or purpose.
What I described above may well deserve to be called "a simpler set of principles". Now let's see what it gives us in the light of proposals from the paper.
The authors "argue that naturalistic computational cognitive science should seek theories of cognitive phenomena that consist of two components:
1. Task-performing (predictive) models that reproduce the phenomena across the same range of naturalistic stimuli and paradigms as the human/animal subjects.
2. Reductions of these task-performing models to simpler mechanisms, properties, and theories of why these models reproduce the phenomena."
Any task can be framed as a selection problem with options and constraints. The core algorithm then explains how cognitive functions work under the hood. That is why the shift to comparable properties and comparison-based selection are crucial for cognitive science.
Naturalistic computational cognitive science
The authors propose the following definition, "Naturalistic computational cognitive science is a research strategy for theory-driven cognitive science that aims to predict human behavior across increasingly naturalistic stimuli and tasks." Then they proceed with, "Our goal is to encourage cognitive scientists to embrace experimental paradigms and models that capture a broader spectrum of the variability in inputs, interactions, and tasks present in the natural environment of humans or animals". According to the authors, "naturalistic computational cognitive science seeks to (1) explain real-world intelligent behavior by developing models that operate over the scope of naturalistic inputs, produce naturalistic outputs, and are optimized according to the actual constraints and affordances of an person’s environment, and (2) develop cognitive theories that unite these models with reductive understanding."
The authors propose to expand tasks, environments, architectures, and algorithms in the direction corresponding to more naturalistic ones. In that respect, moving a recognition task from hand-written digits to photos of cats and dogs is a good move, but we have to agree that broader is not the same as natural. Natural intelligence processes dynamic stimuli of different modalities simultaneously and in real time.
Gradual increase of models complexity in terms of their handling more naturalistic stimuli is understandable in the absence of adequate compute but first of all proper theories of intelligence. The theory I propose may help to address those issues but will require a lot of work to be done from scratch.
Also we need to remember that human intelligence is not a "single model". Rather, it's a "model utilizing models utilizing models ..." Human culture moved in the direction of specializing knowledge in multiple domains and even subdomains. The number of algorithms available for any task may be quite big. The selection of the most fitting one should consider not only the available ones but also the relevant constraints.
Why we need Naturalistic computational cognitive science
The authors provide many reasons for the above:
"naturalistic experimental paradigms can lead to different behavior, engage brain systems differently, and expose computational challenges that engage mechanisms differently";
"models that learn from naturalistic data ... perform a wide range of tasks, and yield qualitatively different patterns of generalization than models trained in simplified settings";
"Naturalistic experimental paradigms can expose computational challenges that engage mechanisms differently";
"Learning with naturalistic data can yield good performance across a range of seemingly disparate tasks", for example, "the features discovered by AlexNet could be repurposed to novel tasks";
"Learning from naturalistic data allows us to ask new questions about the origins of knowledge".
I agree that we have observed a significant progress using the deep learning paradigm but other researchers warn us about the limitations of that paradigm. There are even claims about "hitting the wall".
I was skeptical about LLMs. According to the authors, "these models acquire both syntax and semantics" - I am not sure this is so, especially about the latter. Without Rosetta Stone the Egyptian hieroglyphs were hard to decipher even though we had a lot of them to observe. But LLMs train from text only - they do not have the grounding for the words, which may be polysemous, on the one hand, and require context to be properly understood, on the other hand.
I may say, "look how the red dog shows its tongue". Using the word "red" allows me to differentiate the relevant dog in the picture, but the dog is not all red. My listener will know that but that does not follow from that sentence. The information was received via perception guided by that sentence. Further discussion may rely on the actual colors of the dog, but for LLMs to pick those nuances from the sentence is impossible.
We want AI assistants in any real-world real-time situation. To develop such assistants in software and hardware we need to understand us, our intelligence. The theory I propose may be the first step in that quest. More work will be required.
How we should do Naturalistic computational cognitive science
Here I depart from the paper. The authors provide "Recommendations for cognitive science" and I agree with them, but having my model I propose additional ones.
We need feasible theories. Above I mentioned the exemplar theory as a flawed one. We need theories without obvious problems with real-time performance or enormous energy requirements. As the authors put it, "building computational models that perform the task in as naturalistic a way as possible, across as wide a variety of settings as possible, imposes much stronger constraints on our theories than an abstracted model at a higher-level".
I propose to rely on comparable properties. In the process of cultural development, humans polished their knowledge of them and reflected that knowledge in natural languages. We need to reverse engineer relevant comparable properties and defining features of various categories from language (which is genius according to a well-known hypothesis).
Intelligence has a respect for context - external (this refers to a park where we walk, to a park in our memory where we walked yesterday, and to a park in our imagination on an imaginary planet), internal (feelings, needs, beliefs), and inertial (the task at hand - should we continue or switch to a new one, in the latter case we need to perform a transition). Context is often ignored, especially when discussing language. Real-time constraints do not allow us to address everything even in our immediate context. Context is the first factor limiting the number of available options. Discretization or chunking is another. After it is formed we start the selection with respect to relevant constraints.
The authors mention, "explanations of that behavior as a rational solution to constraints". Having multiple algorithms of possible behavior, we select among them based on relevant constraints. We do not solve constraints, we use them to guide the selection of algorithms or their parameters.
***
Italians have a wonderful saying, "Che sara sara" - Be it what it may. I think that intelligence is not about accurate predictions, rather it is about moving the needle in a meaningful way. Given opportunities and constraints of the situation we select the most fitting option and ... che sara sara. I do not know the fate of my model or the results of its implementation and deployment. Be it what it may!
I think your idea seems likely more correct in concept than other such proposals that you compare.
Whatever is the correct answer, I do believe in principle that the search for efficiency should overlap significantly with intelligence.
Intelligence is extraordinarily efficient and much of what is happening right now in AI seems to be ignoring this principle as we seem to be going with "eventually with enough compute and power it will just happen".
If we are on the right path, we should see less power required and less data required as we improve.