Memory
Not to forget anything, let me start with the list of issues to be covered in this post:
1. The CogSieve Theory reiterated
- comparable properties
- definition of intelligence
- the core algorithm of intelligence
2. Perception
- is object recognition based on similarities or differences?
- modalities priorities, distorted perception
- the role of schema, its obligatory and optional components
- the role of our goals and needs in guiding our attention, change blindness
- how attention projects multi-dimensional objects onto single-dimensional concepts
- relevant objects and context, chiseling them out
3. Knowledge
- is generalization based on similarities or differences?
- different kinds of knowledge, information, and differences
- language and meaning, limitations
4. Learning
- learning from the mistakes of others
- embodiment
5. Memory
5.1 Differences
- primacy and recency effects
- extraverts vs introverts
- meaningful information vs gibberish
- explicit vs implicit
- rehearsal vs test querying
5.2 Keys
- efficient lookup requirements
- remembering vs forgetting
- rewriting, same key but different value
- generalization revisited
- retrieval and reconstruction of schema
- spacing effects - more keys
- cache and priming
- rehearsal and imposing POV
- information about information
- context-dependent learning - context is also encoded in keys
- firing threshold and optional/obligatory components of schema
- "sound-alike" and "same hand-shape" errors, the role of language
5.3 Hippocampus and amygdala
- graph vs state machine
- hippocampus and new records, relevant information vs context
- imagination and understanding of stories
- amygdala and the role of humor
5.4 Syndromes
- amnesia - retrograde, anterograde, source, etc.
- damage to the amygdala but not the hippocampus
- Capgras syndrome and others
5.5 Where is it?
- synapses vs DNA/RNA
- energy efficiency
6. Consciousness
- competency and awareness assemblies
- from cells to self
- boundaries
- limits of interest and/or influence
7. On current events
- manipulation and propaganda, truth blurred
- uniting people
Why was it necessary to include the above list? I could start covering those issues without it just as well. First, in the case of memory, repetitions are not bad. Second, that is not exactly a repetition.
Our memory also prefers meaningful information over gibberish. I hope you will find this post meaningful.
1. The CogSieve Theory reiterated
Witkowski et al. 2023 defined intelligence as "observer-relative competencies to solve problems in some specified space" and later refine this definition with "different spaces and with different competencies" and apply it to all biological beings.
In order to solve different problems in different situations using different competencies we first need to recognize in each case a specific problem out of many possible ones and select the most fitting competency to apply to solve the problem. Bateson 1972 questions the nature of differences, "But what is a difference? A difference is a very peculiar and obscure concept. It is certainly not a thing or an event." Their importance is further stressed by Derrida 1982 and Deleuze 1994.
I propose to respect differences. I even define intelligence as the ability to handle differences.
Differentiation is based on comparison. Objects are compared by static values of properties. Actions are compared by dynamic changes of properties. This makes comparable properties the first fundamental component of intelligence.
The core algorithm of intelligence then is based on comparisons. The best illustration is the game 20 Questions. Venn diagrams illustrate the other way of applying comparisons to narrow down the set of fitting objects.
2. Perception
The world contains a lot of objects and those objects are all unique. But for practical purposes, some objects are interchangeable (Deleuze 1994). How do we group such objects under an umbrella of a concept?
Reisberg's "Cognition" mentions several possible approaches: definitions, family resemblance, prototypes, and exemplars. Definitions are no good because for any definition we can find examples that belong to a concept but do not satisfy the definition (Wittgenstein 1953). Wittgenstein proposed to consider family resemblance based on shared features. Prototype and exemplar theories also use the idea of similarities of objects within a class.
Reisberg then mentions several studies where the similarities-based approaches are tested or criticized. For example, the category "counterfeit money" deals a serious blow to such theories. So, similarities are not good. Reisberg then calls for "Different Profiles for Different Concepts".
To understand why differences are better for understanding concepts, consider the sentence "Robin is a bird". Similar sentences are used in a sentence verification task - a common technique used in experiments to test various theories. I have concerns of a philosophical nature with respect to that technique. Why are robins evaluated as birds? What about spiders, whales, or snakes? How long does each check take?
What I mean is - consider the object recognition task. You are given an object and you know hundreds or thousands of categories. Similarities-based approaches require checking the object on how similar it is to each known concept. Note also that we do it in real-time. Is it realistic and energy-efficient to do it that way?
Now recall the amazing efficiency and speed of 20 Questions. Does that game check similarities? No, it checks differences. This is important. We do not check for similarities of objects within a class, we check for differences among same-level subclasses. When we end up within a category, the features of objects inside of it are determined on a residual basis - we have what we have. Consider mammals differentiated from other animals on the basis of children's feeding preferences. But we do not impose the similarity of features on whales, giraffes, mice, and bears.
Let's analyze the rainbow colors. To describe basic colors we cut the rainbow with 6 or 7 decisive cuts. Yes, there is always a place for vagueness - consider reading the SEP entry. If borderline case errors are tolerable we do not care for higher accuracy. If it is needed we introduce millions of shades. But even then each category (shade in this case) is represented with a range, not a dot on some axis. We compare ranges of comparable properties with respect to vagueness.
If any property is considered as an axis then concepts having multiple properties may be considered as multi-dimensional cubes or buckets. Each bucket contains instances and can be subdivided in the process of specialization.
Errare humanum est. It is not only vagueness that can introduce errors. Our perception channels are not equal. This is reflected in the size of the respective areas of the brain areas responsible for processing signals from our senses. In a movie theater we "hear" sound coming from the lips of characters on the screen not from the loud-speakers on each side of it. We almost believe that it is a doll speaking not a ventriloquist. This is because vision is more accurate than hearing, which is less reliable in locating the source of sound. So we willingly relocate its source to the lips on the screen but we do it in our head. Our perception is deliberately distorted. Memory is no different - it also can be easily manipulated.
Apart from primitive objects, there are also composite objects - mechanisms, animals, groups, etc. We recognize them using the same differentiating mechanism. However, it is important to understand that such composite objects have components and those components may be optional or obligatory. For example, it is obligatory for a forest to contain trees but the type of trees is optional. A person may wear glasses or not - it should not affect the identity.
Intelligence is believed to solve tasks, satisfy needs, handle adaptation, etc. I claim that it is a high-level view of it. Consider an intelligent agent - it receives signals from inside and outside, differentiates them, and based on the current situation chooses how to proceed. For example, if the dominating signal is hunger, then the logical thing to do is to search for food. If the dominating signal is danger, then it is necessary to search for weapons or shelter. If there is an apple tree nearby then in the first case, we will appreciate apples as food, in the second case we will use them as projectile weapons or we will climb the tree to hide there.
Note that we considered the same object (apples) as different concepts (food vs weapons vs shelter) based on our current needs/goals. When all you have is a hammer then everything seems like a nail. But if you don't have a hammer and need to hit a nail then you will look at the surrounding objects through the prism of your need. You will ignore all the other objects that do not qualify and you will ignore changes going on around you if those are irrelevant. That may explain the phenomenon of change blindness.
Note that the ability of apples to serve as food or weapons is always there but if we do not need one or the other those extra properties are ignored thus we essentially perform projection. We drastically decrease the number of dimensions to enable faster and more efficient processing.
The last thing to mention in this section, but not the least, is context. It is necessary to remember that context encompasses all the objects (viewed broadly - abstractions and actions are included) and relationships among them. It is a mistake to include in the context only those objects that are relevant to an agent. Relevance plays the role of a chisel - it separates the relevant objects from all the other objects in the context. The fact that in most practical situations the number of objects in each context is quite limited makes the core algorithm of intelligence based on comparisons and filtering especially efficient.
3. Knowledge
We have seen that differences may help in recognizing concepts. Can they help with generalization? We have also mentioned above that in the process of specialization, any multi-dimensional cube may be subdivided based on some comparable property into multiple subclasses. So specialization may be achieved using differences. But if we ignore those differentiating factors we will achieve generalization.
Almost all concepts are essentially subclasses of some other concepts. It implies that the atomic nature of concepts proposed by Fodor is not reasonable because with respect to concepts, everything is recognized in comparison.
Note that specialization may proceed along many different properties. Therefore, there may be many possible paths for generalization as well. Ignore the color of "red chairs" and "green chairs" and you will generalize to "chairs". Ignore the material of "wooden chairs" and "plastic chairs" and you will generalize to "chairs".
Claude Shannon 1948, in the paper "A Mathematical Theory of Communication", defined information as a set of possible messages. The word "different" is implied.
Let's now return to our agent and analyze what kinds of knowledge it may encounter. The first type of knowledge is from its sensors. The second type is general information. The third type is textual - it may represent sensual or general information from other agents, for example.
The third type leads us to communication, language, and meaning. In this post, I will not dive deep into these broad issues. Let me just mention that there are mechanisms for encoding and decoding information of the first and second types and transferring it between two or more agents. In my previous posts, I tried to explain how differences enable both encoding and decoding of information. If that is not enough, let me know if another post or two are needed.
Language has its limitations. Text alone will not explain emotions, just like it will not allow for mastering any skill. Still, the communication function of language makes humans who we are and stands behind our culture.
4. Learning
I come to believe that it is wrong that we may learn from the mistakes of others. We may observe them. We may understand them. We may try to simulate the observed situation in our head and in those simulations try various alternative actions. Essentially, in those simulations, we train and learn. If possible, we can train in real life as well.
But note the role of embodiment in learning. The role of learning is to help us in our future interactions with the world. During training, we are allowed to be inefficient. But without that process of trying multiple parameters regulating our decisions and actions, we will never achieve the desired level of efficiency. Maybe one day robots will come up with a skill transfer technology similar to that from the Matrix movie. But even that will work only for identical robots. Differences are messy but worth it.
5. Memory
This post was intended to be about memory. We are finally ready to talk about it.
5.1 Differences
We will start by mentioning several interesting observations about memory. They demonstrate the presence of clearly different mechanisms behind memory.
One of the simplest experiments in the studies of memory is remembering lists - of words, numbers, pictures, etc. It was noted that people remember better the elements from the beginning (primacy effect) and the end (recency effect) of such lists.
A number of studies investigated who has better memory - extroverts or introverts. Extraverts won.
Other studies showed that meaningful information is remembered better than gibberish. Do you remember that it has already been mentioned?
There are clear indications that we remember a lot of information but some of it we remember explicitly and we can reproduce it. Other knowledge is implicit and we cannot even say what we know or if we know something of that kind at all.
In text understanding tasks, it was shown that reading the text one more time is less efficient than testing your understanding in a test mode. The details of the text are better remembered in the latter case as well.
There are many other facts like those above but even those demonstrate that memory is far from simple and involves many moving parts if I may use that term.
5.2 Keys
As a programmer by trade, I should know multiple data structures and the principles of their work. Since the allegory of computation has been used for processes in the brain for a long time, data structures from programming are not a bad place for inspiration when it comes to understanding memory.
Because I already mentioned 20 Questions, the idea of a "key-value" map based on binary search trees (BST) is a natural candidate to consider. If you recall the limited number of objects in the context and that BSTs have logarithmic time complexity, we will get a decent performance using that data structure to model memory. But surely, we do not want to forget about keys. In this section, we will explore their possible structure.
In his "Forgetting", Small 2021 shares an important idea that memory involves two processes - writing and deleting, remembering and forgetting. We are not interested in what molecules are responsible for each process but we are definitely interested in including both processes in modeling memory.
Let's return to the list remembering task. Can we explain primacy and recency effects using keys and those two mechanisms? The traditional explanation is that the recency effect takes place naturally because the latest elements are still fresh in our memory, while the primacy effect happens because of the rehearsal - it is more efficient for the starting elements. I like the whole concept of rehearsal and its implications but in this case, it is unlikely the right explanation.
Consider the following idea - we create a key in our head, something like "words-to-remember" and we start writing the whole list under that one key. The brain allows creating several links for one key but that key becomes overloaded quickly and our brain starts deleting links that are least important. This does not necessarily mean going in the exact FIFO order. Later, we will discuss the effect of emotions on memory, for now, we can assume that emotions at the start of an experiment may play some role in better memory about initial elements.
Let's now revisit generalization. We may observe multiple instances of multiple concepts. For example, each observed dog will occupy a sub-cube within the "dogs" general cube. Note, however, that each property may be considered separately, transferred to other concepts having that property, with or without adjustments if necessary. For example, if we observe a dog with an injured leg we may assume that we can observe a cat with an injured leg or the same dog with two injured legs.
Each time we remember information about a subclass we also update information about its parent class - at least about the differentiating factors involving the subclass. Since subclasses are formed as sub-cubes, it should not be difficult to reflect information about their parent class in their keys. In real-time, it may be too much to expect all such updates to happen on time. It is easier to expect some marker left for updates to be performed during sleep. It's just a suggestion.
We may update our information about composite objects and their schemas. Not only we may reconsider the allowed boundaries within each property but we may also reconsider if some property is optional for that object or obligatory. Here we may need to design keys with care.
Something similar to schemas may be at work when something "pops up" in our memory after hearing a mention of something else. When some concepts are often mentioned together they become friends in our memory. If one mentions Chandler, get ready for Monica to pop up. If emotions were involved when records of those concepts were stored in our memory the effect would be even stronger.
The previous observation illustrates another point - information about information is also worth remembering. Coincidences, co-occurrences, and events often observed in succession are all worth remembering. Our memory has a mechanism of forgetting, remember? If something is not reaffirmed by repetitive occurrence, it will be removed from memory. Keys may benefit from this mechanism as well. We remember information and a key, then we remember that the key and that piece of information are related. Later we retrieve information by the key and update it or we encounter the same information under different circumstances, which may lead to updating the key.
It was long observed that learning is more efficient if sessions are spaced out (https://en.wikipedia.org/wiki/Spacing_effect) as opposed to learning in one massive session. If we consider that each session has one master key and each key is only capable of linking to a limited number of information nodes then it becomes clear that multiple sessions have an advantage.
Let's now consider priming (https://en.wikipedia.org/wiki/Priming_(psychology)). Exposure (even subconscious) to some signals makes them "warmed up" and if the following tasks may query for something similar then those signals "pop up" in our memory. In programming, this effect is known as "cache" memory - "browser history" or "recently opened files".
A good insight into the structure of keys is provided by context-dependent learning. Reisberg in "Cognition" mentions an illustrative study of divers studying and passing exams in a library and underwater. The results were better for those groups of participants who studied and passed exams in the same environment. Therefore, context is present in the structure of keys.
We can associate the previous observation with photos made on smartphones - they all have a timestamp and a geolocation tag. Not a bad information to include in keys in our model. Note that along with question words "who" and "what" we also have question words "where" and "when". Having such information encoded into keys allows for a more reliable retrieval.
Why do extroverts remember better and more? Because they pay attention and are interested in all the relationships and intricate details of events around them. They actively absorb keys. No wonder that character trait was called "The Luck Factor" by Wiseman in his book.
Language plays an important role in both querying memory and forming it. It was observed and confirmed that we are more likely to make mistakes remembering words that sound like other words than those that look like other words. The same is true for deaf people - they are more likely to confuse words that use similar hand shapes.
By the way, suppressing internal rehearsal, or hand movements in the case of deaf people, significantly worsens our ability to remember information presented in textual form - words, numbers, phrases, etc. This may be related to another observation that we remember better information related to us. By rehearsing we "internalize" what we see or hear.
Last but not least, rehearsal in my opinion may also play the role of turning our implicit needs and goals into explicit representation. When we observe a scene with an apple tree and say, "Those apples are yummy", most likely we are not looking for projectile weapons or shelter, we are just hungry. So, by rehearsing we impose our POV on what we observe and start projecting, simplifying, and modeling the context according to our needs/goals.
5.3 Hippocampus and amygdala
Currently popular LLMs based on the statistical approach to AI often make mistakes that can be explained by their missing a "model of the world". Multiple studies of the hippocampus show that it may be involved in constructing such a model. But there are some nuances involved that I would like to mention. Note that I am not a neuroscientist, so I may miss some aspects of the big picture.
"The Tolman-Eichenbaum Machine" article by Whittington et al. proposes to use graphs as the underlying model for tracking relationships among objects. The examples provided to support this idea demonstrate that graphs can model relationships of many types and even generalize them. I am concerned about the dynamics though. What about adding and deleting nodes or transitions between scenes? Most likely, those can be easily added to the model in the form, for example, of state machines.
The story of Henry Molaison, whose hippocampus was removed to treat epilepsy, is a classic. The conclusion was that the hippocampus stores long-term memory. Does it really? That small piece of our brain? Let's be serious.
The discovery of place cells in the hippocampus and the agile plasticity of those cells do not support the "long-term memory storing" story either. Time cells were also discovered in the hippocampus. Don't you think that storing all of our long-term memory while keeping track of time and space is a bit of a stretch for a tiny area in our brain?
On the other hand, if we forget for a second about long-term memory, focus on time and place cells, and recall about timestamps and geolocation tags in photos on our smartphones then maybe we will return to the idea of keys for storing records in memory.
Here is a great quote from Eichenbaum and Cohen 2001, "When a memory is retrieved, all its fragments are put back together rarely identical to the initial experience". It looks like we use modeling and simplify a lot. With the help of schemas, we may remember only essential/obligatory aspects of some composite scene or object and later reconstruct the pieces that "most likely" could be there. Reisberg mentions the study by Brewer and Treyens 1981 when participants "recalled" shelves filled with books in an academic office while books were not there.
The above may be a useful hint into what the hippocampus does. It simplifies the environment around us to create a model of the context and this model is stored in memory. Raw sensory data are too resource-demanding to store them. Besides, we can always take another look or ask an interlocutor to repeat.
One important note - the hippocampus makes each of us the center of the Universe. Every model it creates is "me"-centric. Therefore, it is involved in storing memories that require or involve our personal experience. The other story is when you learn general facts, like "Paris is the capital of France". On the first mention, you will remember the context when you learned that astonishing fact. On the hundredth mention, the key to that fact will get rid of irrelevant pieces, like who told you that, where you read it, or when. Those are chiseled out. Forgetting mechanism, remember?
This is why the hippocampus is involved when we imagine ourselves somewhere, even in the role of observers, as we do reading sci-fi books. But imagination is almost impossible without evoking emotions. That is why in this section I also include the amygdala.
I will not claim that the amygdala alone is involved in emotions and our experiences of them. It may be even not the most important module responsible for emotions. Let's just make it the front-module of the "Emotions" band.
Memories of events rich with emotions are stored faster and for longer. That is a fact. In "Amygdala Models", Guntu et al. 2017 state, "Emotional memory is viewed as an implicit or unconscious form of memory and contrasts with explicit or declarative memory mediated by the hippocampus". In my model, emotions are just another type of key.
Using humor as an example, I would like to illustrate how our collective, implicit intelligence uses emotions for good. Again, I am not claiming anything here - just a hypothesis. If we consider situations when people laugh at someone, most often that person was behaving inefficiently or without proper care and got into trouble. One obviously does not feel comfortable when laughed at, so laughter serves as a stimulus to remember not to behave as that person did. But on the other side, this situation is remembered also by those who laughed and so they will also get the stimulus to remember how not to behave. This way, laughter discourages inefficiency in a community. Not bad for implicit mechanisms.
5.4 Syndromes
The above-described mechanisms allow for imperfections, which are still tolerable. But there are cases when the whole areas of the brain go off because of infection, trauma, or some other reason. Interestingly, these cases are a major source of our knowledge about memory. We have already mentioned Henry Molaison.
I suggest analyzing various kinds of amnesia as hints to the existence of modules responsible for processing information of some kind.
Bechara et al. 1995 experimented with patients who had either hippocampus, amygdala, or both removed completely. Without a hippocampus, a patient was demonstrating the signs of functioning implicit memory tested via skin reactions but no explicit memory. Without an amygdala, a patient shows a lack of implicit memory but fully functioning explicit memory. Without both hippocampus and amygdala, a patient was unable to form any memory, explicit or implicit.
Capgras syndrome and prosopagnosia may suggest that just like the hippocampus models our environment some other brain area models the faces of people we encounter. Think about police sketches as an example. Bauer 1984 showed that face recognition may also be explicit and implicit. If face recognition is impaired, the less accurate object-recognition system may be used.
5.5 Where is it?
Tee and Taylor 2021, discussing "Where is Memory Information Stored in the Brain?", mentioned sea slugs used in two different experiments confirming two different theories.
In the first experiment, sea slugs supported the theory that memory is stored in synapses. During the experiment, sea slugs were primed to fear a certain stimulus. For scaring sea slugs, "Based on his discovery of the synapse as the physiological basis of memory storage, Kandel was awarded the 2000 Nobel Prize in Physiology or Medicine".
In the second experiment, naive sea slugs were transplanted RNA from experienced sea slugs and as a result, the former showed the acquired memories. Thus, it was shown that RNA may store memory not related to genes.
Tee and Taylor 2021 proceed to mention the analysis of Gershman et al. 2021 and their estimate that "a synaptic memory substrate requires that computations operate via the propagation of spiking activity, incurring an energetic cost roughly 13 orders of magnitude greater than the cost incurred if the computations are implemented using intracellular molecules". I think multiple orders of magnitude with respect to energy consumption are worth paying attention to.
6. Consciousness
Let's start from the above and agree with Professor Michael Levin that all living forms are intelligent and conscious - from unicellular organisms to humans. Each has its own boundaries, areas of interest and influence, needs and competencies. There is even more - organs also behave as separate entities. Do not forget autonomous agents in our organisms - immune cells, viruses, symbionts. We have a whole zoo inside us and yet in most cases, each of us manifests only one self, not the choir.
How does nature add one cell to another? What is the result? Why is there only one self at the top? Why are others just like us?
In this section, I will not attack the problem of consciousness, neither easy nor hard. But everything may change. I believe in differences. What is change if not differences?
7. On current events
I am from Ukraine. I know what propaganda may lead to. Ukraine is not the only country affected. Consider this section as my position on current events.
Reisberg in "Cognition" discusses multiple studies and experiments that show how easy it is to manipulate memory, our likes and beliefs, even if the information is obviously absurd. What's more, even prepared people find it difficult not to fall to those tricks.
According to Wikipedia, there were times when fasces were considered the symbol of union, as in a Greek fable depicting how individual sticks can be easily broken but how a bundle could not be. Later they became the symbol of fascism. Getting together is OK, competition is OK, but crossing the line is not. People are different and we should respect our differences. Not respecting those we cease to be humans.
K. Lorenz in "King Solomon's Ring" explains how our biology makes us dangerous. Unfortunately, evolution takes millennia to kick in with corrective measures. However, understanding our limitations may help in preventing problems. If we make conscious efforts.