I am rather a philosopher of language than a linguist. There are misconceptions about symbols which severely limit the progress of linguistics. Fixing those will shed light on categorization, references, generalization, polysemy, representations, and the role of language in communication. Many of those misconceptions follow from the lack of a useful metaphor of what symbols are and how they enable our cognitive functions. The main goal of this post is to get away from "somehow" (which is often implied in the descriptions of how cognitive functions work) and provide a more specific view of symbols. I would like to use a tight set of tools and will try to show how they can be used consistently across many linguistic phenomena.
Symbols
The first misconception about symbols is that they refer to objects. For example, "dog" refers to/describes/encodes an animal of the family Canidae. It is so natural to make that statement. Yet, there are multiple issues with it. The minor, negligible one is that words like "dog" are labels for phenomena performing the cognitive heavy lifting. Words themselves are not the phenomena. The major issue is that the phenomena are not about objects. They are about properties. In particular, each one indicates the range of values of a property.
For clarity, let's call words "symbols" and those phenomena "concepts". Each concept is unique, but many symbols may refer to it (synonymy), or the same symbol may refer to different concepts (polysemy). Natural Language Processing deals with words losing the ability to differentiate concepts and symbols.
Fodor claimed that concepts are atomic and independent. This is another misconception. Consider "apple" and "pear", now consider "apple" and "chair", now consider "apple" and "curiosity", now consider "apple" and "unicorn". It looks like there are levels of independence. Now consider "apple" and "fruit". Sometimes independence breaks down.
Consider polysemy. The word "key" may have different meanings. Or as I claim it refers to ranges of different properties. Those ranges are discrete (there is no blending of those) and distinct from each other. That is why it is important for linguists to differentiate symbols/words from concepts/ranges-of-properties.
Conventions
I am not sure if de Saussure was the first to introduce the idea of social conventions about words but I agree with it. Whenever someone has a need to differentiate some range of some property (for example to categorize some group of objects) one may easily introduce a word/symbol for that range. Any word can be used for that. I coined the term "referential flexibility" to refer to that ease.
There are certain constraints. A word needs to be (preferably) easy to pronounce, easy to follow grammatical transformations characteristic of the language, and it should not be already used in that property (naming all colors "red" is impractical). There should be a need in that range for a broad group of speakers and they should all agree on the usage of that word for that range. The broader the group the higher the cost of the convention. This explains the higher linguistic innovation of small groups relative to larger groups. The use of a word in this new capacity should also pass the test of time. I will return to this later.
One good example of referential flexibility is provided in Dune by Frank Herbert. There is a scene where Jessica mentions "garment", for which she and Paul share an additional meaning - "prepare for violence". A banker participating in that conversation obviously did not catch that additional meaning. Note that Jessica and Paul would not have interpreted that word in that meaning if the banker had uttered it.
By the way, the task of squeezing a word (code word) into a conversation is interesting - it needs to be used with a usual meaning and the phrase should be consistent with the flow of the conversation.
Objects
By objects, I propose to name all the phenomena we can talk about. Those include real or abstract objects, existing or already not or not yet existing ones, imaginary ("unicorns") or hypothetical ones, actions, collections, properties, symbols, concepts, facts, etc.
All objects carry multiple properties and have a particular value with respect to each of them. For example, an apple has the property of "taste" and its value is "sweet". In turn, the property "taste" has properties of its own. An apple can be eaten, but the action of "eating" also has properties. For example, there is a "manner" of eating. We use words as symbols for concepts. Words have their own properties. Do not confuse those properties with the properties of underlying concepts.
In order to categorize objects we need to compare them. We do not compare objects. We compare their properties.
Limits of knowledge
Why do I claim that concepts are about properties and not objects? It is clear in the case of "red" or "high" but not so clear in the case of "dog". The reason for my claim is that objects are more than any of their categories. Dogs have breeds, names, age, color, habits, injuries, etc. All those are not covered by the "dog" concept.
I will further claim that we are unable to know everything about any object or concept. We can only learn a bit more on each occasion and operate with what we know so far at any given moment. A very good example was demonstrated by soldiers marching across a bridge. The bridge collapsed because its engineers had not accounted for resonance. Another example is the famous Newton's laws. It took about 300 years and one Einstein to demonstrate that those laws were inaccurate.
Amulets exploit this incompleteness of our knowledge. We may believe that certain manipulations with an amulet lead to certain results in our favor. At some moment, the procedure fails. Some amulets have absolutely no effect on the desired results but some may affect them. The pursuit of knowledge is to find out what works and what doesn't.
Projection
When I first decided to engage with NLP I wanted to avoid "multiple moving parts". That is when I got an insight, "There are objects/nouns, actions/verbs, and properties/adjectives. But objects have properties and actions change properties. We only need properties."
Actions do not affect the whole object but only a subset of its properties. For example, "eating" decreases "hunger", and increases "energy", but does not affect "age" or "name". One may consider properties as dimensions. Then objects are multidimensional. Then actions are devices for decreasing the dimensionality of objects.
Actions are important because they accept input parameters and lead to results. Recall Einsteins definition of insanity, "doing the same thing over and over again and expecting different results." "The same thing" refers to the input parameter property from the same range. "Different results" refer to different ranges of the affected property. It turns out, that actions determine relevant ranges.
Comparison
Talking about "computation" is not a misconception per se but it is not specific enough. I claim that intelligence is based on comparisons. Language as both a product created and a tool used by intelligence is also based on comparisons.
The core algorithm of cognition starts with a set of objects. Then constraints are applied to the set in terms of properties. Only the fitting objects remain in the set.
Comparisons are fast operations. No other computations can ensure an adequate real-time performance.
However, there may be errors. For example, when the measured value of a property is close to the boundary between two ranges. The SEP entry on Vagueness has a good quote, "Where there is no perceived need for a decision, criteria are left undeveloped." To deal with such cases I propose to consider two types of ranges: 1) from X to Y, 2) around Z. If the value is close to the boundary between the ranges of the first type, consider using the range of the second type around the boundary.
Properties
The symbol grounding problem introduced in 1991 by Harnad is about how words acquire their meanings. We have already discussed many related issues. The first one is the distinction between symbols and concepts. The second one is about social conventions. In general, the process relies on the well-known scientific principle of "other things being equal". Of course, we understand now that we need to fix not "things" but properties for which we do not require an exact match. Just falling into the same range will be enough.
Categorization
The exemplar theory or prototype theory propose to keep track of the respective item of each class/concept and compare an object to be categorized to that exemplar or prototype. The problem with both theories is that they propose to keep track of an object, which is multidimensional, and propose to compare it to an unknown object. We have mentioned above that we do not compare objects, we compare properties. Which properties should be compared is not specified by those theories.
Another problem is related to the "atomicity" of concepts, which in those theories and by Fodor are understood as referring to objects. If concepts are atomic then the comparison of an unknown object should be performed to each possible concept, after which we should compare the obtained scores to select a winner. Think for a second and you will agree that we do not perform categorization in that way. Real-time constraints do not provide enough time for all those operations.
I propose for the purposes of categorization to play the game 20 Questions. In the beginning, we consider all known concepts as our set of objects. They have properties, in particular, properties and boundaries. At each step of the game, we find a property with two broad ranges so that roughly half of the remaining concepts fit one of them. The question asked at each level will establish a defining feature for those two concepts. The answer will allow us to name each concept. For example, "abstract vs material", "animate vs inanimate", "animal vs plant", etc. The process is not limited to 20 comparisons but we can use that number to estimate the efficiency of this process. Within 20 simple comparisons, we will be able to figure out one of about a million concepts (2 to the power of 20). Definitely better than any of the above two theories.
The game 20 Questions points at another misconception. Categories are not determined based on similarities of objects within a class, they are determined based on differences between sibling subclasses all the way up the specialization tree. Only those properties are relevant in comparisons (missed by the two theories above).
One may say that those differences are shared by the class instances. It's true. However, keep in mind that we are talking about ranges - within a range discrepancies are possible. Also, ranges were introduced as differentiating devices. Besides, of all the multiple properties of objects only a subset of properties participating in the determining of the defining features is relevant.
Generalization
Here is a wonderful quote from Jorge Luis Borges, "To think is to forget a difference, to generalize, to abstract."
Above we categorized objects by introducing differences. By forgetting differences and moving up the tree we will generalize. If we consider classes as multidimensional cubes, generalization widens those cubes.
Note that we may forget differences in different properties which allows for different paths of generalization. Consider "red wooden chairs". Forget differences in color and generalize to "wooden chairs". Forget differences in "material" and generalize to "red chairs".
Generalization of actions may be even more versatile. Recall that actions can be considered as objects and therefore have properties. But also they affect properties. Based on differences in what properties we decide to forget we will get two types of generalization for actions. "Dusting a table" may generalize to "tidying up a table" or "dusting a room".
Polysemy
Consider the word "high". It may refer to properties "height", "percentage", "sound", etc. All those properties are distinct and enable different operations with respective objects. Those properties have different properties of their own. I suspect that some of those secondary properties may have something in common and this justified the use of the same word for referring to one of their ranges. It is a suspicion, not a claim.
Please note that in the case of polysemy, the same word may refer to only one range in any property. But there may be many such properties.
Disambiguation
Recall that objects have multiple properties. Here is the key to solving polysemy - objects are multidimensional but not omnidimensional. The set of properties of different objects contains many properties but not all possible properties.
Consider the phrase "high key". This reference refers to one object, which should have properties that correspond to polysemous words "high" and "key". Among many possible objects, only "sound" fits both. The use of verbs or nouns related to sound, like "sing" or "guitar", additionally supports such a choice.
Note that both words constrain the choice of meaning for the other. I use the term "coherence constraints" to refer to these mutual constraints. Another quote that fits here is from Firth, "you shall know a word by the company it keeps."
The process of disambiguating polysemous words is complicated and may well lead to confusion. Above we mentioned referential flexibility and the need for newly acquired meanings to pass the test of time. If the confusions caused by linking a word to a new meaning are too much the community may decide to use a different word for that meaning.
Synonymy
Now you are ready to understand synonyms. They are about the words that may refer to the same range of the same property. For example, "high" and "dangerous" are synonymous with respect to the "blood sugar" property, while "high" and "feminine" are synonymous with respect to the "voice" property. Probably it is difficult to find a property that makes "feminine" and "dangerous" synonymous. Therefore, we may talk about synonyms only after disambiguation if the resolved property and range are the same.
Ranges of properties may also explain antonyms. If for some property there exists a "middle" range then antonyms refer to ranges symmetrical around that range. For example, "cool vs warm" or "hot vs cold". But be careful. There are antonyms "light vs dark" but not "red vs blue". The black and white palette may have some middle shade, but in the case of the rainbow, even though it is possible to determine the "middle color", we do not treat these colors that way.
The role of language in communication
If by communication we mean the transfer of information then language alone is a poor tool for that. I claim that the role of language is to point at relevant phenomena - objects, actions, and all the other constituents. One specific phenomenon the language can point at is the connection of constituents into one fact.
The word "apple" does not convey all the information about an apple to which a speaker refers. That information is collected by the listener's perception after one's attention has been guided to the apple by the reference. Similar mechanisms engage the listener's memory or imagination.
The way a speaker connects constituents is subject to the verification by the listener if it is possible against what one can perceive or check against one's experience. In the process of verification, the information lost in linguistic encoding is recovered using the listener's perception.
Communication, thus, involves the speaker's utterances to guide the listener's attention to engage the former's perception/memory/imagination to collect or recreate the relevant information.
Context
Before we discuss how language performs its narrow function we need to cover context. Even though I stress the importance of properties, the context is about objects. Recently the term "context" has been applied to text or discourse and it is OK to use it that way as long as we understand that context is about objects. So when we read the text it is not words that form the context it is those objects that stand behind those words.
Consider the following tricky question, "What contains 4 letters but sometimes 9". Words may stand for objects or they themselves may be objects.
In the case of texts, they start with the set of experiences that the author and readers share. When a "boy" is mentioned, the range of possibilities is wide. Therefore, if needed, the author adds details to narrow that range to the extent within which the differences in the author's imagination and that of readers will be negligible.
If we describe a strange house we mention "house but it has no ..." or "but it has ..." because otherwise readers will imagine what usually houses have or they will not imagine what usually houses don't have. The same applies to any person or object mentioned.
We are animals living in the real world and in real time. We need to react fast if needed. It necessarily makes us focus on a narrow context just within our reach. It has a limited number of objects. It provides a limited number of opportunities (also can be considered objects, remember?). These limited sets of objects can be filtered fast based on relevant constraints.
The problem with humans is that we have the broadest sphere of interests. We are interested in subatomic particles and in those tiny shiny dots that are in fact galaxies billions of light years away from us. And for sure we are interested in unicorns. This broadens the sets and even creates the need for System 2 thinking.
With respect to the speaker, one's experiences, needs, and wishes should also be considered a part of context. One's ideas about those of listeners may also be included.
References
Given the objects in some context already shared with listeners, we select the part worth their attention. I will not touch on how we select what is relevant. Let's start with context and relevant objects there from the speaker's point of view. Please note that context is not just those relevant objects. Context is all the objects in our (with listeners) neighborhood. It is important and relevant, pun intended.
Before you proceed please take a look at the SEP entry Reference to realize that the philosophy of language does not have a general and reasonable theory of how references refer to objects. When you realize that come back and read what I have to offer.
The context has many objects and relevant objects form a subset of it. Therefore, the reference has to throw away non-relevant objects and leave only relevant ones in the set. We already mentioned the core algorithm of cognition - filtering the set of objects based on relevant constraints. How does it apply to references?
We need to evaluate all the objects in the context in terms of their properties. With respect to some properties, the relevant objects will fall in one range. Some other objects may also fall in that range with relevant ones. But many other objects will fall into different ranges. These other objects will be filtered out as non-fitting if we use any symbol referring to the range where our relevant objects are. Categories are often narrow down the set of objects very fast. That is why they are so useful.
Consider yourself in a park. There may be just one dog there. But imagine the case that your relevant object is a tree. There are many trees in the park. Using the symbol "tree" will filter out the dog, people, grass, benches, etc. But still, there are many objects left in the set. The use of category did its job now it is time to apply another property. This time we evaluate the properties of trees only. We may use height, kind of tree, location, or any other property that differentiates our relevant tree from the rest. Do you remember that concepts are differentiating devices? References rely on this in referring to relevant objects.
As long as the set contains not only relevant objects we add one more concept to filter out non-relevant objects. This stacking of filters has nothing to do with compositionality or complex concepts. We just deal with a particular context and the statistics of properties of objects there with respect to relevant ones. We do not "describe" relevant objects, we differentiate them. We do not communicate all the relevant (with respect to our purpose) information about the objects, we rely on the differentiating properties of relevant objects.
Usually, references are considered only with respect to objects - dogs, trees, etc. But relationships or actions also require pointing and differentiating. If there are many trees in the park we may use location as a differentiating property. This is where auxiliary words may help - "ON the hill", "BEHIND the church", etc. With respect to actions even a sitting dog performs many actions - sits, looks, wags its tail, breathes, etc. It is not just "does". Differentiation is required and verbs help with that. Different forms of verbs allow us to address various nuances of actions - finished or in progress, now or in the past, etc.
Based on what the relevant phenomena are, we also select among different grammatical forms of sentences when connecting references together - "The dog looks tired" (SVC) vs "The dog is chasing a butterfly" (SVO). You may consider this as a higher-order reference - to the fact as a connection of relevant phenomena.
Negation
After our discussion of filtering, negation is NOT difficult to understand (sorry, could NOT help it!). Sometimes the set is narrowed down fast by using properties that relevant objects have. But sometimes the set is narrowed down fast by using properties that relevant objects don't have. Consider a typical example, "Looking for a partner with NO bad habits."
Meaning
So are we now ready to discuss what the meaning of "pointing" is? I will leave it as an exercise for the reader. So much for the most widely discussed topic in the philosophy of language!
Perception/Memory/Imagination
Imagine a typical situation when a painting hangs crooked and someone says, "Fix the painting!" Does that sentence "communicate" the essence of the issue? Are you able to fix it based on the "information" provided? Probably we need to agree that language plays a role in communication but that role is not as prominent as many believe.
When you hear the word "church", was that word intended to "describe" the church you visited last and which was built centuries after the word was invented? But your memory readily provides you with visuals based on that word.
And of course, I will not forget to mention unicorns. Wait, whom?
Communication heavily involves these modules but all the glory goes to language. OK, language communicates information. Here I provided you with information using language. Now you have to agree with me and adopt my theory. Or wait, your acceptance of information depends on your life experiences and whether my claims correspond to those experiences. Sorry for challenging your views about the role of language in communication.
Truth/Lie
Above we mentioned that a speaker and a listener have to share context to communicate efficiently. Now I will introduce private contexts. Imagine a detective interrogating a suspect. Both have private contexts - information unknown to the other. The suspect misreports one's private context in order to evade punishment. The detective does not mention the available evidence in order to collect more false testimonies. Both lie in order to get better off in the end.
Now take a look at the following statement, "This text is written in Italian!" Can we call it a lie? Can I expect to get better off in the end? For example, I may hope that you will respect me more believing that I know Italian.
I do not insist but I strongly believe that "lie" is about misreporting private context only. If context is shared misreporting it makes no sense.
Knowledge update
The previous section is directly related to our false beliefs. For example, "The Earth is the center of the Universe." The immediate consequence is, "Other objects rotate around it." Now comes Nicolaus Copernicus and we have to reconsider. Do we need to forget the Earth, the Moon, planets, and stars? Center and rotation? No. We only need to rearrange how all those are connected.
Forgetting is an important part of our memory mechanisms. We do not ensure "backward compatibility." New versions of knowledge replace the old ones. As we start using it, old neural connections get weaker and new ones get stronger. Keys that previously triggered false beliefs now lead to updated facts.
Questions
Our memory holds a lot of knowledge. No one knows how it is represented. In such cases, even wrong hypotheses are better than "somehow".
We have already mentioned that we store knowledge about properties and ranges. Also, we store "rules" about how actions affect properties and what objects have what properties.
Let's consider a hypothesis that facts are stored as sentences. Only we store not words but constituents - objects, actions, relations. Also, we store information about contexts. We do not need to store references to constituents, we store constituents - information packages about respective objects. By adding timestamps we can keep track of what was known about them at any moment in time.
But whatever the internal representations are, we are restricted to language as the interface for information exchange with memory. The basic form of queries is questions. Questions are declarative statements where one of the constituents is unknown (at least one, but we will not complicate matters for now).
How do we answer questions? To start with, any question is asked against some context. So we resolve references to provided constituents against that context. Then we start searching for those constituents among the stored ones. Finally, we search among facts that involve those constituents. If we find the required fact we extract the missing piece from it, otherwise we "don't know."
Again, consider the set of known facts and apply the provided constituents as the constraints for selection. If the resulting set is empty then we do not know the answer.
I think it is important to mention "good questions". Those are questions with well-established constituents. Imagine a prehistoric man asking, "What is a rocket?" and you answer, "It's a projectile that uses reactive propulsion to overcome gravity and reach outer space." The words "projectile, reactive propulsion, gravity, outer space" do not differentiate any constituents known to the "listener". Therefore, the answer will not sink in. As a house is built brick by brick, knowledge is added on top of the existing, known facts. You cannot add a roof before the walls.
Cooperative completion
The mechanism of answering questions is at work when we perform "cooperative completion" or figure out an unknown word (or even an OOV word, out-of-vocabulary). The constituents already provided constrain the possible "meaning" of a missing or unknown word. It is especially helpful if we are "in the context". Therefore, unknown words are not that unknown when they have a company. Thank you, Firth.
Metaphors
Knowledge is often about metaphors. Whenever we encounter some abstract phenomenon we try to apply our existing knowledge to that domain based on similarity of properties. For example, we may waste time if we apply the metaphor of money to the abstract concept of time or time may flow if the metaphor of river is applied.
It is always from specific concepts to abstract ones that metaphors are applied. A good discussion of how much metaphors are used in our language is provided in Lakoff's Metaphors We Live By.
***
I want to finish this post with a short statement. I made several claims above but those are no more than attempts to propose new metaphors to our understanding of language and intelligence. Those metaphors are necessarily incomplete and in some respects may be wrong. Please take them with a grain of salt and try to find better ones.
I just saw this article on x.com so have just read through it. I like the approach you are taking as I have done similar work. My mentors suggested some tips to me to help communicate by using the same terms others have used.
For example, the Saussure references to symbols and concepts above was also discovered by someone else at the same time, who seems to have developed a better model. In the C.S. Peirce model, in his semiotics (sign theory) he maps signs/representamen to interpretants to objects. The signs are the words we would write/say, the interpretants are the choice of meanings, and the objects are the real-world thing we refer to.
He also talked about what signs can refer to. In my NLU work this was great to learn. A sign can be a symbol (arbitrary relation), an index (like a pronoun that refers to something else in context) or an icon (something that resembles something as a map represents the relationships of places).
Moving from the philosophical side of things to linguistics, Robert Van Valin in Role and Reference grammar models the meaning of words with a model of decomposition that is very handy. That system build on the distinction between activities (like walking) and states (like being happy or knowing something).
I always wished someone had pointed me to good resources rather than finding out later on!