Language is often mentioned as a means of communication and knowledge transfer. However, its components and mechanisms are designed for a different function, namely pointing at relevant phenomena. I provide arguments in favor of that claim and explains how language machinery works in unison to perform that role.
Introduction
Humans are social animals. We need information transfer for learning purposes in childhood and coordination purposes in adulthood. Language is one of the tools used to facilitate those processes.
The role of language has been debated for centuries. Fodor (1975) proposed a hypothesis that our thoughts use some form of language. Fedorenko et al. (2024) considered two major roles, thought and communication, and based on their experimental findings favored the latter one.
In this post, I argue that the role of language in communication is limited and it does not support information transfer. I propose to consider communication as a two-stage process. Language is involved in the first stage. However, information is not sent by the speaker. Information is received by listeners in the second stage using their perception, memory, or imagination.
To explain how language is used in the communication process, I analyze its components and mechanisms. This analysis consistently supports the main claim.
The Main Claim
I propose to consider communication as a two-stage process. The role of language is revealed during the first stage when a speaker uses language to guide a listener's attention to relevant phenomena in the current context. During the second stage, the listener's perception collects information about the relevant phenomena.
It is possible to refer to past or imaginary events. In those cases, listeners extract all the necessary information about relevant phenomena from memory or imagination.
Hence, the role of language is to point or refer to relevant phenomena.
Language does not encode and transfer information about them. Other cognitive abilities are responsible for that.
Uniqueness vs Interchangeability
Referring to relevant phenomena in the current context relies on comparison-based selection. To understand it we need to understand comparison and its peculiarities.
Heraclitus claimed, "No man ever steps in the same river twice. For it's not the same river and he's not the same man." Everything is unique, even the same object is different at different points in time. But from the practical perspective, for some purposes, some objects are interchangeable.
When we know that for some purpose some objects are interchangeable and we know how to recognize them as such it makes sense to keep such information in memory and use it across different locations and at various time points.
Judging about interchangeability relies on the comparison. We compare objects one property at a time. If we view a property as an axis it makes sense to consider not points on that axis but ranges. Then objects with values of that property falling in the same range will be considered interchangeable. It would be impossible to track interchangeability if we required point-accurate estimates.
Boundaries between ranges deserve special attention. The Vagueness entry in the Stanford Encyclopedia of Philosophy gives us the following hint, "Where there is no perceived need for a decision, criteria are left undeveloped." Each boundary makes a difference for some purpose. However, borderline cases often cause confusion. I propose to consider two types of ranges. One type of ranges contains values between two boundaries. The second type contains values in the vicinity of one boundary. We may treat ranges of different types differently .
Ranges should not be considered fixed once and for all. Rather, they are flexible and context-dependent. A tall child in a kindergarten may be well below a short NBA player.
I propose to call ranges of comparable properties concepts. Further, we need to understand one more aspect about them.
Concepts via Object Classification/Recognition
Names were the first attempts at referring to objects. I will address shortly how words and concepts are related. Now is the time to analyze what the hierarchy of concepts reveals about our modeling of the world.
Concepts serve two important roles - group instances of a class and differentiate them from instances of all other classes (an exception related to specialization/generalization will be addressed shortly).
In disagreement with Fodor's idea of atomic concepts (Fodor 1998), I claim that classification is hierarchical, which makes concepts interrelated.
We could label everything under the concept of "phenomenon" but it would hardly serve the practical need of differentiating phenomena from each other. Hence, the a need to introduce more concepts. The process of their introduction is best illustrated by the game 20 Questions. At each level, we consider only one concept (a range of some comparable property). We consider a new property and its ranges. Respective boundaries differentiate subclasses from each other. The boundaries serve as the defining features of each subclass. The stack of defining features up to the root element is all that is required for classification. Instances of a class have many other properties but classification relies only on the defining features like the tip of an iceberg.
It is worth noting that any noun refers to a subset of the corresponding class properties. It is one of the reasons why language encoding is lossy and for this reason language is a poor tool for transferring fine details.
Please note that instances are recognized as belonging to a class not based on their mutual similarities but based on their differences from other classes. This brings order to the recognition process. Also, it makes recognition fast and computationally easy since it relies on comparisons of a fraction of an object's properties and does not require comparisons with every possible concept out there.
Everything is recognized in comparison. Because of that, concepts are not atomic, they are interrelated.
Facts vs Generalization
The process described above introduces a specialization tree, which introduces differences for new classes.
Generalization is the opposite process. Jorge Luis Borges explains it this way, "To think is to forget a difference, to generalize, to abstract." Differentiating the class Chair on the basis of color, we may have subclasses Red Chair, Green Chair, Blue Chair, etc. Forgetting the color differences we generalize back to Chair.
When we record facts, we may evaluate properties with point accuracy. Moreover, we may record more properties than needed for the current task. By dropping accuracy and some data columns we may achieve higher clarity about the task, contrary to natural expectations.
Statics vs Dynamics
Language can refer to objects - real and imaginary, existing and gone or yet to come, singular or multiple, tangible or not, etc. Language can refer also to actions, scenes, conditions, relationships, etc. This is achieved by words and phrases.
The most important pointing possibility of language is in referring to connections among phenomena. This is achieved by sentences. SVC sentences shed light on static properties of phenomena. SVO sentences address dynamic changes in those properties and interactions between phenomena.
Words - Synonyms vs Polysemy
Words are symbols initially devoid of any meaning. This is easy to demonstrate if you take any unfamiliar word from a foreign language. However, words are easy to acquire meaning. I call this referential flexibility.
Content words, such as nouns and adjectives, refer to concepts or ranges of comparable properties. One property has many ranges. Any word may refer to one of them. Using the same word to refer to multiple ranges of the same property is meaningless as it decreases its differentiating capacity.
Now you are ready to understand synonyms. They are about the words that may refer to the same range of the same property. For example, "high" and "dangerous" are synonymous with respect to the "blood sugar" property, while "high" and "feminine" are synonymous with respect to the "voice" property. Probably it is difficult to find a property that makes "feminine" and "dangerous" synonymous. Therefore, we may talk about synonyms only after disambiguation if the resolved property and range are the same.
Ranges of properties may also explain antonyms. If for some property there exists a "middle" range then antonyms refer to ranges symmetrical around that range. For example, "cool vs warm" or "hot vs cold". But be careful. There are antonyms "light vs dark" but not "red vs blue". The black and white palette may have some middle shade, but in the case of the rainbow, even though it is possible to determine the "middle color", we do not treat these colors that way.
Consider the word "high". It may refer to properties "height", "percentage", "sound", etc. It makes that word polysemous. All those properties are distinct and enable different operations with respective objects.
Polysemous words create the need for disambiguation. I will address it below.
Context
Before we discuss how language performs its narrow pointing function we need to cover context. Even though I stress the importance of properties, the context is about objects. Recently the term "context" has been applied to text or discourse and it is OK to use it that way as long as we understand that context is about objects. When we read the text it is not words that form the context it is those objects that stand behind those words.
Consider the following tricky puzzle, "What contains 4 letters but sometimes 9". Words may stand for objects or they themselves may be objects.
Texts start with the set of experiences that the author and readers share. When a "boy" is mentioned, the range of possibilities is wide. Therefore, if needed, the author adds details to narrow that range to the extent within which the differences in the author's imagination and that of readers will be negligible.
If we describe a strange house we mention "house but it has no ..." or "but it has ..." because otherwise readers will imagine what houses usually have or they will not imagine what houses usually don't have. The same applies to any person or object mentioned.
We are animals living in the real world and in real-time. We need to react fast if needed. It necessarily makes us focus on a narrow context just within our reach. It has a limited number of objects. It provides a limited number of opportunities. These limited sets of objects can be filtered fast based on relevant constraints.
The problem with humans is that we have the broadest sphere of interests. We are interested in subatomic particles and in those tiny shiny dots that are in fact galaxies billions of light years away from us. And for sure we are interested in unicorns. This broadens contexts or the sets of objects possibly of interest to us.
With respect to the speaker, one's experiences, needs, and wishes should also be considered a part of context. One's ideas about those of listeners may also be included.
One important feature of contexts is statistics or their contents. Every park is unique in terms of how many trees and of what kinds it has and also other details. This feature affects how we refer to objects in each context.
Compositionality
Consider context as a set of objects. If any object is of a unique class referring to them is easy. That task is more interesting when there are multiple objects of the same class. In a park, the word "tree" is only the beginning of the process of constructing a reference.
Many linguists apply the principle of compositionality to the analysis of phrases and sentences. I claim that "composite phrases" do not stand for "complex concepts", instead, they are stacked filters. They rely on those properties of relevant objects that make them stand out in the given context.
As long as the remaining set contains not only relevant objects we add one more concept to filter out non-relevant objects. This stacking of filters has nothing to do with compositionality or complex concepts. We just deal with a particular context and the statistics of properties of objects there with respect to relevant ones. We do not "describe" relevant objects, we differentiate them. We do not communicate all the relevant (with respect to our purpose) information about the objects, we rely on the differentiating properties of relevant objects.
Negation
After our discussion of filtering, negation is NOT difficult to understand (sorry, I could NOT help it). Sometimes the set is narrowed down fast by using properties that relevant objects have. But sometimes the set is narrowed down fast by using properties that relevant objects don't have. Consider a typical example, "Looking for a partner with NO bad habits."
Lies
Above I mentioned that a speaker and a listener have to share context to communicate efficiently. Now I will introduce private contexts. Imagine a detective interrogating a suspect. Both have private contexts - information unknown to the other. The suspect misreports one's private context in order to evade punishment. The detective does not mention the available evidence in order to collect more false testimonies. Both lie in order to get better off in the end.
Now take a look at the following statement, "This text is written in Italian!" Can we call it a lie? Can I expect to get better off in the end? For example, I may hope that you will respect me more believing that I know Italian.
I do not insist but I strongly believe that "lie" is about misreporting private context only. If the context is shared and any claims about it are easily verifiable misreporting it makes no sense.
I do not discuss the second stage of communication here. It may well include not only the collection of information but also the validation of the suggested connections between the phenomena mentioned by the speaker.
Mathematical Subdomain for Precision
What I discussed above creates the impression that natural languages are not precise. This is not so. They can achieve an arbitrary level of precision. One of the ways to do that is to engage the mathematical sublanguage.
Metaphors
Knowledge is often about metaphors. Whenever we encounter some abstract phenomenon we try to apply our existing knowledge to that domain based on the "similarity" of properties. Possibly, we may talk here about the properties of properties. For example, we may waste time if we apply the metaphor of money to the abstract concept of time or time may flow if the metaphor of river is applied.
It is always from specific concepts to abstract ones that metaphors are applied. A good discussion of how much metaphors are used in our language is provided by Lakoff and Johnson 1980.
Disambiguation
Recall that objects have multiple properties. Here is the key to solving polysemy - objects are multidimensional but not omnidimensional. The set of properties of different objects contains many properties but not all possible properties.
Consider the phrase "high key". This reference refers to one object, which should have properties that correspond to both polysemous words "high" (height, sound, blood sugar level, etc.) and "key" (locking tool, sound, programming term, etc.). Among many possible objects, only "sound" fits both.
Note that both words constrain the choice of meaning for the other. I use the term "coherence constraints" to refer to these mutual constraints. Another quote that fits here is from Firth 1957, "you shall know a word by the company it keeps." Ranges of comparable properties and coherence constraints provide the mechanism for that.
The process of disambiguating polysemous words is complicated and may well lead to confusion. Above I mentioned referential flexibility. Newly acquired meanings need to pass the test of time. If linking a word to a new meaning causes too much confusion the community may decide to use a different word for that meaning.
Consider the following example of anaphora resolution and available candidates: "Stiven saw a car. It was red. It was yesterday. It was raining." We are interested in understanding each "it". All the available candidates are provided in the first sentence, some of them implicitly. Overall, there are four "objects" mentioned - Stiven, car, day (implicitly), the event of seeing (indirectly). The property "color" is only available for "car". The "time" property is only available for "event". The "weather" property is available for "day". This example also demonstrates coherence constraints in action.
Related to anaphora is cataphora. It is when a "relative reference" is used before anything is said about an "object". For example, "Because he was in a hurry, John took a taxi". We claim that "he" provides a lot of information to include an object into our set - "person", "male". Later we add the "name" property to the entry. But the use of coherence constraints remains - "being in a hurry" is coherent with "taking a taxi".
Levesque et al. (2012) proposed to prepare pairs of sentences with small differences that affect the resolution of an ambiguity contained in them. Consider the following two sentences from that dataset:
Joan made sure to thank Susan for all the help she received.
Joan made sure to thank Susan for all the help she had given.
How should "she" be resolved in each case? Is it Joan or Susan? In my theory, I propose to keep track of expected direct/indirect objects for each action. In this case, "thank smb for smth" expects something good, to which "help" generalizes well. If she received help, she should be the subject of "thank", otherwise, she should be the direct object.
Sometimes, information from the sentence is not sufficient for disambiguation of grammatical structures and therefore constituents. Consider the example, "Visiting relatives can be annoying". Is it "visiting whom" or "visiting who"? Imagine now that the following sentence is mentioned in the context, "I don’t like the climate there." The "visit" action may change, among others, the property "exposure to climate". Note the use of anaphoric "there" and that "don’t like" resolves to the same range of the "attitude" property as "annoying". All these considerations support the version that I visited my relatives.
As an alternative, consider the following hint instead, "I don’t like when my house gets crowded." In this case, we consider "visit" as changing the property "number of people in my house". "gets crowded" means the increase in that number. Therefore, this hint supports the version that relatives visited me.
Questions
Let's consider a hypothesis that facts are stored as sentences. Only we store not words but resolved constituents - objects, actions, relations. Also, we store information about contexts. We do not need to store references to constituents, we store constituents - information packages about respective objects. By adding timestamps we can keep track of what was known about them at any moment in time.
But whatever the internal representations are, we are restricted to language as the interface for information exchange with memory. The basic form of queries is questions. Questions are declarative statements where one of the constituents is unknown (at least one, but we will not complicate matters for now).
How do we answer questions? To start with, any question is asked against some context. So we resolve references to provided constituents against that context. Then we start searching for those constituents among the stored ones. Finally, we search among facts that involve those constituents. If we find the required fact we extract the missing piece from it, otherwise we "don't know."
I think it is important to mention "good questions". Those are questions with well-established constituents. Imagine a prehistoric man asking, "What is a rocket?" and you answer, "It's a projectile that uses reactive propulsion to overcome gravity and reach outer space." The words "projectile, reactive propulsion, gravity, outer space" do not differentiate any constituents known to the "listener". Therefore, the answer will not sink in. As a house is built brick by brick, knowledge is added on top of the existing, known facts. You cannot add a roof before the walls.
Meaning
Are we now ready to discuss what the meaning of "pointing" is? I will leave it as an exercise for the reader. So much for the most widely discussed topic in the philosophy of language!
Conclusion
If by communication we mean the transfer of information then language alone is a poor tool for that. I claim that the role of language is to point at relevant phenomena - objects, actions, and all the other constituents. One specific phenomenon the language can point at is the connection of constituents into one fact.
The word "apple" does not convey all the information about an apple to which a speaker refers. That information is collected by the listener's perception after one's attention has been guided to the apple by the reference. Similar mechanisms engage the listener's memory or imagination.
The way a speaker connects constituents is subject to the verification by the listener if it is possible against what one can perceive or check against one's experience. During the second stage of communication, the information lost in linguistic encoding is recovered using the listener's perception and verified.
Communication, thus, involves the speaker's utterances to guide the listener's attention to engage the former's perception/memory/imagination to collect or recreate the relevant information.
References
1. Fodor, Jerry A. (1975). The Language of Thought. Harvard University Press. ISBN 9780674510302.
2. Fedorenko, E., Piantadosi, S.T. & Gibson, E.A.F. Language is primarily a tool for communication rather than thought. Nature 630, 575–586 (2024). https://doi.org/10.1038/s41586-024-07522-w
3. Sorensen, Roy, "Vagueness", The Stanford Encyclopedia of Philosophy (Winter 2022 Edition), Edward N. Zalta & Uri Nodelman (eds.), URL = <https://plato.stanford.edu/archives/win2022/entries/vagueness/>.
4. Fodor, J. A. (1998). Concepts: Where cognitive science went wrong. Clarendon Press/Oxford University Press. https://doi.org/10.1093/0198236360.001.0001
5. Borges, J. L.: Funes, the Memorious. The Argentina Reader: History, Culture, Politics, edited by Gabriela Nouzeilles, Graciela Montaldo, Robin Kirk and Orin Starn, New York, USA: Duke University Press, (2002), pp. 306-312. https://doi.org/10.1515/9780822384182-045
6. George Lakoff and Mark Johnson. Metaphors we Live by. University of Chicago Press, Chicago, 1980.
7. Firth, J. (1957). A Synopsis of Linguistic Theory, 1930-55. In Studies in Linguistic Analysis (pp. 1-31). Special Volume of the Philological Society. Oxford: Blackwell.
8. Levesque, H. J., Davis, E., & Morgenstern, L. (2012). The Winograd schema challenge. In 13th International Conference on the Principles of Knowledge Representation and Reasoning, KR 2012 (pp. 552-561).
In the beginning was the word Alex. What do you think?
https://open.substack.com/pub/luciferv/p/in-the-beginning-was-the-word?utm_source=share&utm_medium=android&r=5e4lda
>the role of language is to point or refer to relevant phenomena.
>Language does not encode and transfer information about them.
This might be the single most important thing for any intellectual to understand. Naturally there are implications for LLMs, but also for every other field.