Philosophical Misconceptions in Artificial Intelligence
Consider this piece a thought experiment. Let's assume that the whole approach described in this Substack is a valid theory of intelligence. Don't worry - we will reiterate its main components. With that theory in mind, we will take a look at major philosophical approaches to intelligence. Will our theory stand that test?
We will start with definitions.
John McCarthy first coined the term artificial intelligence in 1956 at the Dartmouth Summer Research Project on Artificial Intelligence. Their proposal [1] mentioned, “The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”
A definition from the Oxford Dictionary [2]: “The theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.”
The above definition sounds smart, mentions many activities that require intelligence, but is it useful for the purposes of implementation? Many experts believe in a single algorithm running in a brain (yes, in a mouse brain as well) that facilitates all those and all the other intellectual activities. Can we see that algorithm in that definition?
The above definition is not alone in that respect. Take a look at the definitions collected in [3].
We propose to think about the definition later. First, we will discuss all the other philosophical conceptions about artificial intelligence.
Now let's discuss several citations from [4] about meaning. The following one "No general coverage is available for the notion of meaning" continues with "We propose to fill this lack by a system approach to meaning generation." In our model, meaning is covered, but most importantly we claim that meaning is not generated. Let's reiterate the first fundamental component of the model.
It is true that the world is about objects and actions, statics and dynamics. But how do we differentiate/compare them? "That tree is higher. That tree grew higher." Note that we compare the height of the tree, not the tree as a whole. We compare objects by the static values of properties. We compare actions by dynamic changes in properties. That is why we consider properties the first fundamental component of intelligence. Not objects or actions.
Properties have ranges. The simplest set of ranges includes "above average", "average", and "below average". Note the use of average or midrange as a reference point. If necessary, we can fine-tune ranges. Also, each property has a comparison function. These may be different. Some ranges allow ordering, some don't. But they all allow differentiation. There are binary properties. Some properties allow multiple changes back and forth. Some allow only one change - for example, we cannot unmeet someone, an egg cannot be unbroken.
Now back to meaning. We use meaning in two contexts - actions and texts. An action can be meaningless or a text can be meaningful, for example. We will return to "meaningless actions" later. Let's handle the meaning in the text first. The word "high", according to dictionaries, has multiple meanings. We claim that taken alone it has none. Words are references or variables if you like.
We claim that language serves the purpose of delivering information about the world, real or imaginary, from producers of utterances to consumers. Any information is a code of some sort. Producers take a screenshot of the world, encode it into words, and transfer the words to consumers. Consumers decode the words to arrive at the screenshot. In order to be successful, both parties have to share the same agreement about the code used. In general, this leads to a word's referring ability being fixed to a limited set of concepts from the world. Careful here, a word can refer to some concept from that set. But which one? It is decided at the moment of producing an utterance when a word is entangled with other words, which also get their specific meaning, from their respective sets, in the process.
But any word is still a reference. Consider the following situation. A mother tells her husband in private "When you are back, mention a butterfly in a positive or negative sentence and I will understand whether you bought a present for our daughter's birthday." The father comes home in the evening and says "I saw a big and beautiful butterfly today." The daughter reacts with "I also saw a butterfly today." How will the mother understand those two almost identical messages? That example illustrates the power of words’ referential nature.
What does all of the above have to do with our fundamental properties? The word "high" may "mean" or refer to the "above average" range of "height", "pitch", or "blood sugar level", to name a few properties. When we resolve a sentence with that word semantically and determine which of those it refers to, we understand or decode the meaning.
Note that nouns also refer to properties but one may call these ones the object categories. We prefer to keep things simple and stick to properties as a single term. Note further that actions also refer to properties - not to their ranges but to their changes. Also, actions have expectations, or restrictions if you will, on subjects and objects, which are categorized by ... properties. As you can expect, verbs may refer to different actions. When we resolve meaning, we determine to which action a given verb refers in this sentence.
Let's consider two more quotes from [4]:
"A meaning is meaningful information that is generated by a system submitted to a constraint when it receives an external information that has a connection with the constraint. The meaning is formed of the connection existing between the received information and the constraint of the system. The function of the meaning is to participate to the determination of an action that will be implemented in order to satisfy the constraint of the system." and "A meaning does not exist by itself but relatively to a system having a constraint to satisfy."
Is it OK to define meaning as meaningful information? Up to you, my dear readers. You have two views on meaning. Which one makes more sense? Which one is easier to implement? As you see, we are collecting pieces for our definition.
It's worth mentioning the role of properties in generalization/specialization and associations. In terms of properties, objects may be viewed as multi-dimensional cubes. Classes of objects are characterized by allowable ranges of each property that each class has. Each instance will then be a point within that cube. A derived or specialized class will be a smaller cube completely contained within the parent class' cube. Note that for some class some property may have the value "any" covering the total range, but in a subclass that property may be restricted to a narrow range. It does not mean that the parent class does not have that property.
Note that as specialization may happen within different properties of an object, generalization may also follow different paths. In terms of occupation, some person may be generalized to an employee, but not to a mammal, which is a perfect biological generalization. What we claim is that both processes involve properties and ranges.
Actions are also multi-dimensional. However, their multi-dimensionality is different. In addition to multi-dimensionality on the basis of properties of actions, there is one more layer to the multi-dimensionality of actions. It is determined by those properties affected by each action. It is possible for a single action to affect multiple properties of different objects, therefore, we cannot use the convenient multi-dimensional cubes to describe this aspect of actions. An action "explosion" demonstrates well that the same action may affect different objects and different properties. This additional layer of multi-dimensionality implies that for actions the possibilities for specialization and generalization are richer than those for objects.
With respect to multi-dimensionality, we may consider objects and actions as crossroads for associations. Usually, text reports mention only a handful of properties involved in some fact. Knowing what objects and actions were mentioned there, it is possible to use our knowledge about their other dimensions and view that event from a different angle.
Now we will change our angle and consider the symbol grounding problem [5]:
“How can the meanings of the meaningless symbol tokens, manipulated solely on the basis of their (arbitrary) shapes, be grounded in anything but other meaningless symbols?”
We know about the zero semantic commitment condition (Z condition) from [6]. If its authors expect that one day a machine will guess correctly how to name objects in the language of each observer without any prior hints then we can only wish good luck with the implementation of that machine. We observe that children get lots of references to objects before they can name them properly. We believe that observation strong enough to discard the Z condition.
Our approach to this problem will be explained shortly when we analyze the reference resolution algorithm. For now, note that the idea of tokens(words) being essentially meaningless is not new.
Consider the following claim from [7] "The common complaint about traditional AI that its symbols do not represent might seem clearer now. A certain symbol, in order to represent, must either serve a particular function for a person or it must have a particular causal function. Neither of these criteria seem easy to come by in a computing machine." Clearly, the author struggles with the definition of meaning or the process of arriving at it. At the same time, humans for a long time had a perfect example of referential symbols - money. When you carry cash in your pocket, do you know what it will turn into? It is in the process of purchasing when money for a short moment corresponds to a bought item. The cash remaining in your wallet still remains meaningless. Sure, the potential is there. Just like with words.
Now let's say a few words about the Language of Thought Hypothesis [8]: "The Language of Thought Hypothesis (LOTH) postulates that thought and thinking take place in a mental language. This language consists of a system of representations that is physically realized in the brain of thinkers and has a combinatorial syntax (and semantics) such that operations on representations are causally sensitive only to the syntactic properties of representations. According to LOTH, thought is, roughly, the tokening of a representation that has a syntactic (constituent) structure with an appropriate semantics. Thinking thus consists in syntactic operations defined over such representations."
Our idea of properties and ranges, to which references can be made, kind of supports that idea and even clarifies some of its inconsistencies. The use of the word "Language" confuses many experts, who get distracted by the language of our internal dialogue which is performed in one of the languages a person is fluent in. In the case of multi-linguals, it gets difficult to decide which language to call "the language of thought".
We suggest considering natural languages as operating at a level above the level of Mentalese as they call it. In our model, all the properties are identified in a unique way, and each range is identified in a unique way even if some ranges partially overlap. These unique identifiers allow us to flexibly differentiate how different languages assign words to multiple possible meanings.
Another critical point is about "syntactic operations" only. We will show later that natural languages are more than just syntax. Semantics also affects the choice of words and syntax, therefore, we should add a bit more complexity. In general, the above hypothesis does not contradict our model.
[9] supports the above with the following claim: "The "symbols" that Newell, Simon and Dreyfus discussed were word-like and high level—symbols that directly correspond with objects in the world". The only problem is that words "directly correspond with objects in the world" but in different sentences, the correspondence may be different.
We are closer to our definition. Soon we will be able to test it. But let's first decide on the test. [9] continues with the following claim: "One criticism of the Turing test is that it only measures the "humanness" of the machine's behavior, rather than the "intelligence" of the behavior."
In [10], Turing wrote: "we cannot so easily convince ourselves of the absence of complete laws of behaviour ... The only way we know of for finding such laws is scientific observation, and we certainly know of no circumstances under which we could say, 'We have searched enough. There are no such laws.'"
We claim that we "have searched enough and found such laws." Natural languages are intelligent enough. The Turing test based on them will be a good test for intelligence because natural languages play by the rules of intelligence.
In [11], the authors write that "aeronautical engineering texts do not define the goal of their field as 'making machines that fly so exactly like pigeons that they can fool other pigeons'". We also refer to the difference between birds and planes (and other flying machinery) but we focus on the function that may be implemented in different ways. With respect to fooling humans, we do believe that if a machine can do that against a well-prepared expert then why not?
So how does intelligence works? Is it like suggested in [12]? - "For REASON, in this sense, is nothing but Reckoning." We agree, but that gives us little progress in terms of our definition or implementation. So we want to be more specific.
Maybe, it is like in [13]? - "What unites them [the cognitive creatures] is that […] they are all computing the same, or some part of the same abstract <<sensory input, prior state>, <motor output, subsequent state>> function." Again, we do not get any hint in terms of how that function is computed. "Some function is somehow computed" - really, we want something more helpful.
What about the suggestion from [14]? - "in computational semantics some variant of first-order logic is generally preferred. This choice is sensible for at least two reasons. First, ... first-order theorem provers ... now offer levels of performance which make them genuinely useful for certain reasoning tasks. Second, ... first-order logic is able to deal (at least to a good approximation) with a wide range of interesting phenomena. In short, first-order logic offers an attractive compromise between the conflicting demands of expressivity and inferential effectiveness." The fact that first-order logic is useless with commands and questions is quite telling. Are we ready to limit ourselves and our artificial intelligence only to a subset of language or more broadly to a subset of intellectual activities? Surely, we want everything everywhere all at once and several Oscars.
We will start the explanation of the algorithm with an obvious statement - anything can be an object with some properties. Actions, modifiers, words, letters, imaginary objects, ideas, states - anything. If we can differentiate something, it can be viewed as an object. Our algorithm keeps all the objects in a set.
Next, we need to mention formulas. We view sentences as formulas with constituents being their components. Those are not mathematical formulas. They are more like physical formulas because constituents have a type (property) and value (range). We use formulas for knowledge representation only after all the references are resolved. We all remember the relationship between distance, velocity, and time. Knowing any two of them, we can calculate the other one. Intelligence doesn't work that way - forget about that aspect of formulas. What you need to remember is that we use sentences/formulas with resolved constituents for knowledge representation. And our intelligent agent will remember them in its memory. Formulas are the second fundamental component of intelligence.
Intelligence starts any task by asking a question. If you recall, natural languages have a way of asking questions about any constituent, not only syntactically but also semantically. Do you remember the possible meanings of the word "high", which we have mentioned above? Now consider the question "How high is that tree?" Not only are we asking about the complement constituent, but we are specifically interested in the one referring to the "height" property. Questions are the third fundamental component of intelligence.
We have formulas in memory with resolved constituents - that is our set with objects. The question filters out those that do not have complement constituents or where it does not refer to the "height" property. All the other sentences with a fitting constituent remain in the set. These are our available candidates. The answer will be selected among them after considering all the restrictions. "that tree" is considered a restriction. It should also be present in the sentence, from which we will extract the answer.
There is a great motto mentioned in [15]: "Learn to use what you have got, and you won't need what you have not." Remember that one. Intelligence does not calculate the solution, it picks the most fitting candidate among the available ones. Consider how the HR department hires a new employee. There is a list of requirements. Do they hire the ideal, "calculated" person? No, they hire the most fitting one among the candidates who applied for the job. Intelligence is the HR department if you want to stick labels.
We will illustrate the algorithm using a simple example - what does "high key" mean? We start with the "key" - a root element of the reference. It may refer to different properties - "locking tool", "pitch", "programming term". Of those three, "high" refers only to "pitch" - bingo. We have "high key" = "pitch above average". Note that we do not proceed with the Cartesian product of options after each additional restriction is applied. The requirement that all the pieces in one reference need to refer to the same object makes the complexity of this algorithm logarithmic.
Do we have anything similar in our culture? What about the game "20 questions"? Or Venn diagrams? Or triangulation? Intelligence leaves us hints about how it works everywhere.
How do producers pick properties to include in the reference? Imagine you want to refer to a specific butterfly in a big collection. This example illustrates reference composition. We use those properties that differentiate that butterfly and narrow down the set of available candidates fast. According to [16], "speakers include more information in their referring expressions than is strictly necessary to identify an object." It is important that the speaker refers to a real object and his excessive references will all point to that object therefore the size of the set with available candidates will never become empty. It can only happen if the speaker uses contradicting references. An uncooperative counterparty in a conversation may use other properties, which will not narrow down the set, and the process will be slow.
Surely, such a matter-of-fact explanation of how intelligence works is too simple and most experts should have come up with it and even improved it. Let's go through several publications and try to find such improvements. We will provide only brief comments after each quote.
In [17]:
"what it is to perform the speech act of saying" - we should shift focus from objects and actions to properties
"The sock is not about anything. ... The sentence ‘The Eiffel Tower is in Paris’, on the other hand, is about something — it exhibits what philosophers imaginatively call ‘aboutness’." - the author never played with a chess set where pieces were missing, most likely, a sock would not substitute a queen but it is not so "imaginatively" to miss the referential ability of almost anything touched by intelligence.
"by virtue of its representational properties, it is the kind of thing that can be true or false." - note that shifting focus from the true/false logic to meaning changes a lot, now questions and commands can be easily handled.
"most of the contributions to the philosophy of AI have drawn on work done in the philosophy of mind. It has not been based on work done in the philosophy of language" - we claim that one algorithm can be applied to both mind and language, and more.
"externalist views hold that we represent the way we do in part because of features external to us." - we explained the mechanism behind that
"Some of our linguistic utterances might mean what they mean in part because of the way that we are related to our larger speech community and because of what the words we use mean in that speech community." - we only hope that the birthday present example was more illustrative than this statement.
In [18]
"It is a crucial and defining feature of our mental states that they have semantic content – that they are meaningful states." - awkward sounding statements are usually a sign of missing a clear picture.
"Our mental states are meaningful by virtue of being about things. In other words, meaningful mental states are representational states – they represent or stand for things." - no, this does not look like an improvement.
"Mental representations are categorial. My mental representation of ‘dog’ picks out all and only dogs." - a basic dictionary search provides multiple possible meanings of the word, "picks out all"?
"The other important feature of mental representations I want to highlight is that they are compositional. Mental representations compose into more complex mental representations. Given, for instance, my possession of mental representations of ‘brown’ and ‘dog’, I need nothing further to compose the more complex mental representation of ‘brown dog’." - there are sets of possible meanings and a matching algorithm behind that "nothing further"
"The crucial – and most difficult – question to answer, however, is how it is that primitive mental representations are conferred with their intentional content. How is it that our atomic mental representations come to be about their objects of representation? In other words, what is the nature of the relation between mental representations and the (categories of) objects they represent?" - we provided the answer, which we considered simple
"On one hand we have theories according to which mental representations are essentially discrete. On the other hand we have theories according to which mental representations are essentially interrelated." - definitely discrete, but not set in stone.
"The mechanism by which symbols are conferred with their content is understood as some kind of direct relation between tokens of the symbol and objects of representation. Crucially, this mechanism is such that the content of a symbol in no way depends on the content of other symbols." - it depends and a lot, without this dependence the mechanism does not work
"The compositionality of mental representation is understood to be simple syntactic concatenation. When symbols compose to give more complex representations, each symbol always brings the same content to the complex in which it participates. In other words, the content of symbols is taken to be contextually insensitive." - it's not "simple syntactic concatenation", it's not "always the same", it's always "contextually sensitive". No wonder, the symbolic AI failed.
"It is a further feature of symbols that their presence is binary – a symbol token is either present or not. If a symbol token is present, it is fully present and if it is absent, it is fully absent." - when the father mentioned a butterfly, was the "butterfly" fully present?
"the way distributed representations compose is contextually modulated. In other words, the content that a particular representation brings to the complex representation in which it participates will vary in a way that is dependent on the other particular representations also participating in the complex." - this is closer, but the author describes neural networks, which are based on statistics of words appearing next to each other. It's hard to accept that as an improvement.
In [19]:
"prototypical effects in categorisation and category representation generally are not only crucial for the empirical study of human cognition, but are also of the utmost importance in representing concepts in artificial systems. Let us first consider human cognition. Under what conditions should we say that somebody knows the concept DOG (or, in other words, that they possess an adequate mental representation of it)? It is not easy to say. However, if a person does not know that, for example, dogs usually bark, that they typically have four legs and that their body is covered with fur, that in most cases they have a tail and that they wag it when they are happy, then we probably should conclude that this person does not grasp the concept DOG." - in our model "dog" will be mentioned in different properties - "animal", "pet", etc. Information general to all dogs will be stored as formulas mentioning "dog", exceptions specific to certain breeds will be mentioned in formulas mentioning those breeds. Breeds will be ranges on the property "dog" - that is how specialization/generalization works, remember?
"According to the prototype view, knowledge about categories is stored in terms of prototypes, where a prototype is a representation of the “best” instance of a category." - no, we store information about a class
"According to the exemplar view, categories are not mentally represented as specific, local structures such as prototypes. Rather, a category is represented as a set of specific exemplars explicitly stored within memory." - this is what we mean by specialization and exceptions
"Theory-theory approaches adopt some form of holistic point of view regarding concepts. According to some versions of theory theories, concepts are analogous to theoretical terms in a scientific theory. For example, the concept CAT is individuated by the role it plays in our mental theory of zoology." - "dog" will refer to different properties, remember? "animal", "mammal", "canid" - properties make that easy
"Another important source of evidence for the exemplar model stems from the study of linearly separable categories" - differentiation based on properties is fundamental to intelligence
In [20]:
"a definition of what a cognitive system is: a system that learns from experience and uses its acquired knowledge (both declarative and practical) in a flexible manner to achieve its own goals." - we stand for shifting focus from autonomous agents to cooperative agents, in a team, it will be easier for them to extermi..., sorry, to benefit all of humanity, of course, accumulate knowledge, learn faster by transferring knowledge to each other.
"Natural cognitive beings constitute a way to deal with an uncertain world." - first, it's just "stuff" all around, then we specialize and learn to fine-tune. Generalization leaves no space for an "uncertain world". Even a small child knows stuff, adults know just a bit more.
"all of these approaches work with abstract data sets, rather than with real environments, and assume a passive view of the system (which is conceived as computational). This seems far from the way natural cognitive systems learn from experience: in an active, situated, way; by exploring the world; and by reconfiguring one’s own skills and capabilities. On the other hand, the standard strategy of “annotated” data sets can be seen as a form of social learning" - "semantical agreement" is important for exchanging information, education is important, exploration is important, relating all the acquired knowledge to oneself (be one bio- or arti-) is important
"The current challenge clearly stems from the classical problem of knowledge representation. Classical AI got stuck with the idea of explicit, formal logic-like, propositional representations, and the conception of reasoning as a kind of theorem-proving by transforming those propositional data structures." - I contribute these ideas and invite others to improve them
"Extracting world regularities and contingencies would be useless unless such knowledge can guide future action in real-time in an uncertain environment." - really, differentiation is more precious to intelligence than "extracting regularities"
Finally, let's have some fun. The following quote is from [21]: "This is a version of the well-known ‘AI curse’: in the formulation known as ‘Larry Tesler’s Theorem’ (ca. 1970): “Intelligence is whatever machines haven’t done yet.”" Needless to say, we hope that our model will overcome that curse. Machines will be able to think and do other stuff like humans. What about the following, mentioned in [22] - "Human creativity generates novel ideas to solve real-world problems." Oops, sorry. Do you remember about available candidates? Intelligence, both human and artificial, is only able to pick the best-fitting candidate among the available ones. What we call creativity is not truly creative. Consider, for example, a bully who learned to throw stones at people. Is it creative if one day he throws an apple? "Human creativity" is a complex result of education, observations, generalization, combination, associations, etc. Have you read "Steal like an artist"?
I tried to find an improvement for my model but instead compromised creativity. It's not that I want to be wrong, but if you can provide examples that will improve my model - be my guest.
References (sorry for the mess)
1. McCarthy, J., Minsky, M. L., Rochester, N., & Shannon, C. E. (2006). A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955. AI Magazine, 27(4), 12. https://doi.org/10.1609/aimag.v27i4.1904
2. https://www.oxfordreference.com/display/10.1093/oi/authority.20110803095426960
3. https://digitalwellbeing.org/artificial-intelligence-defined-useful-list-of-popular-definitions-from-business-and-science/
4. Menant C. Introduction to a Systemic Theory of Meaning. Bordeaux, France. July 3, 2014.
5. Harnad, S. (1990), ‘The Symbol Grounding Problem’, Physica D 42, pp. 335-346.
6. Taddeo, M., et al. (2005). Solving the Symbol Grounding Problem: A Critical Review of Fifteen Years of Research. Journal of Experimental and Theoretical Artificial Intelligence 17, 419-445.
7. Müller, Vincent C. (2007), ‘Is there a future for AI without representation?’,
Minds and Machines, 17 (1), 101-15. –
http://www.sophia.de - http://dx.doi.org/10.1007/s11023-007-9067-1
- http://orcid.org/0000-0002-4144-4957
8. The Language of Thought Hypothesis. Stanford Encyclopedia of Philosophy. First published Thu May 28, 1998; substantive revision Fri Sep 17, 2010
9. Wikipedia. Philosophy of artificial intelligence
10. Turing, Alan (October 1950), "Computing Machinery and Intelligence", Mind, LIX(236): 433–460.
11. Russell, Stuart J.; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach (http://aima.cs.berkeley.edu/) (2nd ed.), Upper Saddle River, New Jersey: Prentice Hall, ISBN 0-13-790395-2
12. HOBBES Thomas, Leviathan [1651], ed. Richard Tuck, Cambridge : Cambridge UP, 1991. DOI : 10.1017/CBO9780511808166
13. Churchland, Paul M. (2005), ‘Functionalism at forty: A critical retrospective’, Journal of Philosophy, (102/1), 33-50.
14. Patrick BLACKBURN, Johan BOS. Computational Semantics
15. Around the World in Eighty Days (1972 TV series)
16. Overspecified references: An experiment on lexical acquisition in a virtual environment, A. Luccioni, L. Benotti, F.Landragin, Computers in Human Behavior, 49, 94-101.
17. Herman Cappelen and Josh Dever. Making AI Intelligible: Philosophical Foundations. Oxford University Press, 2021.
18. M. Carter, Minds and Computers, Edinburgh University Press, Edinburgh, 2007.
19. Marcello Frixione, Antonio Lieto. Dealing with Concepts: From Cognitive Psychology to Knowledge Representation. Frontiers in Psychological and Behavioral Science Jul. 2013, Vol. 2 Iss. 3, PP. 96-106
20. Gomila, Antoni and Müller, Vincent C. (2012), ‘Challenges for artificial cognitive systems’, Journal of Cognitive Science, 13 (4), 453-69. http://www.sophia.de - http://orcid.org/0000-0002-4144-4957
21. Müller, Vincent C. (2016), ‘New developments in the philosophy of AI’, in Vincent C.
Müller (ed.), Fundamental Issues of Artificial Intelligence (Synthese Library; Berlin:
Springer). – http://www.sophia.de - http://orcid.org/0000-0002-4144-4957
22. Georgi V. Georgiev, Danko D. Georgiev. Enhancing user creativity: Semantic measures for idea generation. Knowledge-Based Systems 151 (2018) 1–15