Recognition
Nоt from the stars do I my judgement pluck;
And yet methinks I have Astronomy,
But not to tell of good or evil luck,
Of plagues, of dearths, or seasons’ quality;
Nor сап I fortune to brief minutes tell,
Pointing to each his thunder, rain and wind,
Or say with princes if it shall go well
By oft predict that I in heaven find:
But from thine eyes mу knowledge I derive,
And, constant stars, in them I read such art
As truth and beauty shall together thrive,
If from thyself, to store thou wouldst convert;
Or else of thee this I prognosticate:
Thy end is truth’s and beauty’s doom and date.
William Shakespeare, Sonnet XIV
Everything is recognized in comparison. My favorite saying. It sounds simple, but there is so much more to that if one thinks about the details. Heraclitus on one side of the spectrum and Sinéad O’Connor on the other. I propose to add depth to it by considering the hierarchies of categories and comparing only sibling categories. It gives a lot in terms of efficiency and clarity. So, let’s dive deep.
Constellations
Consider reading the tweet “AGI Pipedream Pt. 2: Computable intelligence is not probable” (
by @Dr_Gingerballs. And pay attention to the section about the failure of current AI with respect to constellations. With his permission, I use here the images he used in that post.
The idea is simple - propose a tiny patch of a starry sky to a generative model and ask it to propose a constellation for those stars. The model fails because its result ignores the locations of the original stars.
Original patch:
A constellation proposed by a model:
Overlay to show the differences:
I intentionally looked at real constellations, and I can tell you that our ancestors had good imagination. To start with, my initial suspicion was that the number of stars in real constellations should be quite limited, like up to 7. I was wrong about that. There are constellations with 40+ stars, 30+, and 20+.
But another of my suspicions was correct - they are differentiable. Especially those with the same number of stars. Of course, the “similarity” to the object/animal is not the defining feature. Those are more like keys to place the patterns in memory.
Before I finish with constellations, let me address some preliminaries.
Differentiation
Everything is recognized in comparison. What is important here is to compare commensurate items. It’s not that chairs cannot be compared to apples. It’s that height cannot be compared to weight. We compare values of the same property. Well, technically, height and width are different properties, but are they?
Categories are about interchangeability. It, in turn, relies on ranges of properties, not point-accurate measurements. Often praised, compression can be achieved by using ranges over point-accurate measurements. Calculations are possible over point-accurate measurements, but in real time, calculating “predictions” for all possible “points” is infeasible. Ranges are computationally efficient.
Given that any property is broken into ranges, comparison boils down to projecting objects into ranges of the property and then performing operations with ranges. You have to remember that ranges may have vague boundaries, ranges may overlap, or smaller ranges may be included in bigger ranges. There are also peculiar ranges like “odd” and “even” numbers.
Recognition or categorization involves several rounds of comparisons. Each involves a single property. Each property has its own semantics and rules for comparison. Some allow numeric assessments, while others don’t. That unique semantics of each property cannot be abstracted away.
Properties
Objects are multidimensional in terms of properties. Their categorization proceeds as shown by the game 20 Questions. At each level, one property is used with its unique semantics of comparisons.
However, there is an important step before comparisons. It’s a projection or figuring out a range. Consider the property “shape.” If an object is behind a fence, how do we recognize the shape? If you think that it applies only to complex properties, consider the property “length.” How do you perceive the length of objects if their “length” is at different angles to your line of sight? Again, some of the “length” may be occluded.
That’s why the “single property” approach is important. The number of options is limited. Given available (observable) pieces of an object, we may “place” them onto different options and see if they match.
Let’s proceed with the shape example. Here are a few principles that come to mind. Prioritize edge pixels over internal ones. Prioritize vertices over edges. If only a few pieces of an edge are available, interpolate. If vertices are not available, figure them out by continuing edges to the intersection. Perception relies on taking multiple “snapshots” of a dynamic environment. Pieces missing in one snapshot may be recovered from a different one.
Those principles may apply to other domains as well.
Relationships
Relations introduce formations with placeholders for objects in relationships with each other. Have you seen flowers planted so that, in due time, they produce words or pictures? Or trees planted to show a smiley face for a short period of time.
Source: This giant smiley face of trees greets drivers in Polk County every fall
Ignore that those are flowers or trees, replace them with “pixels,” and proceed with the derivatives of the above principles.
How do we differentiate football teams? Consider the “uniform color” property. But it provides 5 types of participants - field players (2 kinds), goalkeepers (2 kinds), and referees. To differentiate the latter, check for a whistle, red and yellow cards, and running back and forth. To differentiate goalkeepers, check for gloves and being in the vicinity of the woodwork. Essentially, a team is determined by the woodwork and a corresponding goalkeeper. Then, of those two groups of field players, a corresponding team is differentiated by being, on average, closer to the woodwork and the goalkeeper. Notice how a single property is used each time. The formation “team” is recognized by filling in individual items in respective placeholders.
Respect the “optionality.” Some placeholders may be empty. For example, in football (I hope you understand by now that I refer to its European version) it is possible for a team to have fewer than 10 field players either temporarily or till the end of a match. In a company, employees at any level take vacations or get fired, which leads to vacancies. A paper may have the “Discussion” section, or that section may be omitted or irrelevant.
Transitions
In case of transitions, the defining features include “affected property,” “time period,” “change,” and most importantly, “how the change occurred over the time period.” The latter may be occluded, making recognition challenging, but not impossible.
Respect pixels corresponding to the same “pocket” in 3D space as representing the same “object.” Pixels moving in unison may also be grouped. Chaotic movements will not match any of the patterns.
By placing individual specific transitions in a “dynamic formation” or “transformation,” we may recognize abstract actions like “playing football” or “making tea.” People in uniform running with a ball are not necessarily recognized as playing a match. If only one team is present, it may be a training session or an ad shoot. Boiling water and pouring it into a cup are required not only for making tea. Pay attention to defining properties, one property at a time, be it static or dynamic.
Constellations 2
How do we recognize constellations? One property at a time. How many stars are there? What are the possible constellations for that number? Adjusting for scale and rotation, which ones fit the provided picture?
Other possible properties (brightness and colors of stars, distances between them, other (smaller) stars, etc.) can help to make sure. This is important. An object is a source of abundant information. Any representation relies on dimensionality reduction. To enable reliable differentiation, add more properties to the process. Overspecification does not hurt.
***
Think about “déjà vu,” how we sometimes recognize the current situation in its dynamics as “already experienced.” Our memory also relies on dimensionality reduction and reconstructs the missing pieces of past events based on the pieces stored. Reconstruction is flexible, and events unfolding before our eyes may heavily affect that process. Instead of thinking about mysterious stuff, don’t be so serious about the reliability of your memory.
Which reminds me to reconsider some stuff. Given some progress in my understanding of intelligence, I want to write a paper with all those ideas in one place. It will take some time. It is possible that I will take a pause here, but if any idea comes up, I will definitely post about it. So if you noticed a gap in my writing and would like me to write about it, just let me know.







