# Musing on Intelligence – Part 10: Generalization, maps and schemas

Standard

If you have been following this series, we have covered some important topics related to artificial intelligence.  The main premise of this series is: AI implementation should combine knowledge from psychology, biology, brain study and computer science, to arrived at a holistic solution.   In this post I will cover the concepts of generalization, maps and schemas.

## What is generalization?

One of the hallmarks of intelligence is the ability to generalize.  A simple example of generalization is interpolation.  For instance, given the `(x, y)` data pairs,

```     (1, 2), (3, 6), (8, 16), ....
```

What is `x`, in  `(2, x)`?   Most will quickly recognize a linear relationship, and answer `x = 4`. In this scenario, `x` is the stimulus, and `y` is the output, and the function mapping `x→y` is unspecified.   Interpolation is a form of generalization, capable of generating an approximate answer given a stimulus, even if never before seen.  While the mathematical construct of interpolation maybe obvious to many, what is the neural construct underlying human’s ability to generalize?

## Not every decision is black and white

The most elementary information in a digital system is a “bit”, encoding true or false (or 1 and 0), the simplest decision boundary; but not every decision is “black and white”. Some decision boundaries are complex; some are not even linearly separable, as is illustrated by XOR, a mapping that cannot be realized by linear decision boundary.  Instead, by adding additional layer, which in effect superimpose a second linear boundary, XOR mapping becomes possible.  Please see this nice article for detailed explanation.

The XOR example illustrates that decision boundaries can be complex, but they are “crisp”.  But human experience suggests that not every decision is so “black-and-white”. Take feelings and intuition for example, both are subjective and difficult to quantify. Attempts at their assessments is often done with ratings (e.g., 5-star) or scores (e.g, 1-10).  Whether stars or scales, the boundaries between the ratings are fuzzy.  Fuzzy logic, conceived by Lotfi Zadeh, is a relatively successful decision model that blurs the decision boundaries by treating observation or measurement not as a singular data, but as a distribution.  Decision is then made based on summation of distributed lookup.  Kalman filter is another example of optimal decision-making based on multiple observations with different uncertainty, or probability distribution.

Figure 1 – Fuzzy representation of temperature.

Neural networks with their massively parallel interactions, may also exhibit properties of distribution–for example, when a “point” stimulus is applied, a distribution of neurons (receptive field) is activated, a form a “fuzzification“;  and knowledge is recalled when a distribution of neurons are activated to recall a specific value, a form of “defuzzification“.  Encoding information over a distribution has fault-tolerance benefits, much like how RAID storage works.  When information is distributably stored, it is more robust to noise and corruption. Distributed encoding of inputs form internal maps, which can be layered with increasing levels of abstraction.

Figure 2 – Receptive field in visual cortex is form of layered, distributed encoding.

## Learning maps

Internal maps in neural systems maps one space to another.  For instance, grasping objects involves mapping visual space to motor space.  Jean Piaget proposed that children learn eye-hand coordination through repeated self-exploration, called circular-reaction.  In this process, the child randomly “babble” the hands while the eyes track the hands. The random motor commands, together with the visual cues, form a self-consistent training set for visuomotor map learning.

Figure 3 – Circular learning of eye-hand mapping in a child.

Maps can interconnect, forming hierarchical association capable of high level  abstraction, even to the point of symbolic thoughts.  Piaget theorized through observations that there are several stages of sensorimotor development in infants, from reflexes to simple association, to complex sensorimotor association involving external objects and goals.  These stages represent elevating levels of interconnecting maps.  These maps, when learned, form an internal model of the external world.  Having an internal model of the external world is necessary to achieve object permanence.

The concept of learning through circular reaction is actually deployed in calibration of eye-hand robotic systems; however these implementations rely on having a priori kinematics models of cameras and robots tying back to a common reference frame, and the purpose of calibration is to extract the parameters of the kinematics models. In contrast, neuromorphic implementation of circular reaction learning of eye-hand coordination does not require kinematics models.  Instead, distributed maps are used to encode the visuomotor relationship.

Circular reaction learning concept, as applied to robot eye-hand coordination, can be illustrated in Figure 4.  During learning, the robot, through random motor commands, positions the target in front of the cameras while the motor map records the joint positions, and the cameras capture the scene. The cognitive network identifies and locates the salient object in the image and extracts the object’s visual “coordinates”, and represent them in the visual map in a distributed fashion. The object’s visual coordinates together with the corresponding motor map, trains the network to learn sensorimotor transform.  During recall, cognitive network identifies the salient target in the camera images, extracts its visual coordinates and represent the coordinates in a distributed fashion within the visual map.  The visual map then recalls the corresponding motor commands, through the sensorimotor network, to cause the robot to reach for the target.

Figure 4 – Learning visually guided arm movement through circular reaction.

In this scenario, the ability to generalize is critical, because unless every visual coordinates and their corresponding motor commands are learned, not every target position can recall a motor command.  With generalization, approximate motor commands can be recalled.  A target’s visual coordinate can be generalized by activating not only neuron associated with the target’s visual coordinate, but also the neighbors around it, much like a receptive field.  The distribution of receptive field activities could be any radially decaying function, such as Gaussian, with the center being the neuron associated with the target’s visual coordinates.

## Schema as association of knowledge

Piaget went on to propose “schema” as a unit of knowledge.   He defined it as:

a cohesive, repeatable action sequence possessing component actions that are tightly interconnected and governed by a core meaning’.

This carefully worded definition suggested that schema have the following properties:

• Physical structure
• Component actions
• Sequences
• Core governance

One can infer that a schema must have an associated physical structure in order to exhibit repeatable action sequences.  Actions having components means that the underlying structure has parts and perhaps hierarchy, so that the parts operate in sequences to achieve the desired outcome.  Cohesion and core governance mean the structure and its components are acting as a whole; and the act is sustained until the schema’s intended purpose is either achieved or abandoned.

Imagine there are different pools of neurons each encodes and therefore represents different objects, say table, chair, forks, books, and computer.  When table, chair and forks are observed together, the associated pools activates, and their collective activation elicits recalls the corresponding abstraction.  For the purpose of rest of this discussion, let’s consider Figure 5.  There are three higher level concepts of library, diner and office, which I will consider them schema.   Library is associated with table, chair and books; diner is associated with table, chair and fork, while office is associated with table, chair, book and computer.  The idea is that a schema can be represented by layered networks of elevating abstraction, and the association can be bidirectional–that is, the set (table, chair, books) can activate library, and library could elicit the set, (table, chair and books).

Figure 5 – schema example.

That means, if you see table, chairs and books, you’re likely in a library or an office. If you also see computers sitting around, may conclude that you are in an office.  Working this backwards: if you are told that you are in a library, then you expect to see tables, chairs and books.  But if you see tables, chairs and forks instead, you will find presence of fork being odd, and reconsider that you may be in a dining facility of some kind.  Now, obviously there are many more subtleties that help distinguishing office from library from diner, and indeed these subtleties are what we continuously search for when we scan a scene, testing different hypothesis until there is sufficiently strong agreement between expectation and observation–in other words, until resonance has occurred.

## Key takeaways

In this blog post I explained the importance of generalization and discussed some known mechanisms that can implement it, such as fuzzy logic, Kalman filter and receptive fields. The key is spreading each learning point as a distribution, so not only the exact point is learned, but the neighborhood around it.  The distributed representation are layered maps, that either model the external world, or provide intermodal mapping. The layers provide elevating level of abstraction.  The concept of schema also fits well under this framework, as each schema embodies a combination sensory and motor components that work as a whole to represent or achieve a specific purpose.