How to learn from very little data? Analogy! Analogizers can learn from as little as one example because they never from a model.
Analogical reasoning has a distinguished intellectual pedigree.
- Aristotle: law of similarity: if two things are similar, the thought of one will tend to trigger the tought of the other
- Locke and Hume
- Nietzche: Truth is a mobile army of metaphors
- William James: this sense of sameness is the very keel and backbone of our thinking
- Contemporary psychologists: human cognition in its entirety is a fabric of analogies
- Teenagers who insert “like” into every sentence they say would probably, like, agree that analogy is important, dude.
Analogy got off to a slow start, and was initially overshadowed by neural networks.
- Nearest-neighbor algorithm
- Analogical reasoning
Nearest neighbor algorithm is the simplest and fastest learning algorithm. It consists of doing exactly nothing, and takes zero time to run. Can’t beat that. But there is a price to pay, and the prices comes at test time.
The entire genre of learning that nearest-neighbor is part of is sometimes called “lazy learning”
NN is able to implicitly form a very intricate border.
As the number of dimensions goes up, things fall apart pretty quickly. In high dimensionality, the notion of similarity itself breaks down.
Consider an orange: a tasty ball of pulp surrounded by a thin shell of skin. Let’s say 90% of the radium of an orange is occupied by pulp, and the remaining 10% by skin. That means 73% of the volume is orange is pulp (0.9^3). Now for a hyperorange in 100 dimension. The pulp has shrunk to 0.9^100 of the hyperorange’s volume. The hyperorange is all skin, and you’ll never be done peeling it!
With a high-dimensional normal distribution, you’re more likely to get a ample far from the mean than close to it. A bell curve in hyperspace looks more like a doughnut than a bell.
Decision trees are not immune to the curse of dimensionality. A decision tree can approximate a sphere by the smallest cube it fits inside. Not perfect, but not too bad either: only the corners of the cube get misclassified. But in high dimensions, almost the entire volume of the hypercube lies outside of the hypersphere. For every example you correctly classify as positive, you incorrectly classify many negative ones as positive, causing your accuracy to plummet.
No learner is immune to the curse of dimensionality. It’s the second worst problem in ML, after overfitting.
- Get rid of the irrelevant dimensions.
- Learn attribute weights, shrink the less-relevant ones.
With the advent of deep learning, connectionists have regained the upper hand. Networks with many layers can express many functions more compactly than SVMs, which always have just one layer.
A notable early success of SVM was in text classification.
The fewer support vectors an SVM selects, the better it generalizes.
The most important question in any analogical learner is how to measure similarity. Another one is to know what we can infer about new object based on similar ones we’ve found.
Analogizer’s neatest trick is learning across problem domains.
Douglas Hofstandter, cognitive scientist and author of Godel, Escher, Bach: An Eternal Golden Braid, is probably the world’s best known analogizer. He argued passionately that all intelligent behavior reduces to analogy.