Hebb’s rule is the cornerstone of connectionism. It says knowledge is stored in the connections between neurons.
Donald Hebb stated in his 1949 book The Organization of Behavior: “When an axon of cell A is near enough cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one the cells firing B, is increased.” It’s often paraphrased as “neurons that fire together wire together.”
In symbolist learning, there is a one-to=one correspondence between symbols and the concepts they represent. In contrast, connectionist representations are distributed: each concept is represented by many neurons, and each neuron participates in representing many different concepts.
Symbolist learning is sequential. Figure out one step at a time what new rules are needed to arrive at the desired conclusion from the premises. While connectionist learning is parallel.
The first formal model of a neuron was proposed by Warren McCulloch and Walter Pitts in 1943.
Perceptrons were invented in the late 1950s by Frank Rosenblatt, a Cornell psychologist.
The perceptron is a like a tiny parliament where the majority wins. It’s not altogether democratic, though, because in general not everyone has an equal vote. A NN is more like a social network, where a few close friends count for more than thousands of Facebook ones, and it’s the friends you trust most that influence you the most.
Perceptrons can only learn linear boundaries, they can’t learn XOR.
If the history of ML were a Hollywood movie, the villain would be Marvin Minsky. He’s the evil queen who gives Snow White a poisoned apple.
Boltzmann machines in practice learning was very slow and painful.
Logistic, sigmoid, or S curve is the most important curve in the world. It is the shape of phase transition of all kinds: ice melting, water evaporating, rumors, epidemics, revolutions, the fall of empires, etc.
S curves are a nice halfway house between the dumbness of linear functions and the hardness of step functions.
The universe is a vast symphony of phase transitions, from the cosmic to the microscopic, from the mundane to the life changing.
The global minimum is hidden somewhere in the unfathomable vastness of hyperspace.
Most of the time a local minimum is fine. It may be preferable because it’s less likely to prove to have overfit our data than the global one.
Backprop was invented in 1986 by David Rumelhart, with the help of Geoff Hinton and Ronald Williams. Backprop can learn XOR. Backprop was invented more than once.
Linear models are blind to phase transitions; neural networks soak them up like a sponge.
Symbolists point to a long list of things that humans can do but neural networks an’t. Take commonsense reasoning.
NN are not compositional, and compositionality is a big part of human cognition. Humans and, symbolic models like sets of rules and decision trees, can explain their reasoning, while NN are big piles of numbers that no one can understand.