If ML is a continent divided into the territories of 5 tribes, the MA is its capital city, standing on the unique spot where the 5 territories meet.

- The outer and by far widest circle is Optimization Town
- Each house is an algorithm, and they come in all shapes and sizes
- Some are under construction, the locals busy around them
- Some are gleaming new
- Some look old and abandoned

- Higher up the hill lies the Citadel of Evaluation
- Towers of Representation
- Lives the rulers of the city
- Their immutable laws set forth what can and cannot be done not just in the city but throughout the continent

- Atop the central tallest tower flies the flag of the master algorithm
- Red and black
- With a five-pointed star surrounding an inscription that you cannot yet make it

Representation: the formal language in which the learner expresses its models

- Symbolists: logic, rules, decision trees
- Connectionists: neural networks
- Evolutionaries: genetic programs, classifier systems
- Bayesian: graphical models
- Analogizers: specific instances

Evaluation: scoring function that says how good a model is

- Symbolists: accuracy, information gain
- Connectionists: continuous error measure, squared error
- Bayesian: posterior probability
- Analogizer: margin for svm
- Evolutionaries: fitness

Optimization: algorithm that searches for the highest-scoring model and returns it

- Symbolists: inverse deduction
- Connectionists: gradient descent
- Evolutionaries: genetic search, crossover, mutation
- Bayesians: probabilistic inference, mcmc
- Analogizers: svm use constrained optimization

Bayesian district:

- Clusters around the Cathedral of Bayes’ Theorem
- MCMC Alley zigzags randomly along the way
- Belief Propagation Street, seems to loop around forever
- Most Likely Avenue, rising majestically toward the Posterior Probability Gate. Rather than average over all models, you can head straight for the most probable one, confident that the resulting predictions will be almost the same
- Let the genetic search pick the model’s structure and gradient descent its parameters

When we learn SVMs, we let margins be violated in order to avoid overfitting, provided each violation pays a penalty.

Inverse Deduction district:

- A place of broad avenues and ancient stone buildings
- Architecture here is geometric, austere, made of straight lines and right angles
- Start with Conclusions, fill in Premises, get Rules

SVM is just a MLP with a hidden layer composed of kernels instead of S curves and an output that’s a linear combination instead of another S curve.

Tower of Logic

- A set of rules in the center
- Each rule is just a highly stylized neuron
- A set of rules is a MLP with a hidden layer containing one neuron for each rule and an output neuron to form the disjunction of the rules

Tower of Genetic Programs:

- Genetic programs are just programs, and programs are just logic construct
- Sculpture of a genetic program is in the shape of a tree, subroutines branching into more subroutines
- Leaves are just simple rules

- Programs boil down to rules
- If rules can be reduced to neurons, so can program

Tower of Graphical Models

- A graphical model is a product of factors: conditional probabilities, in the case of bayesian networks
- Non-negative functions of the state, in the case of Markov networks
- “Loggles” replace every function by its logarithm, then product of factors is now a sum of terms, just like an SVM, a voting set of rules, or a MLP without the output S curve

Combining logic and graphical models?

- Graphical models don’t let us represent rules involving more than one object, they also can’t represent arbitrary programs, which pass sets of variables from one subroutine to another
- Logic can easily do both of these things, but it can’t represent uncertainty, ambiguity, of degrees of similarity

The hydra-headed complexity monster pounces on you. Slashing desperately at the monster with the sword of learning.

After an arduous climb, you reach the top. A wedding is in progress. Praedicatus, First Lord of Logic, ruler of the symbolic realm and Protector of the Programs, says to Markovia, Princess of Probability, Empress of Networks: “Let us unite our realms. To my rules thou shalt add weights, begetting a new representation that will spread far across the land. ” The princess say, “And we shall call our progeny Markov logic networks.”

You go outside to the balcony. You gaze out over the rooftops to the countryside beyond. Forests of servers stretch away in all directions, waiting for the master algorithm. Convoys move along the roads, carrying gold from the data mines. Far to the west, the land gives way to a sea of information, dotted with ships.