- Believe that the senses deceive
- Logical reasoning is the only sure path to knowledge
- Likes to plan everything in advance before making the first move
- Pundits, lawyers, mathematicians, theorists and knowledge engineers in CS
- Plato was an early rationalist, later Descartes, Spinoza, Leibniz
- All reasoning is fallible
- Knowledge must come from observation and experimentation
- Likes to try things and see how they turn out
- Journalists, doctors, scientists, hackers and machine learners in CS
- Aristotle was an early empiricist, later Locke, Berkeley, Hume
David Hume was the greatest of the empiricists. Thinkers like Adam Smith and Charles Darwin count him among their key influences.
ML is at heart a kind of alchemy, transmuting data into knowledge with the aid of a philosopher’s stone.
No free lunch theorem sets a limit on how good a learner can be. No learner can be better than random guessing. On average over all possible worlds, paring each world with its antiworld, your learner is equivalent to flipping coins.
In ML, preconceived notions are indispensable; you can’t learn without them.
Errors are the rule, not the exception.
The power of rule sets is a double-edged sword.
Overfitting is the central problem in ML. It happens when you have too many hypotheses and not enough data to tell them apart.
Learning is a race between the amount of data you have and the number of hypotheses you consider.
Even test-set accuracy is not foolproof.
Prefer simpler hypotheses.
Data mining means “torturing the data until it confesses”.
Occam’s razor: entities should not be multiplied beyond necessity. Just choose the simplest theory that fits the data. Simple theories are preferable because they incur a lower cognitive cost (for us) and a lower computational cost (for our algorithms), not because we necessarily expect them to be more accurate.
A clock that’s always an hour late has high bias but low variance.
Induction is the inverse of deduction.
A cell is like a tiny computer, and DNA is the program running on it.
One limitation of inverse deduction is that it’s very computationally intensive, which makes it hard to scale to massive data sets. For these, the symbolist algorithm of choice is decision tree induction.
Decision tree can be viewed as an answer to the question of what to do if rules of more than one concept match an instance.
The symbolist core belief is that all intelligence can be reduced to manipulating symbols.
Symbolist machine learning is on offshoot of the knowledge engineering school of AI.
In 1970s, knowledge-based system scored some impressive successes, and in the 1980s they spread rapidly, but then they died out. Main reason: extracting knowledge from experts and encoding it as rules is just too difficult, labor-intensive, and failure-prone to be viable for most problems.
Inverse deduction is:
- Clean and beautiful
- But is easily confused by noise
- Real concepts can seldom be concisely defined by a set of rules
Connectionists are highly critical of symbolist learning. They think that, concepts you can define with logical rules are only the tip of the iceberg; there’s a lot going on under the surface that formal reasoning just can’t see, in the same way that most of what goes on in our minds is subconscious.