## Introduction

A good ML practitioner needs to:

- know how to choose an algorithm
- how to monitor and respond to feedback from experiments
- how to improve ML system

Possible decisions:

- gather more data?
- increase or decrease model capacity
- add or remove regularizing features
- improve optimization of model
- improve approximate inference in model
- debug codes
`they are all time consuming`

Suggestions:

- better to choose right course of action rather than blindly guessing
- correctly applying an algorithm is more important than sloppily applying an obscure algorithm

Follow those design process:

- determine goals
- what error metrics
- what target value for this error metric
- goal and metric should be driven by problem

- build a working end-to-end pipeline asap
- determine bottlenecks in performance
- diagnose bottlenecks
- due to overfitting, underfitting, or defect in data/software

## Performance metrics

- mostly, it is impossible to get absolute zero error
- the bayes error defines the minimum error rate, even if you have infinite training data and can recover true probability distribution
- first decide realistic desired error rate, second, guide design decisions by reaching this error rate
- decide which metrics to use. Many applications need advanced metrics
- when training a binary classifier to detect rare event:
- accuracy is poor way to measure performance
- instead, measure precision and recall
- plot PR curve, yaxis is precision
- varying a threshold, we can trade precision for recall
- in many cases, we want to summarize performance in a single number, so
- either calculate F-score
- or report total area under PR curve

Advertisements