Reading Notes: Practical Methodology in Machine Learning


A good ML practitioner needs to:

  • know how to choose an algorithm
  • how to monitor and respond to feedback from experiments
  • how to improve ML system

Possible decisions:

  • gather more data?
  • increase or decrease model capacity
  • add or remove regularizing features
  • improve optimization of model
  • improve approximate inference in model
  • debug codes
  • they are all time consuming


  • better to choose right course of action rather than blindly guessing
  • correctly applying an algorithm is more important than sloppily applying an obscure algorithm

Follow those design process:

  • determine goals
    • what error metrics
    • what target value for this error metric
    • goal and metric should be driven by problem
  • build a working end-to-end pipeline asap
  • determine bottlenecks in performance
    • diagnose bottlenecks
    • due to overfitting, underfitting, or defect in data/software

Performance metrics

  • mostly, it is impossible to get absolute zero error
  • the bayes error defines the minimum error rate, even if you have infinite training data and can recover true probability distribution
  • first decide realistic desired error rate, second, guide design decisions by reaching this error rate
  • decide which metrics to use. Many applications need advanced metrics
  • when training a binary classifier to detect rare event:
    • accuracy is poor way to measure performance
    • instead, measure precision and recall
    • plot PR curve, yaxis is precision
    • varying a threshold, we can trade precision for recall
    • in many cases, we want to summarize performance in a single number, so
      • either calculate F-score
      • or report total area under PR curve

