Effectively changes the hypothesis space! This is a powerful strategy for encoding “prior knowledge” about the function we are looking to approximate.

Assessing Underfitting & Overfitting

Training/Test Split

Overfitting (high variance)

• High capacity model capable of fitting complex data
• Insufficient data to constrain it

Underfitting (high bias)

• Low capacity model that can only fit simple data
• Sufficient data but poor fit

How to fix undercutting/overfitting

Choose the right model

Regularization

Modifying the loss function

L2

Original loss+regularization

$\|\beta\|_2^2=\sum\limits_i\beta_i^2$

Intuition on L2 Regularization

Encourages "simple" functions

pulls coefficient to 0

L1

$\|\beta\|_1=\sum\limits_i|\beta_i|$

Hyperparameter Tuning & Model Selection

training data
val data
test data

Choice of Learning Rate

L2 Regularized Linear Regressions

weight decay that encourages $\beta$ to be small

Minimizing the MSE Loss

• Closed-form solution: Compute using matrix operations
• Optimization-based solution: Search over candidate 𝛽

Closed-form solution

Stochastic gradient descent

Iterative Optimization Algorithms

Iteratively optimize 𝛽

• Initialize 𝛽1 ← Init …
• For some number of iterations 𝑇, update 𝛽𝑡 ← Step(… )
• Return 𝛽𝑇