CS229a - Week 2

Environment Setup Instructions

brew install octave

Octave doc

Multivariate Linear Regression

Multiple Features

Notation:

Hypothesis:

\[h_\theta(x) = \theta_0x_0 + \theta_1x_1 + \theta_2x_2 ... + \theta_nx_n, (x_0 = 1)\]

Feature vector(n+1-dimensional vector, \(x_0 = 1\)):

\[X = \begin{bmatrix} x_0 \\ x_1 \\ ... \\ x_n \end{bmatrix}\]

Parameter vector(n+1 - dimensional vector):

\[\Theta = \begin{bmatrix} \theta_0 \\ \theta_1 \\ ... \\ \theta_n \end{bmatrix}\]

Compact form

\[h_\theta(x) = \Theta^TX = \begin{bmatrix} \theta_0 & \theta_1 & ... & \theta_n \end{bmatrix} \begin{bmatrix} x_0 \\ x_1 \\ ... \\ x_n \end{bmatrix}\]

Cost function

\[J(\theta_0,\theta_1,...,\theta_n) = \frac{1}{2m}\sum_{i=1}^m(h_\theta(x_i)-y_i)^2\]

Gradient Descent for Multiple Variables

Hypothesis:

\[h_\theta(x) = \theta_0x_0 + \theta_1x_1 + ... + \theta_nx_n\]

Parameters(n+1 - dimensional vector):

\[\Theta\]

Cost function:

\[J(\Theta) = \frac{1}{2m}\sum_{i=1}^m(\Theta^Tx^{(i)}-y^{(i)})^2 = \frac{1}{2m}\sum_{i=1}^m( \sum_{j=0}^n(\theta_jx^{(i)}_j) -y^{(i)})^2\]

Gradient descent, Repeat until convergence:

\[\theta_j := \theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\Theta)\\ = \theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0,...\theta_n)\\ = \theta_j - \alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_j\]

Gradient descent in practice 1 - Feature Scaling

Gradient descent in practice 2 - Learning rate

Gradient descent debugging

Features and Polynomial Regression

Defining new features: Housing prices prediction

Computing Parameters Analytically

Normal Equation

\[\Theta = (X^TX)^{-1}X^Ty\]

multivariate.png

Normal Equation vs Gradient descent

Gradient Descent Normal Equation
Need to choose alpha No need to choose alpha
Needs many iterations No need to iterate
\(O (kn^2)\) \(O(n^3)\) need to calculate inverse of \(X^TX\)
Works well when n is large Slow if n is very large

Normal Equation Noninvertibility

\[\Theta = (X^TX)^{-1}X^Ty\]

Vectorization

for loop를 사용하지 않고 코딩해야 속도가 빠름