CS229a - Week 1

Introduction

What is Machine Learning?

In general, any machine learning problem can be assigned to one of two broad classifications: Supervised learning and Unsupervised learning.

Supervised Learning

Unsupervised Learning

Linear regression with one variable

Model and Cost function

To describe the supervised learning problem slightly more formally, our goal is, given a training set, to learn a function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y. For historical reasons, this function h is called a hypothesis.

db_pagination.jpg

Cost Function, “Squared error function” or “Mean squared error”: this is a function of theta

\[J(\theta_0,\theta_1) =\frac{1}{2m}\sum_{i=1}^m(h_\theta(x_i)-y_i)^2\] \[h_\theta(x_i) = \theta_0 + \theta_1x_i\]

db_pagination.jpg

Gradient descent algorithm

db_pagination.jpg

초기 지점에서 기울기를 따라서 작은 값을 찾아 내려감

db_pagination.jpg

각 스텝 마다 동시에 업데이트 해야 함

repeat until convergence(simultaneously update \(j = 0: j = 1\)):

\[\theta_j := \theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1)\]

Gradient descent for Linear regression

repeat until convergence(simultaneously update j = 0 and j = 1):

\[\theta_j := \theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1)\]

Linear regression model

\[J(\theta_0,\theta_1) =\frac{1}{2m}\sum_{i=1}^m(h_\theta(x_i)-y_i)^2 \\ h_\theta(x_i) = \theta_0 + \theta_1x_i\]

Plugging in the definition of the cost function into derivative term

\[\frac{\theta}{\partial\theta_j}\frac{1}{2m}\sum_{i=1}^m(h_\theta(x_i)-y_i)^2\]

Plugging in the definition of hypothesis

\[\frac{\theta}{\partial\theta_j}\frac{1}{2m}\sum_{i=1}^m(\theta_0 + \theta_1x_i-y_i)^2\]

Calculate partial derivative

\[\theta_i := \theta_i - \alpha\frac{1}{m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})x^{(i)}\]

Linear regression의 Cost function

Batch Gradient Descent

Linear algebra review

Matrices

The dimension of the matrix is going to be written as the number of row times the number of columns in the matrix.

\[A =\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}\]

Vectors

Vector is a matrix that has only 1 column.

\[y =\begin{bmatrix} 1 \\ 4 \\ 5 \end{bmatrix}\]

Matrix Addition

\[\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} + \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} = \begin{bmatrix} 2 & 4 & 6 \\ 8 & 10 & 12 \end{bmatrix}\]

Scalar Multiplication

\[3 \times \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} = \begin{bmatrix} 3 & 6 & 9 \\ 12 & 15 & 18 \end{bmatrix}\]

Matrix Vector Multiplication

\[\begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix} \times \begin{bmatrix} 1 \\ 2 \end{bmatrix} = \begin{bmatrix} 5 \\ 11 \\ 17 \end{bmatrix}\]

The result of multiplying 3 by 2 matrix with 2 by 1 vector is 3 by 1 matrix(or vector).

Matrix Vector Multiplication을 Linear regression에 사용하면 아래와 같음

Matrix Matrix Multiplication

\[\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \times \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix} = \begin{bmatrix} 22 & 28 \\ 49 & 64 \end{bmatrix}\]

The result of multiplying 2 by 3 matrix with 3 by 2 matrix is 2 by 2 matrix.

Matrix Vector Multiplication을 Linear regression에 사용하면 아래와 같음

더 살펴보기

Matrix Multiplication properties

Identity Matrix

\[\begin{bmatrix} 1 \end{bmatrix}, \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{bmatrix} ...\]

Matrix Inverse and Transpose

Inverse

Inverse Matrix

\[A \times A^{-1} = A^{-1} \times A = I\]
    octave:1> A = [3 4; 2 16]
    A =
    
        3    4
        2   16
    
    octave:2> inverseOfA = pinv(A)
    inverseOfA =
    
       0.400000  -0.100000
      -0.050000   0.075000
    
    octave:3> A * inverseOfA
    ans =
    
       1.0000e+00   5.5511e-17
      -2.2204e-16   1.0000e+00
    
    octave:4> inverseOfA * A
    ans =
    
       1.00000  -0.00000
       0.00000   1.00000

Matrix Transpose

Dimension reverse

\[A_{ij} = B_{ji}, B = A^T\] \[A =\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}, A^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}\]
    octave:1> A = [1,2,0;0,5,6;7,0,9]
    A =
    
       1   2   0
       0   5   6
       7   0   9
    
    octave:2> A_trans = A'
    A_trans =
    
       1   0   7
       2   5   0
       0   6   9