cost function

Cost Function (function that tries to fit our training set)

h(x) = Q0 + Q1x

  • Q0 and Q1 are parameters

How do we come up with Q1 and Q0 so that values fit in this function

  • lets choose Q0 and Q1 so that our (x,y) are close to this line
  • This is a minimization problem
  • sum from i = 1 to m -> h(x) - y = small so minimize this
  • (1/2m) * min(sum from i = 1 to m compute ---->( h(x) - y)square

So we calculate the different between 2 errors and square them. Then take the average and divide by 2. This is the cost function for linear regression

  • This function is also called square error cost function
  • This works very well for linear regression problem

Cost Function intuition:

Simplified cost function: pass through origin

  • h(x) = Q1x .

Q1 = 1

J(Q1) i.e J(1) = 0

Q1 = 0.5

Q1=0

Finnaly

Minimize

Previously when Q0=0 and we just had Q1 then the function was a parabola with minima at Q1=1. which let to straight line passing through origin

  • Now lets also consider Q0 . If we have both the parameters the resulting plot is also a bow shaped plot

We can plot this in 2 D as contour function

  • Just imagine the bow shaped 3D coming out of the screen with the base as the innermost ellipse. J(Q0,Q1) are same for points that lie on same ellipse

  • Now we go on seeing which h(x) will give us the minima

  • So we do some iterations one eg is below

We want a efficient algorithm to find Q0 and Q1 that minimizes J

Gradient descent:

  • for minimizing J
  • keep changing Q0 and Q1 till we wind up at a local minimum
  • What we do in this algorithm is we spin 360 degrees take a look around and ask
    • If i want to take a baby step in some direction which will minimize my value which will it be
    • gradient descent depends on starting point resulting into different local optima

  • alpha above is called learning rate. If alpha is large its aggresive gradient descent ..big steps instead of baby
  • Note we must do simultaneous update as if we do otherwise we will get incorrect value

Gradient descent intuition

So basically there are 2 terms

  • alpha
  • derivative term

  • derivative is the tangent to the slope. So if derivative is +ve learning rate is positive Q1 = Q1- some positive number

  • So Q1 will move to left which is correct as slope is increading. If alpha is small we might get closer to minima

  • But what happens when alpha is large

Also if Q1 is at local minimum

Then it will stay there as derivative is 0.

Also as we approach local minimum no need to increase alpha as gradient descent will take smaller steps since derivative will become less steep

First learning algorithm

Result of derivatives

Update simultaneously

Convex function

  • Cost function for linear regression is always a bow shaped function and is convex (the bow shaped parabola function)
  • So it does not have a local optima . it only has a global optima

results for ""

    No results matching ""