Cost Function (function that tries to fit our training set)
h(x) = Q0 + Q1x
How do we come up with Q1 and Q0 so that values fit in this function
- lets choose Q0 and Q1 so that our (x,y) are close to this line
- This is a minimization problem
- sum from i = 1 to m -> h(x) - y = small so minimize this
- (1/2m) * min(sum from i = 1 to m compute ---->( h(x) - y)square
So we calculate the different between 2 errors and square them. Then take the average and divide by 2. This is the cost function for linear regression
- This function is also called square error cost function
- This works very well for linear regression problem
Cost Function intuition:
Simplified cost function: pass through origin
Q1 = 1
J(Q1) i.e J(1) = 0
Q1 = 0.5
Q1=0
Finnaly
Minimize

Previously when Q0=0 and we just had Q1 then the function was a parabola with minima at Q1=1. which let to straight line passing through origin
- Now lets also consider Q0 . If we have both the parameters the resulting plot is also a bow shaped plot

We can plot this in 2 D as contour function

Just imagine the bow shaped 3D coming out of the screen with the base as the innermost ellipse. J(Q0,Q1) are same for points that lie on same ellipse
Now we go on seeing which h(x) will give us the minima
So we do some iterations one eg is below

We want a efficient algorithm to find Q0 and Q1 that minimizes J
Gradient descent:
- for minimizing J
- keep changing Q0 and Q1 till we wind up at a local minimum
- What we do in this algorithm is we spin 360 degrees take a look around and ask
- If i want to take a baby step in some direction which will minimize my value which will it be
- gradient descent depends on starting point resulting into different local optima

- alpha above is called learning rate. If alpha is large its aggresive gradient descent ..big steps instead of baby
- Note we must do simultaneous update as if we do otherwise we will get incorrect value

Gradient descent intuition
So basically there are 2 terms
- alpha
derivative term
derivative is the tangent to the slope. So if derivative is +ve learning rate is positive Q1 = Q1- some positive number
So Q1 will move to left which is correct as slope is increading. If alpha is small we might get closer to minima
But what happens when alpha is large
Also if Q1 is at local minimum
Then it will stay there as derivative is 0.
Also as we approach local minimum no need to increase alpha as gradient descent will take smaller steps since derivative will become less steep
First learning algorithm
Result of derivatives

Update simultaneously

Convex function
- Cost function for linear regression is always a bow shaped function and is convex (the bow shaped parabola function)
- So it does not have a local optima . it only has a global optima