Aleks on AWS: Linear Regression and Regularization

Types of Algorithms

Based on a mathematical model that defines the relationship between inputs and outputs
Fixed and pre-determined set of parameters and coefficients
Can learn quickly
Can be too simple for real world - prone to underfitting

Does not make assumptions about relationship b/w input and output ahead of building a model
Flexible and adapts well to non-linear data relationships
Performs better than linear
Requires more data to train; prone to overfitting

Linear Regression

Linear relations b/w features and targets (input and the output that we are looking to predict), defined by set of coefficients
Simple, easy to interpret and see the relationship between inputs and outputs
Yet is a basis for many more complex models
Example: Home Price = # of Bedroom
Y = W0 + W1*X; W0 is Bias; W1 is Coefficient/weight
Multiple Linear regression

Polynomial Regression

Non-linear relationships can be modeled as well
Ex: use a non-linear function to create a feature, x=X or log(x); then use it as input

Regularization

Add a penalty factor to the cost function to penalize feature complexity
This would reduce the complexity in the model and decrease the probability of overfitting thus helping the model performance
I.e. a coefficient per feature - the more features there are the more coefficients are added to the penalty function, which wight on the overall regression cost
Use another variable, lamba, to multiply the sum up the coefficients - this controls the severity of the punishment
Penalty factor

Penalty factor is the sum of the absolute value of the coefficients, multiplied by lambda
Reduces the coefficients of irrelevant features to 0
Suitable for a simpler and more inheritable model

Penalty factor is the sum of the coefficients squared, multiplied by lambda
Reduces them close to 0 but does not completely remove them
Suitable when there is complex relationship of target to many features with collinearity/correlation

Logistic Regression

Mainly used for classification tasks
Linear regression is not the best solution when an outcome of 0 or 1 is required
Linear regression produces an actual prediction value, not 1 or 0
Solution option - predict probability of p(y=1)
Use logistics/sigmoid function - the input for this would be the output of the linear model
The output of the sigmoid would be between 0 and 1; this can then be interpreted as the probability of either 0 or 1
Gradient descent - "an iterative first-order optimization algorithm, used to find a local minimum/maximum of a given function."
Use gradient descent to determine the values of the weights that minimize the cost

Update the values of the weights in by a small amount to move closer to the weights which minimize our loss/cost function
Need - a variable to use in every step called leaning rate; the gradient of the cost/loss function itself; weight values from the previous iteration

To visualize in a graph - iterate against the gradient slowly until reach the bottom (i.e. the minimum cost point)
What if there more than 1 or (two classes) - predicting multiple classes?

Use softmax function instead of sigmoid - it provides the probability of belonging to each class normalized to sum of 1
Then take the class with the highest probability value as our prediction

Aleks on AWS