Monday, November 6, 2023

Linear Regression and Regularization

Types of Algorithms

  • Parametric algorithm (Linear model)
    • Based on a mathematical model that defines the relationship between inputs and outputs
    • Fixed and pre-determined set of parameters and coefficients
    • Can learn quickly
    • Can be too simple for real world - prone to underfitting
  • Non-Parametric algorithm
    • Does not make assumptions about relationship b/w input and output ahead of building a model
    • Flexible and adapts well to non-linear data relationships
    • Performs better than linear
    • Requires more data to train; prone to overfitting

Linear Regression

  • Linear relations b/w features and targets (input and the output that we are looking to predict), defined by set of coefficients
  • Simple, easy to interpret and see the relationship between inputs and outputs
  • Yet is a basis for many more complex models
  • Example: Home Price = # of Bedroom
  • Y = W0 + W1*X; W0 is Bias;  W1 is Coefficient/weight
  • Multiple Linear regression
    • y = W0 + W1*X1 + W2*X2 +…+Wp*Xp
    • Additional W's - ex: House Size; Location, etc.
  • Total Error of a Model
    • SSE - Cost function
      • Sum of Squared Errors - (Sum of Predictions minus Actuals) squared
    • The goal is to minimize it

Polynomial Regression

  • Non-linear relationships can be modeled as well
  • Ex: use a non-linear function to create a feature, x=X or log(x); then use it as input

Regularization

  • When a complex model does not predict well on new data
    • Add a penalty factor to the cost function to penalize feature complexity
    • This would reduce the complexity in the model and decrease the probability of overfitting thus helping the model performance
    • I.e. a coefficient per feature - the more features there are the more coefficients are added to the penalty function, which wight on the overall regression cost
    • Use another variable, lamba, to multiply the sum up the coefficients - this controls the severity of the punishment
    • Penalty factor
      • LASSO regression
        • Penalty factor is the sum of the absolute value of the coefficients, multiplied by lambda
        • Reduces the coefficients of irrelevant features to 0
        • Suitable for a simpler and more inheritable model
      • Ridge regression
        • Penalty factor is the sum of the coefficients squared, multiplied by lambda
        • Reduces them close to 0 but does not completely remove them
        • Suitable when there is complex relationship of target to many features with collinearity/correlation

Logistic Regression

  • Mainly used for classification tasks
  • Linear regression is not the best solution when an outcome of 0 or 1 is required
  • Linear regression produces an actual prediction value, not 1 or 0
  • Solution option - predict probability of p(y=1)
  • Use logistics/sigmoid function - the input for this would be the output of the linear model
  • The output of the sigmoid would be between 0 and 1; this can then be interpreted as the probability of either 0 or 1
  • Gradient descent - "an iterative first-order optimization algorithm, used to find a local minimum/maximum of a given function."
  • Use gradient descent to determine the values of the weights that minimize the cost
    • Update the values of the weights in by a small amount to move closer to the weights which minimize our loss/cost function
    • Need - a variable to use in every step called leaning rate; the gradient of the cost/loss function itself; weight values from the previous iteration
  • To visualize in a graph - iterate against the gradient slowly until reach the bottom (i.e. the minimum cost point)
  • What if there more than 1 or (two classes) - predicting multiple classes?
    • Use softmax function instead of sigmoid - it provides the probability of belonging to each class normalized to sum of 1
    • Then take the class with the highest probability value as our prediction


No comments:

Post a Comment