Types of Algorithms
- Parametric algorithm (Linear model)
- Based on a mathematical model that defines the relationship between inputs and outputs
- Fixed and pre-determined set of parameters and coefficients
- Can learn quickly
- Can be too simple for real world - prone to underfitting
- Non-Parametric algorithm
- Does not make assumptions about relationship b/w input and output ahead of building a model
- Flexible and adapts well to non-linear data relationships
- Performs better than linear
- Requires more data to train; prone to overfitting
Linear Regression
- Linear relations b/w features and targets (input and the output that we are looking to predict), defined by set of coefficients
- Simple, easy to interpret and see the relationship between inputs and outputs
- Yet is a basis for many more complex models
- Example: Home Price = # of Bedroom
- Y = W0 + W1*X; W0 is Bias; W1 is Coefficient/weight
- Multiple Linear regression
- y = W0 + W1*X1 + W2*X2 +…+Wp*Xp
- Additional W's - ex: House Size; Location, etc.
- Total Error of a Model
- SSE - Cost function
- Sum of Squared Errors - (Sum of Predictions minus Actuals) squared
- The goal is to minimize it
Polynomial Regression
- Non-linear relationships can be modeled as well
- Ex: use a non-linear function to create a feature, x=X or log(x); then use it as input
Regularization
- When a complex model does not predict well on new data
- Add a penalty factor to the cost function to penalize feature complexity
- This would reduce the complexity in the model and decrease the probability of overfitting thus helping the model performance
- I.e. a coefficient per feature - the more features there are the more coefficients are added to the penalty function, which wight on the overall regression cost
- Use another variable, lamba, to multiply the sum up the coefficients - this controls the severity of the punishment
- Penalty factor
- LASSO regression
- Penalty factor is the sum of the absolute value of the coefficients, multiplied by lambda
- Reduces the coefficients of irrelevant features to 0
- Suitable for a simpler and more inheritable model
- Ridge regression
- Penalty factor is the sum of the coefficients squared, multiplied by lambda
- Reduces them close to 0 but does not completely remove them
- Suitable when there is complex relationship of target to many features with collinearity/correlation
Logistic Regression
- Mainly used for classification tasks
- Linear regression is not the best solution when an outcome of 0 or 1 is required
- Linear regression produces an actual prediction value, not 1 or 0
- Solution option - predict probability of p(y=1)
- Use logistics/sigmoid function - the input for this would be the output of the linear model
- The output of the sigmoid would be between 0 and 1; this can then be interpreted as the probability of either 0 or 1
- Gradient descent - "an iterative first-order optimization algorithm, used to find a local minimum/maximum of a given function."
- Use gradient descent to determine the values of the weights that minimize the cost
- Update the values of the weights in by a small amount to move closer to the weights which minimize our loss/cost function
- Need - a variable to use in every step called leaning rate; the gradient of the cost/loss function itself; weight values from the previous iteration
- To visualize in a graph - iterate against the gradient slowly until reach the bottom (i.e. the minimum cost point)
- What if there more than 1 or (two classes) - predicting multiple classes?
- Use softmax function instead of sigmoid - it provides the probability of belonging to each class normalized to sum of 1
- Then take the class with the highest probability value as our prediction
No comments:
Post a Comment