Testing a Model
- Divide data into
 - Training Set - build / train the model
 - Test Set - model performance evaluation set; data different from Training set
 - Need to be careful with data leakage - test data getting into model building process
 - This causes performance estimate to be overly optimistic vs. when using new data
 - Further split Training set into:
 - Training Set
 - Validation Set - use for improving the model and monitoring performance on the test set
 - And only then test using the Test set - as an unbiased exercise on a separate set of data
 - K-Folds Cross Validation
 - Another test strategy
 - Split the data into subsets ("folds")
 - Run multiple iteration of the model using a separate "fold"
 - Calculate the Error as the Average Error across K-Folds validation runs
 - This approach is industry standard as it:
 - Maximizes the use training data
 - Provides better insight into performance
 
Model Evaluation
- Metrics used
 - Outcome - business impact, usually $
 - Output - model output; usually remains internal, non-customer
 - Regression Metrics:
 - MSE - Mean Squared Error
 - Most popularly used
 - Influenced by large errors (these are penalized heavily), scale of data
 - This is the model to use of concerned with minimizing large errors on outlier datapoints
 - MAE - Mean Absolute Error
 - Influenced by scale
 - Can be easier to interpret
 - Doesn’t go after outlier results as heavily as MSE
 - MAPE - Mean Absolute Percent Error
 - Converts an error to %
 - Easer understood by non-technical clientele
 - An error could be small - but when compared to the value of the target, the % could be large
 - Coefficient of Determination (R-squared)
 - Definition - the proportion of the variation in the dependent variable that is predictable from the independent variable(s)
 - Displays how much of the variability in the target variable is explained by the model
 - Totals squared deviation from mean (SST) = Sum of squares regression (SSR) + Total squared error (SSE)
 - R-squared = SSR/SST = 2- SSE/SSR
 - Usually b/w 1 and 0; the closer it is to 1, the better it explains variability in the target variable
 
Accuracy
- To aim at better accuracy be careful with input data class imbalance (ex: how many days a year a healthy person is sick?)
 - Confusion Matrix - True/False Positives vs True/False Negatives
 - FPR Rate - False Positives Rate (FPR), i.e. False Positives/Total True + False Positives
 - Precision - True Positives Rate (TPR), i.e. True Positives/Total True + False Positives
 - Receiver Operating Characteristic (ROC) Curves
 - FPR and TPR across different threshold values for the given model
 - I.e. set a threshold - a prediction below the threshold is a Negative, above - Positive
 - Typical threshold is 0.5
 - Use the Positives/Negatives to generate TPR/FPR by comparing these to the actual targets
 - Re-run this across different threshold and plot TPR (y-axis) and FPR (x-axis) on a graph - ROC Curve
 - Areas Under ROC (AUROC)
 - Area under ROC Curve
 - Higher AUROC - better quality model
 - Precision Recall Curve (PR)
 - Measures Precision vs Recall values across multiple thresholds
 - Used in situations of high class imbalance (many 0's, few 1's)
 - Does not factor True Negatives
 
Errors
- Common error causes:
 - Proper problem faming and metric section
 - Data quality
 - Feature selection
 - Model fit
 - Inherent error
 
No comments:
Post a Comment