Sunday, November 5, 2023

Testing and Evaluation of a Model

Testing a Model

  • Divide data into
    • Training Set - build / train the model
    • Test Set - model performance evaluation set; data different from Training set
  • Need to be careful with data leakage - test data getting into model building process
  • This causes performance estimate to be overly optimistic vs. when using new data
    • Further split Training set into:
      • Training Set
      • Validation Set - use for improving the model and monitoring performance on the test set
    • And only then test using the Test set - as an unbiased exercise on a separate set of data
  • K-Folds Cross Validation
    • Another test strategy
    • Split the data into subsets ("folds")
    • Run multiple iteration of the model using a separate "fold"
    • Calculate the Error as the Average Error across K-Folds validation runs
    • This approach is industry standard as it:
      • Maximizes the use training data
      • Provides better insight into performance

Model Evaluation

  • Metrics used
    • Outcome - business impact, usually $
    • Output - model output; usually remains internal, non-customer
  • Regression Metrics:
    • MSE - Mean Squared Error
      • Most popularly used
      • Influenced by large errors (these are penalized heavily), scale of data
      • This is the model to use of concerned with minimizing large errors on outlier datapoints
    • MAE - Mean Absolute Error
      • Influenced by scale
      • Can be easier to interpret
      • Doesn’t go after outlier results as heavily as MSE
    • MAPE - Mean Absolute Percent Error
      • Converts an error to %
      • Easer understood by non-technical clientele
      • An error could be small - but when compared to the value of the target, the % could be large
    • Coefficient of Determination (R-squared)
      • Definition - the proportion of the variation in the dependent variable that is predictable from the independent variable(s)
      • Displays how much of the variability in the target variable is explained by the model
      • Totals squared deviation from mean (SST) = Sum of squares regression (SSR) + Total squared error (SSE)
      • R-squared = SSR/SST = 2- SSE/SSR
      • Usually b/w 1 and 0; the closer it is to 1, the better it explains variability in the target variable

Accuracy

  • To aim at better accuracy be careful with input data class imbalance (ex: how many days a year a healthy person is sick?)
  • Confusion Matrix - True/False Positives vs True/False Negatives
    • FPR Rate - False Positives Rate (FPR), i.e. False Positives/Total True + False Positives
    • Precision - True Positives Rate (TPR), i.e. True Positives/Total True + False Positives
  • Receiver Operating Characteristic (ROC) Curves
    • FPR and TPR across different threshold values for the given model
    • I.e. set a threshold - a prediction below the threshold is a Negative, above - Positive
    • Typical threshold is 0.5
    • Use the Positives/Negatives to generate TPR/FPR by comparing these to the actual targets
    • Re-run this across different threshold and plot TPR (y-axis) and FPR (x-axis) on a graph - ROC Curve
  • Areas Under ROC (AUROC)
    • Area under ROC Curve
    • Higher AUROC - better quality model
  • Precision Recall Curve (PR)
    • Measures Precision vs Recall values across multiple thresholds
    • Used in situations of high class imbalance (many 0's, few 1's)
    • Does not factor True Negatives

Errors

  • Common error causes:
    • Proper problem faming and metric section
    • Data quality
    • Feature selection
    • Model fit
    • Inherent error

No comments:

Post a Comment