Wednesday, November 8, 2023

Deep Learning

Deep learning

  • "A subfield of machine learning that focuses on artificial neural networks and deep neural networks."
  • Neural Network - "computational model that is inspired by the structure and functioning of the human brain" (ChatGTP definition)
  • Neural network with many layer - Deep neural network
  • Pre-history:
    • 1943, Warren McCulloch and Walter Pitts - publish a model for how neurons work together to perform computations
    • 2000's saw a boom in deep learning as more computation power and data became available
  • Deep learning shines when
    • Great amount of data is available
    • Large amount features (ex: unstructured data)
    • Complex relationship b/w features and data
    • Explainability is not highly required (black box!)

Artificial Neurons

  • Perceptron
    • A simple model - set of weight is multiplied by a set of coefficient
    • The result is passed through a threshold function
    • If it >= threshold, the neuron is activated
    • The output is a 0/1 prediction; no probability values for each class is calculated
    • Used for binary classification tasks; in reality - it is a simple linear model
  • Logistic Regression
    • Same as above, but the result of the weights times the coefficients is passed through an Activation (sigmoid) Function prior to going into Threshold Function
    • The model output is a probability from 0 to 1; it is then be converted to a 0 or 1 prediction
    • If probability is >=0.5, prediction is 1, otherwise 0
    • Goal - find the weight values that minimize the cost (error)

Training a Neuron

  • Forward Propagation - first a set of weights is propagated through the model, a prediction is calculated
  • Calculate the cost (error) and the gradient of that cost with respect to each weight; go through each weight and update its value using gradient descend 
  • Eventually the values of weight that result in minimal cost are found - these weights will be used in the final model
  • Need to focus on Learning rate - how big of a step is taken in going though gradient descent
    • If too small - the algorithm will be too slow
    • If too large - the gradient will be jumping all over the place
  • Gradient descent methods
    • Stochastic (SGD)
      • Iterate though observations one at a time; calculate gradient and weights
      • Pros - works well for large data sets, online learning
      • Cons - cannot use algebra operations
    • Batch
      • Use the entire data set in each update; calculate gradient and weights based on al updates in each iteration
      • Pros - can use vectorized matrix operations; use those efficiently
      • Cons - may be impossible to achieve on large data sets due to the compute power required
    • Mini-batch
      • Divide the data set into smaller batches; perform batch gradient descent on each
      • Pros - works well for large data sets, uses vectorized operations
      • Cons - not as good as SGD for online learning (when observations come in one at a time)

No comments:

Post a Comment