Thursday, November 9, 2023

Neural Networks

Neural Networks

  • Multilayer perceptron (MLP)
    • Multiple perceptrons are stacked in layers; the outputs of those are fed into another perceptron
    • Multiclass - if multiple output classes are possible, can use multiple units in the output layer
  • Neural Networks
    • Instead of Perceptron in each unit (linear), can use units with activation functions in each layer
      • Sigmoid, Hyperbolic tangent (Tanh), ReLU
    • This will improve modeling for non-linear relationships
    • Network architecture - Forward propagation
      • Input Layer - input data, features
      • Hidden Layer - receives features are multiplied by weights; passes the multiplication results through the Activation Function
      • Output layer - the output of Hidden layer are fed in, multiplied by weights and passed through an Activation Function; prediction is produced
      • In general - there is one input layer, one output layer; the rest are hidden layers
      • If we have more than two classes, need an output unit to get the probability score for each class
    • Strengths
      • Can model complicated relationships with large number of features
      • No or little feature engineering needed
      • Modern tools make these accessible - multiple models exists and are ready to use
    • Weaknesses
      • Computationally expensive
      • Large power consumption
      • Outputs are difficult to interpret - black box producing results
      • Can easily be overfitted, especially if input data set is small

Training Neural Networks

  • Backpropagation
    • Working in reverse to distribute the total output error among layers
    • Calculate the gradient the cost function, i.e. gradient of each error to each weight - perform gradient descend on the weights
  • Approaches to network design
    • Stretch pants - start with a network too large for the problem; work to reduce overfitting
    • Transfer learning - use a pre-build / pre-trained neural network built for  a relevant problem;
      • Do tuning - cut off final layers; add and re-train those

Common Use Cases

Computer Vision
  • Image classification - facial recognition, x-ray analysis
  • Object detection - find objects within an image by drawing boxes around objects and analyzing the inside (ex: self-driving cars)
  • Semantic segmentation - analyze each pixel; find where exactly where the object begins and ends (no boxes)
  • Image generation - atomical images

Convolutional neural network

  • These are most commonly used in image recognition models
  • In a typical set up layers are interconnected
    • Every value in a previous layers is connected by a weight to a value in a following layer
  • The number of features can grow and the number of potential weights can become unmanageable
  • This can weigh on performance and maintenance
  • Convolutional neural network (CNNs) uses additional types of layers:
    • Convolutional Layers
      • Act as filters; a set of weights is applied across the entire data set
      • A node is connected only to nodes with the same weight in the layer before; i.e. not all nodes connect to all nodes
      • In a nut shell, the way this works is:
        • Apply a filter set across a set of values; multiply value by weight; add up the result
        • Shift over and apply the same filter set to the next section within the dataset; sum the result again
        • Combine the summed up set of results into a future map
    • Pooling Layers
      • This can then be flattened even further - as the set of feature maps is then combined into a layer, an additional pooling layer is introduced. A section of a feature map is taken and a mean or max is calculated. The resulting value is used to represent the original section - and is then progressed into the next level. The aim is to minimize the number of weights the model needs to train
      • ImageNet - a database of images that most image recognition models are trained on
        • 14 million images; 20K categories

Natural Language Processing

  • Text classification (ex: spam/not-spam)
  • Sentiment analysis (ex: determine positive/negative sentiment of a consumer towards a product)
  • Search (ex: search for the answer given a question)
  • Machine translation (ex: language translation)
  • Text generation (ex: generated automated email response to an email)

Text Representation

  • Techniques for fitting text into models - converting text to numerical values for use as inputs to a model:
  • Vocabulary - bag of words; each word is assigned a value - count up how many times a word appears in text
  • Embedding - a word or a document is assigned a numerical value aimed to capture the meaning of the word/text
    • Word2Ves, GloVe
  • Attention - used in Transformer models. Transformers use:
    • Word embedding
    • Positional encoding - determining where in a sentence a word is placed
    • Attention - measure of how strongly words within a sentence are related, regardless of their position
      • The man ate food, because he was hungry

No comments:

Post a Comment