Neural Networks
- Multilayer perceptron (MLP)
- Multiple perceptrons are stacked in layers; the outputs of those are fed into another perceptron
- Multiclass - if multiple output classes are possible, can use multiple units in the output layer
- Neural Networks
- Instead of Perceptron in each unit (linear), can use units with activation functions in each layer
- Sigmoid, Hyperbolic tangent (Tanh), ReLU
- This will improve modeling for non-linear relationships
- Network architecture - Forward propagation
- Input Layer - input data, features
- Hidden Layer - receives features are multiplied by weights; passes the multiplication results through the Activation Function
- Output layer - the output of Hidden layer are fed in, multiplied by weights and passed through an Activation Function; prediction is produced
- In general - there is one input layer, one output layer; the rest are hidden layers
- If we have more than two classes, need an output unit to get the probability score for each class
- Strengths
- Can model complicated relationships with large number of features
- No or little feature engineering needed
- Modern tools make these accessible - multiple models exists and are ready to use
- Weaknesses
- Computationally expensive
- Large power consumption
- Outputs are difficult to interpret - black box producing results
- Can easily be overfitted, especially if input data set is small
Training Neural Networks
- Backpropagation
- Working in reverse to distribute the total output error among layers
- Calculate the gradient the cost function, i.e. gradient of each error to each weight - perform gradient descend on the weights
- Approaches to network design
- Stretch pants - start with a network too large for the problem; work to reduce overfitting
- Transfer learning - use a pre-build / pre-trained neural network built for a relevant problem;
- Do tuning - cut off final layers; add and re-train those
Computer Vision
- Image classification - facial recognition, x-ray analysis
- Object detection - find objects within an image by drawing boxes around objects and analyzing the inside (ex: self-driving cars)
- Semantic segmentation - analyze each pixel; find where exactly where the object begins and ends (no boxes)
- Image generation - atomical images
Convolutional neural network
- These are most commonly used in image recognition models
- In a typical set up layers are interconnected
- Every value in a previous layers is connected by a weight to a value in a following layer
- The number of features can grow and the number of potential weights can become unmanageable
- This can weigh on performance and maintenance
- Convolutional neural network (CNNs) uses additional types of layers:
- Convolutional Layers
- Act as filters; a set of weights is applied across the entire data set
- A node is connected only to nodes with the same weight in the layer before; i.e. not all nodes connect to all nodes
- In a nut shell, the way this works is:
- Apply a filter set across a set of values; multiply value by weight; add up the result
- Shift over and apply the same filter set to the next section within the dataset; sum the result again
- Combine the summed up set of results into a future map
- Pooling Layers
- This can then be flattened even further - as the set of feature maps is then combined into a layer, an additional pooling layer is introduced. A section of a feature map is taken and a mean or max is calculated. The resulting value is used to represent the original section - and is then progressed into the next level. The aim is to minimize the number of weights the model needs to train
- ImageNet - a database of images that most image recognition models are trained on
- 14 million images; 20K categories
Natural Language Processing
- Text classification (ex: spam/not-spam)
- Sentiment analysis (ex: determine positive/negative sentiment of a consumer towards a product)
- Search (ex: search for the answer given a question)
- Machine translation (ex: language translation)
- Text generation (ex: generated automated email response to an email)
Text Representation
- Techniques for fitting text into models - converting text to numerical values for use as inputs to a model:
- Vocabulary - bag of words; each word is assigned a value - count up how many times a word appears in text
- Embedding - a word or a document is assigned a numerical value aimed to capture the meaning of the word/text
- Word2Ves, GloVe
- Attention - used in Transformer models. Transformers use:
- Word embedding
- Positional encoding - determining where in a sentence a word is placed
- Attention - measure of how strongly words within a sentence are related, regardless of their position
- The man ate food, because he was hungry