Aleks on AWS: Neural Networks

Neural Networks

Multiple perceptrons are stacked in layers; the outputs of those are fed into another perceptron
Multiclass - if multiple output classes are possible, can use multiple units in the output layer

Instead of Perceptron in each unit (linear), can use units with activation functions in each layer

Input Layer - input data, features
Hidden Layer - receives features are multiplied by weights; passes the multiplication results through the Activation Function
Output layer - the output of Hidden layer are fed in, multiplied by weights and passed through an Activation Function; prediction is produced
In general - there is one input layer, one output layer; the rest are hidden layers
If we have more than two classes, need an output unit to get the probability score for each class

Can model complicated relationships with large number of features
No or little feature engineering needed
Modern tools make these accessible - multiple models exists and are ready to use

Training Neural Networks

Working in reverse to distribute the total output error among layers
Calculate the gradient the cost function, i.e. gradient of each error to each weight - perform gradient descend on the weights

Stretch pants - start with a network too large for the problem; work to reduce overfitting
Transfer learning - use a pre-build / pre-trained neural network built for a relevant problem;

Common Use Cases

Computer Vision

Image classification - facial recognition, x-ray analysis
Object detection - find objects within an image by drawing boxes around objects and analyzing the inside (ex: self-driving cars)
Semantic segmentation - analyze each pixel; find where exactly where the object begins and ends (no boxes)
Image generation - atomical images

Convolutional neural network

Every value in a previous layers is connected by a weight to a value in a following layer

The number of features can grow and the number of potential weights can become unmanageable
This can weigh on performance and maintenance
Convolutional neural network (CNNs) uses additional types of layers:

Act as filters; a set of weights is applied across the entire data set
A node is connected only to nodes with the same weight in the layer before; i.e. not all nodes connect to all nodes
In a nut shell, the way this works is:

Apply a filter set across a set of values; multiply value by weight; add up the result
Shift over and apply the same filter set to the next section within the dataset; sum the result again
Combine the summed up set of results into a future map

This can then be flattened even further - as the set of feature maps is then combined into a layer, an additional pooling layer is introduced. A section of a feature map is taken and a mean or max is calculated. The resulting value is used to represent the original section - and is then progressed into the next level. The aim is to minimize the number of weights the model needs to train
ImageNet - a database of images that most image recognition models are trained on

Natural Language Processing

Text classification (ex: spam/not-spam)
Sentiment analysis (ex: determine positive/negative sentiment of a consumer towards a product)
Search (ex: search for the answer given a question)
Machine translation (ex: language translation)
Text generation (ex: generated automated email response to an email)

Text Representation

Techniques for fitting text into models - converting text to numerical values for use as inputs to a model:
Vocabulary - bag of words; each word is assigned a value - count up how many times a word appears in text
Embedding - a word or a document is assigned a numerical value aimed to capture the meaning of the word/text

Word embedding
Positional encoding - determining where in a sentence a word is placed
Attention - measure of how strongly words within a sentence are related, regardless of their position

Aleks on AWS