Lab 7: Neural Nets

From 6.034 Wiki

(Difference between revisions)
Jump to: navigation, search
m (typo: calculate_deltas -> update_weights)
m (typo)
Line 101: Line 101:
-
Next, use <tt>calculate_deltas</tt> to implement <tt>update_weights</tt>, which performs a single step of back propagation.  The function should compute <tt>delta_B</tt> values and weight updates for entire neural net, then updates all weights.  The function <tt>update_weights</tt> should return the modified neural net, with updated weights.  (Hint: You can update a weight by modifying the <tt>Wire</tt> to set <tt>wire.weight</tt> to the new value.)
+
Next, use <tt>calculate_deltas</tt> to implement <tt>update_weights</tt>, which performs a single step of back propagation.  The function should compute <tt>delta_B</tt> values and weight updates for entire neural net, then update all weights.  The function <tt>update_weights</tt> should return the modified neural net with appropriately updated weights.  (Hint: You can update the weight of a <tt>Wire</tt> instance by setting its <tt>wire.weight</tt> to a new value.)
  def update_weights(net, input_values, desired_output, r=1):
  def update_weights(net, input_values, desired_output, r=1):

Revision as of 05:45, 9 November 2015

Contents


This lab is due by Monday, November 23 at 10:00pm.

To work on this lab, you will need to get the code, much like you did for the first two labs.

Online tests will be made available by the end of Tuesday, November 10. In the meantime, the local tester provides thorough unit tests for each section of the lab.

Your answers for this lab belong in the main file lab6.py.

Problems

This lab is divided into two independent parts. In the first part, you'll code subroutines necessary for training and using neural networks. In the second part, you'll code subroutines for using and validating support vector machines.

Neural Nets

Wiring a neural net

A neural net is composed of individual neurons, which generally take this form:


Image:Lab6_SimpleNeuron.png


We form the net by combining the neurons into a structure, such as the example shown below.


Image:Lab6_SimpleNeuralNet.png‎


In a neural net with two inputs x and y, each input-layer neuron draws a line and shades one side of it, satisfying the equation ax + by >= T. The remaining neurons in the later layers of the neural net perform logic functions on the shadings.

Each of the following pictures can be produced by a neural nets with two inputs x and y. For each one, determine the minimum number of neurons necessary to produce the picture. Express your answer as a list indicating the number of nodes per layer.

As an example, the neural net shown above would be represented by the list [3, 2, 3, 1].

Image:Neural_net_pictures_2015.jpg

The last picture (nn_grid) is optional. To test it locally, change the boolean flag TEST_NN_GRID to True. (Note: If you downloaded the lab files in the first two hours after it was released, there was a mistake in the answer for nn_grid, which has since been corrected.)

Neural net equations (reference only)

For reference, here are the fundamental equations that define a neural net:

Image:Lab6_NN_Eqns.png

Helper functions

First, you'll code helper functions for the neural nets. The stairstep and sigmoid functions are threshold functions; each neuron in a neural net uses a threshold function to determine whether its input stimulation is large enough for it to emit a non-zero output. Fill in each of the functions below.

stairstep: Computes the output of the stairstep function using the given threshold (T)

def stairstep(x, threshold=0):

sigmoid: Computes the output of the sigmoid function using the given steepness (S) and midpoint (M)

def sigmoid(x, steepness=1, midpoint=0):


The accuracy function is used when training the neural net with back propagation. It measures the performance of the neural net as a function of its desired output and its actual output (given some set of inputs). Note that the accuracy function is symmetric -- that is, it doesn't care which argument is the desired output and which is the actual output.

accuracy: Computes accuracy using desired_output and actual_output. If the neurons in the network are using the stairstep threshold function, the accuracy will range from -0.5 to 0

def accuracy(desired_output, actual_output):

Forward propagation

Next, you'll code forward propagation, which takes in a dictionary of inputs and computes the output of every neuron in a neural net. As part of coding forward propagation, you should understand how a single neuron computes its output as a function of its input: each input into the neuron is multiplied by the weight on the wire, the weighted inputs are summed together, and the sum is passed through a specified threshold function to produce the output.

To compute the output of each neuron in a neural net, iterate over each neuron in the network in order, starting from the input neurons and working toward the output neuron. (Hint: The function net.topological_sort() may be useful; see the API for details). The algorithm is called forward propagation because the outputs you calculate for earlier neurons will be propagated forward through the network and used to calculate outputs for later neurons.


Implement the method forward_prop:

def forward_prop(net, input_values, threshold_fn=stairstep):

Here, net is a neural network, input_values is a dictionary mapping input variables to their values, and threshold_fn is a function that each neuron will use to decide what value to output. This function should return a tuple containing

  1. The overall output value of the network, i.e. the output value associated with the output neuron of the network.
  2. A dictionary mapping neurons to their immediate outputs.

The dictionary of outputs is permitted to contain extra keys (for example, the input values). The function should not modify the neural net in any way.

Backward propagation

Backward propagation is the process of training a neural net using a particular training point to modifying the weights of the network with the goal of improving the network's performance.

To perform back propagation on a given training point, or set of inputs:

  1. Use forward propagation (with the sigmoid threshold function) to compute the output of each neuron in the network.
  2. Compute the update coefficient delta_B for each neuron in the network, starting from the output neuron and working backward toward the input neurons. (The function net.topological_sort() can be useful here; see the API for details).
  3. Use the update coefficients delta_B to compute new weights for the network.
  4. Update all of the weights in the network.


You have already coded the forward_propagation routine. To complete the definition of back propagation, you'll define a helper function calculate_deltas for computing the update coefficients delta_B of each neuron in the network, and a function update_weights that retrieves the list of update coefficients using calculate_deltas, then modifies the weights of the network accordingly.

Implement calculate_deltas to return a dictionary mapping neurons to update coefficients (delta_B values):

def calculate_deltas(net, input_values, desired_output):


Next, use calculate_deltas to implement update_weights, which performs a single step of back propagation. The function should compute delta_B values and weight updates for entire neural net, then update all weights. The function update_weights should return the modified neural net with appropriately updated weights. (Hint: You can update the weight of a Wire instance by setting its wire.weight to a new value.)

def update_weights(net, input_values, desired_output, r=1):


Now you're ready to complete the back_prop function, which continues to update weights in the neural net until the accuracy surpasses the accuracy threshold. back_prop should return a tuple containing:

  1. The modified neural net, with trained weights
  2. The number of iterations (that is, the number of weight updates)
def back_prop(net, input_values, desired_output, r=1, accuracy_threshold=-.001):


If your code is failing the back_prop tests but passing everything else:

  • Make sure you have the current version of tests.py and neural_net_api.py, which were updated on 11/7.
  • Make sure you're using out_A in the weight update formula (r * out_A * delta_B), not out_B.

Extra Credit Project: Train a neural net on a dataset

In practice, we would want to use multiple training points to train a neural net, not just one. There are many possible implementations -- for instance, you could put all the training points into a queue and perform back propagation with each point in turn. Alternatively, you could use a multidimensional accuracy function and try to train with multiple training points simultaneously.

Here are some examples of datasets that you could use to train a 2-input neural net:

1. The letter L

4 + -
3 + - 
2 + -
1 + - - - -
0 - + + + +
  0 1 2 3 4

2. This moat-like shape:

4 - - - - -
3 -       - 
2 -   +   -
1 -       -
0 - - - - -
  0 1 2 3 4

3. This patchy shape:

4 - -   + +
3 - -   + +
2        
1 + +   - -
0 + +   - -
  0 1 2 3 4

It should be possible to train a 5-neuron neural net to classify any one of the three shapes.

For some unspecified amount of extra credit, extend your code and/or the API to train a neural net on multiple inputs. The steps include:

  • Wire up a neural net (see nn_problems.py for some examples)
  • Encode the training data (you can use one of the example datasets above, a quiz problem, or make up your own)
  • Develop an algorithm (e.g. extend your back_prop function) for training your neural net on multiple training points
  • Initialize your neural net with random weights
  • Train the neural net and see if it works!

To receive your extra credit, send your code to 6.034-2015-support@mit.edu, ideally with some sort of documentation. (Briefly explain what you did and how you did it.)

Support Vector Machines

Vector Math

We'll start with some basic functions for manipulating vectors. For now, we will represent an n-dimensional vector as a list or tuple of n coordinates. dot_product should compute the dot product of two vectors, while norm computes the length of a vector. (There is a simple implementation of norm that uses dot_product.) Implement both functions:

def dot_product(u, v):
def norm(v):


Later, when you need to actually manipulate vectors, note that we have also provided methods for adding two vectors (vector_add) and for multiplying a scalar by a vector (scalar_mult). (See the API, below.)

SVM Equations (reference only)

Next, we will use the following five SVM equations. Recall from recitation that Equations 1-3 define the decision boundary and the margin width, while Equations 4 & 5 can be used to calculate the alpha (supportiveness) values for the training points.


Image:Lab6 Eqns.png


For more information about how to apply these equations, see:

Using the SVM Boundary Equations

We will start by using Equation 1 to classify a point with a given SVM, ignoring the point's actual classification (if any). According to Equation 1, a point's classification is +1 when w·x + b > 0, or -1 when w·x + b < 0. If w·x + b = 0, the point lies on the decision boundary, so its classification is ambiguous.

First, evaluate the expression w·x + b to calculate how positive a point is. svm is a SupportVectorMachine object, and point is a Point (see the API below for details).

def positiveness(svm, point):


Next, classify a point as +1 or -1. If the point lies on the boundary, return 0.

def classify(svm, point):


Then, use the SVM's current decision boundary to calculate its margin width (Equation 2):

def margin_width(svm):


Finally, we will check that the gutter constraint is satisfied. The gutter constraint requires that positiveness be +1 for positive support vectors, and -1 for negative support vectors. Our function will also check that no training points lie between the gutters -- that is, every training point should have a positiveness value indicating that it either lies on a gutter or outside the margin. Implement check_gutter_constraint, which should return a set (not a list) of the training points that violate one or both conditions:

def check_gutter_constraint(svm):

Using the Supportiveness Equations

To train a support vector machine, every training point is assigned a supportiveness value (also known as an alpha value, or a Lagrange multiplier), representing how important the point is in determining (or "supporting") the decision boundary. The supportiveness values must satisfy a number of conditions, which we will check below.

First, implement check_alpha_signs, which should check that each point's supportiveness value is non-negative, and that all non-support vectors have a supportiveness of 0. This function, like check_gutter_constraint above, should return a set of the training points that violate either of the supportiveness conditions.

def check_alpha_signs(svm):


Implement check_alpha_equations to check that the SVM's supportiveness values are consistent with its boundary equation and the classifications of its training points. Return True if both Equations 4 and 5 are satisfied, otherwise False.

def check_alpha_equations(svm):

Classification Accuracy

Once a support vector machine has been trained -- or even while it is being trained -- we want to know how well it has classified the training data. Write a function that checks whether the training points were classified correctly and returns a set containing the training points that were misclassified, if any.

def misclassified_training_points(svm):


Extra Credit Project: Train a support vector machine

In class and in this lab, we have seen how to calculate the final parameters of an SVM (given the decision boundary), and we've used the equations to assess how well an SVM has been trained, but we haven't actually attempted to train an SVM. In practice, training an SVM is a hill-climbing problem in alpha-space using the Lagrangian. There's a bit of math involved. The following resources may be helpful:


For some unspecified amount of extra credit, extend your code and/or the API to train a support vector machine. Possible extensions include implementing kernel functions or writing some code to graphically display your training data and SVM boundary. If you can come up with a reasonably simple procedure for training an SVM, preferably using only built-in Python packages, we may even use your code in a future 6.034 lab! (With your permission, of course.)

To receive your extra credit, send your code to 6.034-2015-support@mit.edu, ideally with some sort of documentation. (Briefly explain what you did and how you did it.)

Survey

Please answer these questions at the bottom of your lab6.py file:

  • NAME: What is your name? (string)
  • COLLABORATORS: Other than 6.034 staff, whom did you work with on this lab? (string, or empty string if you worked alone)
  • HOW_MANY_HOURS_THIS_LAB_TOOK: Approximately how many hours did you spend on this lab? (number or string)
  • WHAT_I_FOUND_INTERESTING: Which parts of this lab, if any, did you find interesting? (string)
  • WHAT_I_FOUND_BORING: Which parts of this lab, if any, did you find boring or tedious? (string)
  • (optional) SUGGESTIONS: What specific changes would you recommend, if any, to improve this lab for future years? (string)


(We'd ask which parts you find confusing, but if you're confused you should really ask a TA.)

When you're done, run the online tester to submit your code.

API

Neural Nets

The file neural_net_api.py defines the Wire and NeuralNet classes, described below.

NeuralNet

A neural net is represented as a directed graph whose edges are Wires and nodes can be neurons, inputs, or the output OUT.

A NeuralNet has the following attributes:

  • net.inputs, a list of named inputs to the network. An input can be either variable or constant: Variable inputs are represented by strings denoting their names, while constant inputs are represented as numbers (eg -1).
  • net.neurons, a list of neurons. Neurons are represented as strings denoting their names.
  • net.wires, a list of the wires (edges) that connect the nodes in the network. Each wire is a Wire object (see below.)

In this lab, inputs are supplied to neural nets in the form of a dictionary input_values that associates each named (variable) input with an input value.


The constant string NeuralNet.OUT represents the final output of a neural net. It is not a neuron, but it is connected to the output neuron by a wire with weight 1.


You can retrieve the nodes (neurons, inputs, or NeuralNet.OUT) in a network:

  • net.get_incoming_neighbors(node). Return a list of the nodes which are connected as inputs to node
  • net.get_outgoing_neighbors(node). Return a list of the nodes to which node sends its output.
  • net.get_output_neuron(). Return the output neuron of the network, which is the neuron that leads to NeuralNet.OUT. (In this lab, each neural net has exactly one output neuron.)
  • net.topological_sort(). Return a sorted list of all the neurons in the network. The list is "topologically" sorted, which means that each neuron appears in the list after all the neurons that provide its inputs. Thus, the input layer neurons are first, the output neuron is last, etc.


You can also retrieve the wires:

  • net.get_wires(startNode=None, endNode=None). Return a list of all the wires in the network. If startNode or endNode are provided, returns only wires that start/end at particular nodes. Note that there is a wire leading from the output neuron to NeuralNet.OUT whose weight should always be 1.
  • net.get_incoming_wires(node). Return a list of wires that feed in to the given node.
  • net.get_outgoing_wires(node). Return a list of wires that the node feeds into. (That is, wires that lead out of the node.)


Finally, you can query the parts of the network:

  • net.has_incoming_neuron(node). Returns True if the node has at least one incoming neuron, otherwise False. (Note: this method was added on 11/7)
  • net.is_output_neuron(neuron). Return True if the neuron is the final, output neuron in the network, otherwise False.
  • net.is_connected(startNode, endNode). Return True if there is a wire from startNode to endNode in the network, otherwise False.


Wire

A Wire is represented as a weighted, directed edge in a graph. A wire can connect an input to a neuron, a neuron to a neuron, or a neuron to NeuralNet.OUT. A wire's attributes are:

  • wire.startNode, the input or neuron at which the wire starts. An input can be either a string or a number (eg -1).
  • wire.endNode, the neuron (or NeuralNet.OUT) at which the wire ends.
  • wire.weight, the weight on the wire (a float or int).

Support Vector Machines

The file svm_api.py defines the Point, DecisionBoundary, and SupportVectorMachine classes, as well as some helper functions for vector math, all described below.

Point

A Point has the following attributes:

  • point.name, the name of the point (a string).
  • point.coords, the coordinates of the point, represented as a vector (a tuple or list of numbers).
  • point.classification, the classification of the point, if known. Note that classification (if any) is a number, typically +1 or -1.
  • point.alpha, the supportiveness (alpha) value of the point, if assigned.


DecisionBoundary

A DecisionBoundary is defined by two parameters: a normal vector w, and an offset b. w is represented as a vector: a list or tuple of coordinates. You can access these parameters using the attributes .w and .b.


SupportVectorMachine

A SupportVectorMachine is a classifier that uses a DecisionBoundary to classify points. It has a list of training points and optionally a list of support vectors. You can access these parameters using these attributes:

  • svm.boundary, the SVM's DecisionBoundary.
  • svm.training_points, a list of Point objects with known classifications.
  • svm.support_vectors, a list of Point objects that serve as support vectors for the SVM. Every support vector is also a training point.


Helper functions for vector math

vector_add: Given two vectors represented as lists or tuples of coordinates, returns their sum as a list of coordinates.

scalar_mult: Given a constant scalar and a vector (as a tuple or list of coordinates), returns a scaled list of coordinates.

Personal tools