Lab 7: Neural Nets

From 6.034 Wiki

(Difference between revisions)
Jump to: navigation, search
m (Questions 10-12: Conceptual Questions)
m (Basic back propagation)
Line 159: Line 159:
  def back_prop(net, input_values, desired_output, r=1, minimum_accuracy=-0.001):
  def back_prop(net, input_values, desired_output, r=1, minimum_accuracy=-0.001):
-
Good luck!
+
Once you finish, you're all done writing code in this lab!
== Training a Neural Net ==
== Training a Neural Net ==

Revision as of 03:41, 22 October 2016

Contents


This lab is due by Wednesday, November 2 at 10:00pm.

To work on this lab, you will need to get the code, much like you did for the previous labs. You can:


Your answers for this lab belong in the main file lab6.py.

Problems: Neural Nets

Neural Net Subroutines

Wiring a neural net

A neural net is composed of individual neurons, which generally take this form:


Image:Lab6_SimpleNeuron.png


We form the net by combining the neurons into a structure, such as the example shown below.


Image:Lab6_SimpleNeuralNet.png‎


In a neural net with two inputs x and y, each input-layer neuron draws a line and shades one side of it, satisfying the equation ax + by >= T. The remaining neurons in the later layers of the neural net perform logic functions on the shadings.

Each of the following pictures can be produced by a neural net with two inputs x and y. For each one, determine the minimum number of neurons necessary to produce the picture. Express your answer as a list indicating the number of nodes per layer.

As an example, the neural net shown above would be represented by the list [3, 2, 3, 1].

Image:Lab6_nn_pictures.png

Neural net equations (reference only)

For reference, here are the fundamental equations that define a neural net:

Image:Lab6_nn_equations.png

Threshold functions

First, you'll code some threshold functions for the neural nets. The stairstep, sigmoid, and ReLU functions are threshold functions; each neuron in a neural net uses a threshold function to determine whether its input stimulation is large enough for it to emit a non-zero output. Fill in each of the functions below.

stairstep: Computes the output of the stairstep function using the given threshold (T)

def stairstep(x, threshold=0):

sigmoid: Computes the output of the sigmoid function using the given steepness (S) and midpoint (M)

def sigmoid(x, steepness=1, midpoint=0):

ReLU: Computes the output of the ReLU (rectified linear unit) function

def ReLU(x):

For your convenience, the constant e is defined in lab6.py.

Measuring performance with the accuracy function

The accuracy function is used when training the neural net with back propagation. It measures the performance of the neural net as a function of its desired output and its actual output (given some set of inputs). Note that the accuracy function is symmetric -- that is, it doesn't care which argument is the desired output and which is the actual output.

accuracy: Computes accuracy using desired_output and actual_output. If the neurons in the network are using the stairstep threshold function, the accuracy can only be -0.5 or 0.

def accuracy(desired_output, actual_output):

Forward propagation

Notice: Before starting this section, please read the entirety of the API section so that you know what functions and features are available to you!

Next, you'll code forward propagation, which takes in a dictionary of inputs and computes the output of every neuron in a neural net. As part of coding forward propagation, you should understand how a single neuron computes its output as a function of its input: each input into the neuron is multiplied by the weight on the wire, the weighted inputs are summed together, and the sum is passed through a specified threshold function to produce the output.

To compute the output of each neuron in a neural net, iterate over each neuron in the network in order, starting from the input neurons and working toward the output neuron. (Hint: The function net.topological_sort() may be useful.) The algorithm is called forward propagation because the outputs you calculate for earlier neurons will be propagated forward through the network and used to calculate outputs for later neurons.

To help you, we've provided a function node_value, which takes in a node (which can either be an input or a neuron), a dictionary mapping input names (e.g. 'x') to their values, and a dictionary mapping neuron names to their outputs, and returns the output value of the node.

def node_value(node, input_values, neuron_outputs):

For example:

>>> input_values = {'x': 3, 'y': 7}
>>> neuron_outputs = {'Neuron1': 0}
>>> node_value('Neuron1', input_values, neuron_outputs)
0
>>> node_value('y', input_values, neuron_outputs)
7
>>> node_value(-1, input_values, neuron_outputs)
-1
>>>

Implement the method forward_prop:

def forward_prop(net, input_values, threshold_fn=stairstep):

Here, net is a neural network, input_values is a dictionary mapping input variables to their values, and threshold_fn is a function* that each neuron will use to decide what value to output. This function should return a tuple containing

  1. The overall output value of the network, i.e. the output value associated with the output neuron of the network.
  2. A dictionary mapping neurons to their immediate outputs.

The dictionary of outputs is permitted to contain extra keys (for example, the input values). The function should not modify the neural net in any way.

* The threshold_fn argument will be one of the threshold functions you implemented at the beginning of the lab. The astute reader will recognize that each of the three threshold functions take a different number of (and different types of) arguments, so any algorithm leveraging this threshold_fn input can't technically be implementation-agnostic with respect to the threshold function type. However, for this lab, we have decided to simplify your lives by only requiring that you call threshold_fn with the first argument, x.

Backward propagation

Backward propagation is the process of training a neural network using a particular training point to modify the weights of the network, with the goal of improving the network's performance. In the big picture, the goal is to perform gradient ascent on an n-dimensional surface defined by the n weights in the neural net. We compute the update for some weight w_i using the partial derivative of the accuracy with respect to w_i. As a mathematical shortcut, we can calculate delta_B values instead of repeatedly taking derivatives.

Gradient ascent

Conceptually, the idea of the network's performance is abstracted away inside a huge and complex accuracy function, and gradient ascent (or descent, depending on your point of view) is simply a form of hill-climbing used to find the best output possible from the function.

To get a feel for this concept, we will first ask you to implement a very simplified gradient ascent algorithm, to perform a single step of (pseudo-)gradient ascent. gradient_ascent_step should take in a function of three arguments (func), a list of three values representing the three current numerical inputs into the function (inputs), and a step_size which represents how much to perturb each variable:

def gradient_ascent_step(func, values, step_size):

This function should perturb each of the inputs by either +step_size, -step_size, or 0, in every combination possible (a total of 3^3 = 27 combinations), and evaluate the function with each possible set of inputs. Find the assignments that maximize the output of func, then return a tuple containing

  1. the function output at the highest point found, and
  2. the list of variable assignments (input values) that yielded the highest function output.

For example, if the highest point is func(3, 9, 4) = 92, you would return (92, [3, 9, 4]).

Back prop dependencies

Recall from class that in back propagation, calculating a particular weight's update coefficient has dependencies on certain neuron outputs, inputs, and other weights. In particular, updating the weight between nodes A and B requires the output from node A, the current weight on the wire from A to B, the output of node B, and all neurons and weights downstream to the final layer.

Implement a function that takes in a neural net and a Wire object, then returns a set containing all Wires, inputs, and neurons that are necessary to compute the update coefficient for wire's weight. You may assume that the output of each neuron has already been calculated via forward propagation.

def get_back_prop_dependencies(net, wire)

If you're not sure how to approach this function, you can skip ahead to the next section, and/or look at an example in 2015 Quiz 3, Problem 1, Part B.

Basic back propagation

Now let's go over the basic back-propagation algorithm. To perform back propagation on a given training point, or set of inputs:

  1. Use forward propagation with the sigmoid threshold function to compute the output of each neuron in the network.
  2. Compute the update coefficient delta_B for each neuron in the network, starting from the output neuron and working backward toward the input neurons. Note that you may not need to use calculate_back_prop_dependencies, depending on your implementation.
  3. Use the update coefficients delta_B to compute new weights for the network.
  4. Update all of the weights in the network.

You have already coded the forward_propagation routine. To complete the definition of back propagation, you'll define a helper function calculate_deltas for computing the update coefficients delta_B of each neuron in the network, and a function update_weights that retrieves the list of update coefficients using calculate_deltas, then modifies the weights of the network accordingly.

Implement calculate_deltas to return a dictionary mapping neurons to update coefficients (delta_B values):

def calculate_deltas(net, desired_output, neuron_outputs):

Note that this function takes in neuron_outputs, a dictionary mapping neurons to the outputs yielded in one iteration of forward propagation; this is the same dictionary that is returned from forward_prop.


Next, use calculate_deltas to implement update_weights, which performs a single step of back propagation. The function should compute delta_B values and weight updates for entire neural net, then update all weights. The function update_weights should return the modified neural net with appropriately updated weights.

def update_weights(net, input_values, desired_output, neuron_outputs, r=1):


Now you're ready to complete the back_prop function, which repeatedly updates weights in the neural net until the accuracy surpasses the accuracy threshold. back_prop should return a tuple containing:

  1. The modified neural net, with trained weights, and
  2. The number of iterations (that is, the number of times you batch-updated the weights)
def back_prop(net, input_values, desired_output, r=1, minimum_accuracy=-0.001):

Once you finish, you're all done writing code in this lab!

Training a Neural Net

In practice, we would want to use multiple training points to train a neural net, not just one. There are many possible implementations -- for instance, you could put all the training points into a queue and perform back propagation with each point in turn. Alternatively, you could use a multidimensional accuracy function and try to train with multiple training points simultaneously.

In training.py, we've provided code to train a neural net, written by past 6.034 students Joel Gustafson and Kenny Friedman. training.py imports functions that you wrote in lab6.py and generalizes the functions to consider multiple training points, instead of just one.

Example datasets

Here are six example datasets that you could use to train a 2-input neural net, all of which are defined in training.py. In these 2D graphs, each axis represents an input value, and a + or - represent the two possible classifications:

1. A horizontally divided space ("horizontal")

4 - - - - -
3 - - - - -
2 - - - - -
1 + + + + +
0 + + + + +
  0 1 2 3 4

2. A diagonally divided space ("diagonal")

4 + + + + -
3 + + + - -
2 + + - - -
1 + - - - -
0 - - - - -
  0 1 2 3 4

3. A diagonal stripe ("stripe")

4 - - - - +
3 - - - + -
2 - - + - -
1 - + - - -
0 + - - - -
  0 1 2 3 4

4. This patchy checkerboard shape ("checkerboard")

4 - -   + +
3 - -   + +
2        
1 + +   - -
0 + +   - -
  0 1 2 3 4

5. The letter L ("letterL")

4 + -
3 + - 
2 + -
1 + - - - -
0 - + + + +
  0 1 2 3 4

6. This moat-like shape ("moat")

4 - - - - -
3 -       - 
2 -   +   -
1 -       -
0 - - - - -
  0 1 2 3 4

With the correct wiring, it's possible to train a neural net with 6 or fewer neurons to classify any one of the shapes.

What training.py does

  • Provides two fully connected neural nets architectures: get_nn() returns a large [3, 2, 1] neural net, and get_small_nn() returns a smaller [2, 1] neural net.
  • Randomly initializes the weights in a neural net
  • Provides the six encoded training data sets illustrated above
  • Generalizes forward and backward propagation to train on arbitrary datasets of multiple training points by considering multiple data points in parallel.
  • Uses NumPy and Matplotlib (Python libraries) to display the neural net's output in realtime as a heatmap, with a spectrum of colors representing sigmoid output values from 0 (blue) to 1 (red).

How to run training.py

To run the code, you'll need the Python packages Matplotlib and NumPy. If you don't have them, see below. Once you have the packages, you can use either a terminal command line or an interactive Python prompt.

On the command line

Run python2 training.py with up to three optional arguments:

  • -net [large|small]: Selects which neural net configuration to train. small is a two-input, two-layer, three-neuron net, with shape [2, 1]. large is a two-input, three-layer, six-neuron net, with shape [3, 2, 1]. Default: large.
  • -resolution [POSITIVE_INT]: Sets the resolution of the dynamic heatmap -- a resolution of 1 will display a 5x5 grid, and a resolution of 10 will display a 50x50 grid on a 1:10 scale. Be aware that increasing the resolution exponentially increases the time each simulation iteration takes, so resolutions of over 10 are not recommended. Default: 1.
  • -data [diagonal|horizontal|stripe|checkerboard|letterL|moat]: Selects the training dataset from the six shown above. Default: diagonal.

For example, to train the large [3, 2, 1] neural net on the checkerboard dataset, and display the progress with a resolution of 10 pixels per integer coordinate, run:

python2 training.py -data checkerboard -net large -resolution 10

from the command line. Any of the parameters that are omitted will be replaced with their default values. For example,

python training.py

will train the default large net on the diagonal dataset with resolution 1.

At a Python prompt (e.g. IDLE)

Run the file training.py, then call the function start_training() with up to three optional arguments:

  • net = ['small'|'large']: Selects which neural net configuration to train. 'small' is a two-input, two-layer, three-neuron net, with shape [2, 1]. 'large' is a two-input, three-layer, six-neuron net, with shape [3, 2, 1]. Default: 'large'.
  • resolution = POSITIVE_INT: Sets the resolution of the dynamic heatmap -- a resolution of 1 will display a 5x5 grid, and a resolution of 10 will display a 50x50 grid on a 1:10 scale. Be aware that increasing the resolution exponentially increases the time each simulation iteration takes, so resolutions of over 10 are not recommended. Default: 1.
  • data = ['diagonal'|'horizontal'|'stripe'|'checkerboard'|'letterL'|'moat']: Selects the training dataset from the six shown above. Default: 'diagonal'.

For example, to train the large [3, 2, 1] neural net on the checkerboard dataset, and display the progress with a resolution of 10 pixels per integer coordinate, call:

start_training(data='checkerboard', net='large', resolution=10)

from the command line. Any of the parameters that are omitted will be replaced with their default values. For example,

start_training()

will train the default large net on the diagonal dataset with resolution 1.

What if I don't have Matplotlib and NumPy?

If you don't have Matplotlib and NumPy installed, you can:

  • install them (just Google them)
  • install a stand-alone Python distribution that comes with the packages and won't interfere with your current Python installation (e.g. Anaconda or Python(x,y))
  • work with a friend, running the code on their computer
  • use an Athena cluster computer (the Athena version of Python 2 should include Matplotlib and NumPy)
  • use Athena locally via ssh -X (which enables Athena to display GUI windows, including colored plots, on your screen):
   ssh username@athena.dialup.mit.edu -X

Your task: Multiple-choice questions based on training.py

Questions 1-5: How many iterations?

For each of the combinations of neural nets and datasets below, try training the neural net on the dataset a few times. Then, fill in the appropriate ANSWER_n with an int to answer the question: When the neural net didn't get stuck, how many iterations did it generally take to train?

(The tester will check that your answer is in the right range of numbers, based on the repeated trials that we ran. Note that all the answers are expected to be under 200; if the neural net seems to be stuck or is taking more than 200 steps to train, feel free to abort by type Ctrl+C or the equivalent Keyboard Interrupt.)

You may do this with any resolution; the resolution shouldn't affect the number of steps required for back prop to converge. Higher resolution makes it easier to see how the neural net is dividing up the space, but it also greatly increases the real time required for training because it has to perform forward propagation to compute the color output for each cell.

Question 1: small neural net, diagonal dataset

Question 2: medium neural net, diagonal dataset

Question 3: large neural net, diagonal dataset


Question 4: medium neural net, checkerboard dataset

Question 5: large neural net, checkerboard dataset

Questions 6-9: Identifying parameters

Suppose that after training for 200 iterations, the neural net heatmap looks like this: (todo insert image)

Question 6: What is the training resolution? (Fill in ANSWER_6 with an int.)

Question 7: Of the six datasets, which one is the neural net probably being trained on? (Fill in ANSWER_7 with a string.)

Question 8: Which neural net could be producing the heatmap? (Fill in ANSWER_8 with a list of one or more strings, choosing from 'small', 'medium', and 'large. For example: ['small', 'large'])

Question 9: What is likely the state of the simulation? (Fill in ANSWER_9 with a one-letter string representing the one best answer, e.g. 'A'.)

A. Training is complete, and the data is fully classified.
B. Training is stuck at a local maximum.
C. The neural net is overfitting to the data.
D. The neural net is classifying three classes of points.

Questions 10-12: Conceptual Questions

Question 10: Why does the diagonal dataset generally take less iterations to train than the checkerboard dataset? (Fill in ANSWER_10 with a letter representing the one best answer.)

A. Diagonal lines are easier for neural nets to draw than horizontal or vertical lines.
B. The neural nets tended to overfit to the checkerboard data more than to the diagonal data.
C. The neural nets tended to underfit to the checkerboard data more than to the diagonal data.
D. The diagonal data requires less lines than the checkerboard data.
E. The diagonal data was more constrained than the checkerboard data.

Question 11: The large neural net generally requires less iterations to train than the small or medium neural nets. What are some reasons why it might not be the best choice for these datasets? (Fill in ANSWER_11 with a list of one-letter strings, representing all answers that apply.)

A. It might overfit the data because there are too many parameters.
B. It might underfit the data because it spends too few iterations training.
C. It takes more time to compute each iteration.
D. Because there are more parameters, it's more likely to get stuck on a local maximum.

Question 12: You may have noticed that the neural net often either converged on a solution quickly, or got stuck with a partial solution early on and had trouble escaping from the local maximum. Suppose you're training an arbitrary neural net on an arbitrary dataset, and it just got stuck at a local maximum. Which of the following changes would be likely to help it to not get stuck the next time? (Fill in ANSWER_12 with a list of one-letter strings, representing all answers that apply, e.g. ['A', 'B', 'C'].)

A. Restart training with different randomly initialized weights.
B. Re-train multiple times using the same initial weights.
C. Use the current weights as the initial weights, but restart training.
D. Use less neurons.
E. Use more neurons.

Survey

Please answer these questions at the bottom of your lab6.py file:

  • NAME: What is your name? (string)
  • COLLABORATORS: Other than 6.034 staff, whom did you work with on this lab? (string, or empty string if you worked alone)
  • HOW_MANY_HOURS_THIS_LAB_TOOK: Approximately how many hours did you spend on this lab? (number or string)
  • WHAT_I_FOUND_INTERESTING: Which parts of this lab, if any, did you find interesting? (string)
  • WHAT_I_FOUND_BORING: Which parts of this lab, if any, did you find boring or tedious? (string)
  • (optional) SUGGESTIONS: What specific changes would you recommend, if any, to improve this lab for future years? (string)


(We'd ask which parts you find confusing, but if you're confused you should really ask a TA.)

When you're done, run the online tester to submit your code.

API

Neural Nets

The file neural_net_api.py defines the Wire and NeuralNet classes, described below.

NeuralNet

A neural net is represented as a directed graph defined by a set of edges. In particular, the topology of the neural net is enforced by the edges, each of which defines the placement of the nodes and neurons.

In our case, each edge of a neural net is a Wire object, and each node of a neural net is either

  • an input, representing a constant or variable value that's fed into the input layer of the neural net, or
  • a neuron, which conceptually takes in values via wires and amalgamates them into an output.

There is not a dedicated class for nodes in our implementation of neural nets. Instead, different types of nodes may be represented in different ways:

  • An input node is either represented by
    • a string denoting its variable name (e.g. "x" represents the variable x), if the input is a variable input, or
    • a raw number denoting its constant value (e.g. 2.5 represents the constant input 2.5), if the input is a constant input.
  • A neuron node is represented as a string denoting its name, e.g. "N1" or "AND-neuron". Note that these strings have no semantic meaning or association to the neuron's function or position in the neural net; the strings are only used as unique identifiers so that Wire objects (edges) know which neurons they are connected to.

As a consequence of how Wire objects store start and end nodes, no variable input node may have the same name as a neuron node.


A NeuralNet instance has the following attributes:

  • net.inputs, a list of named input nodes to the network.
  • net.neurons, a list of named neuron nodes in the network.

In this lab, input values (for non-constant inputs) are supplied to neural nets in the form of a dictionary input_values that associates each named (variable) input with an input value.


You can retrieve particular nodes (neurons or inputs) in a network:

  • net.get_incoming_neighbors(node). Return a list of the nodes which are connected as inputs to node.
  • net.get_outgoing_neighbors(node). Return a list of the nodes to which node sends its output.
  • net.get_output_neuron(). Return the output neuron of the network, which is the final neuron that computes a response. In this lab, each neural net has exactly one output neuron.
  • net.topological_sort(). Return a sorted list of all the neurons in the network. The list is "topologically" sorted, which means that each neuron appears in the list after all the neurons that provide its inputs. Thus, the input layer neurons are first, the output neuron is last, etc.


You can also retrieve the wires (edges) of a neural net:

  • net.get_wires(startNode=None, endNode=None). Return a list of all the wires in the network. If startNode or endNode are provided, returns only wires that start/end at the particular nodes.


Finally, you can query specific parts of the network:

  • net.is_output_neuron(neuron). Return True if the neuron is the final output neuron in the network, otherwise False.
  • net.is_connected(startNode, endNode). Return True if there is a wire from startNode to endNode in the network, otherwise False.


Wire

A Wire is represented as a weighted, directed edge in a graph. A wire can connect an input to a neuron or a neuron to another neuron. A Wire's attributes are:

  • wire.startNode, the input or neuron at which the wire starts. Recall that an input can be a string (e.g. "x") or a number (e.g. 2.5).
  • wire.endNode, the neuron at which the wire ends.

In addition, one can access and modify a Wire's weight:

  • wire.get_weight(), returns the weight of the wire.
  • wire.set_weight(new_weight), sets the weight of the wire and returns the new weight.
Personal tools