Lab 4 2014

From 6.034 Wiki

(Difference between revisions)
Jump to: navigation, search
Line 3: Line 3:
This problem set is due Friday, November 5th. If you have questions about it, ask the list ''6034tas@csail.mit.edu''.
This problem set is due Friday, November 5th. If you have questions about it, ask the list ''6034tas@csail.mit.edu''.
-
<!-- commented out until release FT2010 -kp
+
<!--commented out until release FT2010 -kp
To work on this problem set, you will need to get the code:
To work on this problem set, you will need to get the code:
Line 10: Line 10:
* Or, on Athena, <tt>attach 6.034</tt> and copy it from <tt>/mit/6.034/www/labs/lab4/</tt>.
* Or, on Athena, <tt>attach 6.034</tt> and copy it from <tt>/mit/6.034/www/labs/lab4/</tt>.
-
This lab has two parts, the first part is on CSPs and the second Part is on Learning algorithms: KNN and decision trees.
+
This lab has two parts, the first part is on CSPs and the second part is on learning algorithms, specifically KNN and decision trees.
== Constraint Satisfaction Problems ==
== Constraint Satisfaction Problems ==
-
Your task here is simple.  Complete the implementation of a simple constraint satisfaction problem solver! Then test it on
+
This portion of Lab 4 is the implementation of a constraint satisfaction problem solver, using the algorithms we've learned in class. You even have the opportunity to test it on problems you've seen worked out in class.
-
problems you've seen worked out in the class. I could just imagine your excitement and anticipation already!
+
-
We have provided you a basic CSP implementation in <tt>csp.py</tt>.  Your job is to use your understanding of the algorithms you've learned to complete the implementation.
+
We have provided you a basic CSP implementation in <tt>csp.py</tt>.
-
You will need to fill in
+
You will need to complete:
<pre>
<pre>
forward_checking(state):
forward_checking(state):
Line 27: Line 26:
forward_checking_prop_singleton(state):
forward_checking_prop_singleton(state):
</pre>
</pre>
-
in <tt>lab4.py</tt>.  <tt>state</tt> is an instance of <tt>CSPState</tt> an object that keep track of the current variable assignments and domains.
+
in <tt>lab4.py</tt>.  <tt>state</tt> is an instance of <tt>CSPState</tt> an object that keep track of the current variable assignments and domains.  Both these functions should return False at the point at which the Domain Reduction Algorithm would backtrack, and True otherwise.
-
For review here is roughly a sketch of how these two constraint checking algorithms work.
+
For review, here is (unrefined) pseudocode for the two algorithms.
=== Forward Checking ===
=== Forward Checking ===
# Let X be the variable currently being assigned.
# Let X be the variable currently being assigned.
# Let x be the value being assigned to X.
# Let x be the value being assigned to X.
-
# Finds all the binary constraints that are associated with X.
+
# Find all the binary constraints that are associated with X.
-
# For each constraint
+
# For each constraint:
-
 
+
## For each neighbor variable, Y, connected to X by a binary constraint.
## For each neighbor variable, Y, connected to X by a binary constraint.
### For each variable value y in Y's domain
### For each variable value y in Y's domain
#### If constraint checking fails for X=x and Y=y
#### If constraint checking fails for X=x and Y=y
##### Remove y from Y's domain
##### Remove y from Y's domain
-
#### If the domain of Y is reduced down to the empty set, then the entire check fails return False.
+
#### If the domain of Y is reduced down to the empty set, then the entire check fails: return False.
-
# If all constraints passed declare success and return True
+
# If all constraints passed declare success, return True
=== Forward Checking with Propagation through Singletons ===
=== Forward Checking with Propagation through Singletons ===
Line 48: Line 46:
# Find variables with domains of size 1.
# Find variables with domains of size 1.
# Build a queue of singleton variables.
# Build a queue of singleton variables.
-
# For each singleton variable in the queue
+
# For each singleton variable in the queue:
-
## Finds all the binary constraints that singleton X is associated with.
+
## Find all the binary constraints that singleton X is associated with.
-
## For each constraint
+
## For each constraint therein:
-
### For each neighbor variable, Y, connected to X by a binary constraint.
+
### For each neighbor variable, Y, connected to X by a binary constraint:
-
#### For each value of y in Y's domain
+
#### For each value of y in Y's domain:
-
##### If constraint check fails for X = (X's singleton value) and Y = y
+
##### If constraint check fails for X = (X's singleton value) and Y = y:
-
###### Remove y from the domain of Y for values
+
###### Remove y from Y's domain
-
##### If the domain of Y is reduced down to the empty set, then the entire check fails return False.
+
##### If the domain of Y is reduced down to the empty set, then the entire check fails, return False.
-
## Check if new unvisited singletons are introduced; if so add them to your queue.
+
## Check to see if domain reduction produced new, unvisited singletons; if so, add them to your queue.
-
## Stop when there are no more singletons.
+
## Discontinue when all singleton domains produced have been evaluated, and if no domains have been reduced to the empty set, return True.
-
=== Useful Functions ===
+
=== API ===
-
Here are some useful functions defined in <tt>csp.py</tt>
+
These are some useful functions defined in <tt>csp.py</tt>
that you should use in your code to implement the above algorithms:
that you should use in your code to implement the above algorithms:
-
<tt>Variable</tt>: representation of a variable in the problem
+
<tt>CSPState</tt>: one of the many possible search states (nodes) in the CSP problem.
 +
* <tt>get_current_variable()</tt> - gets the name of the variable being currently assigned.  Returns None if this is the initial variable assignment.
 +
* <tt>get_constraints_by_name(variable_name)</tt> - retrieves all the BinaryConstraint objects associated with <tt>variable_name</tt>.
 +
* <tt>get_variable_by_name(variable_name)</tt> - retrieves the Variable object associated with <tt>variable_name</tt>.
 +
* <tt>get_all_variables()</tt> - gets the list of all Variable objects in this CSP problem.
 +
 
 +
<tt>Variable</tt>: representation of a variable in these problems.
* <tt>get_name()</tt> - returns the name of this variable.
* <tt>get_name()</tt> - returns the name of this variable.
-
* <tt>is_assigned()</tt> - returns true if we've made an assignment for this variable.
+
* <tt>is_assigned()</tt> - returns True if we've made an assignment for this variable.
-
* <tt>get_domain()</tt> - returns a copy of the current domain of this variable.  Use this to iterate over values of Y.
+
* <tt>get_domain()</tt> - returns a copy of the list of the current domain of this variable.  Use this to iterate over values of Y.
* <tt>reduce_domain(value)</tt> - remove <tt>value</tt> from this variable's domain.
* <tt>reduce_domain(value)</tt> - remove <tt>value</tt> from this variable's domain.
* <tt>domain_size()</tt> - returns the size of this variable's domain
* <tt>domain_size()</tt> - returns the size of this variable's domain
-
 
-
<tt>CSPState</tt>: one of the many possible search states (nodes) in the CSP problem
 
-
* <tt>get_current_variable()</tt> - gets the name of the variable being currently assigned.
 
-
* <tt>get_constraints_by_name(variable_name)</tt> - retrieves all the BinaryConstraint objects associated with <tt>variable_name</tt>.
 
-
* <tt>get_variable_by_name(variable_name)</tt> - retrieves the Variable object associated with <tt>variable_name</tt>
 
-
* <tt>get_all_variables()</tt> - gets the list of all Variable objects in this CSP problem.
 
<tt>BinaryConstraint</tt>:  a binary constraint on variable i, j:  i -> j.
<tt>BinaryConstraint</tt>:  a binary constraint on variable i, j:  i -> j.
* <tt>get_variable_i_name()</tt> - name of the i variable
* <tt>get_variable_i_name()</tt> - name of the i variable
* <tt>get_variable_j_name()</tt> - name of the j variable
* <tt>get_variable_j_name()</tt> - name of the j variable
-
* <tt>check(state, value_i=value, value_j=value)</tt> - checks the binary constraint for a given CSP state, with variable i set by value i, and variable j set by value j.  Returns false if the constraint fails.
+
* <tt>check(state, value_i=value, value_j=value)</tt> - checks the binary constraint for a given CSP state, with variable i set by value i, and variable j set by value j.  Returns False if the constraint fails.
=== Testing ===
=== Testing ===
Line 99: Line 97:
</pre>
</pre>
-
Should return the expected search trees for the B,Y,R, state coloring problem from the 2006 Quiz 2.
+
Should return the expected search trees for the B,Y,R, state coloring problem from the 2nd Quiz in 2006.
== LEARNING ==
== LEARNING ==
Line 107: Line 105:
== Classifying Congress ==
== Classifying Congress ==
-
During Obama's visit to MIT you got a chance to impress him with your analytical thinking.  Now he has hired you to do some political modeling for him.  He seems to surround himself with smart people that way.
+
During Obama's visit to MIT, you got a chance to impress him with your analytical thinking.  Now, he has hired you to do some political modeling for him.  He seems to surround himself with smart people that way.
He takes a moment out of his busy day to explain what you need to do. "I need a better way to tell which of my plans are going to be supported by Congress," he explains. "Do you think we can get a model of Democrats and Republicans in Congress, and which votes separate them the most?"
He takes a moment out of his busy day to explain what you need to do. "I need a better way to tell which of my plans are going to be supported by Congress," he explains. "Do you think we can get a model of Democrats and Republicans in Congress, and which votes separate them the most?"
-
"Yes we can!", you answer.
+
"Yes, we can!" You answer.
-
== The data ==
+
== The Data ==
You acquire the data on how everyone in the previous Senate and House of Representatives voted on every issue. (These data are available in machine-readable form via <tt>voteview.com</tt>. We've included it in the lab directory, in the files beginning with <tt>H110</tt> and <tt>S110</tt>.)
You acquire the data on how everyone in the previous Senate and House of Representatives voted on every issue. (These data are available in machine-readable form via <tt>voteview.com</tt>. We've included it in the lab directory, in the files beginning with <tt>H110</tt> and <tt>S110</tt>.)
Line 128: Line 126:
The lab file reads in the provided data, storing them in the variables <tt>senate_people</tt>, <tt>senate_votes</tt>, <tt>house_people</tt>, and <tt>house_votes</tt>.
The lab file reads in the provided data, storing them in the variables <tt>senate_people</tt>, <tt>senate_votes</tt>, <tt>house_people</tt>, and <tt>house_votes</tt>.
-
== Nearest neighbors ==
+
== Nearest Neighbors ==
You decide to start by making a nearest-neighbors classifier that can tell Democrats apart from Republicans in the Senate.
You decide to start by making a nearest-neighbors classifier that can tell Democrats apart from Republicans in the Senate.
We've provided a <tt>nearest_neighbors</tt> function that classifies data based on training data and a distance function. In particular, this is a third-order function:
We've provided a <tt>nearest_neighbors</tt> function that classifies data based on training data and a distance function. In particular, this is a third-order function:
-
* First you call <tt>nearest_neighbors(distance, k)</tt>, with <tt>distance</tt> being the distance function you wish to use and <tt>k</tt> being the number of neighbors to check. This returns a ''classifier factory''.
+
* First, call <tt>nearest_neighbors(distance, k)</tt>, with <tt>distance</tt> being the distance function you wish to use and <tt>k</tt> being the number of neighbors to check. This returns a ''classifier factory''.
* A classifier factory is a function that makes classifiers. You call it with some training data as an argument, and it returns a classifier.
* A classifier factory is a function that makes classifiers. You call it with some training data as an argument, and it returns a classifier.
* Finally, you call the classifier with a data point (here, a Congressperson) and it returns the classification as a string.
* Finally, you call the classifier with a data point (here, a Congressperson) and it returns the classification as a string.
Line 148: Line 146:
Examine the results of this evaluation. In addition to the problems caused by independents, it's classifying Senator Johnson from South Dakota as a Republican instead of a Democrat, mainly because he missed a lot of votes while he was being treated for cancer. This is a problem with the distance function -- when one Senator votes yes and another is absent, that is less of a "disagreement" than when one votes yes and the other votes no.
Examine the results of this evaluation. In addition to the problems caused by independents, it's classifying Senator Johnson from South Dakota as a Republican instead of a Democrat, mainly because he missed a lot of votes while he was being treated for cancer. This is a problem with the distance function -- when one Senator votes yes and another is absent, that is less of a "disagreement" than when one votes yes and the other votes no.
-
You should fix this. Euclidean distance is a reasonable measure for the distance between lists of discrete numeric features. Recall that the formula for Euclidean distance is:
+
You should address this. Euclidean distance is a reasonable measure for the distance between lists of discrete numeric features, and is the alternative to Hamming distance that you decide to try. Recall that the formula for Euclidean distance is:
''[(x1 - y1)^2 + (x2 - y2)^2 + ... + (xn - yn)^2] ^ (1/2)''
''[(x1 - y1)^2 + (x2 - y2)^2 + ... + (xn - yn)^2] ^ (1/2)''
Line 167: Line 165:
* Did this legislator vote NO on this vote, or not?
* Did this legislator vote NO on this vote, or not?
-
These are different because it is possible for a legislator to abstain or be absent.
+
(These are different because it is possible for a legislator to abstain or be absent.)
You can also use <tt>CongressIDTree</tt> directly to make an ID tree over the entire data set.
You can also use <tt>CongressIDTree</tt> directly to make an ID tree over the entire data set.

Revision as of 20:02, 23 October 2010


This problem set is due Friday, November 5th. If you have questions about it, ask the list 6034tas@csail.mit.edu.