From 6.034 Wiki
This problem set will be due on Thursday, October 2, at 11:59pm.
To work on this problem set, you will need to get the code, much like you did for the first two problem sets.
- You can view it at: http://web.mit.edu/6.034/www/labs/lab2/
- Download it as a ZIP file: http://web.mit.edu/6.034/www/labs/lab2/lab2.zip
- Or, on Athena, add 6.034 and copy it from /mit/6.034/www/labs/lab2/.
Your answers for the problem set belong in the main file lab2.py.
How our system implements graphs, nodes, edges, and heuristics
The file search.py contains definitions for graphs, nodes, edges, and heuristics. You probably won't need to read it, because the relevant information is described below.
A graph is an object of type Graph.
- You can get a list of all nodes in the graph via the .nodes attribute.
- You can get a list of all edges in the graph via the .edges attribute.
- You can get a list of the neighbors of a node via graph.get_connected_nodes(node) : Given a node, it returns a list of all nodes that are directly connected to it.
- You can get an object representing the edge between two nodes via graph.get_edge(node1, node2) : Given two nodes, it returns the edge that directly connects them (or None if there is no such edge).
- You can get a heuristic estimate of distance using graph.get_heuristic(node1, node2). This is implemented as follows: Each graph contains a dictionary of estimated distances between its nodes. Given two nodes node1 and node2, the method graph.get_heuristic(node1, node2) looks up and returns the estimated distance between those two nodes (or returns 0, if no estimate is found).
A node is just a string — the name of the node.
An edge is an object of type Edge. It has attributes .name (a string), .length (a number), .node1 (a string, the node at one end of the edge), and .node2 (a string, the node at the other end of the edge).
The heuristic estimates are handled by the Graph object; see above.
The following methods are not as essential as the ones already mentioned, but you may still find them helpful:
- graph.are_connected(node1, node2): Return True iff there is an edge running directly between node1 and node2; False otherwise.
- graph.is_valid_path(path): Given 'path' as an ordered list of node names, return True iff there is an edge between every two adjacent nodes in the list, False otherwise.
- You will undoubtably need to sort Python lists during this lab (using either the in-place .sort method or sorted). Python has built-in sorting functionality, documented at <http://wiki.python.org/moin/HowTo/Sorting>. If you read that document, note that Solaris-Athena computers (the purple Athena computers, not the black ones) might still have an older version of Python prior to v2.4 installed.
- You will need to know how to access elements in lists and dictionaries. For some portions of this lab, you may want to treat lists like either stacks or queues, as documented at <http://docs.python.org/tut/node7.html>. However, you should not import other modules (such as deque) for this purpose because they will confuse the tester.
Different search techniques explore nodes in different orders, and we will keep track of the nodes remaining to explore in a list we will call the agenda (in class we called this the queue). Some techniques will add paths to the top of the agenda, treating it like a stack, while others will add to the back of the agenda, treating it like a queue. Some agendas are organized by heuristic value, others are ordered by path distance, and others by depth in the search tree. Your job will be to show your knowledge of search techniques by implementing different types of search and making slight modifications to how the agenda is accessed and updated.
Extending a path in the agenda
In this problem set, a path consists of a list of node names. When it comes time to extend a new path, a path is selected from the agenda. The last node in the path is identified as the node to be extended. The nodes that connect to the extended node, the adjacent nodes, are the possible extensions to the path. Of the possible extensions, the following nodes are NOT added:
- nodes that already appear in the path.
- nodes that have already been extended (if an extended-nodes set is being used.)
As an example, if node A is connected to nodes S, B, and C, then the path ['S', 'A'] is extended to the new paths ['S', 'A', 'B'] and ['S', 'A', 'C'] (but not ['S', 'A', 'S']).
The paths you create should be new objects. If you try to extend a path by modifying (or "mutating") the existing path, for example by using list .append(), you will probably find yourself in a world of hurt.
The Extended-Nodes Set
An extended-set, sometimes called an "extended list" or "visited set" or "closed list", consists of nodes that have been extended, and lets the algorithm avoid extending the same node multiple times, sometimes significantly speeding up search. You will be implementing types of search that use extended-sets. Note that an extended-nodes set is a set, so if, e.g., you're using a list to represent it, then be careful that a maximum of one of each node name should appear in it. Python offers other options for representing sets, which may help you avoid this issue. The main point is to check that nodes are not in the set before you extend them, and to put nodes into the extended-set when you do choose to extend them.
Returning a Search Result
A search result is a path which should consist of a list of node names, ordered from the start node, following existing edges, to the goal node. If no path is found, the search should return an empty list, .
Exiting the search
Non-optimal searches such as DFS, BFS, Hill-Climbing and Beam may exit either:
- when it finds a path-to-goal in the agenda
- when a path-to-goal is first removed from the agenda.
Optimal searches such as branch and bound and A* must always exit:
- When a path-to-goal is first removed from the agenda.
(This is because the agenda is ordered by path length (or heuristic path length), so a path-to-goal is not necessarily the best when it is added to the agenda, but when it is removed, it is guaranteed to have the shortest path length (or heuristic path length).)
For the sake of consistency, you should implement all your searches to exit:
- When a path-to-goal is first removed from the agenda.
This section contains the first graded questions for the problem set. The questions are located in lab2.py and let you check your knowledge of different types of search. You should, of course, try to answer them before checking the answers in the tester.
Optional Warm-up: Breadth-first Search and Depth-first Search
This section is optional. You should do it if you want to get a better understanding of the basics of search.
The first two types of search to implement are breadth-first search and depth-first search. You should allow backtracking for the search.
Your task is to implement the following functions:
def bfs(graph, start, goal): def dfs(graph, start, goal):
The inputs to the functions are:
- graph: The graph
- start: The name of the node that you want to start traversing from
- goal: The name of the node that you want to reach
When a path to the goal node has been found, return the result as explained in the section Returning a Search Result (above).
Hill climbing is very similar to depth first search. There is only a slight modification to the ordering of paths that are added to the agenda. For this part, implement the following function:
def hill_climbing(graph, start, goal):
The hill-climbing procedure you define here should use backtracking, for consistency with the other methods, even though hill-climbing typically is not implemented with backtracking. Hill-climbing does not use an extended-set.
For an explanation of hill-climbing, refer to Chapter 4 of the textbook (page 8 of the pdf, page 70 of the textbook).
Beam search is very similar to breadth first search, but there is a modification to what paths are in the agenda. The agenda at any time can have up to k paths of each length n; k is also known as the beam width, and n corresponds to the level of the search graph. You will need to sort your paths by the graph heuristic to ensure that only the best k paths at each level are in your agenda. You may want to use an array or dictionary to keep track of paths of different lengths. Beam search does NOT use an extended-set, and does NOT use backtracking to the paths that are eliminated at each level.
Remember that k is the number of paths to keep at an entire level, not the number of paths to keep from each extended node.
Also remember that beam search sorts paths by just the heuristic value, not the path length (as branch-and-bound does) or the path length + heuristic value (as A* does).
For this part, implement the following function:
def beam_search(graph, start, goal, beam_size):
For an explanation and an example of beam search, refer to Chapter 4 of the textbook (pages 13-14 of the pdf). However, this example is incorrect according to the way we do beam search: at the third level, the node B (with heuristic value 6.7) should not be included, only C and F should be included (since the beam width is 2), so at the fourth level, B should not be extended to A and C.
The search techniques you have implemented so far have not taken into account the edge distances. Instead we were just trying to find one possible solution of many. This part of the problem set involves finding the path with the shortest distance from the start node to the goal node. The search types that guarantee optimal solutions are branch and bound and A*.
Since this type of problem requires knowledge of the length of a path, implement the following function that computes the length of a path:
def path_length(graph, node_names):
The function takes in a graph and a list of node names that make up a path in that graph, and it computes the length of that path, according to the "LENGTH" values for each relevant edge. You can assume the path is valid (there are edges between each node in the graph), so you do not need to test to make sure there is actually an edge between consecutive nodes in the path. If there is only one node in the path, your function should return 0.
Branch and Bound
Now that you have a way to measure path distance, this part should be easy to complete. You might find the list procedure remove, and/or the Python 'del' keyword, useful (though not necessary). For this part, complete the following:
def branch_and_bound(graph, start, goal):
Branch-and-bound does not use an extended-set.
For an explanation of branch-and-bound, refer to Chapter 5 of the textbook (page 3 of the pdf).
You're almost there! You've used heuristic estimates to speed up search and edge lengths to compute optimal paths. Let's combine the two ideas now to get the A* search method. In A*, the path with the least (heuristic estimate + path length) is taken from the agenda to extend. A* always uses an extended-set -- make sure to use one. (Note: If the heuristic is not consistent, then using an extended-set can sometimes prevent A* from finding an optimal solution.)
def a_star(graph, start, goal):
A heuristic value gives an approximation from a node to a goal. You've learned that in order for the heuristic to be admissible, the heuristic value for every node in a graph must be less than or equal to the distance of the shortest path from the goal to that node. In order for a heuristic to be consistent, for each edge in the graph, the edge length must be greater than or equal to the absolute value of the difference between the two heuristic values of its nodes.
In lecture and tutorials, you've seen examples of graphs that have admissible heuristics that were not consistent. Have you seen graphs with consistent heuristics that were not admissible? Why is this so? For this part, complete the following functions, which return True iff the heuristics for the given goal are admissible or consistent, respectively, and False otherwise:
def is_admissible(graph, goal): def is_consistent(graph, goal):
Please answer these questions at the bottom of your lab2.py file:
- How many hours did this problem set take?
- Which parts of this problem set, if any, did you find interesting?
- Which parts of this problem set, if any, did you find boring or tedious?
(We'd ask which parts you find confusing, but if you're confused you should really ask a TA.)
Complaint: All of the search functions we have to implement are so similar.
Response: They're similar, but you should understand in detail how each one works. If you want, you can try implementing a single search function (like the flow-chart shown in lecture), and then implementing each of the individual searches as a call to this function with different parameters.
Complaint: Most of these searches are non-optimal and not even used in practice.
Response: The goal is to teach you the theory behind search, not just the best way to do it.
Complaint: This lab is tedious to debug.
Response: Make sure to read the problem description thoroughly, and feel free to ask the TAs for help.
If you find what you think is an error in the problem set, tell email@example.com about it.
Q: For SAQG, my implementation of depth-first search finds the path SG, but the tester only accepts SAG and SQG.
A: This is based on some assumptions about the order in which you're traversing the search tree. If you use the graph.get_connected_nodes function, you'll end up getting them in the right order.
Q: For AGRAPH, my implementation of A* finds the path SBACG (which has a length of 13), but the tester says it should be SACG (which has a length of 14). Isn't A* always supposed to find the shortest path?
A: Not necessarily. Make sure you're using your extended-list correctly.
Q: One of the A* tests is failing because it times out after 1 minute.
A: Try submitting. The local tests run slower on some machines, which can cause them to time out. If the test also fails when you submit online, you'll need to make your A* implementation more efficient.