Lab 1: Search

From 6.034 Wiki

(Difference between revisions)
Jump to: navigation, search
(Extending a path)
(Graphs for Testing)
Line 43: Line 43:
== Graphs for Testing ==
== Graphs for Testing ==
-
Here are some sample graphs that are used in the tester and which you can use to test and debug your functions.  The graphs have been imported into lab2.py by their names -- that is, if you want to access GRAPH_2, it's stored in the variable <tt>GRAPH_2</tt>.
+
Here are some sample graphs that are used in the tester and which you can use to test and debug your functions.  The graphs have been imported into <tt>lab2.py</tt> by their names -- that is, if you want to access GRAPH_2, it's stored in the variable <tt>GRAPH_2</tt>.
[http://web.mit.edu/jmn/www/6.034/lab2-graphs_0123.jpg GRAPH_0, GRAPH_1, GRAPH_2, GRAPH_3]
[http://web.mit.edu/jmn/www/6.034/lab2-graphs_0123.jpg GRAPH_0, GRAPH_1, GRAPH_2, GRAPH_3]
(todo: typeset graphs and add to wiki)
(todo: typeset graphs and add to wiki)
 +
 +
The graphs come from the files <tt>graph.txt</tt> and <tt>read_graphs.py</tt>, which you should not need to read.
== Python advice ==
== Python advice ==

Revision as of 16:21, 18 August 2015

Contents


Your answers for the problem set belong in the main file lab2.py.

API

Graphs and Edges

The file search.py contains definitions for graphs and edges. The relevant information is described here, so you probably won't need to read the file.

A graph is an object of type UndirectedGraph.

For an UndirectedGraph g, you can use the following attributes:

  • You can get a list of all nodes in the graph via the .nodes attribute.
  • You can get a list of all edges in the graph via the .edges attribute. todo

For an UndirectedGraph g, you can use the following methods:

  • g.get_neighbors(node1): Given a node node1, returns a list of all nodes that are directly connected to node1.
  • g.get_neighboring_edges(node1): Given a node node1, returns a list of all edges that have node1 as their startNode.
  • g.get_edge(node1, node2): Given two nodes, returns the edge that directly connects node1 to node2 (or None if there is no such edge)
  • g.is_neighbor(node1, node2): Given two nodes, returns True if there is an edge that connects them, or False if there is no such edge
  • g.get_heuristic_value(node1, goalNode): Given two nodes node1 and goalNode, returns the heuristic estimate of the distance from node1 to goalNode.

An edge is an object of type Edge. It has the following attributes:

  • .startNode: a node
  • .endNode: a node
  • .length: a positive number, or None if the edge has no length specified

For this lab, you may assume that edge weights will always be positive numbers (or None).

You can check whether two edges e1 and e2 are the same with e1 == e2.

Graphs for Testing

Here are some sample graphs that are used in the tester and which you can use to test and debug your functions. The graphs have been imported into lab2.py by their names -- that is, if you want to access GRAPH_2, it's stored in the variable GRAPH_2.

GRAPH_0, GRAPH_1, GRAPH_2, GRAPH_3

(todo: typeset graphs and add to wiki)

The graphs come from the files graph.txt and read_graphs.py, which you should not need to read.

Python advice

  • You will undoubtably need to sort Python lists during this lab (using either the in-place .sort method or sorted). Python has built-in sorting functionality, documented at <http://wiki.python.org/moin/HowTo/Sorting>.
  • You will need to know how to access elements in lists and dictionaries. For some portions of this lab, you may want to treat lists like either stacks or queues, as documented at <http://docs.python.org/tut/node7.html>. However, you should not import other modules (such as deque) for this purpose because they will confuse the tester.

Search

The Agenda

Different search techniques explore nodes in different orders, and we will keep track of the nodes remaining to explore in a list we will call the agenda (sometimes called the queue). Some search techniques will add paths to the front (or top) of the agenda, treating it like a stack, while others will add to the back (or end) of the agenda, treating it like a queue. Some agendas are organized by heuristic value, others are ordered by path distance, and others by depth in the search tree. Your job will be to show your knowledge of search techniques by implementing different types of search and making slight modifications to how the agenda is accessed and updated.

Extending a Path

In this lab, a path consists of a list of node names. When it comes time to extend a new path, a path is selected from the agenda. The last node in the path is identified as the node to be extended. The nodes that connect to the extended node -- the adjacent, or neighboring, nodes -- are the possible extensions to the path. Of the possible extensions, the following nodes are NOT added:

  • nodes that already appear in the path
  • nodes that have already been extended (iff an extended set is being used)

As an example, if node A is connected to nodes S, B, and C, then the path ['S', 'A'] is extended to the new paths ['S', 'A', 'B'] and ['S', 'A', 'C'] (but not ['S', 'A', 'S']).

The paths you create should be new objects. If you try to extend a path by modifying (or "mutating") the existing path -- for example by using list .append() -- you may find yourself in a quagmire.

Extended Set

An extended set -- sometimes called an "extended-nodes set", "extended list", "visited set", or "closed list" -- consists of nodes that have been extended. It lets the algorithm avoid extending the same node multiple times, sometimes significantly speeding up search. Some of the types of search you will be implementing use extended sets. Python offers other options for representing sets, which may help you avoid this issue. The main point is to check that nodes are not in the set before you extend them, and to put nodes into the extended-set when you do choose to extend them.

When to Exit a Search

A non-optimal search such as DFS, BFS, hill climbing and beam may exit either:

  • when it finds a path-to-goal in the agenda
  • when a path-to-goal is first removed from the agenda.

Optimal searches such as branch and bound and A* must always exit:

  • when a path-to-goal is first removed from the agenda.

(This is because the agenda is ordered by path length (or heuristic path length), so a path-to-goal is not necessarily the best when it is added to the agenda, but when it is removed, it is guaranteed to have the shortest path length (or heuristic path length).)

For the sake of consistency, you should implement all your searches to exit:

  • when a path-to-goal is first removed from the agenda.

Problems

Part 1: Helper Functions

Your first task is to write four helper functions which may be useful in Part 2:

def path_length(graph, path):

def has_loops(path):

def extensions(graph, path):

def sort_by_heuristic(graph, goalNode, nodes):

A path is a list of nodes where each neighboring pair is connected by an edge in the graph. By convention, the start of the path is at index [0] and the end of the path is at index [-1].

path_length(graph, path) should return the length of a path. You may assume that the path is a valid sequence of edge-connected nodes in the graph. If there is only one node in the path, your function should return 0.

has_loops(path) should return True if the path contains a loop, otherwise False. (Hint: One efficient solution involves using a set.)

extensions(graph, path) should take a path and return a list of possible one-node extensions of that path. That is, your function should return a list of all possible paths that can be created by adding one more node to the given path (as described in Extending a Path, above).

sort_by_heuristic(graph, goalNode, nodes) should takes a list of any nodes in the graph and sort them based on their heuristic value to the goalNode, from smallest to largest. The function should return the sorted list. (Hint: Use the Python keyword sorted with the "key" argument.)

Part 2: Search Algorithms

In this section, you will implement each of the search algorithms we study in 6.034. (Alternatively, you may skip to Part 3: Generic Search and complete Part 2 by calling generic_search with the arguments from Part 3.)

The inputs to each search function are:

  • graph: The graph, as an UndirectedGraph object
  • startNode: The node that you want to start traversing from
  • goalNode: The node that you want to reach

Each algorithm should return an appropriate path from startNode to goalNode, or None if no such path is found. A path should consist of a list of nodes, beginning with the start node and following existing edges to the goal node.

Blind Search

The first two types of search to implement are depth-first search and breadth-first search. You should allow backtracking.

def dfs(graph, startNode, goalNode):

def bfs(graph, startNode, goalNode):

For an explanation of DFS and BFS, refer to Chapter 4 of the textbook (pages 4-6 of the pdf, pages 65-67 of the textbook).

Heuristic Search

Next, you will implement three types of heuristic search.

Hill Climbing

Hill climbing is very similar to depth-first search. There is only a slight modification to the ordering of paths that are added to the agenda. Implement the following function:

def hill_climbing(graph, startNode, goalNode):

The hill-climbing procedure you define here should use backtracking, for consistency with the other methods, even though hill climbing typically is not implemented with backtracking. Hill climbing does not use an extended set.

For an explanation of hill climbing, refer to Chapter 4 of the textbook (page 8 of the pdf, page 70 of the textbook).

Best-First Search

Best-first search is similar to hill climbing in that it uses a heuristic, but best-first search always extends the "best" path, as determined by heuristic values to the goal. Like hill climbing and beam search, best-first search sorts paths by only the heuristic value, not the path length (as branch and bound does) or the path length + heuristic value (as A* does). Implement the following function:

def best_first(graph, startNode, goalNode):

For a brief explanation of best-first search, refer to Chapter 4 of the textbook (page 13 of the pdf, page 75 of the textbook).

Beam Search

Beam search is very similar to breadth-first search, but there is a modification to which paths are in the agenda. The agenda at any time can have up to w paths of each length n; w is also known as the beam width, and n corresponds to the level of the search graph. Beam search can be useful in situations where you want to reduce memory usage, especially when working with a large graph.

You will need to sort your paths by the graph's heuristic to ensure that only the best w paths at each level are in your agenda. You may want to use an array or dictionary to keep track of paths of different lengths. Beam search does NOT use an extended set, and does NOT use backtracking to the paths that are eliminated at each level.

Remember that w is the number of paths to keep at an entire level, not the number of paths to keep from each extended node.

Implement the following function:

def beam(graph, startNode, goalNode, beam_width):

Unlike the other search functions, beam has an additional argument, beam_width, which is the same as w, the number of paths to keep at each level.

For an explanation and an example of beam search, refer to Chapter 4 of the textbook (pages 13-14 of the pdf, pages 75-76 of the textbook). However, note that this example is incorrect according to the way we do beam search: At the third level, the node B (with heuristic value 6.7, at the end of path S-D-E-B) should not be included -- only C and F should be included (because the beam width is 2) -- so at the fourth level, B should not be extended to A and C.

Optimal Search

The search techniques you have implemented so far have not taken into account the edge lengths. Instead, we were just trying to find one possible solution of many. Optimal searches try to find the path with the shortest distance from the start node to the goal node. The search types that guarantee optimal solutions are branch and bound and A* (under certain conditions).

Branch and Bound

Using the path_length helper function, this part should be easy to complete. You might find the list procedure remove, and/or the Python 'del' keyword, useful (though not necessary). Implement basic branch and bound with no extended set:

def branch_and_bound(graph, startNode, goalNode):

For an explanation of branch-and-bound, refer to Chapter 5 of the textbook (page 3 of the pdf).

Branch and Bound Variations

Now you've used heuristic estimates to speed up search, and you've used edge lengths to compute optimal paths. If you add a heuristic to branch and bound, the path with the least {heuristic estimate + path length} is taken from the agenda to extend. Implement this variation of branch and bound (using a heuristic but no extended set):

def branch_and_bound_with_heuristic(graph, startNode, goalNode):

Next, implement a variation on branch and bound that uses an extended set (but no heuristic):

def branch_and_bound_with_extended_set(graph, startNode, goalNode):

A*

A* is simply branch and bound using both a heuristic and an extended set. (Note: If the heuristic is not consistent, then using an extended-set can sometimes prevent A* from finding an optimal solution.) Implement A*:

def a_star(graph, startNode, goalNode):

Part 3: Generic Search

generic_search is a generic search algorithm that has already been written for you:

def generic_search(sort_new_paths_fn, add_paths_to_front_of_agenda, sort_agenda_fn, use_extended_set):

Arguments for generic_search

generic_search takes in two functions and two boolean values as arguments, and it returns a function representing a search algorithm. By defining appropriate values for the arguments, you can recreate any of the search algorithms we study in 6.034.

Here's an explanation of each of the arguments in generic_search:

  • When you extend a node, you create a list of new paths. Some algorithms will sort these paths before adding them to the agenda. The argument sort_new_paths_fn is a function that takes the list of new paths and performs the appropriate sorting routine; it returns the sorted list of new paths. If you don't want any sorting to happen, you can pass a function that simply returns the list of paths without changing it. The function do_nothing_fn accomplishes this.
  • Some algorithms will add new paths to the front of the agenda (like a stack). Others will add new paths to the back of the agenda (like a queue). When the argument add_paths_to_front_of_agenda is True, paths will get added to the front of the agenda. Otherwise, paths will get added to the back.
  • After new paths are added to the agenda, some algorithms will then sort the entire agenda. The argument sort_agenda_fn takes the agenda as an argument, and performs the appropriate sorting routine; it returns a sorted agenda. If you don't want any sorting to happen, you can pass a function that simply returns the list of paths without changing them (do_nothing_fn). Note that if you do sort the agenda, then the previous two arguments become irrelevant (it no longer matters whether you added new paths to the front, or whether you sorted them before adding them to the agenda).
  • Some algorithms use an extended set. Set use_extended_set to True to maintain an extended set, else set it to False.

Examples for using generic_search

To create a search algorithm that does not sort new paths, adds paths to the front of the agenda, does not sort the agenda, and does not use an extended set, you can call generic_search like this:

my_search_fn = generic_search(do_nothing_fn, True, do_nothing_fn, False)

You can then call your search algorithm on a graph:

my_path = my_search_fn(graph, startNode, goalNode)

You can also call generic_search directly on a list of arguments (such as generic_dfs) like this:

path = generic_search(*generic_dfs)(graph, startNode, goalNode)

For beam search, the search algorithm produced by generic_search will take an additional argument, beam_width. For example:

my_beam_fn = generic_search(...) # where "..." represents your four arguments for beam search

my_beam_path = my_beam_fn(graph, startNode, goalNode, beam_width)

If you want to test your search algorithm on a provided graph, you can use one of the graphs shown in Graphs for Testing, above. For example:

path = generic_search(*generic_dfs)(graph1, 'S', 'G')

Your Task

Please implement the following search algorithms by designing the right arguments to pass to the generic search algorithm. Your answer to each should be an ordered list of the appropriate four arguments to generic_search. No argument should be None.

generic_dfs = [None, None, None, None]
generic_bfs = [None, None, None, None]
generic_hill_climbing = [None, None, None, None]
generic_best_first = [None, None, None, None]
generic_beam = [None, None, None, None]
generic_branch_and_bound = [None, None, None, None]
generic_branch_and_bound_with_heuristic = [None, None, None, None]
generic_branch_and_bound_with_extended_set = [None, None, None, None]
generic_a_star = [None, None, None, None]

You will need to write your own path-sorting functions. Each sorting function should take in a graph, goalNode, and list of paths, and then return a list of paths. For example: sorted_paths = my_sorting_fn(graph, goalNode, paths)

The only exception is the sort_agenda_fn for beam search, which should take a fourth argument, beam_width. For example: sorted_beam_agenda = my_beam_sorting_fn(graph, goalNode, paths, beam_width)

Break ties lexicographically. (For example, the path ['a', 'c', 'f'] comes before the path ['a', 'd', 'b'], because 'c' comes before 'd'.) We recommend that you avoid modifying the original list of paths in your sorting function.

Hint for beam search: You can implement beam search in a way almost identical to bfs. The only difference is that beam search will need to limit the number of paths whenever an entire level has been expanded. You can do this by defining a sort_agenda_fn function that checks whether an entire level has been extended. If it has, the function should limit the number of paths in the agenda. If it hasn't, the function should do nothing to the agenda. To tell whether an entire level has been expanded, check whether all paths in the agenda have the same number of nodes. Use the variable beam_width to refer to the limiting size of the agenda.

As taught in 6.034, depth-first search and breadth-first search use backtracking but do not use an extended set. Refer to Part 2 above for more details about individual search methods.

Part 4: Counting Extensions

generic_search_counter is identical to the generic_search algorithm used in Part 3. Your task is to slightly modify the search_algorithm code in generic_search_counter so that instead of returning a path to the goal, the search algorithm returns the number of extensions it had to make.

By convention, please include the final extension of the goal node, if any, in your extension count.

Hint: One possible solution involves replacing the extended set with a list.

Once you have made your changes, remove this line from generic_search_counter so that the tester can run your code:

    raise NotImplementedError # TODO: REMOVE THIS!

Part 5: Heuristics

Admissibility and Consistency

A heuristic value gives an approximation from a node to a goal. You've learned that in order for the heuristic to be admissible, the heuristic value for every node in a graph must be less than or equal to the distance of the shortest path from the goal to that node. In order for a heuristic to be consistent, for each edge in the graph, the edge length must be greater than or equal to the absolute value of the difference between the two heuristic values of its nodes.

For this part, complete the following functions, which return True iff the heuristics for the given goal node are admissible or consistent, respectively, and False otherwise:

def is_admissible(graph, goalNode):

def is_consistent(graph, goalNode):

Hint: For is_consistent, it's sufficient to check only neighboring nodes, rather than checking all pairs of nodes in the graph.

Picking a Heuristic

The next questions use the following graph:

GRAPH_FOR_HEURISTICS

(todo: typeset graph and add to wiki)

For heuristic_1, pick heuristic values so that, for the goal node G, the heuristic is both admissible and consistent. Fill in the values by replacing the list of five None's with five numbers:

[h1_S, h1_A, h1_B, h1_C, h1_G] = [None, None, None, None, None]

For heuristic_2, pick heuristic values so that, for the goal node G, the heuristic is admissible but NOT consistent.

For heuristic_3, pick heuristic values so that, for the goal node G, the heuristic is admissible but A* returns a non-optimal (not shortest) path to the goal (S-A-C-G).

For heuristic_4, pick heuristic values so that, for the goal node G, the heuristic is admissible but NOT consistent, yet A* still finds the optimal (shortest) path to the goal (S-B-C-G).

Euclidean Heuristic

Search can be applied both to abstract problems like finding recipes, and to concrete problems like finding a route on a map. When you are finding a route on a map, you may have nodes that look like the following, where each node has spatial coordinates:

node_F = {'coordinates' : [2.5, 3.5], 'label' : 'F'}

Whenever you are using spatial coordinates, "Euclidean distance to the goal" can be a useful heuristic. Fill in the following function, which takes two nodes with spatial coordinates as input, and returns the Euclidean distance between them:

def euclidean_heuristic(startNode, goalNode):

You can test your heuristic function with the sample graph below by changing the boolean flag "if False:" to "if True:"

euclidean_graph

(todo: typeset graph and add to wiki)

Survey

Please answer the following questions:

NAME: What is your name? (string)

COLLABORATORS: Other than 6.034 staff, whom did you work with on this lab? (string, or empty string if you worked alone)

HOW_MANY_HOURS_THIS_LAB_TOOK: Approximately how many hours did you spend on this lab? (number)

WHAT_I_FOUND_INTERESTING: Which parts of this lab, if any, did you find interesting? (string, or empty string if none)

WHAT_I_FOUND_BORING = Which parts of this lab, if any, did you find boring or tedious? (string, or empty string if none)

(optional) SUGGESTIONS = What specific changes would you recommend, if any, to improve this lab for future years? (string)


Personal tools