5 contestants participated in the First Planning Systems Competition:
In the second round, IPP ran with the RIFO meta strategy doing the following: If more than 3500 ground instantiated actions and 35 objects are contained in the domain, RIFO is activated with its most stringent pruning strategy (corresponding to IPP switches -rr 1 -rm 4 -rl 3). If the planner cannot solve the planning problem within the pruned search space, the space is enlarged (corresponding to RIFO switches -rr 1 -rm 4 -rl 1). If the planning problem is still unsolvable, IPP tries to find a solution in the original search space. To prune search spaces by hand edit file ipp.c, set DO_META to FALSE, and follow instructions on RIFO options from the RIFO homepage.
Note that several planners used a termination test and were able to prove problems from the mystery and mprime test sets as unsolvable. These were not counted as solved in the competition, i.e. solved only refers to generated plans for solvable problems. IPP could prove several problems as unsolvable, which we verified by repetition of the tests (they cannot be extracted from the competition output). We plan to add some information about the search space size and number of possible actions for each problem.
Av. time is the total time that was spent to generate all plans divided by the number of plans that was found. Fastest means that no other planner found the same plan in shorter time and shortest means that no other planner found a shorter plan, but there can be planners that found a plan of equal length. In this case, all planners receive a point. All times are given in seconds.
| Planner | Total Time | Solved | Fastest | Shortest |
| IPP | 22 (1421) | 69 | 68 | 68 |
| SGP | 575 | 38 | 1 | 35 |
For ADL we list the total time spent on the
problems. IPP solved the 38 problems that solved SGP plus 31 more
difficult problems from the gripper, mystery, and mprime domains. No
system solved any of the assembly problems, where IPP ran out of space
because the ADL preprocessing generated several millions of
instantiated actions. We show the total runtime for both systems for
the 38 problems (where IPP is more than 25 times faster than SGP) and
in braces the total runtime IPP needed to solve the other 31
problems. In total, IPP generated 662 actions with an average search
time of 0.45 seconds per generated action. Even if IPP is written in C
and SGP is written in LISP, IPP 3.3 is still significantly faster than
SGP. Click on the figure above to see how IPP and SPG compare in the
movie domain. Only on movie5, SPG could outperform IPP by 10
milliseconds. But the 30 movie runtimes of both systems
are too small to allow for useful comparison.
The following table compares SGP and IPP on the remaining problems in
the ADL set that both solved showing planning time in seconds and plan length (in braces):
| Problem | SGP | IPP |
| mystery1 | 2.2 (4) | 0.1 (4) |
| mystery2 | 203.3 (14) | 12.2 (9) |
| mystery3 | 3.5 (4) | 1.0 (4) |
| mprime1 | 7.1 (9) | 1.1 (11) |
| mprime2 | 266.2 (12) | 1.5 (9) |
| mprime3 | 22.9 (5) | 3.2 (5) |
| mprime4 | 39.7 (17) | 1.9 (11) |
| gripper1 | 0.4 (11) | 0.04 (11) |
IPP was also able to prove 6 problems in the mystery domain as unsolvable when repeating the competition tests on our machines under 10 minutes runtime restriction and 120 Mb memory restriction.
| Planner | Av. Time | Solved | Fastest | Shortest | |
| IPP | 7.4 | 63 | 29 | 49 | |
| Blackbox | 1.5 | 63 | 16 | 55 | |
| HSP | 35.5 | 82 | 19 | 61 | |
| Stan | 55.4 | 64 | 24 | 47 |
This picture shows the runtime behavior of the
four planners in the STRIPS version of the Gripper domain. The Y-axis
lists the number of balls that have to be transported by a robot with
two gripper hands. The columns show how far each planner scaled and
how much runtime was spent. This domain was originally designed to
motivate the use of resources in IPP because the STRIPS representation
leads to a combinatorial explosion in the search space. All planner
obviously suffer from this explotion, with the exception of HSP that
does quite well. A plot of its runtime
data seems to indicate non-exponential runtime (quadratic?).
Interestingly, HSP plans only transport one ball at a time leading to
lots of unnecessary move actions, while STAN and BLACKBOX generate
optimal solutions. IPP has been run with RIFO switched on, which
excludes one gripper as irrelevant, ie. non-optimal plans are found
using only the left gripper. The plot
of STAN's, IPP's, and BLACKBOX' runtime shows how hard STAN hits the
combinatorial explosion, but IPP and BLACKBOX don't even provide data
on the harder problem instances.
The movie domain was no challenge for any of
the planners running in the STRIPS track. Since the goals remain
constant over all 30 test problems, but only the number of objects to
choose from changed, planning time was not an issue in this test set,
but the systems just spent a slightly increasing amount of time on
generating more action instances and building larger planning
graphs. In fact, this domain contains lots of irrelevant
information. RIFO detects that in all problems only 9 of the up to 174
actions and 5 of the up to 170 original initial facts are relevant.
In the picture, we had to divide HSP's runtime by 100 to allow
for a meaningful plot. Even if the system is much slower than the
others, it seems to scale almost linearly on the problem instances. Closer inspection of IPP's solution file revealed that the
system was not run on the movie8 problem, which was probably
accidentally skipped because the shellscript was not used all the
time.
The logistics domain was quite difficult for all planners as the following table shows listing runtimes in seconds and in braces the length of the generated solution. Out of the 30 test problems, only between 3 and 5 problems were solved by the planners.
| Problem | BB | HSP | STAN | IPP |
| log1 | 2.0 (27) | 79.7 (43) | 0.8 (27) | 0.9 (26) |
| log2 | 6.4 (32) | 97.1 (44) | 4.3 (32) | - |
| log5 | - | 14.4 (27) | 364.9 (29) | 2.4 (24) |
| log7 | - | 788.9 (112) | - | - |
| log11 | 6.5 (32) | 86.2 (30) | 12.8 (34) | 6.9 (33) |
The mystery and mprime domains were slightly easier for all planners. We show the number of plans that IPP found in a domain, the total runtime to generate them, the total number of actions that were generated, and the number of problems that can be proven as unsolvable by IPP when repeating the results on a SPARC I Ultra (which is almost exactly twice as slow as the PCs used in the competition.) Unsolvability could be proven during graph construction because the goals remained exclusive after the graph has leveled off and no search was necessary.
| Domain | solved | total time | # actions | unsolvable proven |
| mprime | 13 | 132.8 | 111 | 1 |
| mystery | 13 | 66.1 | 87 | 8 |
| Planner | Av. Time | Solved | Fastest | Shortest | |
| IPP | 17.3 | 11 | 3 | 8 | |
| Blackbox | 2.5 | 8 | 3 | 6 | |
| HSP | 25.8 | 9 | 1 | 5 | |
| Stan | 1.3 | 7 | 5 | 4 |
Note that IPP could solve more problems than any other planner, but was never the fastest planner. The 3 points in the fastest category result from the fact that no other planner solved these problems. This is caused by the RIFO strategy to prune the search space that is currrently quite time consuming. We show again runtime and plan length (in braces) on the various problems:
| Problem | BB | HSP | STAN | IPP |
| log1 | 0.5 (13) | 4.3 (13) | 0.1 (13) | 0.2 (13) |
| log2 | 0.6 (20) | 5.2 (27) | 0.2 (20) | 0.3 (20) |
| log3 | 0.9 (27) | 10.5 (32) | 0.4 (27) | 0.6 (27) |
| log4 | - | 170.4 (59) | - | - |
| log5 | - | - | - | 66.5 (31) |
| mprime1 | 0.9 (5) | 7.2 (4) | 0.9 (5) | 4.1 (5) |
| mprime2 | 2.5 (7) | 12.8 (8) | 4.8 (9) | 2.6 (8) |
| mprime4 | 1.7 (5) | 8.3 (4) | 0.4 (5) | 11.4 (4) |
| mprime5 | 0.5 (6) | 4.6 (6) | - | 2.9 (6) |
| grid1 | 12.1 (14) | 9.3 (24) | 2.5 (14) | 4.6 (20) |
| grid2 | - | - | - | 13.9 (31) |
| grid4 | - | - | - | 84.3 (47) |
The influence of the RIFO strategy on the search space reduction is shown in the following table. It lists the number of ground actions and objects in the original problem description and in braces following the length of the plan if IPP could find one given a 10 minutes cpu time limit and a 120 Mb memory limit as in the competition, but on a SPARC 1/170 (which is approx. twice as slow.) Then we list the number of selected ground actions and objects after the strong and weaker RIFO selection process. (#) means the planning problem became unsolvable, - means RIFO was inactive, (-) means no plan was found because the planner either exceeded the cpu time or memory limit. The weaker RIFO heuristic is activated when either the strong heuristic makes a planning problem unsolvable as e.g. in the case of the grid3 problem or when the number of ground actions remains below the threshold parameter of 3500 and only the number of objects exceeds 35 as in the grid1 problem. A number in braces means shows the number of actions in the generated plan.
| problem | original | RIFO strong | RIFO weak |
| log1 | 571/25 (13) | - | - |
| log2 | 502/21 (20) | - | - |
| log3 | 958/26 (27) | - | - |
| log4 | 3561/42 (-) | 189/30 (-) | - |
| log5 | 4985/53 (-) | 119/26 (31) | - |
| mprime1 | 7809/36 (5) | 7/9 (#) | 11/9 (#) |
| mprime2 | 3281/32 (8) | - | - |
| mprime3 | 97259/68 (-) | - | - |
| mprime4 | 8485/42 (5) | 7/10 (4) | - |
| mprime5 | 6773/22 (6) | - | - |
| grid1 | 2610/38 (14) | - | 80/11 (20) |
| grid2 | 4501/50 (-) | 69/19 (#) | 260/19 (31) |
| grid3 | 7256/64 (-) | 315/21 (#) | 557/21 (-) |
| grid4 | 11151/80 (-) | 135/24 (47) | - |
| grid5 | 16240/98 (-) | 1481/49(-) | - |
As the analysis shows, only 8 problems can be solved without RIFO, but in using the meta strategy 3 more solutions are found. The meta strategy combines only 2 out of the 12 selection heuristics. Which combinations work for which examples has to be found out by experimentation. In the competition, no such experimentation was possible and therefore the selection of the heuristics was done long before the competition simply following the intuition that one should try the strongest possible heuristic first because it leads to the smallest search space, but then relax it by allowing more actions when incompleteness occurs. Only the threshold parameters that decided whether RIFO is activated at all have been set to 3500 actions and 35 objects after a few trials on some of the competition problems.