Institut für
Informatik, Universität
Freiburg
Project2: Action/Skill Prediction Module
Part I - Collecting data
Function:
Implement a module that allows the prediction of the
temporal
outcome of
actions/skills given the current state of the map. Generally it is
assumed that a skill is defined by an assignment of K agents to L targets on the map, e.g. five
fire brigades are assigned to ten burning buildings. It is furthermore
assumed that the agents are already situated at the target position and
all targets are valid (pre-condition), that the success can be
expressed by a Boolean value (post-condition).
Given a vector of summed state variables S
(e.g. ground area, # floors, material distribution, amount of
fire/water) of an area F and
the number of assigned agents K,
the module learns to predict the number of cycles N the skill needs to complete the
task, e.g. P(F=extinguished, N |
S, K). It is
important that the module also learns from cases available during the
execution of the agents, e.g. P(F,N | St+1, K), P(F, N | St+2,K), ..., P(F, N | St+N-1,K).
Implementation:
The module has to provide
an abstract interface for defining an agent specific learning tasks:
- public abstract
boolean preCondition(vector s, vector a);
- public abstract
boolean postCondition(vector s, vector a);
- public abstract
Iterator getStateVarIterator();
- public abstract int
getObjectType(); // Returns the object type of interest
- public abstract int
getObjectTypeName(); // Returns the object type name
Given preCondition(S,A) returns true, the module has to
automatically maintain a cycle counter for S,A that terminates if postCondition(S,A) returns true. The module furthermore has to
merge/cluster skills if necessary, e.g. two fire brigades extinguishing
at the same fire cluster, however also has to learn from single object
experiences (e.g. two agents extinguish one building at the same time).
In order to check pre-conditions efficiently, the module has to
maintain a (hierarchical) clustering that returns all possible clusters
of the object type returned by getObjectType()
and also a function that returns all agents assigned to these
clusters. Collected
experiences have to be continuously appended to a file for following
evaluations (Part II). Each line in the file represents an positive
experience <S1, S2, S3,...Sn, K, N>,
where S1, S2, ..., Sn are the random variables returned by getStateVarIterator(). Files have
to be named according to getObjectTypeName().
Note that pre- and post-conditions might not be simple as they
look like at the first glance. For example a cluster of burning
buildings should be considered as extinguished iff there is no
potential of expansion. Thus it is important, in the case of fires, to
introduce a state variable that expresses the fire's potential of
expansion, which also allows distinguishing different cases more
efficiently. From our former experiences we know, that the expansion
potential is invariant to the building distribution/properties around
the fire (except there are no buildings at all). Therefore it probably
suffices to consider the number of houses in danger (e.g. houses that
might be ignited) around the fire cluster.
Generate manually some data sets and visualize some counted frequencies
by histograms in order to get an idea of the inherent correlations.
Part II -
Function approximation (incomplete)
Approach:
Naive
Bayes, ANNs, Tile Coding, Decision Trees and RBF networks (see: Machine Learning T. Michell, A Modern Approach to
AI Russell Norvig)
Part III -
Evaluation
Implementation:
In order to collect an sufficient amount of training
and test data, it will be necessary to automate simulation runs. This
includes to randomize the map and to restart the server. Have a look to
the "autorun" tool from the PAKRescue team under
http://www.damas.ift.ulaval.ca/projets/RobocupRescue/Tools.php and to
the "random map generator" from the BlackSheep team under
http://sourceforge.net/project/showfiles.php?group_id=87802.
Given the labelled data (cluster states with the measured time),
perform an evaluation of the predictive capability of your chosen
function approximator.
Deliverables:
A graph that documents the learning performance of the function
approximator.
A graph that documents the prediction error of the
approximator for varying cluster sizes.
A graph that documents the prediction error of the
approximator for varying numbers of agents.
Part IV -
Advanced topics (optional)
Suppose
F={F1,F2,...Fn} is a set of N clusters, e.g. fire clusters, and C={{a1},{a1,a2}, {a1,a2,a3}, ...
{a1,...,an}} are all possible cooperations among agents A. Let Si
be the corresponding state variable of cluster Fi and Q(Si,Ci)
the expected cumulative reward for assigning cooperation Ci to state Si (which is equal to -summ_over_all_N(P(F,N
| Si, card(Ci)) if a elapsed cycle is
rewarded with -1).
Extend the approach towards SMDP learning which takes the agents'
travel time, the possibility of reassignments during skill execution
and the overall outcome into account.
kleiner@informatik.uni-freiburg.de,
20. October 2003