Institut für Informatik, Universität Freiburg


Project2: Action/Skill Prediction Module


Part I - Collecting data



Function:

Implement a module that allows the prediction of the temporal outcome of actions/skills given the current state of the map. Generally it is assumed that a skill is defined by an assignment of K agents to L targets on the map, e.g. five fire brigades are assigned to ten burning buildings. It is furthermore assumed that the agents are already situated at the target position and all targets are valid (pre-condition), that the success can be expressed by a Boolean value (post-condition).

Given a vector of summed state variables S (e.g. ground area, # floors, material distribution, amount of fire/water) of an area F and the number of assigned agents K, the module learns to predict the number of cycles N the skill needs to complete the task, e.g. P(F=extinguished, N | S, K). It is important that the module also learns from cases available during the execution of the agents, e.g. P(F,N | St+1, K), 
P(F, N | St+2,K), ..., P(F, N | St+N-1,K).


Implementation:
The module has to provide an abstract interface for defining an agent specific learning tasks:


Given preCondition(S,A) returns true, the module has to automatically maintain a cycle counter for S,A that terminates if postCondition(S,A) returns true. The module furthermore has to merge/cluster skills if necessary, e.g. two fire brigades extinguishing at the same fire cluster, however also has to learn from single object experiences (e.g. two agents extinguish one building at the same time).

In order to check pre-conditions efficiently, the module has to maintain a (hierarchical) clustering that returns all possible clusters of the object type returned by getObjectType() and also a function that returns all agents assigned to these clusters. Collected experiences have to be continuously appended to a file for following evaluations (Part II). Each line in the file represents an positive experience <S1, S2, S3,...Sn, K, N>, where S1, S2, ..., Sn are the random variables returned by getStateVarIterator(). Files have to be named according to getObjectTypeName().


Note that pre- and post-conditions might not be simple as they look like at the first glance. For example a cluster of burning buildings should be considered as extinguished iff there is no potential of expansion. Thus it is important, in the case of fires, to introduce a state variable that expresses the fire's potential of expansion, which also allows distinguishing different cases more efficiently. From our former experiences we know, that the expansion potential is invariant to the building distribution/properties around the fire (except there are no buildings at all). Therefore it probably suffices to consider the number of houses in danger (e.g. houses that might be ignited) around the fire cluster.

Generate manually some data sets and visualize some counted frequencies by histograms in order to get an idea of the inherent correlations.

                   

Part II - Function approximation (incomplete)


Approach:
Naive Bayes, ANNs, Tile Coding, Decision Trees and RBF networks (see:
Machine Learning T. Michell, A Modern Approach to AI Russell Norvig)

 

Part III - Evaluation
 
Implementation:
In order to collect an sufficient amount of training and test data, it will be necessary to automate simulation runs. This includes to randomize the map and to restart the server. Have a look to the "autorun" tool from the PAKRescue team under http://www.damas.ift.ulaval.ca/projets/RobocupRescue/Tools.php and to the "random map generator" from the BlackSheep team under http://sourceforge.net/project/showfiles.php?group_id=87802.

Given the labelled data (cluster states with the measured time), perform an evaluation of the predictive capability of your chosen function approximator.


Deliverables:
A graph that documents the learning performance of the function approximator.
A graph that documents the prediction error of the approximator for varying cluster sizes.
A graph that documents the prediction error of the approximator for varying numbers of agents.



Part IV - Advanced topics (optional)

Suppose F={F1,F2,...Fn} is a set of N clusters, e.g. fire clusters, and C={{a1},{a1,a2},  {a1,a2,a3}, ... {a1,...,an}} are all possible cooperations among agents A. Let Si be the corresponding state variable of cluster Fi and Q(Si,Ci) the expected cumulative reward for assigning cooperation Ci to state Si (which is equal to -summ_over_all_N(P(F,N | Si, card(Ci)) if a elapsed cycle is rewarded with -1).

Extend the approach towards SMDP learning which takes the agents' travel time, the possibility of reassignments during skill execution and the overall outcome into account.


kleiner@informatik.uni-freiburg.de, 20. October 2003