Team Empowerment

This research is the subject of a paper submitted for publication at The 2nd International Conference on Interactive Collaborative Robotics (SPECOM/ICR 2017)

Empowerment as a Generic Utility Function for Agents in a Simple Team Sport Simulation

Marcus Clements and Daniel Polani
Adaptive Systems Research Group, School of Computer Science,
University of Hertfordshire, Hatfield, UK

Abstract Players in team sports cooperate in a coordinated manner to achieve common goals. Automated players in academic and commercial team sports simulations have tradi- tionally been driven by complex externally motivated value functions with heuristics based on knowledge of game tactics and strategy. Empowerment is an information-theoretic mea- sure of an agent’s potential to influence its environment, which has been shown to provide a useful intrinsic value function, without the need for external goals and motivation, for agents in single agent models. In this paper we expand on the concept of empowerment to propose the concept of team empowerment as an intrinsic, generic utility function for coop- erating agents. We show that agents motivated by team empowerment exhibit recognizable team behaviors in a simple team sports simulation based on Ultimate Frisbee.

This page includes diagrams and screen captures of the software model used to investigate Team Empowerment. It is intended as supporting material for readers of the above paper.

A few years ago I played Ultimate Frisbee for the first time. As a beginner without an armoury of learned behaviours, trying to contribute to team success whilst keeping track of the movements of the players, and the flight of the disc, is a challenging cognitive process. When I read in All Else Being Equal Be Empowered (Klyubin et al., 2005), how empowerment could be colloquially described as “keep your options open” my early Frisbee strategy came to mind. I was attracted by the idea that intelligent behaviour could emerge in agents in a computer model, motivated only by the application of an elegant mathematical theory of communication.

Marcus Clements, MxCog

Calculating empowerment by constructing a tree of possible future states

The empowerment calculation was performed by constructing a graph of possible future states where the nodes of the graph represent states and the vertices represent actions. The recursive algorithm which constructs the graph keeps totals of terminal nodes (end states) as it builds the graph resulting in a total of possible states when the recursion unrolls back to level 0.

The main findings were encapsulated into repeatable Junit tests. These tests, as well as demonstrating the findings, combined with model-checking tests were run during development to ensure that changes to the code did not change the behaviour of the model in undesirable ways.

Fig 1. Main: Graph of states and actions for move, pickup disc and throw. Inset: Plan view of move, pick-up, throw. Only a subset of throws are shown for clarity.
Fig 2. Screen capture of the JAVA development environment (JetBrains IntelliJ Idea 15) showing the state tree in an experiment with two players and the disc. Note that each node has a textual representation of the complete state of the model.

Videos of the simulation in Online Mode

The simulation prunes the state tree according to the configurable settings visible in the user interface. Without the pruning the simulation is too slow to watch in real-time due to the explosion of states.
NB. The results of the experiment as outlined in the paper are demonstrated by the offline tests, where none of the pruning options are used and the full state tree is assembled. The optimisations are only available to expedite rapid investigation and discovery in online mode.

The simulation's user interface proviodes the following controls:

  • Scenario A number of pre-configured scenarios are provided and may be selected by the user in the dropdown box under "Scenario"
  • Disc Velocity Number of squares the disc moves per time step.
  • Include team moves [Tree pruning optimisation] If the checkbox is not ticked, the players teammates move options are not considered (pruned) from the tree.
  • Throw range [Tree pruning optimisation] The dynamics of the model and the nature of empowerment ensure that with a sufficient lookahead players will always throw towards a teammate. This optimisation reduces the possible throws to within the configured number of squares of the teammates position.
  • Empowerment lookahead Number of steps ahead included in the empowerment calculation.
Fig 3. Video screen capture of the simulation running with two players on two teams. The empowerment lookahead is sufficient that when a player is in possession of the disc the action with the highest empowerment is to throw to a location where the teammate can catch it. However, because of the arrangement of the players, the opposing players are close enough to intercept and catch the disc. Interceptions happen repeatedly until a blue player steps far enough away that they do not intercept. Passing between the green team ensues and their proximity makes it unlikely that another interception will occur for some time.

References

Klyubin, A., Polani, D., Nehaniv, C.: Empowerment: A universal agent-centric measure of control. Evolutionary Computation, The 2005 IEEE Congress on 1, 128–135 (2005)