Numbers -- the Quantitative Side of War Game Analysis
Multivariate Statistical Methods
Questionnaires with Multiple Choices or Likert Scales
Players (and other participants) may be asked to complete questionnaires that indicate personal preferences in some regards, e.g., attractiveness of some course of action. As with questionnaires in other realms, these can be subjected to various statistical methods, see Feedback.
Scoring of Tasks and Using Radar Scope Graphs
One numerical method used for war games is STAMPER, for "Systematic Task Analysis for Measuring Performance and Evaluating Risk". See a paper and a presentation by Eugenia Kalantzis from ISMOR 2004. In this method, the Study Team working with military subject matter experts developed a task list, organized in three levels of refinement, see diagram.
The Study Team used five operational functions at the highest level, namely Command, Sense, Act, Shield, and Sustain. These were decomposed into Level 1, Level 2, and Level 3 sub-tasks with progressively more detailed descriptions.
During the progress of the game, players were periodically asked to 'score' effectiveness at the lowest level of granularity (Level 3). The Level 3 results were combined for Level 2 results, and these were combined in turn for Level 1 results. These Level 1 results were normalized onto a scale from 0 to 100.
At a summary level the normalized scores on five axes (corresponding to the five operational functions) were displayed on a so called 'spider web' or 'radar scope' diagram. The example shown here has results from two scenarios, one for operations of a brigade in open terrain (played on Day 1) and one for the same brigade in urban terrain (played on Day 2).
Since numerical scores of this sort are largely meaningless in absolute terms, they are best displayed where comparisons can be relative to each other for contrasting situations.
In this example, we can see that the brigade appeared to be somewhat more successful in open terrain than in urban terrain in the Command and Sense functions (and about the equal for the other three functions). The short lines of sight due to urban obstructions created problems with communications (a component of Command) and with sensor detections. However, the conclusions were more nuanced than this simple explanation.
Apart from the radar scope diagrams, the scoring technique allowed the comparison of the different situations (represented by different scenarios) as illustrated in the spreadsheet of the scores at all three levels, see example. The colour coding immediately draws the eye to issues that may demand attention.
This application of STAMPER was quite successful. It did require a considerable investment of time from subject matter experts to develop a task list oriented to the objectives of the war game. But once developed, the task list allowed the players to evaluate various aspects of the results in a systematic and rigorous way.
In many ways STAMPER parallels the scoring methods used in sporting events like figure skating or gymnastics. The qualitative assessment of several judges is transferred to a numbers of numeric scales, e.g., for 'style' or 'difficulty'. Then a composite scale represents a combination of the sub-scales from the various judges. While such methods rely on the credibility, competency, and fairness of the judges, they provide a framework that can remove a considerable amount of arbitrariness.
Analysis of Individual Preferences for Alternatives
Another set of numerical methods can be applied when participants have indicated personal preferences for some candidate courses of action; it is called Schools of Thought Analysis (SOTA). In this method, a distance is determined between each pair of participants. This distance represents the level of agreement or disagreement between the two members of that pair -- a short distance (close to 0) means considerable agreement.
When all distances are combined and the participants are mapped out this would then represent a 'constellation' of the participants in a multi-dimensional space. Participants who are close together in this space are in considerable agreement. Unfortunately, humans cannot visualize space of more than three dimensions, so methods have been developed to represent multidimensional constellations in three or even two dimensions.
Using cluster analysis and multi-dimensional scaling, SOTA provides simpler configurations of the participants in two dimensions. From this we can see which of the participants are largely in agreement and which are not.
The figure provides cluster trees using three methods -- 'single linkage', 'complete linkage', and 'average linkage'. The two-dimensional map is the result of applying multi-dimensional scaling.
We can see that three individuals ('rmk', 'cgk', and 'wrd') seem to be close together and at some distance from the developing consensus of the majority. A facilitator can then ask representatives from the various 'schools of thought' why they have preferences that differ from the other schools.
Thus, results from SOTA can be used by the facilitator in a war game to tease out the sources of disagreement between the schools of thought.A paper on SOTA provides more detail on the method and how it has been applied in group decision making.
Analysis of Questionnaires
During war gaming, the participants may be obliged to complete one or more questionnaires. Many of the questions will include numeric results or selections from a Likert scale. Another aspect from questionnaires that is frequently of interest is the demographics of the participants.
Some aspects of the demographics of the participants, especially the players, should be collected. The nature and depth of demographics will depend on the focus of the war game. Some aspects, e.g., age, rank, military speciality, and experience are common features in demographics as these may influence the perspectives of players during the game and may be a source of dissention or disagreements (or agreements). Other aspects that may or may not be relevant include nationality, gender, specific training courses, and operational tours. It is easy to be too ambitious in collecting demographic information, so a "common sense" filter should be applied: don't collect for the sake of collecting, but if there is a plausible connection between personal attributes of players and the intent of the game, try to cover this in a demographic survey.
Numbers from Physics-based Models
Combat models used in analysis are usually configured to provide data on the many interactions during their application: detections, shots fired, kills, and so on. For the combat portion of a war game, these sources can be used to supplement other data collection methods.
Other Physics-Based Models
Apart from combat, computer-based models may be used for logistics and other aspects of combat service support. Numbers from these models can also augment other sources of data and should provide information on levels of supply, medical capacity, replacement personnel availability, platform recovery and repair when such matters are critical to success in the war game.
Numbers from External Sources
Network load statistics
Games that rely on communications over networks (a typical feature of Command Post Exercises) can generate network performance data, such as network load or latency of messages. Generally the IT support personnel who are responsible for the provision of network services will be collecting such data for their own real-time role in network management. Data from these sources can augment other information to determine, for example, if latency in the C2 systems as noted by players was due to some transient communications difficulty on the underlying network.
Overload of Participants
Each form of data collection involving participants, and especially players, should be subject to a filter of whether it will overload them. It is all too easy to develop a new questionnaire if some issue should come up where participant feedback would be desirable. Unless there is a significant value seen in some new data collection procedure, sudden inspirations for more demands on participants should be carefully filtered.
Insults to intelligence
Due to the dynamic nature of war games, with players possibly taking scenarios in unanticipated directions, questions that seemed useful in advance may be overtaken by events. If aspects of a questionnaire have become meaningless, or even misleading, given the play that has unfolded, players will not appreciate being asked about something they now consider irrelevant.
Cost-effective use of limited time
In a well-run game, players will become enthusiastic about their play time. It will be seen as an opportunity to test their ideas, to interact with peers on professional issues, and to deepen their understanding of operations. So time spent filling in questionnaires or participating in interviews may be viewed as a waste, or at least of lower priority. Consequently the data collection and analysis should always be planned for the efficient use of the time available, given that time will quickly become a scarce resource.