OpAnalytics.ca logo

Seminar War Game Course

Defence Technology Agency, Devonport, April 2013

The Zefra Scenario

The Zefra scenario provides the background for a short role-playing game. It will be used as an example during discussions about planning and conducting analysis of a seminar war game.

Numbers -- the Quantitative Side of a Qualitative Method

The Spectrum of Operational Analysis Methods

In the military context of wargaming, operational analysis methods have been widely applied, with considerable success. The methods applied to wargaming have generally been adapted from other areas of operations research, and, indeed, from other disciplines too. Some of the methods have been quantitative approaches traditionally associated with operations research, but many have been the 'soft' methods of judgement-based operational analysis and problem structuring methods.

One reason that wargaming is so frequently used in military operations research is because wargaming is widely practiced by military personnel for their own purposes. For example, US Army doctrine, US Marine Corps doctrine, US Air Force doctrine, US Navy doctrine, and joint doctine all incorporate wargaming in course of action analysis (as well as other aspects of military decision making). Thus wargaming is a method that will be very familiar to many participants in an OR study. And personnel from all services will have been exposed to wargaming as an analysis method throughout their military careers.

War games, however, come in a wide variety of formats. Not all of these formats will be appropriate to some specific operations research study of the moment. The game format must selected such that it is suited to the analysis objectives. And the associated quantitative methods for analyzing the game results must be chosen so they are compatible with the format of games to which they are being applied.

In decades past one of the quantitative methods applied was to count casualties and report something like a loss-exchange ratio: say, the ratio of RED losses to BLUE losses. The war game format was generally such that combat engagements were expected during the game and the result (achieving some operational objective) would be based on the success (or failure) in those combat engagements. This was quite appropriate when the main focus was on attrition: "Are we killing more than we are losing?" In more recent times, however, attrition has rarely been the focus of wargaming, so meticulous counting of casualties and reckoning of the loss-exchange scoreboard has been of little value. Now, other factors are more likely indicators of success in the larger mission: "Have we won over the local population?", "Have the insurgents been discredited?", or "Has the legitimacy of the local government been established?"

Outside of the military sphere, professional games may be used to investigate many aspects of decision making. Wargaming for Leaders provides one perspective on how methods adapted from military war gaming might be used by civilian leaders to make better business decisions.

Some Quantitative Methods for Use in Professional Games

Command and Control

Code of Best Practice for the Assessment of Command and Control. Many military OR studies deal with command and control. Outside of the military domain, command and control might be understood as operations management -- making decisions for the firm in real time. Military OR studies have ranged from evaluating procedures to testing prototype equipment. Many of these studies have used a war game to provide a cacoon around a team comprising a commander and staff to walk them through command and control activities while measuring some aspect of performance.

Computer Network Performance Metrics. Command and control systems (and real-time management system in the civilian world) generally rely on networks of computers. Metrics are available for how these systems function under a simulated load created by playing a game with participants manning their positions on such systems. Typically network administrators have various software packages that can determine available bandwidth, latency, and such -- to determine if their network is functioning properly, and to diagose problems when they are not. Thus network administrations can provide useful measures to include in OR studies where the issue is the complete system, both the hardware and humans.

Command Team Effectiveness

A specific part of command and control is the effectiveness of the people engaged in the process. Command is often viewed as a highly personal matter, linked to leadership abilities and command style. As such, the ideosyncratic nature of command makes it difficult to analyze. However, the activity in the staff surrounding a commander can be viewed as more procedural, albeit with the human factors of the participants affecting the team's performance. The human factors community has provided assessment guidelines for command team effectiveness. These assessment procedures can be used in real military operations, but they are more likely to be seen in some simulated situations, like an exercise or war game.

The effectiveness of the command team is typically a area of interest for OR analysts -- it can have an enormous impact on the success or failure of military operations. Rather than developing OR assessment techniques from scratch (and reinventing the wheel), OR practitioners should abstract from the human factors community those specfic techniques that may apply to their current study. While some tailoring may be appropriate for some specific study objectives, the wealth of material available from the human factors community in this domain is quite extensive and comprehensive. For those working outside of the military domain the procedures should translate well if the term command team is simply replaced by management team.

The command team effectiveness (CTEF) model and assessment procedures were first published as CTEF in 2005. After application in several studies and subsequent review, revised methods were published as CTEF 2.0 in 2010.

Situational Awareness

Situational Awareness (SA) is "the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning and the projection of their status in the near future" (Endsley's definition (1995)). In the military domain, this definition overlaps with a command team in the sense that one aspect of a team's performance is generally how well that team can understand their operational environment. But it can be applied to individuals, e.g., how well an air traffic controller can monitor the aircraft in his or her airspace and predict the future geometry of them.

While situational awareness is a term that has developed considerable attention lately from the military community, the practices associated with assessing an individual's or a team's SA can and have been applied in the civilian domain as well. Two techniques that have been popular for some time are SAGAT and SART: Situation Awareness Global Assessment Technique and Situation Assessment Rating Technique.

As Endsley points out, methods like SAGAT can be used to evaluate alternative design concepts for a monitoring system or changes in training programs in terms of their contributions to situational awareness. She also acknowledges that there are no criteria to determine what level of situational awareness is required for successful performance (how much is enough?). Endsley offers that better situational awareness will generally contribute to successful performance, but does not guarantee it. Such methods are not intended to determine if people are making the right decisions. She highlights experience, for example, as something that should affect decision making but which may not be a large part of situational awareness

Situation Awareness Global Assessment Technique (SAGAT). One of the first applications of SAGAT was to civilian air traffic control. Controllers operated within an air traffic control simulator. The simulator was paused from time to time and the controllers asked about their awareness.

SAGAT has subsequently become one of the standard instruments for measuring SA. Here at random times a simulator, simulation, or game is frozen, a rapid battery of queries administered to selected participants, and these are then scored on the basis of objective data. This battery of queries is issued when the operator's display is blanked, so that he or she must rely on working memory to answer the questions. (Clearly, under the time pressure of certain real-world activity, such a process is inappropriate, but modern simulation makes it more feasible when the operator is working in an artificial environment.)

In the military domain, SAGAT has been applied down to the level of an infantry platoon operating within a simulated urban environment. One conclusion was: "The results of this analysis demonstrate that these measures, particularly SAGAT, show good promise for measuring SA in future studies of Infantry operations."

SAGAT is best applied in a simulated environment as opposed to a real environment since it involves subjects pausing to answer a number of questions. Thus, applying it when the subject is the pressure of a situation demanding constant attention is fraught with problems. Past applications for air crew or air traffic controllers were in simulated environment where the situation could be frozen while the subjects responded.

Situation Assessment Rating Technique (SART) provides its assessments through an operator's subjective opinion. It includes some 14 components which were developed from past analysis of SA factors that were relevant to pilots. Operators are asked to rate the components on bipolar scales. They are asked for opinions on how they perceive (1) demands on their resources, (2) the supply of these resources, and (3) their understanding of the situation.

SAGAT and SART (as Endsley proposes) can be used to measure the actual quality of SA and a person's perception of the quality of their own SA, respectively, and then compared. This combination will permit, for example, to determine when operators are overconfident of their own SA.

Task Analysis
Task Analysis within Teams

NASA developed a widely used technique for measuring the ability of team to conduct various tasks. NASA-TLX was originally applied within the aerospace domain to develop workload estimates from one or more operators while they are performing a task or immediately afterwards. NASA-TLX has been an valuable tool in this are for over 20 years. It has proven to be a useful technique to break team activity into component tasks and then evaluate the participants in terms of ability to perform those tasks. Aspects of NASA's TLX were subsequently incorporated into the NATAO Command Team Effective Form (CTEF).

Task Analysis of Organizational Components

A study for the Canadian Army conducted analysis of future force structures by decomposing missions into tasks and sub-tasks, conducting war games, scoring alternatives structures against the tasks, and aggregating the scores. It was also used to assess a given force structure in different aspects of operations, e.g., in open versus urban terrain.

Stamper task list exampleSTAMPER, for "Systematic Task Analysis for Measuring Performance and Evaluating Risk" provides a means to diagnose from an overall evaluation of mission success back to specific tasks that were accomplished well, or otherwise. See a paper and a presentation by Eugenia Kalantzis from ISMOR 2004. In this method, the Study Team working with military subject matter experts developed a task list, organized in three levels of refinement, see diagram.

The Study Team used five operational functions at the highest level, namely Command, Sense, Act, Shield, and Sustain. These were decomposed into Level 1, Level 2, and Level 3 sub-tasks with progressively more detailed descriptions.

During the progress of the game, players were periodically asked to 'score' effectiveness at the lowest level of granularity (Level 3). The Level 3 results were combined for Level 2 results, and these were combined in turn for Level 1 results. These Level 1 results were normalized onto a scale from 0 to 100.

Stamper spider-web diagramAt a summary level the normalized scores on five axes (corresponding to the five operational functions) were displayed on a so called 'spider web' or 'radar scope' diagram. The example shown here has results from two scenarios, one for operations of a brigade in open terrain (played on Day 1) and one for the same brigade in urban terrain (played on Day2).

Since numerical scores of this sort are largely meaningless in absolute terms, they are best displayed where comparisons can be relative to each other for contrasting situations.

In this example, we can see that the brigade appeared to be somewhat more successful in open terrain than in urban terrain in the Command and Sense functions (and about the equal for the other three functions). The short lines of sight due to urban obstructions created problems with communications (a component of Command) and with sensor detections. However, the conclusions were more nuanced than this simple explanation.

Stamper numerical comparisonApart from the radar scope diagrams, the scoring technique allowed the comparison of the different situations (represented by different scenarios) as illustrated in the spreadsheet of the scores at all three levels, see example. The colour coding immediately draws the eye to issues that may demand attention.

This application of STAMPER was quite successful. It did require a considerable investment of time from subject matter experts to develop a task list oriented to the objectives of the seminar war game. But once developed, the task list allowed the players to evaluate various aspects of the results in a systematic and rigorous way.

In many ways STAMPER parallels the scoring methods used in sporting events like figure skating or diving. The qualitative assessment of several judges is transferred to a numbers of numeric scales, e.g., for 'style' or 'difficulty'. Then a composite scale represents a combination of the sub-scales from the various judges. While such methods rely on the credibility, competency, and fairness of the judges, they provide a framework that can remove a considerable amount of arbitrariness.

Analysis of Individual Preferences for Alternatives

Another set of numerical methods can be applied when participants have indicated personal preferences for some candidate courses of action; it is called Schools of Thought Analysis (SOTA). In this method, a distance is determined between each pair of participants. This distance represents the level of agreement or disagreement between the two members of that pair -- a short distance (close to 0) means considerable agreement.

When all distances are combined and the participants are mapped out this would then represent a 'constellation' of the participants in a multi-dimensional space. Participants who are close together in this space are in considerable agreement. Unfortunately, humans cannot visualize space of more than three dimensions, so methods have been developed to represent multidimensional constellations in three or even two dimensions.

Using cluster analysis and multi-dimensional scaling, SOTA provides simpler configurations of the participants in two dimensions. From this we can see which of the participants are largely in agreement and which are not.

SOTA Sample DiagramA sample diagram resulting from the application of SOTA is shown here. Each participant is represented by his or her initials. Those who are shown close together are largely in agreement.

The figure provides cluster trees using three methods -- 'single linkage', 'complete linkage', and 'average linkage'. The two-dimensional map is the result of applying multi-dimensional scaling.

We can see that three individuals ('rmk', 'cgk', and 'wrd') seem to be close together and at some distance from the developing consensus of the majority. A facilitator can then ask representatives from the various 'schools of thought' why they have preferences that differ from the other schools.

Thus, results from SOTA can be used by the facilitator in a game to tease out the sources of disagreement between the schools of thought. A paper on SOTA provides more detail on the method and how it has been applied in group decision making.

Analysis of Questionnaires

During gaming, the participants may be obliged to complete one or more questionnaires. Many of the questions will include numeric results or selections from a Likert scale. Another aspect from questionnaires that is frequently of interest is the demographics of the participants.


Some aspects of the demographics of the participants, especially the players, should be collected. The nature and depth of demographics will depend on the focus of the game. Some aspects, e.g., age, rank, military speciality, and experience are common features in demographics as these may influence the perspectives of players during the game and may be a source of dissention or disagreements (or agreements). Other aspects that may or may not be relevant include nationality, gender, specific training courses, and operational tours. It is easy to be too ambitious in collecting demographic information, so a "common sense" filter should be applied: don't collect for the sake of collecting, but if there is a plausible connection between personal attributes of players and the intent of the game, try to cover this in a demographic survey.

Multivariate Statistical Methods

Questionnaires with Multiple Choices or Likert Scales

Players (and other participants) may be asked to complete questionnaires that indicate personal preferences in some regards, e.g., attractiveness of some course of action. As with questionnaires in other realms, these can be subjected to various statistical methods, see Feedback.

Numbers from Physics-based Models Supporting the Game

Combat Models

Combat models used in analysis are usually configured to provide data on the many interactions during their application: detections, shots fired, kills, and so on. For the combat portion of a seminar war game, these sources can be used to supplement other data collection methods.

Other Physics-Based Models

Apart from combat, computer-based models may be used for logistics and other aspects of combat service support. Numbers from these models can also augment other sources of data and should provide information on levels of supply, medical capacity, replacement personnel availability, platform recovery and repair when such matters are critical to success in the seminar war game.

Models for Other Applications

For other applications, e.g., in commerce or civilian government, there may be other models that can be used to support the game. For example, there could be a model of market forces for a business game where investments in marketing or R&D can affect the success of a new product for the firm. Numerical results from this model may be used to diagnose whenplayers are having success or failure, and why.

Numbers from External Sources

Network load statistics

Games that rely on communications over networks (a typical feature of Command Post Exercises) can generate network performance data, such as network load or latency of messages. Generally the IT support personnel who are responsible for the provision of network services will be collecting such data for their own real-time role in network management. Data from these sources can augment other information to determine, for example, if latency in the C2 systems as noted by players was due to some transient communications difficulty on the underlying network.

Overload of Participants

Each form of data collection involving participants, and especially players, should be subject to a filter of whether it will overload them. It is all too easy to develop a new questionnaire if some issue should come up where participant feedback would be desirable. Unless there is a significant value seen in some new data collection procedure, sudden inspirations that impose more demands on participants should be carefully filtered.

Insults to intelligence

Due to the dynamic nature of games, with players possibly taking scenarios in unanticipated directions, questions that seemed useful in advance may be overtaken by events. If aspects of a questionnaire have become meaningless, or even misleading, given the play that has unfolded, players will not appreciate being asked about something they now consider irrelevant.

Cost-effective use of limited time

In a well-run game, players will become enthusiastic about their play time. It will be seen as an opportunity to test their ideas, to interact with peers on professional issues, and to deepen their understanding of operations. So time spent filling in questionnaires or participating in interviews may be viewed as a waste, or at least of lower priority. Consequently the data collection and analysis should always be planned for the efficient use of the time available, given that time will quickly become a scarce resource.

Guides for Visualization, Analysis, and Presentation

Simple Graphs and Charts

Edward Tufte has made a life's work of seeking to improve the visual display of quantitative information. Many of those who are familiar with tables and charts in MicroSoft Excel may feel that, since they can turn a small table into a pie chart, and change the colours and labels, there is little left to learn. However presenting numerical material so it is easy to absorb and hard to misinterpret is a perpetual challenge involving not only mathematical skills, but considerable chunks of practical psychology.

“Bauman´s Inferno”

Michael Bauman, Director of the TRADOC Analysis Center, has compiled a summary of his experience over decades of analysis support to the US Army. His summary, called “Bauman´s Inferno”, includes valuable advice on the presentation of results of operations research studies. "Bauman's Inferno" is available at the OA 4604 Sakai site.

Open-Source Software for Data Scrubbing, Visualization, Analysis, and Presentation

A recent article in Computerworld provides a survey of available open-source software for various sorts of data analysis. It includes a side-by-side comparison of data visualization tools that are freely available on the web.