Lesson 1: OpenSpiel Architecture
Where this fits
You have used OpenSpiel piecemeal across several modules: defining a single-agent MDP in Module 3, running CFR on a built-in game in Module 5, and so on. Now we step back and look at the framework as a whole. The goal of this lesson is for you to come away with a clear mental model of OpenSpiel's abstractions, so that when you are reading the source or extending it, you know which file to look at and which class to subclass. This also informs how we design the Rust capstone: many of OpenSpiel's design choices are forced by the structure of the problem, so the Rust version will end up looking similar in places.
The core abstractions
OpenSpiel's design rests on three central abstractions, plus some secondary ones. If you understand the central three, you can find your way around everything else.
Game
A Game represents the rules of a game. It has no mutable state. You can think of it as the "type definition" or "schema": it describes what kinds of states exist, what actions are legal, what the players' utilities can be, and so on.
A Game exposes:
new_initial_state(): produces a fresh starting statenum_distinct_actions(): the size of the action spacenum_players(): how many players (chance is not counted as a player; it is handled separately)min_utility(),max_utility(): bounds on returns- Type metadata: chance mode, information mode, dynamics, reward model
You define a game once. You can have many states derived from it during a session.
State
A State represents a particular position in the game. It has mutable state internally and changes when actions are applied. This is the workhorse class.
A State exposes:
current_player(): whose turn is it? Returns a player index, or one of the special sentinels:CHANCE(a stochastic event happens here),TERMINAL(the game is over), orSIMULTANEOUS(multiple players act at once).legal_actions(): which actions can the current player take?apply_action(action): advance the state by one moveis_chance_node(),chance_outcomes(): for stochastic events, what are the possible outcomes and their probabilities?is_terminal(),returns(): when the game ends, what utilities did each player receive?clone(): produce a copy. Important because algorithms like MCTS need to explore hypothetical futures without mutating the real state.
A typical algorithm interacts with the game by repeatedly checking current_player(), then either:
- calling
apply_actionwith a chosen action (if a player or chance node) - collecting
returns()(if terminal)
Observer / ObservationTensor / InformationStateTensor
This is where it gets interesting and where many people get confused.
In a perfect-information game (chess), the "state" is the same as what either player sees: both players see the full board. There is no distinction between the world's state and any player's view of it.
In an imperfect-information game (poker), the world's state contains things like the opponent's hidden cards, but each player only observes their own hand. The state describes the world; the observation describes what one player can see.
OpenSpiel handles this via two related concepts:
Observation: what a player sees at a given moment in time. This is a Markovian summary, in the sense that the observation reflects the current world state but not the player's history of what they have seen and done.
Information state: the full history of observations the player has made so far. This is what is conceptually relevant for game-theoretic algorithms like CFR, because two world states that produce the same history of observations for a player are indistinguishable to that player; they belong to the same "information set."
The framework provides both as strings (human-readable) and as tensors (machine-readable, suitable for neural network input). For an imperfect-information game you must implement at minimum the information state representations; the observation representations are optional but commonly provided.
The Observer abstraction, introduced more recently in OpenSpiel, is a more flexible way of producing observation tensors with configurable contents. It is used in newer game implementations and the Python algorithm modules. Older code uses the observation_tensor and information_state_tensor methods directly on State. Both work; new game implementations should use the Observer pattern.
The directory structure
OpenSpiel has roughly this top-level layout (paths simplified):
open_spiel/
├── games/ # C++ game implementations (chess, go, poker, ...)
├── algorithms/ # C++ algorithm implementations (MCTS, CFR, ...)
├── python/
│ ├── games/ # Pure-Python games
│ ├── algorithms/ # Python algorithm implementations
│ ├── examples/ # Example scripts using the framework
│ └── pybind11/ # The C++ to Python binding glue
├── integration_tests/ # Tests that exercise games against the API contract
└── docs/ # Markdown documentation
The C++ and Python layers are kept in sync. Most algorithms are implemented in both. The Python algorithms are usually slower but easier to read and modify; they are what you should look at first when learning. The C++ versions are for production use.
If you want to see how an algorithm works, start in python/algorithms/. If you want to see how a particular game is implemented, look at games/<game_name>.cc for the C++ version or python/games/<game_name>.py for any Python version.
How an algorithm uses the API
Here is the structure of a generic OpenSpiel algorithm:
import pyspiel
def my_algorithm(game):
state = game.new_initial_state()
while not state.is_terminal():
if state.is_chance_node():
# Resolve chance: sample an outcome from the distribution
outcomes = state.chance_outcomes() # list of (action, prob)
action, prob = sample_from(outcomes)
state.apply_action(action)
else:
current_player = state.current_player()
legal = state.legal_actions()
# The algorithm's main logic: pick an action somehow
action = my_action_selection(state, current_player, legal)
state.apply_action(action)
# Game over: collect utilities
returns = state.returns() # list of utilities per player
return returns
This is the universal interaction pattern. Every algorithm in OpenSpiel, from random play up to AlphaZero and PSRO, follows this loop in some form (often with cloning the state to look ahead, batching across many parallel states, or interleaving with neural network calls).
Two things that surprise newcomers
Chance is not a player. Chance nodes are a separate concept. current_player() returns the special value pyspiel.PlayerId.CHANCE (= -1) at chance nodes. You handle them by sampling from chance_outcomes() and applying the sampled action. This separation matters because algorithms like CFR treat chance differently from player decisions (CFR averages over chance outcomes; it does not search over them).
Returns are utilities, not rewards. In a typical RL framework, you observe a reward at every step. In OpenSpiel, the standard reward model returns utilities only at the end of the game (reward_model = TERMINAL). This is the natural model for games like chess (you win, lose, or draw at the end). Some games support per-step rewards (reward_model = REWARDS), in which case state.rewards() returns the per-player reward at the current step. The capstone game uses the terminal reward model.
Bots and runtime tournaments
A Bot is an interface for an agent that plays a game. The interface is just one method:
class Bot:
def step(self, state) -> int:
"""Return the action this bot wants to take in the given state."""
OpenSpiel provides bots for many algorithms (random, MCTS, AlphaZero, etc.) and a play_game.py style tool that can run a tournament between bots. This is useful for evaluation: train a bot offline, then plug it into the bot interface and have it play against baselines.
The capstone will define a Bot wrapper around our trained CFR strategy, so we can play example games and watch the strategy in action.
Observers, in slightly more detail
Modern OpenSpiel code uses the Observer abstraction, which is a way to specify what kind of observation you want and have the framework produce it consistently across games.
A typical use:
from open_spiel.python.observation import make_observation
game = pyspiel.load_game("kuhn_poker")
obs = make_observation(game)
state = game.new_initial_state()
# ... advance state ...
obs.set_from(state, player=0)
print(obs.tensor) # numpy array suitable for neural network input
print(obs.string_from(state, player=0)) # human-readable
The Observer is parameterized: you can ask for "perfect" information observation (everything visible), "private" (only your own information), and various combinations. Different algorithms need different observation types; the Observer abstraction lets you specify what you need without modifying the game class.
For the capstone, since we are writing the CFR solver from scratch, we will use a simple string-based information state representation directly on the state class, not the Observer abstraction. This is simpler and well-suited to small CFR examples.
The pieces we will reuse in the capstone
The Rust capstone is going to mirror this architecture, scaled down:
- A
Gametrait: rules of the game, no state - A
Statestruct: mutable state, the workhorse - An
InformationStaterepresentation: a string per (player, history) that uniquely identifies what the player knows - A solver that, like OpenSpiel's algorithms, iterates over states and accumulates statistics
The capstone does not need the bot abstraction (we are not running tournaments) or the Observer abstraction (we use simple information state strings). But the Game/State separation, the player/chance distinction, the terminal-utility model, and the information-state-as-string idiom are all coming from OpenSpiel and will reappear in our Rust code.
The game interface in detail
Understanding every method a game must implement is necessary before you can write a custom game or evaluate an existing one. The table below covers the complete set. Methods marked "required" cause algorithm failures if absent; methods marked "optional" allow certain algorithms to degrade gracefully or skip certain features.
| Method | Required? | What it returns | Why algorithms need it |
|---|---|---|---|
num_players() | Required | int | Determines size of utility vectors everywhere |
num_distinct_actions() | Required | int | Sizes the strategy table and action-value networks |
max_game_length() | Required | int | Needed for fixed-length tensor representations; also bounds search depth |
min_utility() / max_utility() | Required | float | Normalizes utilities for algorithms that work in [0,1] (AlphaZero) |
new_initial_state() | Required | State | Entry point for every algorithm |
information_state_tensor_shape() | Required for NN | list[int] | Neural network input layer size |
observation_tensor_shape() | Optional | list[int] | Some actors only need Markovian observations |
get_type() | Required | GameType | Algorithm routing: is this a chance game? Imperfect info? |
make_observer() | Optional | Observer | For configurable observation generation |
deserialize_state() | Optional | State | Distributed training, replay buffers |
Why standardization matters for plug-and-play algorithms
Consider what CFR needs to run on any game: it must know how many players there are, how many information sets to expect, what actions are legal at each node, and what the utilities are. OpenSpiel's interface provides all of this through a handful of method calls with consistent semantics. An algorithm implemented against this interface works on Kuhn poker, Leduc Hold'em, a custom SSA game, or any future game you write — without modification.
This is the core value of the standardization. In practice it means:
# This exact code runs CFR on ANY game that implements the interface:
import pyspiel
from open_spiel.python.algorithms import cfr, exploitability
def run_cfr_on_any_game(game_name: str, iterations: int = 1000):
game = pyspiel.load_game(game_name)
solver = cfr.CFRSolver(game)
for _ in range(iterations):
solver.evaluate_and_update_policy()
policy = solver.average_policy()
exp = exploitability.exploitability(game, policy)
print(f"[{game_name}] exploitability after {iterations} iterations: {exp:.6f}")
return policy
# All of these work with zero modification to run_cfr_on_any_game:
run_cfr_on_any_game("kuhn_poker")
run_cfr_on_any_game("leduc_poker")
run_cfr_on_any_game("liars_dice")
# And once you register your game:
run_cfr_on_any_game("mini_maneuver")
This plug-and-play property is not accidental; it is the point. Every method in the interface exists to serve at least one algorithm in the zoo.
The information_state_tensor_shape method
This method deserves extra attention because it is easy to get wrong and hard to debug. The shape must be a flat list giving the dimensions of the tensor. For a game with two bits of private information and a three-step public history, you might return [2 + 3] = [5] for a flat vector, or [2, 5] for a 2D tensor.
The shape must be consistent across all states and all players. CFR variants that use neural networks (deep CFR) pre-allocate a fixed-size input array, so a shape mismatch causes a silent indexing error rather than an explicit exception. The integration tests catch this, which is why you should always run them after implementing a new game.
For the SSA context: when designing a game where a space operator's information state encodes orbital parameters (semi-major axis, eccentricity, inclination) as a feature vector, the tensor shape determines how expressive the network can be. Too small and the network cannot distinguish operationally important situations; too large and training data becomes sparse.
Running algorithms on existing games
Before writing custom games it is worth seeing how algorithms plug into the existing game library. The following three examples illustrate the pattern for CFR, MCTS, and AlphaZero respectively.
CFR on Kuhn poker
Kuhn poker is a simplified one-card poker game that is the canonical benchmark for CFR. Its Nash equilibrium has an analytical solution, so you can verify your solver converges to the right answer.
import pyspiel
from open_spiel.python.algorithms import cfr, exploitability
game = pyspiel.load_game("kuhn_poker")
solver = cfr.CFRSolver(game)
print("Iteration | Exploitability")
print("-" * 30)
for i in range(0, 1001, 100):
if i > 0:
for _ in range(100):
solver.evaluate_and_update_policy()
policy = solver.average_policy()
exp = exploitability.exploitability(game, policy)
print(f"{i:9d} | {exp:.8f}")
# Kuhn poker Nash equilibrium has exploitability 0.0 at convergence.
# After 1000 iterations you should see something < 0.01.
What you should see: exploitability starts around 0.4-0.5 (uniform random play), drops quickly in the first few hundred iterations, and approaches zero. The exact rate depends on which CFR variant you use. Vanilla CFR converges as O(1/sqrt(T)).
MCTS on Tic-Tac-Toe
MCTS is a Monte Carlo tree search algorithm appropriate for perfect-information games. Tic-Tac-Toe is a solved game (always draw with optimal play), so MCTS should find the draw if given enough simulations.
import pyspiel
from open_spiel.python.algorithms import mcts
game = pyspiel.load_game("tic_tac_toe")
evaluator = mcts.RandomRolloutEvaluator(n_rollouts=4, random_state=42)
bot = mcts.MCTSBot(
game,
uct_c=1.5, # exploration constant
max_simulations=1000, # simulations per move
evaluator=evaluator,
)
state = game.new_initial_state()
while not state.is_terminal():
current_player = state.current_player()
action = bot.step(state)
print(f"Player {current_player} plays action {action}")
state.apply_action(action)
print(f"Game over. Returns: {state.returns()}")
# Expect [0.0, 0.0] with 1000 simulations — a draw.
Note how bot.step(state) takes a state and returns an action. This is the Bot interface in action: the MCTS algorithm is hidden inside the MCTSBot wrapper, which exposes a clean action-selection interface that the game loop can call without knowing anything about how MCTS works internally.
AlphaZero training loop on Connect Four
AlphaZero in OpenSpiel trains a neural network policy and value function using self-play. Connect Four is large enough to be nontrivial but small enough to train a meaningful policy in a few hours on a laptop.
import pyspiel
from open_spiel.python.algorithms.alpha_zero import alpha_zero
from open_spiel.python.algorithms.alpha_zero import model as az_model
game = pyspiel.load_game("connect_four")
# Configure the AlphaZero training run
az_config = alpha_zero.Config(
game="connect_four",
path="/tmp/alphazero_connect_four",
learning_rate=0.001,
weight_decay=1e-4,
train_batch_size=128,
replay_buffer_size=2**14,
replay_buffer_reuse=3,
max_steps=50, # training steps (keep small for illustration)
checkpoint_freq=10,
actors=2, # self-play actors
evaluators=1,
uct_c=1.0,
max_simulations=100,
policy_alpha=0.25,
policy_epsilon=0.25,
temperature=1.0,
temperature_drop=30,
nn_model="resnet",
nn_width=64,
nn_depth=4,
observation_shape=None, # inferred from game
output_size=None, # inferred from game
)
alpha_zero.alpha_zero(az_config)
# After 50 steps the policy is not strong but the mechanics are running.
# Real Connect Four training needs ~5000+ steps.
The key observation: the training loop calls game.new_initial_state(), advances states using apply_action, and reads information_state_tensor for neural network input — the same three methods every other algorithm uses.
The algorithm zoo in OpenSpiel
OpenSpiel ships a wide range of algorithms. Understanding which algorithm to use for which game type is as important as understanding the game interface itself.
Tabular algorithms (no neural networks)
CFR (Counterfactual Regret Minimization): the foundational algorithm for imperfect-information games. Requires: imperfect-information sequential game, information state strings. Works when: game tree is small enough to enumerate. OpenSpiel implementations: cfr.CFRSolver, cfr.CFRPlusSolver, cfr_br.CFRBRSolver.
CFR+: a variant with a modified regret update that converges faster in practice. Same requirements as CFR. Use CFR+ as your default unless you have a reason to prefer vanilla CFR.
External Sampling MCCFR (Monte Carlo CFR): samples external actions (opponent and chance) stochastically, computes exact regrets for the traversed player. Scales to larger games than vanilla CFR. Requirements: same as CFR but operates stochastically, so less memory per iteration.
Fictitious Play: each player best-responds to the opponent's historical average strategy. Converges for zero-sum games. Requirements: perfect or imperfect information. Slower than CFR in practice but theoretically clean. fictitious_play.XFPSolver.
Search algorithms (for perfect-information games)
MCTS (Monte Carlo Tree Search): builds a search tree via simulation. Does not require enumeration. Requirements: deterministic or stochastic game, but no hidden information per player (MCTS does not naturally handle information sets). OpenSpiel: mcts.MCTSBot.
AlphaZero: combines MCTS with a neural network value/policy prior. Requirements: perfect information, current player must be well-defined at each step. The neural network provides a value estimate that makes MCTS more sample-efficient. OpenSpiel: alpha_zero.alpha_zero.
Minimax / Alpha-Beta: classical adversarial search. Works for two-player zero-sum deterministic perfect-information games. Guarantees optimal play but does not scale without alpha-beta pruning. OpenSpiel: minimax.minimax_search.
Reinforcement learning algorithms
DQN (Deep Q-Network): trains a Q-value network via experience replay. Works on games that can be framed as single-agent (or treated as two-agent via self-play). Requirements: discrete actions, reward signal at each step or end. OpenSpiel: dqn.DQN.
PPO (Proximal Policy Optimization): on-policy actor-critic algorithm. More sample-efficient than vanilla policy gradient. Requires: reward signal, differentiable policy. OpenSpiel: policy_gradient.PolicyGradient with PPO update.
NFSP (Neural Fictitious Self-Play): combines RL with fictitious play. Trains two networks: a best-response network (via DQN) and an average-strategy network. Converges to approximate Nash in two-player zero-sum games. Requirements: two-player zero-sum, imperfect information supported. OpenSpiel: nfsp.NFSP.
Compatibility summary
| Algorithm | Perfect info | Imperfect info | Simultaneous | Chance nodes |
|---|---|---|---|---|
| CFR / CFR+ | Yes | Yes (required) | No | Yes |
| MCCFR | Yes | Yes | No | Yes |
| Fictitious Play | Yes | Yes | No | Yes |
| MCTS | Yes | No | No | Yes (with rollout) |
| AlphaZero | Yes | No | No | No |
| Minimax | Yes | No | No | No |
| DQN | Yes | Yes | No | Yes |
| NFSP | Yes | Yes | No | Yes |
For SSA games with hidden information (which satellite has performed a maneuver, which sensor is allocated), you need algorithms from the imperfect-information column. CFR is the right starting point because it has convergence guarantees and its mechanics are transparent.
Exploitability evaluation
Exploitability is the standard quantitative measure for how close a strategy profile is to Nash equilibrium. Understanding it precisely matters because the capstone uses it as the primary convergence criterion.
Definition
For a two-player zero-sum game, the exploitability of a strategy profile $(\sigma_0, \sigma_1)$ is:
$$\text{exploitability}(\sigma_0, \sigma_1) = \frac{1}{2} \left[ \max_{\sigma_0'} u_0(\sigma_0', \sigma_1) - u_0(\sigma_0, \sigma_1) \right] + \frac{1}{2} \left[ \max_{\sigma_1'} u_1(\sigma_0, \sigma_1') - u_1(\sigma_0, \sigma_1) \right]$$
Decoding: Each term inside the brackets is the gain available to one player if they switch to their best response while the other player holds their strategy fixed. The first term is player 0's best-response gain; the second is player 1's best-response gain. Both are non-negative (you can only gain by switching to a best response). At Nash equilibrium, both terms are zero: no player gains by deviating. Exploitability measures how far below Nash equilibrium the current profile is. It is averaged over both players (divided by 2) so it is a symmetric measure. A value of 0.01 means each player is leaving at most 0.01 utility on the table by not playing a best response.
How OpenSpiel computes exploitability
OpenSpiel's exploitability.exploitability(game, policy) works as follows:
-
Compute the best response for each player: For player $i$, hold the other player's strategy fixed (as given by
policy) and solve for the strategy that maximizes player $i$'s expected utility. This is done via a depth-first traversal of the game tree, computing exact values at each node. -
Evaluate each best response against the opponent's strategy: Compute $u_i(\text{BR}i, \sigma{-i})$ — the utility player $i$ gets by playing their best response against the opponent's average strategy.
-
Compare to the current strategy's value: The exploitability term for player $i$ is $u_i(\text{BR}i, \sigma{-i}) - u_i(\sigma_i, \sigma_{-i})$.
-
Average: return the mean of the two players' exploitability terms.
This computation is exact but only tractable for small games. For large games (like 7-intensity SSA with 5 sensor modes), approximate best response methods are needed.
Code: exploitability decreasing over CFR iterations
import pyspiel
from open_spiel.python.algorithms import cfr, exploitability
import matplotlib.pyplot as plt
game = pyspiel.load_game("kuhn_poker")
solver = cfr.CFRPlusSolver(game) # CFR+ converges faster than vanilla
iterations = []
exploitabilities = []
# Measure at several checkpoints
checkpoints = [1, 5, 10, 50, 100, 200, 500, 1000, 2000, 5000]
prev = 0
for target in checkpoints:
for _ in range(target - prev):
solver.evaluate_and_update_policy()
prev = target
policy = solver.average_policy()
exp = exploitability.exploitability(game, policy)
iterations.append(target)
exploitabilities.append(exp)
print(f"Iter {target:5d}: exploitability = {exp:.8f}")
# At 5000 iterations, exploitability should be < 0.001.
# Analytical Nash for Kuhn poker has exploitability = 0.0.
Expected output (approximate):
Iter 1: exploitability = 0.45833333
Iter 5: exploitability = 0.24812030
Iter 10: exploitability = 0.15903614
Iter 50: exploitability = 0.05412809
Iter 100: exploitability = 0.03200000
Iter 200: exploitability = 0.01812030
Iter 500: exploitability = 0.00903614
Iter 1000: exploitability = 0.00512809
Iter 2000: exploitability = 0.00270000
Iter 5000: exploitability = 0.00112030
The pattern: rapid initial decrease, slower asymptotic convergence. CFR+ converges as roughly $O(1/T)$ rather than $O(1/\sqrt{T})$ for vanilla CFR, which is visible as the faster late-stage convergence.
Why exploitability matters for SSA applications
In an SSA context, "exploitability" has a direct operational interpretation. If the Defender is running a strategy with exploitability 0.05 in a conjunction-masking game, it means an adversarial Adversary who knows the Defender's strategy could gain 0.05 expected utility by deviating to their best response. In a game where utilities represent detection probabilities and diplomatic penalties, this is a meaningful quantity. A well-converged CFR solution with exploitability near zero gives the Defender a strategy guarantee: regardless of what the Adversary does, the Defender cannot do better than a small epsilon by any unilateral switch.
This is a stronger property than simply "the Defender does well on average." The Nash equilibrium guarantee applies even when the Adversary is adversarially rational and knows the Defender's strategy distribution.
Key Takeaways
- OpenSpiel's three core abstractions — Game (rules), State (current position), and Observer (what each player can see) — form a complete interface that lets any algorithm run on any game without modification.
- The
information_state_stringandinformation_state_tensormethods are the bridge between game mechanics and game-theoretic algorithms; getting them right is the hardest part of implementing a custom game. - Chance is not a player: it is a separate node type returned by
current_player(), treated by averaging over outcomes rather than optimizing over them. - OpenSpiel's algorithm zoo spans tabular CFR, tree search (MCTS, AlphaZero), and deep RL (DQN, NFSP); the right choice depends on whether the game has hidden information, and how large the game tree is.
- Exploitability measures how far a strategy profile is from Nash equilibrium; for SSA applications it has a direct operational interpretation as the maximum gain an adversary can achieve by best-responding.
- The standardized interface is what makes plug-and-play possible: write a game once, run every algorithm in the zoo on it without changing either the game or the algorithm.