NPC Control Networks Guide

GitHub: https://github.com/mark-alfred-griffiths/video_game_prototyping

GitHub

Math Helpers

These are helper functions that provide mathematical utilities to support the NPC belief-update system.

clamp01(x) limits a value 'x' so that it is always between 0.0 and 1.0. This is useful for probabilistic variables as well as trust, uncertainty and confidence to ensure a valid probability range. If x is below 0.0 this function returns 0.0, whilst if x is above 1.0 it returns 1.0.

The softplus(x) function is a smooth alternative to the ReLU activation function.

softplus(x)=log(1+ex) Unlike ReLU, Softplus is differentiable everywhere and always produces positive outputs. The conditional branch avoids numerical overflow because exp(x) blows up for high x values.

inverse_softplus(y) performs the reverse transformation. inverse_softplus(y)=log(ey−1) It converts a positive value back to the unconstrained raw neural value prior to the softplus activation. This is important during supervised training where the inverse function allows target values to be mapped correctly into the network's raw prediction space.

Story Model Types

This section defines the core story-state objects that the model operates on. These classes operate underneath the predictive-coding encoder, Tensorflow heads, belief updates, policy selection, and scene logic.

NPCAction defines the behavioural outputs that are possible for the NPC to take. It inherits from both str and Enum, meaning each action is both a fixed enum value and a usable string. This means that code can safely compare actions whilst also printing or logging an associated string value. These actions are predicted by the Tensorflow policy head, which outputs logits over DISMISS, PROBE, REVEAL, and CONFRONT;

Engram represents a belief-object in the story word. An engram is a psychologically meaningful story fact that the NPC can hold beliefs about. The prior field gives the NPC's starting expectation, before new evidence arrives. This means that an engram is not just story text; it is also a target for probabilistic belief updating.

Observation represents evidence gathered following the player's dialogue choices/ behaviour. It stores the engram the evidence refers to, how strongly it supports that engram, how reliable the evidence is, and where it came from. The weighted_strength property combines strength and reliability. A strong but unreliable signal is weakened, while a strong and reliable signal has more impact on belief updating. This value is later compared against the NPC’s current belief when computing prediction error and free energy.

NPCState is the main internal state of the chatracter. It stores name, belief parameters, engrams, and emotional and social variables. It is built around a Beta distribution, where each engram has an alpha and beta value. Alpha represents accumulated evidence for the engrams, while beta representation accumulated evidence against it. The method default_alpha_beta() creates an initial Beta distribution from default_belief and default_concentration. With the defaults, the NPC starts with a mean belief of 0.40 and a concentration of 5.0 giving approximately: alpha = 0.40 * 5.0 = 2.0, beta = 0.60 * 5.0 = 3.0.

So NPC begins slightly sceptical, but not certain. get_params() retrieves the current alpha and beta values for a given engram. If the NPC has never encountered a particular engram before, it automatically creates a default belief state for it. This links to the memory system. Engrams became part of NPC's tracked psychological state once they are updated·.set_params() writes new alpha and beta values back into the NPC state.

belief() returns the mean of the Beta distribution: belief = alpha / (alpha + beta)

variance() measures how uncertain that belief is. A Beta distribution with low evidence has higher variance; one with lots of accumulated evidence has lower variance.

uncertainty() normalises the Beta variance into a rough 0.0 to 1.0 scale using clamp01(). Since the maximum variance for a probability-like variable is around 0.25, dividing by 0.25 gives a convenient normalised uncertainty score.

Engram defines what NPC can believe, observation defines what evidence the player gives her, NPCState stores the NPC's evolving belief distribution, and NPCAction defines the NPC can respond.

These classes are then consumed by the later controller classes, where raw features are built, predictive-coding latent states are generated, TensorFlow heads predict belief updates and actions, and the game scene prints NPC's response.

JPC Predictive Coding Encoder

This class is the predictive coding representation model in the system. It acts as the NPC’s learned internal inference engine, transforming raw psychological and observational state variables into a compact latent representation that the TensorFlow decision heads can use.

The constructor (__init__) defines the architecture and optimisation setup. input_dim is the size of the raw feature vector coming from the NPC state and observations. In your demo this corresponds to values such as belief mean, uncertainty, trust, suspicion, observation strength, and free-energy-related terms. latent_dim is the dimensionality of the learned predictive-coding latent representation. In your implementation this is an 8-dimensional latent state.

The JAX random key: self.key = jr.PRNGKey(seed)

The actual predictive-coding network is created here:
self.model = jpc.make_mlp(...)

This constructs a multilayer perceptron inside the JPC framework with:

input size = input_dim
hidden width = width
network depth = depth
output size = latent_dim
ReLU activation functions

So structurally this resembles a standard neural network, but it is trained using predictive-coding dynamics rather than conventional backpropagation alone. The optimiser: self.optim = optax.adam(learning_rate) uses Adam from Optax to update the predictive-coding network parameters. self.opt_state = self.optim.init((eqx.filter(self.model, eqx.is_array), None)) initialises the optimiser state only over trainable array parameters inside the Equinox/JAX model. This is necessary because JAX models often contain both trainable tensors and static Python objects.

The train_pc_step() method performs one predictive-coding learning update. It receives:

x: the raw NPC feature vector
target_latent: the desired latent representation

The batching lines:

x_batch = jnp.asarray([x], dtype=jnp.float32)
target_batch = jnp.asarray([target_latent], dtype=jnp.float32)

convert the vectors into JAX tensors with an explicit batch dimension: (1, input_dim) (1, latent_dim)

Even though only one sample is used, JPC expects batched input tensors. The predictive-coding update itself occurs here:

result = jpc.make_pc_step(...). This is the central inference-learning step.

Internally, JPC performs iterative predictive-coding optimisation to reduce the mismatch between the network’s prediction and the target latent state. Rather than simply propagating gradients backward once, predictive coding treats hidden states as inference variables and relaxes them toward equilibrium. The returned result dictionary contains updated network parameters and optimiser state:

self.model = result["model"] self.opt_state = result["opt_state"]

So the encoder gradually learns a latent representation space that captures:

belief state,
uncertainty,
prediction error,
free energy,
social tension,
and instability dynamics.

The encode() method performs inference only — no learning. It transforms a raw feature vector into a learned latent representation.

First the input is converted into a JAX tensor:

y = jnp.asarray(x, dtype=jnp.float32)

Then the code manually iterates through each block of the model:

for block in self.model: y = block(y)

This effectively performs a forward pass through the predictive-coding MLP layer by layer.

Finally: return np.asarray(y, dtype=np.float32) converts the JAX tensor back into a NumPy array so it can be consumed by TensorFlow.

This class therefore acts as the bridge between:

symbolic/story state (NPCState, Observation, Engram)
predictive-coding latent inference (JPCPredictiveCodingEncoder)
behavioural prediction (TensorFlowNPCHeads)

The raw handcrafted psychological variables are not fed directly into the policy network. Instead, the predictive-coding encoder learns an internal latent cognitive representation first, making it function as the NPC’s learned “mental model” layer.

TensorFlow Heads

This class defines the behavioural decision layer of the NPC system. While the JPC predictive-coding encoder learns an internal latent psychological representation, the TensorFlowNPCHeads class converts that latent state into concrete behavioural outputs: belief updates and NPC actions. Conceptually, the architecture looks like this:

raw psychological state
↓
JPC predictive-coding encoder
↓
latent cognitive representation
↓
TensorFlow shared hidden layers
↙ ↘
belief-update head policy head
(delta α, delta β) action logits
↓ ↓
updated Beta belief NPC behaviour

The model is built using TensorFlow/Keras and is described as a multi-head network because two different prediction tasks share the same latent representation and hidden layers.

The constructor begins by defining the input layer:

inputs = tf.keras.Input(shape=(latent_dim,), name="jpc_pc_latent")

This means the network does not directly observe raw game variables. Instead, it consumes the compressed latent vector produced by the predictive-coding encoder. In your implementation this latent vector has dimension 8.

Two hidden layers are then created:

hidden = tf.keras.layers.Dense(hidden_dim, activation="relu")(inputs)
hidden = tf.keras.layers.Dense(hidden_dim, activation="relu")(hidden)

These shared hidden layers allow the network to learn internal abstractions useful for both belief updating and policy selection simultaneously.

After the shared representation, the network splits into two separate output heads.

The first head:

belief_raw_delta = tf.keras.layers.Dense(2)(hidden)

predicts two values:

raw delta-alpha
raw delta-beta

These are updates to the Beta-distribution belief parameters stored inside NPCState.

The system later transforms these raw outputs using the Softplus function so they become strictly positive: raw network output ↓ softplus() ↓ positive α / β increments. This head therefore controls how strongly the NPC updates belief evidence for or against an engram.

The second head:

policy_logits = tf.keras.layers.Dense(action_count)(hidden)

predicts logits over the possible NPCAction values:

dismiss
probe
reveal
confront

These are not probabilities yet — they are raw scores. Later, the action with the highest logit is selected via argmax.

The overall TensorFlow model is assembled here: self.model = tf.keras.Model(...)

with named outputs: { "belief_raw_delta": ..., "policy_logits": ... }

This makes it easy to train both objectives simultaneously.

The optimiser: self.optimizer = tf.keras.optimizers.Adam(learning_rate) uses Adam optimisation for gradient descent.

Two separate loss functions are defined:

self.mse = tf.keras.losses.MeanSquaredError()
self.ce = tf.keras.losses.CategoricalCrossentropy(from_logits=True)

The _train_step() method performs a single TensorFlow training update. The @tf.function decorator compiles it into a TensorFlow graph for faster execution.

Inside the gradient tape: with tf.GradientTape() as tape: TensorFlow records all operations so gradients can later be computed automatically.

The forward pass: outputs = self.model(latent_batch, training=True) produces both output heads simultaneously.

The belief loss: belief_loss = self.mse(...) measures how closely the network predicted the desired Beta update targets.

The policy loss: policy_loss = self.ce(...) measures whether the correct NPC action was predicted.

These are combined: total_loss = belief_loss + policy_loss meaning the model jointly optimises both psychological belief updating and behavioural policy selection. Gradients are then computed: grads = tape.gradient(total_loss, self.model.trainable_variables) and applied: self.optimizer.apply_gradients(...) This is standard backpropagation training over the TensorFlow network. The public train_step() wrapper prepares the data for TensorFlow. The latent vector is reshaped into a batch: latent.reshape(1, -1) because TensorFlow expects batched tensors.

The action target is converted into a one-hot vector: [0, 0, 1, 0] for example, representing "reveal".

Finally, the method returns: total loss belief loss policy loss so training progress can be monitored.

The predict() method performs inference only. It runs the latent vector through the trained network:

outputs = self.model(..., training=False)

and extracts:

raw_delta
logits

The belief deltas are later converted into positive Beta updates using Softplus, while the logits determine which NPC action is selected.

This class therefore acts as the NPC’s learned behavioural layer. The predictive-coding network builds an internal cognitive representation, and the TensorFlow heads translate that representation into:

how beliefs should change,
and how the NPC should behave socially.

Combined JPC + TensorFlow NPC Controller

This class is the central orchestration layer of the entire NPC system. It connects together:

the symbolic story state (NPCState, Engram, Observation)
the JPC predictive-coding encoder
the TensorFlow behavioural heads
the free-energy calculations
the runtime NPC belief-update and action-selection loop

In effect, this class acts as the NPC’s cognitive controller.

Overall Architecture

player choice ↓ Observation ↓ raw handcrafted psychological features ↓ JPC predictive-coding encoder ↓ latent cognitive representation ↓ TensorFlow belief/policy heads ↓ belief update + NPC action ↓ updated NPC psychological state

Action Labels

The class begins by defining:

action_labels = [ NPCAction.DISMISS, NPCAction.PROBE, NPCAction.REVEAL, NPCAction.CONFRONT, ]

This establishes the fixed ordering of possible NPC actions.

The TensorFlow policy head predicts logits by index, meaning:

Index Action

0 dismiss

1 probe

2 reveal

3 confront

This mapping is important because the network internally predicts numerical outputs rather than strings.

Feature Meaning

belief_mean Current Beta-distribution belief

alpha_scaled Evidence-for parameter

beta_scaled Evidence-against parameter

uncertainty Normalised variance

confidence Inverse uncertainty

prior Prior belief expectation

evidence_strength Raw observation strength

reliability Observation reliability

weighted_strength strength × reliability

trust NPC social trust

suspicion NPC suspicion

instability Emotional instability

These are explicitly engineered psychological variables rather than raw dialogue tokens.

Initialisation

The constructor initialises both neural systems.

Predictive-Coding Encoder self.pc_encoder = JPCPredictiveCodingEncoder(...) This creates the JPC predictive-coding representation model. TensorFlow Behavioural Heads self.tf_heads = TensorFlowNPCHeads(...) This creates the TensorFlow decision network.

The controller therefore owns:

the cognitive inference system,
and the behavioural prediction system.

Raw Feature Construction

The raw_features() method converts symbolic story state into a numerical feature vector.

def raw_features(...)

This is the bridge between:

narrative state ↓ machine learning representation

Notably: alpha / 10.0 beta / 10.0 rescales the Beta parameters so they remain numerically stable relative to the other features.

Free Energy Calculation

The free_energy() method computes a simplified Friston-style variational free-energy objective.

F=12πe(o−b)2+12πp(b−p)2F = \frac{1}{2}\pi_e(o-b)^2 + \frac{1}{2}\pi_p(b-p)^2F=21πe(o−b)2+21πp(b−p)2

Where:

Symbol: Meaning:

o Weighted observation

b Current belief

p Prior belief

πe\pi_eπe Evidence precision

πp\pi_pπp Prior precision

This produces a scalar measure of mismatch between:

what the NPC believes,
what evidence suggests,
and what the NPC expected beforehand.

Higher free energy corresponds to greater prediction error and psychological tension.

Predictive-Coding Teacher Latent

The pc_latent_teacher() method generates the synthetic latent training target for the predictive-coding network.

In a larger system this target could come from:

replay memory,
episode trajectories,
a generative world model,
or a true predictive-coding objective

In this demo, however, the latent target is handcrafted from psychologically meaningful variables:

belief weighted observation sensory prediction error prior error free energy uncertainty trust - suspicion instability

The predictive-coding encoder therefore learns to compress socially and cognitively meaningful information into an 8-dimensional latent space.

Belief Delta Teacher

The belief_delta_teacher() method creates the supervised training target for the TensorFlow belief-update head.

First, prediction error is calculated:

error = obs.weighted_strength - belief

This acts as a predictive-coding sensory prediction error.

A gain term is then computed:

gain = 0.10 + 1.80 * obs.reliability

Meaning more reliable evidence produces stronger belief updates.

Supporting Evidence

If evidence supports the current belief:

increase alpha strongly increase beta slightly

Contradictory Evidence

If evidence contradicts belief:

increase beta strongly increase alpha slightly

This creates asymmetric evidence accumulation inside the Beta distribution.

The outputs are transformed using: inverse_softplus(...) because the TensorFlow model predicts unconstrained raw values internally, while runtime later applies Softplus to guarantee positive belief updates.

Policy Teacher

The policy_teacher() method defines the synthetic supervision target for NPC actions.

The policy is currently rule-based:

low belief → dismiss high uncertainty → probe high belief + suspicion → confront moderate belief → reveal

The TensorFlow policy head therefore learns to imitate a handcrafted behavioural policy.

Training Pipeline

The train_step() method performs the full learning cycle.

Step 1 — Build Raw Features

x = self.raw_features(...)

Step 2 — Create Predictive-Coding Latent Target

target_latent = self.pc_latent_teacher(...)

Step 3 — Update JPC Predictive-Coding Network

self.pc_encoder.train_pc_step(...)

Step 4 — Encode Psychological State

latent = self.pc_encoder.encode(x)

Step 5 — Train TensorFlow Heads

The TensorFlow heads are trained using:

belief-update targets,
and action labels.

This creates a two-stage architecture: raw psychological state ↓ predictive-coding representation learning ↓ behavioural supervised learning

Runtime Inference Loop

The update_belief_and_act() method performs the actual gameplay cognition cycle.

The method:

computes free energy before updating,
encodes the current state,
predicts belief updates,
predicts action logits,
updates Beta beliefs,
selects the highest-scoring action,
recomputes free energy,
records a full cognitive trace.

Belief Updates

The predicted outputs are transformed using Softplus:

delta_alpha = softplus(...) delta_beta = softplus(...)

This guarantees positive evidence increments.

The NPC’s Beta-distribution belief state is then updated.

Action Selection

The action is selected via:

np.argmax(logits)

Meaning the highest-scoring policy output becomes the NPC’s chosen behaviour.

Cognitive Trace System

The last_trace dictionary records the NPC’s internal cognition state for debugging and inspection.

It stores:

raw features,
latent representation,
belief changes,
uncertainty changes,
free energy before/after,
TensorFlow outputs,
policy logits,
selected action.

This is important because it makes the NPC psychologically interpretable rather than a pure black-box neural agent.

Overall Interpretation

This controller implements a hybrid architecture combining:

Component: Role:

symbolic state story/world representation

Beta distributions probabilistic beliefs

predictive coding latent cognitive inference

TensorFlow heads behavioural prediction

free energy prediction-error objective

handcrafted rules synthetic supervision

latent representation learning cognitive abstraction

The result is not “pure” active inference, but rather an amortised hybrid cognitive architecture combining:

predictive coding,
probabilistic belief modelling,
supervised behavioural learning,
and narrative state modelling

inside an interactive story/game system.

System Summary

This prototype implements a hybrid cognitive NPC architecture that combines predictive coding, probabilistic belief modelling, supervised neural learning, and narrative game logic into a unified interactive system.

At the symbolic level, the system represents psychologically meaningful story concepts as engrams — internal beliefs or memories that NPCs can reason about. Player dialogue and behaviour generate observations, which act as evidence for or against these engrams. Each NPC maintains probabilistic beliefs over these narrative states using Beta distributions, allowing beliefs to evolve gradually rather than changing in a binary manner.

The NPC’s internal psychological state is transformed into a handcrafted feature vector containing variables such as:

belief strength,
uncertainty,
confidence,
trust,
suspicion,
instability,
and observation reliability.

These features are passed into a JPC predictive-coding encoder, which learns a compressed latent cognitive representation of the NPC’s current mental state. Rather than directly mapping inputs to actions, the system first performs a representation-learning stage inspired by predictive coding and variational free-energy minimisation.

This latent representation is then consumed by a TensorFlow multi-head neural network. One output head predicts updates to the NPC’s Beta-distribution beliefs (alpha and beta evidence parameters), while the second predicts behavioural policy logits corresponding to narrative actions such as:

dismiss,
probe,
reveal,
and confront.

The architecture also incorporates a simplified free-energy objective, measuring mismatch between:

current beliefs,
incoming evidence,
and prior expectations.

This creates an internal prediction-error signal that influences both representation learning and belief updating.

Importantly, the system is designed to remain psychologically interpretable. Internal cognitive traces — including latent states, prediction errors, uncertainty changes, free-energy values, and behavioural logits — are explicitly logged and inspectable during runtime. This avoids the NPC becoming a completely opaque black-box neural agent.

Overall, the project represents an amortised active-inference-inspired narrative AI system rather than a fully autonomous active inference agent. Behaviour is learned through a combination of predictive-coding latent learning, supervised policy imitation, probabilistic belief updating, and handcrafted narrative structure. The result is a cognitively motivated NPC framework capable of maintaining evolving beliefs, reacting dynamically to player behaviour, and producing psychologically grounded narrative interactions.