Building a Policy Config¶
A policy attached to a scene needs more than just an ONNX file — the browser runtime has to know which observations to feed in, how to interpret the action, what commands to expose in the UI, and when to reset the episode. mjswan exposes all of that as Python kwargs on add_policy(), modelled after mjlab's config classes.
This page is the practical reference for those kwargs. For a runnable end-to-end policy built entirely in Python — including a hand-crafted ONNX graph — see examples/tutorial/minimum_policy.py.
Top level: add_policy(...)¶
scene.add_policy(
name="Locomotion",
policy=onnx.load("locomotion.onnx"),
policy_joint_names=["FL_hip", "FL_thigh", "FL_calf", ...],
default_joint_pos=[0.1, 0.8, -1.5, ...],
observations={"policy": ObservationGroupCfg(terms={...})},
actions={"joint_pos": JointPositionActionCfg(...)},
commands={"velocity": mjswan.velocity_command()},
terminations={"time_out": TerminationTermCfg(func=term_fns.time_out)},
)
The relevant kwargs (see the API reference for the full list):
| Kwarg | Purpose |
|---|---|
policy_joint_names |
Ordered list of joint names the policy controls. Required for browser-side actuator mapping. |
default_joint_pos |
Default pose, one entry per policy_joint_names. Used when use_default_offset=True on the action term and when an observation subtracts the default pose. |
observations |
dict[str, ObservationGroupCfg] keyed by ONNX input tensor name (e.g. "policy"). |
actions |
dict[str, ActionTermCfg] keyed by term name (e.g. "joint_pos"). |
commands |
dict[str, CommandTermConfig] keyed by policy-visible command name. |
terminations |
dict[str, TerminationTermCfg] keyed by termination name. |
encoder_bias |
Optional per-joint bias; the browser writes processed_action - encoder_bias to the actuators (mirrors mjlab). |
extras |
Arbitrary JSON payload merged verbatim into the generated policy config. |
Observations¶
Each ONNX input tensor maps to one ObservationGroupCfg. A group is an ordered dict of ObservationTermCfg — the runtime concatenates term outputs in declaration order.
from mjswan.envs.mdp import observations as obs_fns
from mjswan.managers.observation_manager import (
ObservationGroupCfg,
ObservationTermCfg,
)
obs = {
"policy": ObservationGroupCfg(
terms={
"base_ang_vel": ObservationTermCfg(func=obs_fns.base_ang_vel),
"projected_gravity": ObservationTermCfg(func=obs_fns.projected_gravity_isaac),
"joint_pos": ObservationTermCfg(
func=obs_fns.joint_positions_isaac, scale=1.0,
),
"joint_vel": ObservationTermCfg(
func=obs_fns.joint_vel_rel, scale=0.05,
),
"last_action": ObservationTermCfg(func=obs_fns.previous_actions),
"velocity_cmd": ObservationTermCfg(
func=obs_fns.generated_commands,
params={"command_name": "velocity"},
),
},
),
}
ObservationTermCfg fields used at runtime: func (a built-in sentinel below or a custom one registered via register_obs_func), params (forwarded to the browser-side class), scale, clip, history_length. Other mjlab fields (noise, delay_*) are accepted for config compatibility but ignored — there's no training in the browser.
Built-in observation sentinels¶
Defined in mjswan.envs.mdp.observations:
| Sentinel | Runtime class | Notes |
|---|---|---|
base_lin_vel |
BaseLinearVelocity |
Linear velocity of the base, base frame. |
base_ang_vel |
BaseAngularVelocity |
Angular velocity of the base, base frame. |
projected_gravity |
ProjectedGravityB |
Gravity in the base frame (legacy implementation). |
projected_gravity_isaac |
ProjectedGravity |
Isaac-compatible; defaults to joint_name="floating_base_joint". |
joint_pos_rel |
JointPos |
Joint positions − default pose. |
joint_vel_rel |
JointVelocities |
Joint velocities − default velocities. |
joint_positions_isaac |
JointPositions |
Isaac joint ordering, default-subtracted. |
last_action |
PrevActions |
Most recent action tensor. |
previous_actions |
PreviousActions |
Isaac-compatible most-recent action tensor. |
generated_commands |
GeneratedCommands |
Requires params={"command_name": "<name>"}. |
velocity_command_with_oscillators |
VelocityCommandWithOscillators |
16-dim velocity command + oscillator signals. |
impedance_command |
ImpedanceCommand |
Impedance command tensor. |
joint_pos_cos_sin |
JointPosCosSin |
[cos(q), sin(q)] for one joint. |
motion_anchor_pos_b |
MotionAnchorPosB |
Tracking: anchor position in the robot anchor frame. |
motion_anchor_ori_b |
MotionAnchorOriB |
Tracking: anchor orientation in the robot anchor frame. |
robot_body_pos_b |
RobotBodyPosB |
Tracking: robot body positions in the robot anchor frame. |
robot_body_ori_b |
RobotBodyOriB |
Tracking: robot body orientations in the robot anchor frame. |
builtin_sensor |
BuiltinSensor |
Raw data from a named MuJoCo sensor. |
height_scan is exported for mjlab compatibility but raises NotImplementedError at build time (the browser has no ray-cast sensor).
For a custom observation backed by your own TypeScript class, see register_obs_func in the API reference.
Actions¶
Each entry in actions is a subclass of ActionTermCfg. Two are supported in the browser:
from mjswan.envs.mdp.actions import (
JointPositionActionCfg,
JointEffortActionCfg,
)
# Joint-position control with external PD
actions = {
"joint_pos": JointPositionActionCfg(
actuator_names=(".*",),
scale=0.25,
offset=0.0,
use_default_offset=True, # action=0 commands the default pose
stiffness=40.0, # kp (scalar, per-joint list, or dict by joint name)
damping=1.0, # kd
),
}
# Direct torque output
actions = {
"thrust": JointEffortActionCfg(
actuator_names=("lift",),
),
}
stiffness and damping are mjswan-specific — in mjlab they live on the actuator, but the browser runtime computes PD externally for motor actuators with biastype=none, so we need them in the policy config. Both accept a scalar, a per-joint list (aligned with policy_joint_names), or a dict keyed by joint name.
JointVelocityActionCfg, TendonLengthActionCfg, TendonVelocityActionCfg, TendonEffortActionCfg, and SiteEffortActionCfg are exported so mjlab configs import cleanly, but they raise NotImplementedError at build time — the browser runtime doesn't support them yet.
Commands¶
commands is a dict of CommandTermConfig values keyed by the policy-visible name your observations reference (e.g. params={"command_name": "velocity"} looks up commands["velocity"]).
commands = {
"velocity": mjswan.velocity_command(), # standard 3-DoF locomotion
"target": mjswan.ui_command([ # arbitrary UI inputs
mjswan.Slider("target_height", "Target Height (m)", range=(0.3, 1.8), default=1.0),
]),
}
To adapt a custom mjlab command class to a browser-side TS class, use mjswan.register_command_term(mjlab_name, spec) — see the API reference.
Terminations¶
Same shape as observations:
from mjswan.envs.mdp import terminations as term_fns
from mjswan.managers.termination_manager import TerminationTermCfg
terminations = {
"time_out": TerminationTermCfg(func=term_fns.time_out, time_out=True),
"fallen": TerminationTermCfg(
func=term_fns.bad_orientation,
params={"limit_angle": 1.0},
),
}
Supported termination sentinels¶
| Sentinel | Runtime class | Required params |
|---|---|---|
time_out |
TimeOut |
— |
bad_orientation |
BadOrientation |
limit_angle (radians) |
root_height_below_minimum |
RootHeightBelowMinimum |
minimum_height (metres) |
Other mjlab termination sentinels are exported for config compatibility but raise NotImplementedError at build time. Register a custom one with register_termination_func.
End-to-end examples¶
| Example | What it shows |
|---|---|
| examples/tutorial/minimum_policy.py | Smallest possible policy — a hand-built ONNX PD controller plus observations, actions, and a ui_command, all in one file. |
| examples/demo/gentle_humanoid/main.py | Realistic tracking policy with JointPositionActionCfg, motions attached via add_motion(...), and per-motion metadata. |
Legacy: passing a JSON file via config_path=¶
add_policy(config_path="policy.json") still accepts a hand-written JSON file. mjswan reads it, merges the Python-side commands / observations / actions / terminations into it, and writes the result alongside the ONNX. Prefer the Python kwargs above for new policies — the JSON form is mostly useful when importing a config that was already produced by another tool.
For the on-disk JSON schema, run git log -- docs/docs/notes/policy-config.md and read the version prior to this rewrite.