Skip to content

Building a Policy Config

A policy attached to a scene needs more than just an ONNX file — the browser runtime has to know which observations to feed in, how to interpret the action, what commands to expose in the UI, and when to reset the episode. mjswan exposes all of that as Python kwargs on add_policy(), modelled after mjlab's config classes.

This page is the practical reference for those kwargs. For a runnable end-to-end policy built entirely in Python — including a hand-crafted ONNX graph — see examples/tutorial/minimum_policy.py.

Top level: add_policy(...)

scene.add_policy(
    name="Locomotion",
    policy=onnx.load("locomotion.onnx"),
    policy_joint_names=["FL_hip", "FL_thigh", "FL_calf", ...],
    default_joint_pos=[0.1, 0.8, -1.5, ...],
    observations={"policy": ObservationGroupCfg(terms={...})},
    actions={"joint_pos": JointPositionActionCfg(...)},
    commands={"velocity": mjswan.velocity_command()},
    terminations={"time_out": TerminationTermCfg(func=term_fns.time_out)},
)

The relevant kwargs (see the API reference for the full list):

Kwarg Purpose
policy_joint_names Ordered list of joint names the policy controls. Required for browser-side actuator mapping.
default_joint_pos Default pose, one entry per policy_joint_names. Used when use_default_offset=True on the action term and when an observation subtracts the default pose.
observations dict[str, ObservationGroupCfg] keyed by ONNX input tensor name (e.g. "policy").
actions dict[str, ActionTermCfg] keyed by term name (e.g. "joint_pos").
commands dict[str, CommandTermConfig] keyed by policy-visible command name.
terminations dict[str, TerminationTermCfg] keyed by termination name.
encoder_bias Optional per-joint bias; the browser writes processed_action - encoder_bias to the actuators (mirrors mjlab).
extras Arbitrary JSON payload merged verbatim into the generated policy config.

Observations

Each ONNX input tensor maps to one ObservationGroupCfg. A group is an ordered dict of ObservationTermCfg — the runtime concatenates term outputs in declaration order.

from mjswan.envs.mdp import observations as obs_fns
from mjswan.managers.observation_manager import (
    ObservationGroupCfg,
    ObservationTermCfg,
)

obs = {
    "policy": ObservationGroupCfg(
        terms={
            "base_ang_vel": ObservationTermCfg(func=obs_fns.base_ang_vel),
            "projected_gravity": ObservationTermCfg(func=obs_fns.projected_gravity_isaac),
            "joint_pos": ObservationTermCfg(
                func=obs_fns.joint_positions_isaac, scale=1.0,
            ),
            "joint_vel": ObservationTermCfg(
                func=obs_fns.joint_vel_rel, scale=0.05,
            ),
            "last_action": ObservationTermCfg(func=obs_fns.previous_actions),
            "velocity_cmd": ObservationTermCfg(
                func=obs_fns.generated_commands,
                params={"command_name": "velocity"},
            ),
        },
    ),
}

ObservationTermCfg fields used at runtime: func (a built-in sentinel below or a custom one registered via register_obs_func), params (forwarded to the browser-side class), scale, clip, history_length. Other mjlab fields (noise, delay_*) are accepted for config compatibility but ignored — there's no training in the browser.

Built-in observation sentinels

Defined in mjswan.envs.mdp.observations:

Sentinel Runtime class Notes
base_lin_vel BaseLinearVelocity Linear velocity of the base, base frame.
base_ang_vel BaseAngularVelocity Angular velocity of the base, base frame.
projected_gravity ProjectedGravityB Gravity in the base frame (legacy implementation).
projected_gravity_isaac ProjectedGravity Isaac-compatible; defaults to joint_name="floating_base_joint".
joint_pos_rel JointPos Joint positions − default pose.
joint_vel_rel JointVelocities Joint velocities − default velocities.
joint_positions_isaac JointPositions Isaac joint ordering, default-subtracted.
last_action PrevActions Most recent action tensor.
previous_actions PreviousActions Isaac-compatible most-recent action tensor.
generated_commands GeneratedCommands Requires params={"command_name": "<name>"}.
velocity_command_with_oscillators VelocityCommandWithOscillators 16-dim velocity command + oscillator signals.
impedance_command ImpedanceCommand Impedance command tensor.
joint_pos_cos_sin JointPosCosSin [cos(q), sin(q)] for one joint.
motion_anchor_pos_b MotionAnchorPosB Tracking: anchor position in the robot anchor frame.
motion_anchor_ori_b MotionAnchorOriB Tracking: anchor orientation in the robot anchor frame.
robot_body_pos_b RobotBodyPosB Tracking: robot body positions in the robot anchor frame.
robot_body_ori_b RobotBodyOriB Tracking: robot body orientations in the robot anchor frame.
builtin_sensor BuiltinSensor Raw data from a named MuJoCo sensor.

height_scan is exported for mjlab compatibility but raises NotImplementedError at build time (the browser has no ray-cast sensor).

For a custom observation backed by your own TypeScript class, see register_obs_func in the API reference.

Actions

Each entry in actions is a subclass of ActionTermCfg. Two are supported in the browser:

from mjswan.envs.mdp.actions import (
    JointPositionActionCfg,
    JointEffortActionCfg,
)

# Joint-position control with external PD
actions = {
    "joint_pos": JointPositionActionCfg(
        actuator_names=(".*",),
        scale=0.25,
        offset=0.0,
        use_default_offset=True,     # action=0 commands the default pose
        stiffness=40.0,              # kp (scalar, per-joint list, or dict by joint name)
        damping=1.0,                 # kd
    ),
}

# Direct torque output
actions = {
    "thrust": JointEffortActionCfg(
        actuator_names=("lift",),
    ),
}

stiffness and damping are mjswan-specific — in mjlab they live on the actuator, but the browser runtime computes PD externally for motor actuators with biastype=none, so we need them in the policy config. Both accept a scalar, a per-joint list (aligned with policy_joint_names), or a dict keyed by joint name.

JointVelocityActionCfg, TendonLengthActionCfg, TendonVelocityActionCfg, TendonEffortActionCfg, and SiteEffortActionCfg are exported so mjlab configs import cleanly, but they raise NotImplementedError at build time — the browser runtime doesn't support them yet.

Commands

commands is a dict of CommandTermConfig values keyed by the policy-visible name your observations reference (e.g. params={"command_name": "velocity"} looks up commands["velocity"]).

commands = {
    "velocity": mjswan.velocity_command(),           # standard 3-DoF locomotion
    "target": mjswan.ui_command([                    # arbitrary UI inputs
        mjswan.Slider("target_height", "Target Height (m)", range=(0.3, 1.8), default=1.0),
    ]),
}

To adapt a custom mjlab command class to a browser-side TS class, use mjswan.register_command_term(mjlab_name, spec) — see the API reference.

Terminations

Same shape as observations:

from mjswan.envs.mdp import terminations as term_fns
from mjswan.managers.termination_manager import TerminationTermCfg

terminations = {
    "time_out": TerminationTermCfg(func=term_fns.time_out, time_out=True),
    "fallen":   TerminationTermCfg(
        func=term_fns.bad_orientation,
        params={"limit_angle": 1.0},
    ),
}

Supported termination sentinels

Sentinel Runtime class Required params
time_out TimeOut
bad_orientation BadOrientation limit_angle (radians)
root_height_below_minimum RootHeightBelowMinimum minimum_height (metres)

Other mjlab termination sentinels are exported for config compatibility but raise NotImplementedError at build time. Register a custom one with register_termination_func.

End-to-end examples

Example What it shows
examples/tutorial/minimum_policy.py Smallest possible policy — a hand-built ONNX PD controller plus observations, actions, and a ui_command, all in one file.
examples/demo/gentle_humanoid/main.py Realistic tracking policy with JointPositionActionCfg, motions attached via add_motion(...), and per-motion metadata.

Legacy: passing a JSON file via config_path=

add_policy(config_path="policy.json") still accepts a hand-written JSON file. mjswan reads it, merges the Python-side commands / observations / actions / terminations into it, and writes the result alongside the ONNX. Prefer the Python kwargs above for new policies — the JSON form is mostly useful when importing a config that was already produced by another tool.

For the on-disk JSON schema, run git log -- docs/docs/notes/policy-config.md and read the version prior to this rewrite.