Frozen schemas
A few things in Humanoid Control are described as frozen. This page
explains which ones, why, and what to do when you want to change them.
What "frozen" means
A schema is frozen when changing it requires retraining (or re-trainable replacement of) any policy that depends on it. The moment a trained ONNX file has been generated against the schema, the schema's contract is binding for that file. Changing it silently produces wrong outputs.
The point of marking schemas as frozen is to make this contract explicit so a reviewer in six months knows "if I add a joint here, I have to retrain". It also tells the rest of the project where the risk lives — most of the codebase is fine to refactor; these specific surfaces are not.
What's frozen
1. Canonical joint order
The order of joints in humanoid_control_lite_controllers.yaml's joints: list
is canonical:
joints:
- left_shoulder_pitch
- left_shoulder_roll
- left_shoulder_yaw
- left_elbow_pitch
- left_wrist_yaw
- left_wrist_roll
- left_wrist_pitch
- right_shoulder_pitch
...
Every consumer that needs an index — the C++ MITState struct, the
in-process ObservationManager, the MITCommand message's parallel
arrays, the ONNX policy's input vector — refers to this order by
index. Reordering means every policy's input layer wires to the
wrong joint. (The order is transcoded from the ONNX custom_metadata_map
into the rl_policy_controller overlay by prepare, so the checkpoint
is the source of truth — see Architecture.)
When to extend: appending a new joint at the end (e.g. the neck) is
safe — old policies that consume only the first N entries still
work. Inserting or reordering is not safe. The Lite URDF supports
this directly via mode:=arms (14 joints, default) and
mode:=arms_neck (17 joints, requires neck silicon).
2. humanoid_control_msgs/MITCommand fields
The command schema the controllers write/consume — used internally by
the in-process RLPolicyController and on the wire by any System 1/2
source feeding RemotePolicyController:
std_msgs/Header header
string[] joint_names
float64[] position
float64[] velocity
float64[] effort
float64[] stiffness
float64[] damping
Five parallel arrays + names. The arrays are indexed in the same canonical joint order above. Adding a field breaks parsers; renaming a field breaks everything.
Off-ROS (Tier-3) clients don't hand-mirror this struct: humanoid_control_msgs_dds
generates the wire-compatible cyclonedds type from this .msg via
pixi run gen-dds, so the wire form follows the schema automatically. That
makes humanoid_control_msgs_dds one more frozen-schema consumer to regenerate on any
change (see the change drill).
3. The 5 MIT command interface names
position, velocity, effort, stiffness, damping
These names are not arbitrary — they match
mujoco_ros2_control::MujocoSystem's HW_IF_STIFFNESS = "stiffness"
/ HW_IF_DAMPING = "damping" constants exactly. Renaming
stiffness to e.g. kp would break the binding against the sim
plugin while leaving the silicon plugin happy — silent skew.
4. The ONNX custom_metadata_map schema (humanoid_control_policy)
humanoid_control_policy/policy_metadata.py parses 13 fields baked into every
trained .onnx:
joint_names, default_joint_position, action_scale,
stiffness, damping, observation_terms, body_names,
dataset_repo_id, policy_dt, ...
These are the deployment contract of the policy. If training forgets to bake one of them in, the runner refuses to load the checkpoint — there is no YAML fallback, by design. Adding a field requires both training-side and runner-side support, in lockstep.
5. SafetyStatus.flags bit positions
FLAG_BUS_OFF = 1 << 0
FLAG_RX_TIMEOUT = 1 << 1
FLAG_TX_QUEUE_OVERRUN = 1 << 2
FLAG_MOTOR_FAULT = 1 << 3
FLAG_TEMPERATURE_LIMIT = 1 << 4
FLAG_INVALID_FRAME = 1 << 5
The bit positions can't be renumbered without a recorded-bag compatibility break. Adding new bits is fine (use the next free position); removing them is not.
What's NOT frozen
To draw the contrast: most of the rest of the project is free to refactor.
- Per-joint default
K_p/K_dinhumanoid_control_lite_controllers.yamlare tuning numbers, not schema. Change at will. - CAN ids in the URDF are wiring facts; if the physical robot changes, the URDF changes.
calibration.yamlvalues are per-physical-robot; regenerating them is the normal case, not a schema change.- Internal C++ field names in the plugin / controller classes — refactor freely.
- The set of available controllers — adding a new mode-FSM member is a versioned change, not a freeze. (Removing one is more delicate: any policy that ran against the old controller may have been validated against its specific behavior.)
How to change a frozen schema
The drill, in order:
- Write down what depends on it. Every trained policy, every
subscriber, every rosbag from the past — and, for
humanoid_control_msgs, the generatedhumanoid_control_msgs_ddsmirror that every Tier-3 (non-rclpy) client consumes vialite_sdk2. - Stage the change — branch + commit + describe in the PR exactly which schemas move and why.
- Bump a version. For
humanoid_control_msgs, that's the message'spackage.xmlversion + a deprecation period (parallel old + new field, or a v2 message type, for at least one release). - Regenerate the Tier-3 mirror. Run
pixi run gen-ddsto re-emithumanoid_control_msgs_dds/_generated.pyfrom the new.msg, and commit it. CI re-runs the emitter (pixi run test-dds) plus a CDR wire round-trip test, failing on an uncommitted diff or a changed field layout. - Retrain every trained policy against the new schema. Document the (old → new) mapping for anyone whose checkpoint isn't being retrained.
- Land everything together. A half-landed schema change is the silent-skew failure mode — old policy loading against new message types and silently mis-indexing.
For the canonical joint order specifically, the easiest "safe"
change is appending at the end. Old policies that consume only
the first N entries continue to work; new policies that consume
N+1+ entries pick up the additions. The neck is the obvious example
— when its actuators land, indices 14–16 are reserved for
neck_yaw / neck_roll / neck_pitch.
See also
- Reference → Messages — the field-by-field schemas.
- Reference → Policy runner — the
ONNX
custom_metadata_mapreader. - The AGENTS.md "Custom interfaces" section in the project root states the freezing rule.