Reward

Reward#

The reward function in URBAN-SIM is modular and composed of multiple weighted terms. Each term corresponds to a specific behavior or outcome in the MDP (Markov Decision Process), and is defined using a RewTerm with an associated function, weight, and optional parameters.

Taking COCO (a wheeled robot) as an example:

COCO Reward Binding#

When robot_name = "coco", the following class is loaded:

from urbansim.primitives.robot.coco import COCORewardCfg
rewards = COCORewardCfg()

The class COCORewardCfg defines the following reward terms:

@configclass
class RewardsCfg:
    """Reward terms for the MDP."""

    arrived_reward = RewTerm(
        func=loc_mdp.is_terminated_term,
        weight=2000.0,
        params={"term_keys": "arrive"}
    )

    collision_penalty = RewTerm(
        func=loc_mdp.is_terminated_term,
        weight=-200.0,
        params={"term_keys": "collision"}
    )

    position_tracking = RewTerm(
        func=nav_mdp.position_command_error_tanh,
        weight=10.0,
        params={"std": 5.0, "command_name": "pose_command"}
    )

    position_tracking_fine = RewTerm(
        func=nav_mdp.position_command_error_tanh,
        weight=50.0,
        params={"std": 1.0, "command_name": "pose_command"}
    )

    moving_towards_goal = RewTerm(
        func=nav_mdp.moving_towards_goal_reward,
        weight=20.0,
        params={"command_name": "pose_command"}
    )

    target_vel_rew = RewTerm(
        func=nav_mdp.target_vel_reward,
        weight=10.0,
        params={"command_name": "pose_command"}
    )

Reward Term Descriptions#

arrived_reward Provides a large positive reward when the robot successfully reaches the goal (termination condition).
collision_penalty Penalizes termination due to collision with objects or pedestrians.
position_tracking Penalizes positional error using a smooth tanh function with coarse tolerance (std=5.0).
position_tracking_fine Applies a finer penalty when closer to the target, encouraging precise final alignment.
moving_towards_goal Encourages movement in the direction of the goal based on heading alignment.
target_vel_rew Provides a dense reward for matching commanded target velocity.

Each RewTerm is evaluated during environment stepping and combined (via weighted sum) to produce the total reward for each timestep.

Customizing Rewards#

To customize rewards:

Subclass RewardCfg or override terms in YAML.
Adjust the weights or add new RewTerm entries.
Define custom reward functions in nav_mdp or loc_mdp.

You can toggle or ablate specific terms (e.g., disable collision_penalty) by setting their weight to 0 or removing them from the config.

Reward

Contents

Reward#

COCO Reward Binding#

Reward Term Descriptions#

Customizing Rewards#