Documentation Menu

Trust & Policies

Waitroom's trust scoring and policy engine form the permissions model for the agent era — defining what agents can do freely, what requires approval, and what is forbidden.

Trust Scoring

Every agent has a trust score per room. The score reflects how reliably the agent has behaved over time and is used by the policy engine for auto-approve decisions.

ParameterValue
initial_score15
min_score0
max_score100

Score Adjustments

Trust score changes after every check-in decision:

EventWeightEffect
approval+1.0Agent acted appropriately, trust increases
modification+0.6Action was mostly correct but needed adjustment
rejection-0.3Agent proposed something inappropriate
expiry-0.1Check-in was abandoned or ignored
Note
Trust scores are tracked per agent per room. An agent trusted in "content-approvals" doesn't automatically have the same trust in "vendor-payments". The counters total_check_ins, approved_count, rejected_count, modified_count, and expired_count are also tracked.

Policy Engine

When an agent creates a check-in, the policy engine evaluates the room's rules to determine the initial status. Rules are evaluated in strict priority order:

  1. Forbid rules — highest priority. If any forbid rule matches, the check-in is immediately rejected with a POLICY_FORBIDS error.
  2. Auto-approve rules — if a rule with auto_approve action matches, the check-in is approved instantly.
  3. Trust-based thresholds — if the agent's trust score meets the room's threshold for the check-in's risk level, it is auto-approved.
  4. Default action — fallback. Usually require_approval, meaning the check-in stays pending for human review.
Evaluation order
Check-in arrives

  ├─ Forbid rules match?   ──▸ REJECTED (403 POLICY_FORBIDS)

  ├─ Auto-approve match?  ──▸ APPROVED (auto)

  ├─ Trust threshold met? ──▸ APPROVED (trust)

  └─ Default action       ──▸ PENDING (require_approval)

Policy Actions

ActionBehavior
auto_approveCheck-in is approved immediately without human review
require_approvalCheck-in stays pending until a human decides
forbidCheck-in is rejected immediately — agent cannot proceed

Timeout Actions

ActionBehavior
auto_approveIf no human responds within the timeout, approve automatically
cancelIf no human responds, expire the check-in (default)
holdKeep the check-in pending indefinitely until someone decides

Policy Configuration

Policies are stored as JSONB on the rooms table. The full schema:

RoomPolicies schema
{
  "default_action":     "require_approval",  // auto_approve | require_approval | forbid
  "timeout_minutes":    60,                   // 1 - 10080 (7 days)
  "timeout_action":     "cancel",             // auto_approve | cancel | hold
  "rules": [
    {
      "action":     "forbid",
      "conditions": {
        "risk_level":     ["critical"],        // match any risk level in array
        "action_type":    ["delete", "drop"],  // match any action keyword
        "agent_id":       ["agent_123"],      // match specific agents
        "min_trust_score": 80                  // agent must have this score
      },
      "reason":     "Critical actions are always blocked"
    }
  ],
  "trust_thresholds": {               // optional
    "auto_approve_low":    60,          // auto-approve low-risk if score >= 60
    "auto_approve_medium": 85           // auto-approve medium-risk if score >= 85
  }
}

Rule Conditions

Each rule has a conditions object. All specified conditions must match (AND logic). Within an array condition (e.g. risk_level), any value can match (OR logic).

ConditionTypeDescription
risk_levelstring[]Match if check-in risk level is in this array
action_typestring[]Match if check-in action contains any of these keywords
agent_idstring[]Match if the check-in agent is in this list
min_trust_scorenumberMatch if agent's trust score is at or above this value (0-100)

Rule Examples

Forbid all critical actions

{
  "action": "forbid",
  "conditions": { "risk_level": ["critical"] },
  "reason": "Critical actions require manual execution"
}

Auto-approve low-risk from trusted agents

{
  "action": "auto_approve",
  "conditions": {
    "risk_level": ["low"],
    "min_trust_score": 70
  }
}

Auto-approve read-only actions

{
  "action": "auto_approve",
  "conditions": {
    "action_type": ["read", "list", "get", "view"]
  }
}

Forbid a specific agent from destructive ops

{
  "action": "forbid",
  "conditions": {
    "agent_id": ["untrusted-bot-id"],
    "action_type": ["delete", "drop", "destroy"]
  },
  "reason": "This agent is not authorized for destructive operations"
}

Trust-based auto-approve thresholds

{
  "default_action": "require_approval",
  "trust_thresholds": {
    "auto_approve_low": 50,    // agents with score >= 50 auto-approved for low risk
    "auto_approve_medium": 80  // agents with score >= 80 auto-approved for medium risk
  },
  "rules": [
    {
      "action": "forbid",
      "conditions": { "risk_level": ["critical"] }
    }
  ]
}

Best Practices

  1. Start strict, loosen gradually. Begin with require_approval as the default action. Add auto-approve rules only after agents have built trust.
  2. Always forbid destructive operations. Actions like "delete database", "drop table", or "revoke access" should have explicit forbid rules regardless of trust score.
  3. Use trust thresholds for routine work. Once an agent consistently gets approved for low-risk actions, set a trust threshold to auto-approve them — reducing human burden without sacrificing safety.
  4. Scope agents to rooms. Use room_scopes when registering agents to limit which rooms a key can access.
  5. Review audit logs regularly. The audit trail shows every decision, trust score change, and policy evaluation. Use it to tune policies over time.
Important
High-risk and critical check-ins are never auto-approved by trust thresholds — only low and medium risk levels have configurable auto-approve scores. Critical actions should always require explicit human approval or be forbidden entirely.