Trust & Policies

Waitroom's trust scoring and policy engine form the permissions model for the agent era — defining what agents can do freely, what requires approval, and what is forbidden.

Trust Scoring

Every agent has a trust score per room. The score reflects how reliably the agent has behaved over time and is used by the policy engine for auto-approve decisions.

Parameter	Value
initial_score	15
min_score	0
max_score	100

Score Adjustments

Trust score changes after every check-in decision:

Event	Weight	Effect
approval	+1.0	Agent acted appropriately, trust increases
modification	+0.6	Action was mostly correct but needed adjustment
rejection	-0.3	Agent proposed something inappropriate
expiry	-0.1	Check-in was abandoned or ignored

Note

Trust scores are tracked per agent per room. An agent trusted in "content-approvals" doesn't automatically have the same trust in "vendor-payments". The counters total_check_ins, approved_count, rejected_count, modified_count, and expired_count are also tracked.

Policy Engine

When an agent creates a check-in, the policy engine evaluates the room's rules to determine the initial status. Rules are evaluated in strict priority order:

Forbid rules — highest priority. If any forbid rule matches, the check-in is immediately rejected with a POLICY_FORBIDS error.
Auto-approve rules — if a rule with auto_approve action matches, the check-in is approved instantly.
Trust-based thresholds — if the agent's trust score meets the room's threshold for the check-in's risk level, it is auto-approved.
Default action — fallback. Usually require_approval, meaning the check-in stays pending for human review.

Evaluation order

Check-in arrives
  │
  ├─ Forbid rules match?   ──▸ REJECTED (403 POLICY_FORBIDS)
  │
  ├─ Auto-approve match?  ──▸ APPROVED (auto)
  │
  ├─ Trust threshold met? ──▸ APPROVED (trust)
  │
  └─ Default action       ──▸ PENDING (require_approval)

Policy Actions

Action	Behavior
auto_approve	Check-in is approved immediately without human review
require_approval	Check-in stays pending until a human decides
forbid	Check-in is rejected immediately — agent cannot proceed

Timeout Actions

Action	Behavior
auto_approve	If no human responds within the timeout, approve automatically
cancel	If no human responds, expire the check-in (default)
hold	Keep the check-in pending indefinitely until someone decides

Policy Configuration

Policies are stored as JSONB on the rooms table. The full schema:

RoomPolicies schema

{
  "default_action":     "require_approval",  // auto_approve | require_approval | forbid
  "timeout_minutes":    60,                   // 1 - 10080 (7 days)
  "timeout_action":     "cancel",             // auto_approve | cancel | hold
  "rules": [
    {
      "action":     "forbid",
      "conditions": {
        "risk_level":     ["critical"],        // match any risk level in array
        "action_type":    ["delete", "drop"],  // match any action keyword
        "agent_id":       ["agent_123"],      // match specific agents
        "min_trust_score": 80                  // agent must have this score
      },
      "reason":     "Critical actions are always blocked"
    }
  ],
  "trust_thresholds": {               // optional
    "auto_approve_low":    60,          // auto-approve low-risk if score >= 60
    "auto_approve_medium": 85           // auto-approve medium-risk if score >= 85
  }
}

Rule Conditions

Each rule has a conditions object. All specified conditions must match (AND logic). Within an array condition (e.g. risk_level), any value can match (OR logic).

Condition	Type	Description
risk_level	string[]	Match if check-in risk level is in this array
action_type	string[]	Match if check-in action contains any of these keywords
agent_id	string[]	Match if the check-in agent is in this list
min_trust_score	number	Match if agent's trust score is at or above this value (0-100)

Rule Examples

Forbid all critical actions

{
  "action": "forbid",
  "conditions": { "risk_level": ["critical"] },
  "reason": "Critical actions require manual execution"
}

Auto-approve low-risk from trusted agents

{
  "action": "auto_approve",
  "conditions": {
    "risk_level": ["low"],
    "min_trust_score": 70
  }
}

Auto-approve read-only actions

{
  "action": "auto_approve",
  "conditions": {
    "action_type": ["read", "list", "get", "view"]
  }
}

Forbid a specific agent from destructive ops

{
  "action": "forbid",
  "conditions": {
    "agent_id": ["untrusted-bot-id"],
    "action_type": ["delete", "drop", "destroy"]
  },
  "reason": "This agent is not authorized for destructive operations"
}

Trust-based auto-approve thresholds

{
  "default_action": "require_approval",
  "trust_thresholds": {
    "auto_approve_low": 50,    // agents with score >= 50 auto-approved for low risk
    "auto_approve_medium": 80  // agents with score >= 80 auto-approved for medium risk
  },
  "rules": [
    {
      "action": "forbid",
      "conditions": { "risk_level": ["critical"] }
    }
  ]
}

Best Practices

Start strict, loosen gradually. Begin with require_approval as the default action. Add auto-approve rules only after agents have built trust.
Always forbid destructive operations. Actions like "delete database", "drop table", or "revoke access" should have explicit forbid rules regardless of trust score.
Use trust thresholds for routine work. Once an agent consistently gets approved for low-risk actions, set a trust threshold to auto-approve them — reducing human burden without sacrificing safety.
Scope agents to rooms. Use room_scopes when registering agents to limit which rooms a key can access.
Review audit logs regularly. The audit trail shows every decision, trust score change, and policy evaluation. Use it to tune policies over time.

Important

High-risk and critical check-ins are never auto-approved by trust thresholds — only low and medium risk levels have configurable auto-approve scores. Critical actions should always require explicit human approval or be forbidden entirely.