Runbooks
Operator-guided playbooks that connect policy matches to safe manual actions using the existing billing tools.
Runbooks are static guidance. They do not execute actions automatically or persist shared state.
Validate stale reservations
Use validation first when a small stale or inconsistent reservation set looks recoverable.
This runbook is for stale or inconsistent reservations where the safest first move is to validate backend state before escalating to retry or release.
- Review the selected reservation identifiers and ages.
- Confirm recent reconciliation state looks healthy before treating the match as actionable.
- Inspect the matching reservations and confirm the stale or inconsistent status is still current.
- Run validate using the existing reservation tooling.
- Refresh quota data and compare the refreshed state against the original policy match.
- Escalate manually only if the refreshed state still looks unhealthy.
- This is guidance only. It does not execute anything automatically.
- Durable operator traceability still comes from audit, not from runbook state.
Long-running active retry review
Review long-running active reservations before approving a retry.
Use this when active reservations have remained open long enough to deserve an operator check before retrying them.
- Check whether the linked workload still appears to be progressing.
- Make sure the reservation set is small enough to handle safely with the existing action tooling.
- Inspect the affected reservations and confirm they are genuinely long-running rather than recently refreshed.
- Validate state first if anything looks ambiguous.
- Approve and run retry only if the allocation should still proceed.
- Refresh quota data and inspect the result after the action completes.
- Avoid repeated retries when state is unclear.
- Use release only when the hold is clearly stale and no longer needed.
Expired reservation release review
Use explicit operator review before releasing expired reservations.
This runbook covers expired reservations where release is usually appropriate, but only after confirming that the hold should not remain active.
- Confirm the reservations are genuinely expired and not in the middle of a legitimate recovery path.
- Check recent audit activity for any operator intervention already in progress.
- Inspect the expired reservations and confirm the target set is correct.
- Validate current state if there is any doubt about whether the reservations are still recognised consistently.
- Approve and run release using the existing reservation tools.
- Refresh quota state and verify the expired hold has been cleared.
- This is a manual-first recovery path, not an automatic expiry handler.
Tenant pressure investigation
Investigate elevated tenant pressure before taking stronger action.
Use this suggest-only runbook when tenant-level unhealthy reservation signals indicate rising pressure but there is no immediate automatic action to take.
- Review tenant quota signals, recent reconciliation runs, and exceptions together.
- Inspect quota health, stale counts, expired counts, and inconsistent signals for the tenant.
- Review recent reconciliation runs and exceptions for the same tenant.
- Open the tenant quota and reservation views to identify the highest-risk reservations.
- Use the existing reservation tools to validate or intervene on the specific reservations that justify action.
- This runbook is advisory. It helps the operator investigate but does not imply a single bulk action.