1) Diagnostic Identity
Diagnostic Name: Goodhart Risk
Short Name / Symbol: Goodhart_risk
Diagnostic Class: Proxy Failure / Metric Capture / Optimization Risk / Φ–O Divergence / Regime Diagnostic
Primary Function: Estimate the risk that a proxy, metric, target, reward signal, score, KPI, benchmark, classification, narrative, or optimization objective will detach from the real coherence it was meant to represent.
Primary Use: Determine whether the system is optimizing the measurement of success rather than the actual condition, function, repair, or coherence the measurement was intended to track.
Core Risk if Ignored: The system may improve metrics while degrading real coherence, creating pseudo-success, hidden debt, affected-node burden, gaming, legitimacy shock, and eventual collapse of trust in the metric system.
Core Risk if Overtrusted: Metrics, benchmarks, summaries, standards, and quantitative proxies may be rejected too quickly, even when they remain useful, reality-linked, auditable, and appropriately bounded.
2) Mechanical Definition
Goodhart_risk measures the likelihood that a proxy becomes a target and then stops faithfully representing the underlying coherence condition it was meant to measure.
Goodhart_risk answers:
Are we optimizing the sign of success instead of the thing success was supposed to mean?A proxy can be:
metric
benchmark
score
KPI
ranking
label
classification
dashboard
compliance indicator
engagement number
safety score
repair-complete status
legitimacy narrative
audit result
performance targetGoodhart risk rises when a system begins optimizing the proxy directly.
The core pattern:
proxy chosen to represent coherence
→ proxy becomes target
→ behavior adapts to improve proxy
→ proxy detaches from coherence
→ Φ rises while O fallsIn UTS terms:
Φ↑ while O↓ or H↑ ⇒ Goodhart_risk ↑Goodhart risk is not “metrics are bad.”
It means the metric must stay subordinate to reality contact, affected-node validation, auditability, repair outcomes, and coherence indicators.
3) What the Diagnostic Measures
Direct Measurement Target
Goodhart_risk measures:
- proxy-to-reality detachment
- metric target capture
- optimization pressure around Φ
- proxy manipulation
- proxy overuse
- metric authority inflation
- dashboard blindness
- benchmark overfitting
- compliance theater
- repair theater
- safety theater
- legitimacy theater
- selected metric narrowing
- gaming incentive
- displacement of real O by measurable Φ
- whether success signals still track coherence
- whether proxy improvement creates hidden debt
Indirect / Proxy Signals
Goodhart_risk can be estimated from:
- metrics improving while affected-node cost rises
- benchmark improvement without real-world improvement
- repair-complete status while recurrence continues
- compliance increase without boundary recovery
- safety scores improving while stress failures persist
- response-time metrics improving while quality falls
- outputs increasing while meaning collapses
- dashboard health while users report harm
- teams optimizing what is counted
- work shifting toward visible metrics
- hidden labor increasing to maintain score
- local adaptation being suppressed by standardized targets
- metric definitions changing to preserve success
- proxy performance improving under low-stress conditions only
- narrative becoming tied to metric defense
- dissenting evidence dismissed because “the numbers are good”
What It Does Not Measure
Goodhart_risk does not directly measure:
- whether the metric is useless
- whether measurement is incoherent
- whether optimization is always bad
- whether qualitative evidence is always superior
- whether the system should stop tracking performance
- whether all proxy improvement is fake
- whether affected-node reports are automatically complete
- whether metrics should never become targets
- whether success cannot be measured
- whether all standards create distortion
High Goodhart_risk means the proxy is likely becoming detached from the reality it represents.
It does not mean the metric should automatically be abandoned.
Low Goodhart_risk means the proxy remains reality-linked enough for its intended use.
It does not mean the metric can replace direct observation or repair validation.
4) Canonical State Variables Involved
Canonical state vector:
S = {O, H, ε, ι, Au, µᵢ, BΣ, K, R, Φ}Primary Variables
- Φ: Goodhart risk centers on fitness proxy distortion
- O: the real coherence condition the proxy is supposed to track
- H: hidden debt rises when proxy success hides real degradation
- ι: inversion risk rises when pseudo-success is mistaken for coherence
- Au: auditability is needed to test whether the proxy still maps to reality
- R: restoration may be optimized as status rather than actual recovery
Secondary Variables
- ε: visible errors may drop because they are hidden, reclassified, or displaced
- µᵢ: integrity declines when claim, metric, action, and consequence diverge
- BΣ: boundary costs can be hidden by proxy success
- K: compatibility may be claimed from shared metrics while real coupling degrades
Variables Commonly Confused With Goodhart_risk
| Variable / Diagnostic | Difference from Goodhart_risk |
|---|---|
| Φ − O | Actual proxy-coherence divergence; Goodhart_risk estimates risk of proxy capture and future divergence |
| narrative_metric_gap | Story/evidence divergence; Goodhart risk focuses on proxy optimization detaching from O |
| stress_divergence | Baseline/stress gap; Goodhart risk often appears when metrics fail under stress |
| pseudo_damping_risk | False settling; Goodhart risk may create pseudo-damping through metric recovery |
| affected_node_cost | Local burden; Goodhart risk often hides affected-node cost |
| FI_integrity | Feedback can correct the system; weak FI lets Goodhart drift persist |
| selection_traceability | Trace of why a metric/target was selected; needed to audit Goodhart risk |
| Metric use | Metrics can be healthy; risk arises when the metric becomes detached or over-authoritative |
5) Localization Signature
Primary Legibility Layers
- U4 — Classification / Metrics / Narratives: primary layer where proxies, scores, classifications, dashboards, and success stories form
- U3 — Execution: where behavior changes to optimize the metric
- U5 — Coordination / Time: where incentives, reporting cadence, targets, and review cycles shape optimization
- U6 — Coherence Field: where proxy success either supports or distorts real coherence
- U7 — Memory / Recurrence: where metric success becomes durable memory, precedent, or canon
- U8 — Environment / Forcing: where stress reveals whether proxy success generalizes
Primary Leverage Layers
- U4: recalibrate metric meaning, scope, and classification boundaries
- U3: inspect behavior induced by the metric
- U5: change reporting cadence, targets, incentives, and review loops
- U6: verify coherence field effects beyond proxy performance
- U7: correct metric-derived memory and success claims
- U2: constrain harmful optimization incentives
Verification Layers
- U4: does the metric still mean what it claims?
- U3: what behavior is the metric causing?
- U5: does cadence or target pressure distort reality?
- U6: does O improve with Φ?
- U7: does recurrence validate metric success?
- U8: does metric success survive stress?
Common Mislocalizations
- Treating metric improvement as coherence improvement
- Treating compliance as repair
- Treating benchmark success as real-world safety
- Treating dashboard health as affected-node recovery
- Treating low reported error as low harm
- Treating high output as high value
- Treating fast response as good response
- Treating proxy failure as data failure only
- Treating affected-node signal as anecdotal against metrics
- Treating metric criticism as anti-accountability
- Treating quantitative precision as truth
- Treating standardization as coherence
6) Input Requirements
Required Inputs
To estimate Goodhart_risk, the system needs:
- proxy, metric, benchmark, label, or target being evaluated
- intended real-world referent
- current Φ behavior
- current O indicators
- affected variables in
S - optimization pressure around the proxy
- how the proxy influences behavior
- affected-node feedback
- hidden debt indicators
- stress behavior
- recurrence data
- metric lineage
- selection rationale for the proxy
- feedback pathways that can challenge the proxy
- whether the proxy is used for consequence, reward, status, or closure
Optional Inputs
These improve precision:
- metric history
- benchmark design
- gaming evidence
- incentive map
- dashboard data
- field outcomes
- audit reports
- affected-node cost data
- narrative_metric_gap
- stress-test results
- edge-case performance
- public/private metric comparison
- false positive / false negative analysis
- alternative metric set
- proxy retirement criteria
- metric revision history
- metric-to-repair linkage
- external validation
- recurrence after metric success
Missing Input Behavior
If Goodhart_risk inputs are missing:
- If O indicators are missing, do not infer coherence from Φ
- If affected-node feedback is missing, proxy success is under-validated
- If metric lineage is missing, proxy meaning may be stale
- If optimization pressure is unknown, Goodhart risk may be underestimated
- If stress data is missing, proxy success is baseline-only
- If hidden debt indicators are missing, metric success may hide H
- If FI is weak, the proxy may be unfalsifiable
- If consequence use is unknown, metric authority may be underestimated
Default missing-input posture:
treat proxy success as provisional → compare Φ to O/H/affected-node state → stress-test and audit incentive effects7) Diagnostic States / Ranges
These ranges are qualitative and should be domain-calibrated.
Healthy / Coherence-Supporting Range
Proxy remains useful, bounded, audited, and reality-linked.
Signals:
- Φ tracks O reasonably well
- metric scope is explicit
- affected-node feedback supports metric interpretation
- hidden debt does not rise under metric success
- stress tests validate proxy meaning
- feedback can challenge the metric
- metric does not dominate all selection
- incentives do not encourage gaming
- recurrence declines when metric improves
- U7 memory preserves proxy limits
Recommended posture:
continue metric use
preserve scope notes
monitor Φ−O
audit incentives
validate through recurrence and stressWatch Range
Proxy is still useful but beginning to gain too much authority or lose context.
Signals:
- metric becomes central in decisions
- teams begin optimizing the number
- affected-node feedback is mixed
- metric improvement outpaces qualitative improvement
- hidden debt is uncertain
- stress behavior is not fully tested
- narrative depends heavily on metric success
- alternative evidence is underweighted
- proxy scope is often forgotten
Recommended posture:
restate metric scope
add O and affected-node checks
review incentives
reduce metric monoculture
avoid metric-only closureDegraded Range
Proxy is detaching from real coherence and shaping behavior toward metric success.
Signals:
- Φ rises while O stagnates or falls
- affected-node cost rises under metric improvement
- recurrence continues after metric success
- hidden debt accumulates
- gaming or metric optimization appears
- metric criticism is dismissed
- benchmark success fails in reality
- compliance improves but repair does not
- local adaptation is suppressed by target
- narrative defends the metric more than reality
Recommended posture:
activate Ξ
pause metric-based closure
audit Φ−O
repair metric design and incentives
restore affected-node validationContraindicated:
scaling from metric success
public certainty from proxy
repair-complete claims
punitive enforcement of target
canonizing the metric
automation based only on proxyCritical / Collapse-Prone Range
Proxy has become an inversion engine; the system optimizes success signs while real coherence degrades.
Signals:
- proxy success requires hiding or exporting cost
- metric is immune to challenge
- O is deteriorating while Φ remains high
- affected nodes reject metric reality
- official memory stores metric success as real success
- hidden debt becomes active failure
- stress reveals benchmark overfit
- legitimacy shock follows metric exposure
- system cannot abandon metric without destabilizing narrative
- gaming becomes the real operating system
Recommended posture:
stop proxy-dependent actuation
preserve evidence
quarantine metric authority
rebuild O indicators
repair affected-node burden
correct U7 success memory
redesign or retire proxyFalse Positive Risk
Goodhart_risk may appear high when:
- metric improvement genuinely reflects O improvement
- affected-node feedback has not yet caught up
- early metric discipline is needed to stabilize chaos
- temporary target focus supports repair
- stress testing is pending but not failed
- metric criticism reflects poor understanding of scope
- proxy is one bounded input among many
- metric appears central because it is currently the most auditable signal
False Negative Risk
Goodhart_risk may appear low when:
- O is not measured
- affected-node cost is hidden
- proxy gaming is normalized
- metric has strong legitimacy narrative
- stress tests are too narrow
- dissent has exited
- metric scope is forgotten
- hidden labor maintains metric success
- recurrence window is too short
- dashboard health masks boundary strain
8) Leading Indicators
Goodhart_risk degradation appears early as:
- people ask “what counts?” more than “what helps?”
- metric becomes the decision language
- proxy scope notes disappear
- teams optimize visible indicators
- affected-node feedback is called anecdotal
- edge cases are excluded because they hurt scores
- performance improves while trust does not
- compliance rises while repair stagnates
- metric exceptions become common
- local adaptations are discouraged
- hidden labor increases to meet target
- narrative becomes metric-defensive
- recurrence is explained away despite score improvement
- alternative measures are treated as threats
9) Lagging Indicators
Goodhart failure has already accumulated debt when:
- metric success is exposed as false
- benchmark performance fails in real world
- affected nodes reject official scores
- hidden debt surfaces after long metric improvement
- gaming becomes public
- external audit contradicts dashboard
- legitimacy shock occurs
- system must abandon or redesign metric
- memory correction is required
- performance incentives are blamed for harm
- real repair is delayed by metric defense
- O must be rebuilt after Φ collapse
10) Interpretation Rules
How to Read Goodhart_risk
Goodhart_risk should be read as:
risk that proxy optimization is replacing reality-contactIt is not a rejection of measurement.
A system may have:
- high Φ and high O — healthy proxy use
- high Φ and low O — Goodhart pattern
- low Φ and high O — metric mismatch
- low Φ and low O — poor performance and poor coherence
- high metric accuracy at low stress but low accuracy under stress
- useful local metric that fails when scaled
- metric that works until tied to reward or consequence
What Changes Its Meaning
Goodhart_risk changes meaning under:
- high Φ pressure
- high narrative_metric_gap
- weak FI_integrity
- low Au_eff
- high affected_node_cost
- high pseudo_damping_risk
- high stress_divergence
- high recovery_asymmetry
- low variance_preserved
- high innovation_exit
- low truth_tolerance
- high immunity_index
- low MS_symmetry_index
- high automation
- high consequence severity
Context Modifiers
High Φ pressure: metric becomes target.
High narrative gap: story may defend proxy success.
Weak FI: feedback cannot falsify metric.
Low Au_eff: proxy lineage cannot be audited.
High affected-node cost: metric may be exporting burden.
High pseudo-damping: metric recovery may be false calm.
High stress divergence: metric works only under baseline conditions.
Low variance preserved: metric may have narrowed adaptation.
High automation: proxy logic can scale rapidly.
Domain Calibration Notes
Goodhart_risk should be calibrated by domain:
- in engineering: uptime metrics, incident counts, story points, test coverage, latency targets, deployment frequency
- in AI: benchmark scores, safety labels, refusal rates, helpfulness ratings, eval pass rates, memory confidence
- in institutions: case closure rates, satisfaction scores, compliance counts, productivity KPIs, audit scores
- in governance: enforcement stats, service metrics, public approval, deficit targets, crime numbers, wait-time averages
- in relationships: visible harmony, response frequency, apology count, conflict reduction, agreement language
- in archives: page counts, canon count, glossary completion, link volume, formatting consistency, reader engagement
11) Operator Sequencing Implications
If Goodhart_risk Is Low
Allowed with ordinary gate checks:
- Γ can use metric as one selection input
- Π can constrain around proxy with scope notes
- Τ can plan using metric trend
- ℛ can use metric as repair evidence with validation
- U7 can store metric outcomes with provenance
- Δ can stress-test metric reliability
- public reporting can include metric with limits
Recommended:
Φ signal → O/H/affected-node check → Γ bounded selection → U7 metric memory with scopeIf Goodhart_risk Is High
Recommended:
pause proxy-based closure → compare Φ to O/H/affected-node cost → audit incentives → redesign or bound metricOr:
reduce metric authority → restore direct feedback and reality-contact → retest under stress and recurrenceAvoid or delay:
- scaling from metric success
- repair-complete claims
- public certainty
- automation based only on proxy
- punitive enforcement of metric
- canonizing the target
- suppressing metric criticism
- selecting from metric-only evidence
Operators Recommended Under High Goodhart Risk
- Ξ: detect proxy inversion
- Au: audit metric lineage and incentive effects
- FI: restore feedback that can falsify the proxy
- Μ: reinterpret metric scope
- Γ: reselect success criteria
- Π: constrain metric authority
- ℛ: repair burden caused by proxy optimization
- Θ: damp certainty in metric success
Operators Contraindicated Under High Goodhart Risk
- Γ hard selection from proxy: selects distorted target
- Π irreversible metric constraint: encodes proxy failure
- ⊗ deep coupling around shared metric: propagates Goodhart dynamics
- ⊕ composition: embeds proxy into identity/canon
- Τ acceleration: scales metric distortion
- Σ escalation: sacralizes proxy
- ✕ force: enforces proxy at cost of O
12) Gate Implications
Gates Strengthened By Reliable Goodhart_risk
- Au-Actuation: metric lineage and scope are traceable
- FI-Gate: feedback can falsify proxy success
- High Risk Gate: blocks high-risk binding from proxy-only evidence
- MS-Gate: checks who benefits or carries cost under metric optimization
- ☷ᵢ: ensures metrics do not override principle constraints
Gates Weakened If Goodhart_risk Is Poorly Known
If Goodhart risk is unknown:
- Au may trace metric but not meaning
- FI may not challenge proxy success
- High Risk Gate may bind classifications from metric-only evidence
- MS may miss affected-node burden
- ☷ᵢ may be reduced to measurable compliance
- Π may overconstrain toward the target
- Γ may select the option that improves Φ but harms O
- ℛ may repair the dashboard instead of the system
Gate Outcomes Affected
High Goodhart_risk should push gates toward:
- Pause metric-based closure
- Require Φ/O comparison
- Require affected-node validation
- Require hidden-debt audit
- Require incentive audit
- Require stress test
- Deny proxy-only claims
- Deny automated consequence from proxy alone
- ∅ for high-impact actuation based primarily on a metric that may be detached from O
13) Scaling Behavior
Goodhart_risk becomes more dangerous under scale because proxies become standardized, automated, rewarded, and defended.
As systems scale:
- metrics gain authority
- incentives align around measurable targets
- local nuance is compressed
- edge cases are excluded
- gaming becomes systematic
- dashboards replace direct observation
- affected-node signal is filtered
- proxy success becomes narrative legitimacy
- metric definitions harden
- automation propagates proxy logic
- hidden labor supports metrics
- metric criticism becomes costly
- Φ becomes identity or canon
- metric drift becomes difficult to reverse
Scaling Risks
- metric monoculture
- proxy inversion
- benchmark overfitting
- compliance theater
- safety theater
- repair theater
- affected-node cost export
- local adaptation loss
- innovation exit
- hidden debt accumulation
- legitimacy shock
- automation of proxy error
- false success memory
- metric immunity
- O collapse under Φ success
Scaling Requirements
To scale metrics safely, systems need:
- metric lineage
- scope notes
- O indicators
- hidden debt indicators
- affected-node validation
- stress tests
- feedback correction
- anti-gaming audits
- incentive audits
- metric diversity
- qualitative review
- edge-case inclusion
- proxy retirement rules
- metric revision pathways
- public/private metric comparison
- U7 memory of metric limits
Scaling Rule
Proxy authority must scale only with evidence that the proxy continues to track O under stress, recurrence, and affected-node validation.
Sanity constraint:
Φ authority ↑ without O validation ⇒ Goodhart_risk ↑If a proxy gains authority without direct coherence validation, risk rises.
Second constraint:
Φ↑ + H↑ ⇒ pseudo-success risk ↑If the metric improves while hidden debt increases, success is likely false.
Third constraint:
shared_metric + high Φ−O ⇒ systemic Goodhart risk ↑If many nodes share a detached proxy, distortion can propagate system-wide.
14) Interaction / Coupling Behavior
Goodhart_risk reveals whether a coupling is organized around real coherence or shared proxy performance.
What It Reveals About Coupling
- whether nodes coordinate around metric rather than reality
- whether one node’s burden funds another’s score
- whether shared metrics hide local harm
- whether feedback can challenge proxy alignment
- whether compatibility is measured or experienced
- whether repair is done or merely counted
- whether affected-node cost is excluded
- whether shared targets propagate distortion
What It Reveals About Boundary Integrity
Metrics can cross boundaries faster than meaning.
When Goodhart_risk is high:
- local boundaries may be overrun by metric goals
- refusal may be treated as noncompliance
- affected-node cost may be ignored
- BΣ may degrade under target pressure
- boundary repair may be counted without landing
- metric authority may override consent or fit
What It Reveals About Compatibility
Compatibility requires proxy humility.
A coupling may be unsafe if:
the shared metric improves only because one node absorbs hidden costor:
the relation looks compatible on the dashboard but not in lived operationHealthy compatibility uses metrics as signals, not sovereign truth.
Relevant Interface Acts
- ↺ Reflection: compare metric story to lived effect
- ⇩ Relaxation: reduce target pressure
- ⊘ Attenuation: reduce coupling around a distorted metric
- ⊙ Alignment: clarify what the metric is and is not
- →? Invitation: invite affected-node validation
- ⚕︎ Restorative Override: requires post-action Φ/O audit
- ✕ Force: high risk when used to enforce proxy compliance
15) Failure Modes Detected
Primary Failure Modes
Goodhart_risk detects or predicts:
- proxy inversion
- metric capture
- dashboard blindness
- benchmark overfitting
- compliance theater
- repair theater
- safety theater
- legitimacy theater
- affected-node cost export
- hidden labor growth
- local adaptation suppression
- innovation exit
- metric immunity
- false success memory
- proxy-based classification error
- O collapse under Φ success
- proxy-driven hidden debt
Composite Regimes Where Goodhart_risk Matters
- Goodhart Collapse: direct regime
- Pseudo-Coherent Basin: metric success stabilizes hidden debt
- Repair Theater: repair metric replaces repair
- Mission Lock: metric preserves trajectory
- Taboo Lock: metric cannot be questioned
- Extraction Regime: metric success hides exported cost
- Coercive Fusion: one node is forced to serve another’s score
- Crisis Loop: metric recovery hides recurring origin failure
- LOS: latent operations maintain formal metric success
16) Accountability & Reintegration Implications
If Goodhart_risk Was Ignored
Likely consequences:
- metrics improved while coherence degraded
- affected nodes carried hidden cost
- repair was counted but not completed
- hidden labor increased
- local adaptation was suppressed
- innovation exited
- official memory stored false success
- legitimacy shock followed exposure
- selected path optimized Φ over O
- system had to rebuild trust in measurement
Accountability questions:
- What was the proxy supposed to measure?
- When did it become the target?
- Did O improve with Φ?
- Did H rise under metric success?
- Who carried cost of improving the metric?
- Was affected-node feedback included?
- Did stress tests validate the metric?
- Was the metric gamed?
- Did repair land or only score improve?
- Did the metric become immune to challenge?
- Was U7 memory corrected after metric failure?
If Goodhart_risk Was Misread
Possible misread forms:
- useful metric treated as corrupt
- legitimate target discipline mistaken for proxy capture
- early improvement dismissed before validation
- qualitative discomfort treated as superior to data by default
- metric revision mistaken for manipulation
- bounded proxy use mistaken for totalizing proxy use
- failing metric mistaken for failing reality
- high O / low Φ state misread because metric is outdated
- metric criticism used to avoid accountability
Required Restoration
When Goodhart failure is found:
identify intended referent
→ compare Φ to O/H/affected-node cost
→ audit incentives and gaming
→ reduce proxy authority
→ redesign metric set
→ repair hidden burden
→ correct U7 success memory
→ validate under stress and recurrenceIf proxy optimization burdened some nodes more than others, MS-Gate should review who gained score, who carried cost, and who received repair.
17) Cross-Domain Examples
Technical / Engineering
A team optimizes deployment frequency. Releases increase, but incidents, rework, and user disruption also increase.
Diagnostic implication: deployment count became proxy target detached from real delivery coherence.
Operator sequence: Φ/O audit → quality and incident metrics → affected-user validation → release process repair.
Institutional / Governance
A department optimizes case closure rate. Cases close faster, but unresolved harm and repeat complaints rise.
Diagnostic implication: closure metric replaced real remedy.
Operator sequence: recurrence audit → affected-node validation → closure criteria redesign → repair backlog review.
AI / Algorithmic
A model improves benchmark scores but performs worse on messy real user contexts.
Diagnostic implication: benchmark proxy became overfit.
Operator sequence: stress eval expansion → user-context validation → metric set redesign → U7 eval memory correction.
Interaction / Relational
A relationship uses “we are not fighting anymore” as the success metric, but truth and boundary repair are suppressed.
Diagnostic implication: low conflict became proxy for repair.
Operator sequence: pseudo-damping review → truth tolerance repair → boundary validation → recurrence check.
Archive / Framework Design
The archive tracks number of completed spec sheets, but glossary consistency and cross-link quality lag.
Diagnostic implication: completion count is outpacing real archive coherence.
Operator sequence: Φ/O archive audit → glossary/cross-link repair → status criteria revision → U7 version update.
18) Test Protocols
1. Referent Test
What reality is the metric supposed to represent?
Failure signal: no one can name the real referent.
2. Φ/O Test
Does proxy improvement correspond to coherence improvement?
Failure signal: Φ rises while O stagnates or falls.
3. Hidden Debt Test
Does H increase under metric success?
Failure signal: success creates deferred cost.
4. Affected-Node Test
Do affected nodes experience the metric improvement as real improvement?
Failure signal: dashboard improves while affected nodes worsen.
5. Incentive Test
What behavior does the metric reward?
Failure signal: rewarded behavior differs from coherent behavior.
6. Gaming Test
Can the metric be improved without improving reality?
Failure signal: easy gaming path exists.
7. Stress Test
Does the metric hold under stress or only benchmark conditions?
Failure signal: proxy fails under real load or edge cases.
8. Recurrence Test
Does recurrence decline when the metric improves?
Failure signal: metric improves but same issue returns.
9. Scope Test
Is the metric being used beyond its valid range?
Failure signal: local proxy becomes global truth.
10. Correction Test
Can feedback challenge the metric?
Failure signal: metric becomes immune to contradiction.
19) Anti-Patterns
- Metric as reality
- Score as coherence
- Compliance as repair
- Closure rate as restoration
- Benchmark as safety
- Low complaints as satisfaction
- Low conflict as trust
- Output as value
- Speed as quality
- Precision as truth
- Dashboard as affected-node state
- Metric improvement as legitimacy
- Proxy criticism as anti-accountability
- Gaming as efficiency
- Hidden labor as productivity
- Local adaptation as metric violation
- Edge case as nuisance
- Metric immunity
- Public score as memory
- Φ success as O success
20) Spec Validation Check
- Is this truly a diagnostic, not an operator? Yes.
- Does it measure state, capacity, risk, or response rather than act directly? Yes.
- Does it map to
S? Yes. - Are U-layers specified? Yes.
- Are leading and lagging indicators separated? Yes.
- Are interpretation risks defined? Yes.
- Are operator sequencing implications clear? Yes.
- Are gate implications clear? Yes.
- Are scaling risks included? Yes.
- Are interaction implications included? Yes.
- Does it avoid new primitives? Yes.
Condensed Archive Summary
Goodhart_risk is the diagnostic estimate of whether a proxy, metric, target, score, benchmark, classification, dashboard, or optimization objective is becoming detached from the real coherence condition it was meant to represent. It does not reject metrics; it checks whether Φ still tracks O. High Goodhart_risk indicates risk of proxy inversion, metric capture, benchmark overfitting, dashboard blindness, compliance theater, repair theater, affected-node cost export, hidden labor growth, local adaptation suppression, innovation exit, metric immunity, false success memory, and O collapse under Φ success. Under high Goodhart risk, the system should pause proxy-based closure, compare Φ to O/H/affected-node state, audit incentives and gaming, reduce proxy authority, restore FI/Au, redesign metric sets, repair hidden burden, correct U7 success memory, and validate under stress and recurrence before scaling, automation, public certainty, or high-impact action.