Universal Theories

Jump Through This Page

1) Diagnostic Identity

Diagnostic Name: Goodhart Risk

Short Name / Symbol: Goodhart_risk

Diagnostic Class: Proxy Failure / Metric Capture / Optimization Risk / Φ–O Divergence / Regime Diagnostic

Primary Function: Estimate the risk that a proxy, metric, target, reward signal, score, KPI, benchmark, classification, narrative, or optimization objective will detach from the real coherence it was meant to represent.

Primary Use: Determine whether the system is optimizing the measurement of success rather than the actual condition, function, repair, or coherence the measurement was intended to track.

Core Risk if Ignored: The system may improve metrics while degrading real coherence, creating pseudo-success, hidden debt, affected-node burden, gaming, legitimacy shock, and eventual collapse of trust in the metric system.

Core Risk if Overtrusted: Metrics, benchmarks, summaries, standards, and quantitative proxies may be rejected too quickly, even when they remain useful, reality-linked, auditable, and appropriately bounded.

2) Mechanical Definition

Goodhart_risk measures the likelihood that a proxy becomes a target and then stops faithfully representing the underlying coherence condition it was meant to measure.

Goodhart_risk answers:

textScroll

Are we optimizing the sign of success instead of the thing success was supposed to mean?

A proxy can be:

textScroll

metric
benchmark
score
KPI
ranking
label
classification
dashboard
compliance indicator
engagement number
safety score
repair-complete status
legitimacy narrative
audit result
performance target

Goodhart risk rises when a system begins optimizing the proxy directly.

The core pattern:

textScroll

proxy chosen to represent coherence
→ proxy becomes target
→ behavior adapts to improve proxy
→ proxy detaches from coherence
→ Φ rises while O falls

In UTS terms:

textScroll

Φ↑ while O↓ or H↑ ⇒ Goodhart_risk ↑

Goodhart risk is not “metrics are bad.”

It means the metric must stay subordinate to reality contact, affected-node validation, auditability, repair outcomes, and coherence indicators.

3) What the Diagnostic Measures

Direct Measurement Target

Goodhart_risk measures:

proxy-to-reality detachment
metric target capture
optimization pressure around Φ
proxy manipulation
proxy overuse
metric authority inflation
dashboard blindness
benchmark overfitting
compliance theater
repair theater
safety theater
legitimacy theater
selected metric narrowing
gaming incentive
displacement of real O by measurable Φ
whether success signals still track coherence
whether proxy improvement creates hidden debt

Indirect / Proxy Signals

Goodhart_risk can be estimated from:

metrics improving while affected-node cost rises
benchmark improvement without real-world improvement
repair-complete status while recurrence continues
compliance increase without boundary recovery
safety scores improving while stress failures persist
response-time metrics improving while quality falls
outputs increasing while meaning collapses
dashboard health while users report harm
teams optimizing what is counted
work shifting toward visible metrics
hidden labor increasing to maintain score
local adaptation being suppressed by standardized targets
metric definitions changing to preserve success
proxy performance improving under low-stress conditions only
narrative becoming tied to metric defense
dissenting evidence dismissed because “the numbers are good”

What It Does Not Measure

Goodhart_risk does not directly measure:

whether the metric is useless
whether measurement is incoherent
whether optimization is always bad
whether qualitative evidence is always superior
whether the system should stop tracking performance
whether all proxy improvement is fake
whether affected-node reports are automatically complete
whether metrics should never become targets
whether success cannot be measured
whether all standards create distortion

High Goodhart_risk means the proxy is likely becoming detached from the reality it represents.

It does not mean the metric should automatically be abandoned.

Low Goodhart_risk means the proxy remains reality-linked enough for its intended use.

It does not mean the metric can replace direct observation or repair validation.

4) Canonical State Variables Involved

Canonical state vector:

textScroll

S = {O, H, ε, ι, Au, µᵢ, BΣ, K, R, Φ}

Primary Variables

Φ: Goodhart risk centers on fitness proxy distortion
O: the real coherence condition the proxy is supposed to track
H: hidden debt rises when proxy success hides real degradation
ι: inversion risk rises when pseudo-success is mistaken for coherence
Au: auditability is needed to test whether the proxy still maps to reality
R: restoration may be optimized as status rather than actual recovery

Secondary Variables

ε: visible errors may drop because they are hidden, reclassified, or displaced
µᵢ: integrity declines when claim, metric, action, and consequence diverge
BΣ: boundary costs can be hidden by proxy success
K: compatibility may be claimed from shared metrics while real coupling degrades

Variables Commonly Confused With Goodhart_risk

TableScroll

Variable / Diagnostic	Difference from Goodhart_risk
Φ − O	Actual proxy-coherence divergence; Goodhart_risk estimates risk of proxy capture and future divergence
narrative_metric_gap	Story/evidence divergence; Goodhart risk focuses on proxy optimization detaching from O
stress_divergence	Baseline/stress gap; Goodhart risk often appears when metrics fail under stress
pseudo_damping_risk	False settling; Goodhart risk may create pseudo-damping through metric recovery
affected_node_cost	Local burden; Goodhart risk often hides affected-node cost
FI_integrity	Feedback can correct the system; weak FI lets Goodhart drift persist
selection_traceability	Trace of why a metric/target was selected; needed to audit Goodhart risk
Metric use	Metrics can be healthy; risk arises when the metric becomes detached or over-authoritative

5) Localization Signature

Primary Legibility Layers

U4 — Classification / Metrics / Narratives: primary layer where proxies, scores, classifications, dashboards, and success stories form
U3 — Execution: where behavior changes to optimize the metric
U5 — Coordination / Time: where incentives, reporting cadence, targets, and review cycles shape optimization
U6 — Coherence Field: where proxy success either supports or distorts real coherence
U7 — Memory / Recurrence: where metric success becomes durable memory, precedent, or canon
U8 — Environment / Forcing: where stress reveals whether proxy success generalizes

Primary Leverage Layers

U4: recalibrate metric meaning, scope, and classification boundaries
U3: inspect behavior induced by the metric
U5: change reporting cadence, targets, incentives, and review loops
U6: verify coherence field effects beyond proxy performance
U7: correct metric-derived memory and success claims
U2: constrain harmful optimization incentives

Verification Layers

U4: does the metric still mean what it claims?
U3: what behavior is the metric causing?
U5: does cadence or target pressure distort reality?
U6: does O improve with Φ?
U7: does recurrence validate metric success?
U8: does metric success survive stress?

Common Mislocalizations

Treating metric improvement as coherence improvement
Treating compliance as repair
Treating benchmark success as real-world safety
Treating dashboard health as affected-node recovery
Treating low reported error as low harm
Treating high output as high value
Treating fast response as good response
Treating proxy failure as data failure only
Treating affected-node signal as anecdotal against metrics
Treating metric criticism as anti-accountability
Treating quantitative precision as truth
Treating standardization as coherence

6) Input Requirements

Required Inputs

To estimate Goodhart_risk, the system needs:

proxy, metric, benchmark, label, or target being evaluated
intended real-world referent
current Φ behavior
current O indicators
affected variables in S
optimization pressure around the proxy
how the proxy influences behavior
affected-node feedback
hidden debt indicators
stress behavior
recurrence data
metric lineage
selection rationale for the proxy
feedback pathways that can challenge the proxy
whether the proxy is used for consequence, reward, status, or closure

Optional Inputs

These improve precision:

metric history
benchmark design
gaming evidence
incentive map
dashboard data
field outcomes
audit reports
affected-node cost data
narrative_metric_gap
stress-test results
edge-case performance
public/private metric comparison
false positive / false negative analysis
alternative metric set
proxy retirement criteria
metric revision history
metric-to-repair linkage
external validation
recurrence after metric success

Missing Input Behavior

If Goodhart_risk inputs are missing:

If O indicators are missing, do not infer coherence from Φ
If affected-node feedback is missing, proxy success is under-validated
If metric lineage is missing, proxy meaning may be stale
If optimization pressure is unknown, Goodhart risk may be underestimated
If stress data is missing, proxy success is baseline-only
If hidden debt indicators are missing, metric success may hide H
If FI is weak, the proxy may be unfalsifiable
If consequence use is unknown, metric authority may be underestimated

Default missing-input posture:

textScroll

treat proxy success as provisional → compare Φ to O/H/affected-node state → stress-test and audit incentive effects

7) Diagnostic States / Ranges

These ranges are qualitative and should be domain-calibrated.

Healthy / Coherence-Supporting Range

Proxy remains useful, bounded, audited, and reality-linked.

Signals:

Φ tracks O reasonably well
metric scope is explicit
affected-node feedback supports metric interpretation
hidden debt does not rise under metric success
stress tests validate proxy meaning
feedback can challenge the metric
metric does not dominate all selection
incentives do not encourage gaming
recurrence declines when metric improves
U7 memory preserves proxy limits

Recommended posture:

textScroll

continue metric use
preserve scope notes
monitor Φ−O
audit incentives
validate through recurrence and stress

Watch Range

Proxy is still useful but beginning to gain too much authority or lose context.

Signals:

metric becomes central in decisions
teams begin optimizing the number
affected-node feedback is mixed
metric improvement outpaces qualitative improvement
hidden debt is uncertain
stress behavior is not fully tested
narrative depends heavily on metric success
alternative evidence is underweighted
proxy scope is often forgotten

Recommended posture:

textScroll

restate metric scope
add O and affected-node checks
review incentives
reduce metric monoculture
avoid metric-only closure

Degraded Range

Proxy is detaching from real coherence and shaping behavior toward metric success.

Signals:

Φ rises while O stagnates or falls
affected-node cost rises under metric improvement
recurrence continues after metric success
hidden debt accumulates
gaming or metric optimization appears
metric criticism is dismissed
benchmark success fails in reality
compliance improves but repair does not
local adaptation is suppressed by target
narrative defends the metric more than reality

Recommended posture:

textScroll

activate Ξ
pause metric-based closure
audit Φ−O
repair metric design and incentives
restore affected-node validation

Contraindicated:

textScroll

scaling from metric success
public certainty from proxy
repair-complete claims
punitive enforcement of target
canonizing the metric
automation based only on proxy

Critical / Collapse-Prone Range

Proxy has become an inversion engine; the system optimizes success signs while real coherence degrades.

Signals:

proxy success requires hiding or exporting cost
metric is immune to challenge
O is deteriorating while Φ remains high
affected nodes reject metric reality
official memory stores metric success as real success
hidden debt becomes active failure
stress reveals benchmark overfit
legitimacy shock follows metric exposure
system cannot abandon metric without destabilizing narrative
gaming becomes the real operating system

Recommended posture:

textScroll

stop proxy-dependent actuation
preserve evidence
quarantine metric authority
rebuild O indicators
repair affected-node burden
correct U7 success memory
redesign or retire proxy

False Positive Risk

Goodhart_risk may appear high when:

metric improvement genuinely reflects O improvement
affected-node feedback has not yet caught up
early metric discipline is needed to stabilize chaos
temporary target focus supports repair
stress testing is pending but not failed
metric criticism reflects poor understanding of scope
proxy is one bounded input among many
metric appears central because it is currently the most auditable signal

False Negative Risk

Goodhart_risk may appear low when:

O is not measured
affected-node cost is hidden
proxy gaming is normalized
metric has strong legitimacy narrative
stress tests are too narrow
dissent has exited
metric scope is forgotten
hidden labor maintains metric success
recurrence window is too short
dashboard health masks boundary strain

8) Leading Indicators

Goodhart_risk degradation appears early as:

people ask “what counts?” more than “what helps?”
metric becomes the decision language
proxy scope notes disappear
teams optimize visible indicators
affected-node feedback is called anecdotal
edge cases are excluded because they hurt scores
performance improves while trust does not
compliance rises while repair stagnates
metric exceptions become common
local adaptations are discouraged
hidden labor increases to meet target
narrative becomes metric-defensive
recurrence is explained away despite score improvement
alternative measures are treated as threats

9) Lagging Indicators

Goodhart failure has already accumulated debt when:

metric success is exposed as false
benchmark performance fails in real world
affected nodes reject official scores
hidden debt surfaces after long metric improvement
gaming becomes public
external audit contradicts dashboard
legitimacy shock occurs
system must abandon or redesign metric
memory correction is required
performance incentives are blamed for harm
real repair is delayed by metric defense
O must be rebuilt after Φ collapse

10) Interpretation Rules

How to Read Goodhart_risk

Goodhart_risk should be read as:

textScroll

risk that proxy optimization is replacing reality-contact

It is not a rejection of measurement.

A system may have:

high Φ and high O — healthy proxy use
high Φ and low O — Goodhart pattern
low Φ and high O — metric mismatch
low Φ and low O — poor performance and poor coherence
high metric accuracy at low stress but low accuracy under stress
useful local metric that fails when scaled
metric that works until tied to reward or consequence

What Changes Its Meaning

Goodhart_risk changes meaning under:

high Φ pressure
high narrative_metric_gap
weak FI_integrity
low Au_eff
high affected_node_cost
high pseudo_damping_risk
high stress_divergence
high recovery_asymmetry
low variance_preserved
high innovation_exit
low truth_tolerance
high immunity_index
low MS_symmetry_index
high automation
high consequence severity

Context Modifiers

High Φ pressure: metric becomes target.

High narrative gap: story may defend proxy success.

Weak FI: feedback cannot falsify metric.

Low Au_eff: proxy lineage cannot be audited.

High affected-node cost: metric may be exporting burden.

High pseudo-damping: metric recovery may be false calm.

High stress divergence: metric works only under baseline conditions.

Low variance preserved: metric may have narrowed adaptation.

High automation: proxy logic can scale rapidly.

Domain Calibration Notes

Goodhart_risk should be calibrated by domain:

in engineering: uptime metrics, incident counts, story points, test coverage, latency targets, deployment frequency
in AI: benchmark scores, safety labels, refusal rates, helpfulness ratings, eval pass rates, memory confidence
in institutions: case closure rates, satisfaction scores, compliance counts, productivity KPIs, audit scores
in governance: enforcement stats, service metrics, public approval, deficit targets, crime numbers, wait-time averages
in relationships: visible harmony, response frequency, apology count, conflict reduction, agreement language
in archives: page counts, canon count, glossary completion, link volume, formatting consistency, reader engagement

11) Operator Sequencing Implications

If Goodhart_risk Is Low

Allowed with ordinary gate checks:

Γ can use metric as one selection input
Π can constrain around proxy with scope notes
Τ can plan using metric trend
ℛ can use metric as repair evidence with validation
U7 can store metric outcomes with provenance
Δ can stress-test metric reliability
public reporting can include metric with limits

Recommended:

textScroll

Φ signal → O/H/affected-node check → Γ bounded selection → U7 metric memory with scope

If Goodhart_risk Is High

Recommended:

textScroll

pause proxy-based closure → compare Φ to O/H/affected-node cost → audit incentives → redesign or bound metric

Or:

textScroll

reduce metric authority → restore direct feedback and reality-contact → retest under stress and recurrence

Avoid or delay:

scaling from metric success
repair-complete claims
public certainty
automation based only on proxy
punitive enforcement of metric
canonizing the target
suppressing metric criticism
selecting from metric-only evidence

Operators Recommended Under High Goodhart Risk

Ξ: detect proxy inversion
Au: audit metric lineage and incentive effects
FI: restore feedback that can falsify the proxy
Μ: reinterpret metric scope
Γ: reselect success criteria
Π: constrain metric authority
ℛ: repair burden caused by proxy optimization
Θ: damp certainty in metric success

Operators Contraindicated Under High Goodhart Risk

Γ hard selection from proxy: selects distorted target
Π irreversible metric constraint: encodes proxy failure
⊗ deep coupling around shared metric: propagates Goodhart dynamics
⊕ composition: embeds proxy into identity/canon
Τ acceleration: scales metric distortion
Σ escalation: sacralizes proxy
✕ force: enforces proxy at cost of O

12) Gate Implications

Gates Strengthened By Reliable Goodhart_risk

Au-Actuation: metric lineage and scope are traceable
FI-Gate: feedback can falsify proxy success
High Risk Gate: blocks high-risk binding from proxy-only evidence
MS-Gate: checks who benefits or carries cost under metric optimization
☷ᵢ: ensures metrics do not override principle constraints

Gates Weakened If Goodhart_risk Is Poorly Known

If Goodhart risk is unknown:

Au may trace metric but not meaning
FI may not challenge proxy success
High Risk Gate may bind classifications from metric-only evidence
MS may miss affected-node burden
☷ᵢ may be reduced to measurable compliance
Π may overconstrain toward the target
Γ may select the option that improves Φ but harms O
ℛ may repair the dashboard instead of the system

Gate Outcomes Affected

High Goodhart_risk should push gates toward:

Pause metric-based closure
Require Φ/O comparison
Require affected-node validation
Require hidden-debt audit
Require incentive audit
Require stress test
Deny proxy-only claims
Deny automated consequence from proxy alone
∅ for high-impact actuation based primarily on a metric that may be detached from O

13) Scaling Behavior

Goodhart_risk becomes more dangerous under scale because proxies become standardized, automated, rewarded, and defended.

As systems scale:

metrics gain authority
incentives align around measurable targets
local nuance is compressed
edge cases are excluded
gaming becomes systematic
dashboards replace direct observation
affected-node signal is filtered
proxy success becomes narrative legitimacy
metric definitions harden
automation propagates proxy logic
hidden labor supports metrics
metric criticism becomes costly
Φ becomes identity or canon
metric drift becomes difficult to reverse

Scaling Risks

metric monoculture
proxy inversion
benchmark overfitting
compliance theater
safety theater
repair theater
affected-node cost export
local adaptation loss
innovation exit
hidden debt accumulation
legitimacy shock
automation of proxy error
false success memory
metric immunity
O collapse under Φ success

Scaling Requirements

To scale metrics safely, systems need:

metric lineage
scope notes
O indicators
hidden debt indicators
affected-node validation
stress tests
feedback correction
anti-gaming audits
incentive audits
metric diversity
qualitative review
edge-case inclusion
proxy retirement rules
metric revision pathways
public/private metric comparison
U7 memory of metric limits

Scaling Rule

Proxy authority must scale only with evidence that the proxy continues to track O under stress, recurrence, and affected-node validation.

Sanity constraint:

textScroll

Φ authority ↑ without O validation ⇒ Goodhart_risk ↑

If a proxy gains authority without direct coherence validation, risk rises.

Second constraint:

textScroll

Φ↑ + H↑ ⇒ pseudo-success risk ↑

If the metric improves while hidden debt increases, success is likely false.

Third constraint:

textScroll

shared_metric + high Φ−O ⇒ systemic Goodhart risk ↑

If many nodes share a detached proxy, distortion can propagate system-wide.

14) Interaction / Coupling Behavior

Goodhart_risk reveals whether a coupling is organized around real coherence or shared proxy performance.

What It Reveals About Coupling

whether nodes coordinate around metric rather than reality
whether one node’s burden funds another’s score
whether shared metrics hide local harm
whether feedback can challenge proxy alignment
whether compatibility is measured or experienced
whether repair is done or merely counted
whether affected-node cost is excluded
whether shared targets propagate distortion

What It Reveals About Boundary Integrity

Metrics can cross boundaries faster than meaning.

When Goodhart_risk is high:

local boundaries may be overrun by metric goals
refusal may be treated as noncompliance
affected-node cost may be ignored
BΣ may degrade under target pressure
boundary repair may be counted without landing
metric authority may override consent or fit

What It Reveals About Compatibility

Compatibility requires proxy humility.

A coupling may be unsafe if:

textScroll

the shared metric improves only because one node absorbs hidden cost

or:

textScroll

the relation looks compatible on the dashboard but not in lived operation

Healthy compatibility uses metrics as signals, not sovereign truth.

Relevant Interface Acts

↺ Reflection: compare metric story to lived effect
⇩ Relaxation: reduce target pressure
⊘ Attenuation: reduce coupling around a distorted metric
⊙ Alignment: clarify what the metric is and is not
→? Invitation: invite affected-node validation
⚕︎ Restorative Override: requires post-action Φ/O audit
✕ Force: high risk when used to enforce proxy compliance

15) Failure Modes Detected

Primary Failure Modes

Goodhart_risk detects or predicts:

proxy inversion
metric capture
dashboard blindness
benchmark overfitting
compliance theater
repair theater
safety theater
legitimacy theater
affected-node cost export
hidden labor growth
local adaptation suppression
innovation exit
metric immunity
false success memory
proxy-based classification error
O collapse under Φ success
proxy-driven hidden debt

Composite Regimes Where Goodhart_risk Matters

Goodhart Collapse: direct regime
Pseudo-Coherent Basin: metric success stabilizes hidden debt
Repair Theater: repair metric replaces repair
Mission Lock: metric preserves trajectory
Taboo Lock: metric cannot be questioned
Extraction Regime: metric success hides exported cost
Coercive Fusion: one node is forced to serve another’s score
Crisis Loop: metric recovery hides recurring origin failure
LOS: latent operations maintain formal metric success

16) Accountability & Reintegration Implications

If Goodhart_risk Was Ignored

Likely consequences:

metrics improved while coherence degraded
affected nodes carried hidden cost
repair was counted but not completed
hidden labor increased
local adaptation was suppressed
innovation exited
official memory stored false success
legitimacy shock followed exposure
selected path optimized Φ over O
system had to rebuild trust in measurement

Accountability questions:

What was the proxy supposed to measure?
When did it become the target?
Did O improve with Φ?
Did H rise under metric success?
Who carried cost of improving the metric?
Was affected-node feedback included?
Did stress tests validate the metric?
Was the metric gamed?
Did repair land or only score improve?
Did the metric become immune to challenge?
Was U7 memory corrected after metric failure?

If Goodhart_risk Was Misread

Possible misread forms:

useful metric treated as corrupt
legitimate target discipline mistaken for proxy capture
early improvement dismissed before validation
qualitative discomfort treated as superior to data by default
metric revision mistaken for manipulation
bounded proxy use mistaken for totalizing proxy use
failing metric mistaken for failing reality
high O / low Φ state misread because metric is outdated
metric criticism used to avoid accountability

Required Restoration

When Goodhart failure is found:

textScroll

identify intended referent
→ compare Φ to O/H/affected-node cost
→ audit incentives and gaming
→ reduce proxy authority
→ redesign metric set
→ repair hidden burden
→ correct U7 success memory
→ validate under stress and recurrence

If proxy optimization burdened some nodes more than others, MS-Gate should review who gained score, who carried cost, and who received repair.

17) Cross-Domain Examples

Technical / Engineering

A team optimizes deployment frequency. Releases increase, but incidents, rework, and user disruption also increase.

Diagnostic implication: deployment count became proxy target detached from real delivery coherence.

Operator sequence: Φ/O audit → quality and incident metrics → affected-user validation → release process repair.

Institutional / Governance

A department optimizes case closure rate. Cases close faster, but unresolved harm and repeat complaints rise.

Diagnostic implication: closure metric replaced real remedy.

Operator sequence: recurrence audit → affected-node validation → closure criteria redesign → repair backlog review.

AI / Algorithmic

A model improves benchmark scores but performs worse on messy real user contexts.

Diagnostic implication: benchmark proxy became overfit.

Operator sequence: stress eval expansion → user-context validation → metric set redesign → U7 eval memory correction.

Interaction / Relational

A relationship uses “we are not fighting anymore” as the success metric, but truth and boundary repair are suppressed.

Diagnostic implication: low conflict became proxy for repair.

Operator sequence: pseudo-damping review → truth tolerance repair → boundary validation → recurrence check.

Archive / Framework Design

The archive tracks number of completed spec sheets, but glossary consistency and cross-link quality lag.

Diagnostic implication: completion count is outpacing real archive coherence.

Operator sequence: Φ/O archive audit → glossary/cross-link repair → status criteria revision → U7 version update.

18) Test Protocols

1. Referent Test

What reality is the metric supposed to represent?

Failure signal: no one can name the real referent.

2. Φ/O Test

Does proxy improvement correspond to coherence improvement?

Failure signal: Φ rises while O stagnates or falls.

3. Hidden Debt Test

Does H increase under metric success?

Failure signal: success creates deferred cost.

4. Affected-Node Test

Do affected nodes experience the metric improvement as real improvement?

Failure signal: dashboard improves while affected nodes worsen.

5. Incentive Test

What behavior does the metric reward?

Failure signal: rewarded behavior differs from coherent behavior.

6. Gaming Test

Can the metric be improved without improving reality?

Failure signal: easy gaming path exists.

7. Stress Test

Does the metric hold under stress or only benchmark conditions?

Failure signal: proxy fails under real load or edge cases.

8. Recurrence Test

Does recurrence decline when the metric improves?

Failure signal: metric improves but same issue returns.

9. Scope Test

Is the metric being used beyond its valid range?

Failure signal: local proxy becomes global truth.

10. Correction Test

Can feedback challenge the metric?

Failure signal: metric becomes immune to contradiction.

19) Anti-Patterns

Metric as reality
Score as coherence
Compliance as repair
Closure rate as restoration
Benchmark as safety
Low complaints as satisfaction
Low conflict as trust
Output as value
Speed as quality
Precision as truth
Dashboard as affected-node state
Metric improvement as legitimacy
Proxy criticism as anti-accountability
Gaming as efficiency
Hidden labor as productivity
Local adaptation as metric violation
Edge case as nuisance
Metric immunity
Public score as memory
Φ success as O success

20) Spec Validation Check

Is this truly a diagnostic, not an operator? Yes.
Does it measure state, capacity, risk, or response rather than act directly? Yes.
Does it map to S? Yes.
Are U-layers specified? Yes.
Are leading and lagging indicators separated? Yes.
Are interpretation risks defined? Yes.
Are operator sequencing implications clear? Yes.
Are gate implications clear? Yes.
Are scaling risks included? Yes.
Are interaction implications included? Yes.
Does it avoid new primitives? Yes.

Condensed Archive Summary

Goodhart_risk is the diagnostic estimate of whether a proxy, metric, target, score, benchmark, classification, dashboard, or optimization objective is becoming detached from the real coherence condition it was meant to represent. It does not reject metrics; it checks whether Φ still tracks O. High Goodhart_risk indicates risk of proxy inversion, metric capture, benchmark overfitting, dashboard blindness, compliance theater, repair theater, affected-node cost export, hidden labor growth, local adaptation suppression, innovation exit, metric immunity, false success memory, and O collapse under Φ success. Under high Goodhart risk, the system should pause proxy-based closure, compare Φ to O/H/affected-node state, audit incentives and gaming, reduce proxy authority, restore FI/Au, redesign metric sets, repair hidden burden, correct U7 success memory, and validate under stress and recurrence before scaling, automation, public certainty, or high-impact action.

Feedback Integrity

Registry

High Risk Gate Integrity