schema_version: "1.0"
id: "FM-BIO-023"
title: "FM-BIO-023 — Reward Engineering Gain Amplification"
slug: "fm-bio-023-reward-engineering-gain-amplification"
type: "failure_mode"
status: "draft"
version: "0.1.0"
last_updated: "2026-06-18"
summary: "Reward engineering gain amplification occurs when biological reward, reinforcement, motivation, craving, relief, pleasure, avoidance, or feedback pathways are artificially or structurally amplified beyond coherence, repair, boundary, timing, or restoration capacity."
canonical_url: "/archive/failure-modes/registry/biology/fm-bio-023-reward-engineering-gain-amplification"
citation_id: "FM-BIO-023-v0-1-0"
canon:
tier: "registry"
state: "draft"
source: "UTS — Failure Modes Registry"
source_id: "FM-BIO-023"
classification:
family: "failure-modes"
module: "biology"
module_group: "biology-medicine"
density: "advanced-reference"
audience:
- "UTS readers"
- "biology systems modelers"
- "medicine systems modelers"
- "restoration researchers"
- "health systems designers"
- "coherence researchers"
- "machine readers"
tags:
- "failure-modes"
- "biology"
- "biology-medicine"
- "reward-engineering-gain-amplification"
- "fm-bio-023-reward-engineering-gain-amplification"
- "reward"
- "gain"
- "reinforcement"
- "feedback"
- "restoration"
aliases:
- "Reward Engineering Gain Amplification"
- "Reward Gain Amplification"
- "Biological Reward Engineering"
- "Reinforcement Gain Failure"
- "Craving Gain Amplification"
- "Relief Loop Amplification"
- "Pleasure Signal Overweighting"
- "Avoidance Reward Lock"
- "Engineered Reward Basin"
- "Former FM-BIOX-022"
related:
laws:
* "Goodhart Collapse"
* "Success Proxy Substitution"
* "Hidden Debt Accumulation"
* "Compression Collapse"
* "Temporal Audit Asymmetry"
* "Restoration Starvation"
* "Boundary Collapse"
invariants:
* "Reward Must Remain Coherence-Governed"
* "Relief Is Not Restoration"
* "Reinforcement Must Not Outrun Repair"
* "Gain Must Be Damped by Long-Term Coherence"
* "Pleasure Signal Is Not Full-System Truth"
* "Reward Loops Require Exit and Integration"
operators:
* "Γ — Selection"
* "K — Constraint / Load"
* "O — Coherence"
* "H — Hidden Debt"
* "R — Restoration Capacity"
* "Φ — Flow / Phase"
* "Τ — Trajectory / Time"
* "BΣ — Boundary Integrity"
* "Ψ — Observation / Interface"
* "Au — Auditability"
* "ℛ — Restoration"
gates:
* "Gain Gate"
* "Restoration Gate"
* "Damping Gate"
* "Classifier Gate"
* "Boundary Gate"
* "Timing Gate"
* "Auditability Gate"
diagnostics:
* "Reward Gain"
* "Reinforcement Loop Strength"
* "Relief / Restoration Gap"
* "Damping Capacity"
* "Repair Capacity"
* "Boundary Integrity"
* "Hidden Burden"
* "Recurrence Pattern"
* "Coherence Level"
* "Time Validation"
failure_modes:
* "FM-CORE-002 — Hidden Debt Accumulation"
* "FM-CORE-003 — Success Proxy Substitution"
* "FM-CORE-004 — Auditability Collapse"
* "FM-BIO-001 — Chronic Low-Coherence Basin"
* "FM-BIO-002 — Wrong-Solution Basin"
* "FM-BIO-003 — False Recovery"
* "FM-BIO-004 — Energy-First Compression"
* "FM-BIO-008 — Signal Flood"
* "FM-BIO-009 — Threshold Stack Overload"
* "FM-BIO-011 — Biological Inversion / Pseudo-Health"
* "FM-BIO-017 — Chronic Urgency Tone"
* "FM-BIO-022 — Timing Failure"
* "FM-BIO-024 — Burden Opacity"
* "FM-BIO-026 — Distortion Normalization"
restoration_arcs:
* "Reward Loop Recalibration"
* "Gain Reduction"
* "Signal Damping Restoration"
* "Relief / Restoration Separation"
* "Repair Capacity Rebuild"
* "Boundary Repair"
* "Staged Slack Restoration"
* "Origin-Layer Repair"
* "Time-Validated Restoration"
modules:
* "Biology / Medicine"
* "Coherence"
* "Restoration"
* "Cybernetics"
* "Scaling"
* "Diagnostics"
* "Meta Theory"
navigation:
order: 623
parent: "failure-modes"
visible: true
provenance:
created_from: "failure-mode-registry-production"
source_thread: "UTS Failure Modes Registry production"
previous_id: "FM-BIOX-022"
renumbered_as: "FM-BIO-023"
source_file: "content/archive/failure-modes/registry/biology/fm-bio-023-reward-engineering-gain-amplification.md"
notes: "Former BIOX series entry migrated into unified FM-BIO numbering. Non-clinical and mapping-first."
entry:
failure_mode_id: "FM-BIO-023"
failure_family: "Biology / Medicine"
production_treatment: "Standalone Entry"
first_gate_failure: "Gain Gate"
primary_hidden_debt: "Hidden debt accumulates when reward, relief, pleasure, craving, avoidance, or reinforcement loops are amplified beyond the system's capacity to dampen, integrate, repair, clear, and maintain whole-system coherence."
primary_inversion: "Reward intensity, relief, or reinforcement success is mistaken for restoration, even when the amplified loop is increasing burden, dependency, threshold sensitivity, or coherence loss."
primary_boundary_pattern: "The boundary between coherent reward and engineered reinforcement collapses; reward signals cross into command authority and begin selecting behavior against long-term restoration."
primary_signature: "Reward gain rises; damping weakens; relief is overvalued; recurrence increases; repair and clearance lag; hidden burden accumulates; coherence becomes reward-dependent or brittle."
FM-BIO-023 — Reward Engineering Gain Amplification
Status: Draft
Archive Type: Failure Mode
System: Universal Theory Stack
Parent: Failure Modes
Canon Tier: Registry
Registry: Failure Modes Registry
Entry ID: FM-BIO-023
Former ID: FM-BIOX-022
Family: Biology / Medicine
0. Non-Clinical Scope Note
This entry is non-clinical and mapping-first.
It does not diagnose, treat, or prescribe for medical conditions. It names a UTS system pattern that may be used for conceptual modeling of biological, physiological, reward-system, reinforcement, behavior-loop, signal-processing, or restoration dynamics.
1. Definition
Reward engineering gain amplification occurs when biological reward, reinforcement, motivation, craving, relief, pleasure, avoidance, urgency relief, novelty, or feedback pathways are artificially, environmentally, structurally, or recurrently amplified beyond coherence, repair, boundary, timing, damping, or restoration capacity.
The system receives a strong signal that says:
repeat this
approach this
avoid that
relieve this
prioritize this nowBut the strength of the signal may no longer correspond to whole-system restoration.
The core failure is:
reward gain↑
coherence governance↓
reinforcement loop strengthens
hidden burden↑Reward engineering gain amplification is a biological expression of gain failure, Goodhart collapse, and success proxy substitution.
It appears when reward intensity, relief, craving, repetition, or short-term reinforcement becomes more authoritative than long-term coherence.
In UTS terms, the reward signal becomes over-weighted.
The system starts optimizing for the signal that feels like success instead of the state that restores the system.
2. Core Pattern
The core pattern is:
- A living system encounters a reward, relief, novelty, pleasure, avoidance, or reinforcement signal.
- The signal produces a change in attention, motivation, repetition, or prioritization.
- External conditions, internal loops, environment design, compensation, craving, urgency, or relief amplify the signal.
- The amplified reward signal becomes more salient than its actual restoration value.
- The system repeats the behavior or state that produces the signal.
- Damping, boundary integrity, timing, and long-term coherence checks weaken.
- Repair capacity, clearance, adaptation, or origin-layer restoration may be bypassed.
- Relief or reward is mistaken for resolution.
- Hidden burden accumulates beneath the reinforced loop.
- The loop becomes more stable because the signal selects itself.
This failure mode often appears when the system learns:
this feels betterand then compresses that into:
this is betterThe failure is not that reward exists.
The failure is that reward gain outruns coherence governance.
3. Failure Signature
Typical signature:
reward gain↑
reinforcement loop↑
damping capacity↓
relief / restoration gap↑
repair bypassed
recurrence↑
H↑
O unstableExtended signature:
pleasure or relief receives excessive priority
avoidance becomes rewarding
short-term signal overrides long-term coherence
novelty gain distorts selection
repetition strengthens the loop
reward signal narrows attention
repair capacity is spent maintaining reward access
boundary and timing checks weakenCommon forms:
relief is mistaken for repair
pleasure signal is mistaken for coherence
avoidance becomes self-reinforcing
short-term improvement produces long-term burden
the system repeats a loop because the signal is strong
damping cannot reduce reward salience
a behavior becomes selected because it quiets urgency
reward access becomes more important than restoration accessThe key diagnostic is whether reward signal strength remains aligned with whole-system coherence.
4. Primary U-Layer Origin
Common origin layers:
- U1 — Power / Budgets: Energy, attention, time, or resources are diverted toward reward access or relief maintenance.
- U2 — Configuration / Boundaries: Boundaries weaken around high-gain reward signals or avoidance loops.
- U3 — Execution: Repeated behavior or biological response becomes reinforced even when restoration does not occur.
- U4 — Information / Truth: Reward, relief, or pleasure is misclassified as coherence or recovery.
- U5 — Coordination / Time: Short-term reward overrides delayed cost and long-term restoration timing.
- U6 — Coherence Field: Whole-system coherence becomes organized around reinforced signal loops.
- U7 — Memory / Recurrence: Rewarded patterns become recurrent basins.
Common manifestation layers:
- U3 — Execution: The system repeats the reinforced action or state.
- U4 — Information / Truth: Reward signal becomes a false success indicator.
- U5 — Coordination / Time: Short-term relief dominates long-term repair.
- U6 — Coherence Field: The system becomes reward-dependent or reward-shaped.
Reward engineering gain amplification is primarily a gain-governance failure.
The reward signal becomes too strong relative to the coherence checks that should govern it.
5. Typical Development Sequence
A common development sequence is:
- A biological system encounters burden, discomfort, deficit, uncertainty, low coherence, or unresolved demand.
- A reward, relief, pleasure, novelty, avoidance, or urgency-reduction signal appears.
- The signal produces short-term improvement, relief, repetition, or prioritization.
- The system begins selecting for the signal.
- Gain increases through repetition, environment design, availability, craving, urgency, or reinforcement.
- The system’s classifier treats the reward or relief signal as evidence of success.
- Damping weakens because the loop feels useful or necessary.
- Repair, clearance, boundary repair, or origin-layer restoration is delayed.
- Hidden burden accumulates beneath the reinforced loop.
- The system becomes more dependent on the reward signal to maintain apparent coherence.
- Recurrence increases when reward access drops or when delayed burden appears.
- Restoration requires lowering gain, separating relief from repair, and rebuilding coherence-governed selection.
This sequence often produces the loop:
burden → reward / relief → repetition → gain amplification → repair bypass → more burdenAnother common loop is:
urgency → relief signal → reward selection → urgency returns → stronger relief seekingThe system becomes organized around repeatedly reducing signal pressure instead of repairing the origin-layer source.
6. Diagnostic Markers
Diagnostic markers include:
- Relief is repeatedly mistaken for restoration.
- Reward or pleasure signal receives disproportionate priority.
- Avoidance becomes self-reinforcing because it reduces discomfort or urgency.
- The same loop repeats despite long-term coherence cost.
- Damping becomes difficult once reward access is available.
- The system’s attention narrows around the reward or relief pathway.
- Short-term improvement is followed by recurrence or hidden burden.
- Repair capacity is consumed by maintaining access to the reward loop.
- Boundary checks weaken under high-gain reward conditions.
- Timing checks fail because immediate relief overrides delayed cost.
- The system becomes less tolerant of non-reward restoration pathways.
- Reward intensity increases while baseline coherence does not.
- Time validation shows the reward loop does not reduce recurrence.
Useful diagnostics:
- Reward Gain: Measures signal strength relative to whole-system restoration value.
- Reinforcement Loop Strength: Tracks repetition, dependency, and self-selection.
- Relief / Restoration Gap: Separates short-term relief from durable repair.
- Damping Capacity: Tests whether reward salience can decrease without rebound.
- Repair Capacity: Measures whether restoration improves or is bypassed.
- Boundary Integrity: Checks whether reward signal bypasses limits or consent of the whole system.
- Hidden Burden: Tracks unresolved load beneath reward-driven behavior.
- Recurrence Pattern: Tests whether the same loop returns.
- Coherence Level: Evaluates whether reward improves whole-system organization.
- Time Validation: Confirms whether benefits persist beyond immediate reinforcement.
7. Related Gates
Relevant gates include:
- Gain Gate: Fails when reward signal strength exceeds coherence governance and damping.
- Restoration Gate: Fails when reward or relief substitutes for origin-layer repair.
- Damping Gate: Fails when high-gain reward cannot decay or be deprioritized.
- Classifier Gate: Fails when reward is misclassified as coherence, repair, or true need.
- Boundary Gate: Fails when reward pathways bypass system limits or coherent containment.
- Timing Gate: Fails when short-term reinforcement overrides delayed restoration.
- Auditability Gate: Fails when the system cannot distinguish relief, pleasure, avoidance, and restoration.
The first common gate failure is usually the Gain Gate.
The reward signal becomes stronger than the system’s ability to govern it.
8. Related Operators
Relevant operators include:
- Γ — Selection: Selects rewarded patterns and reinforces repetition.
- K — Constraint / Load: Rises as reward loops create dependency, burden, or narrowed options.
- O — Coherence: Declines when reward signal overrides whole-system viability.
- H — Hidden Debt: Accumulates when relief replaces repair.
- R — Restoration Capacity: Is bypassed or consumed by loop maintenance.
- Φ — Flow / Phase: Governs reward timing, recurrence, and relief cycles.
- Τ — Trajectory / Time: Reveals delayed cost and recurrence.
- BΣ — Boundary Integrity: Governs whether reward remains contained within coherent limits.
- Ψ — Observation / Interface: Determines which reward signals become salient or visible.
- Au — Auditability: Declines when reward intensity becomes proof of value.
- ℛ — Restoration: Requires coherence-governed selection and repair.
Reward engineering gain amplification often follows this operator pattern:
burden or desire signal appears
reward / relief pathway activates
Γ selects reinforced loop
gain↑
damping↓
R bypassed
H accumulates
Τ reveals recurrence
O unstable9. Related Laws and Invariants
Related Laws
- Goodhart Collapse: The reward signal becomes the target and diverges from whole-system value.
- Success Proxy Substitution: Relief, pleasure, or reinforcement replaces restoration as proof of success.
- Hidden Debt Accumulation: Burden persists when reward loops bypass repair.
- Compression Collapse: High-gain reinforcement compresses attention, timing, and choice space.
- Temporal Audit Asymmetry: Short-term reward hides delayed cost.
- Restoration Starvation: Repair is starved when reward access consumes capacity.
- Boundary Collapse: Reward pathways bypass limits, containment, or coherent filtering.
Related Invariants
- Reward Must Remain Coherence-Governed: Reward is useful only when subordinate to whole-system viability.
- Relief Is Not Restoration: Reduced pressure does not prove burden resolution.
- Reinforcement Must Not Outrun Repair: Repetition without restoration increases debt.
- Gain Must Be Damped by Long-Term Coherence: High reward intensity requires stronger timing and damping checks.
- Pleasure Signal Is Not Full-System Truth: Positive signal is partial evidence, not sovereign proof.
- Reward Loops Require Exit and Integration: Repetition must resolve, not merely continue.
10. Common False Positives
Not every reward, pleasure, relief, or repeated behavior is reward engineering gain amplification.
Common false positives include:
- Healthy reward that supports repair, learning, adaptation, or coherence.
- Temporary relief that accurately reflects reduced burden.
- Pleasure or motivation that remains boundary-governed.
- Repetition that strengthens coherent function.
- Reward pathways that decay appropriately after use.
- Avoidance of a genuinely incoherent input.
- High motivation that remains aligned with restoration.
- A reward loop that improves coherence under time validation.
Clarifying rule:
This is not reward engineering gain amplification unless reward, relief, pleasure, craving, avoidance, novelty, or reinforcement signal strength exceeds the system’s damping, boundary, timing, repair, or coherence-governance capacity.
11. Common False Repairs
Common false repairs include:
- suppressing all reward instead of recalibrating gain
- treating relief as full restoration
- intensifying reward to overcome low coherence
- replacing one high-gain loop with another
- moralizing the reward signal instead of mapping the reinforcement geometry
- ignoring delayed burden because short-term relief is strong
- treating craving or urgency as proof of need
- removing reward access without restoring repair capacity
- forcing abstention without restoring underlying coherence architecture
- increasing output while reward loops consume repair capacity
- optimizing a reward marker while whole-system coherence declines
- ignoring boundary and timing checks around reinforced behavior
False repair often produces the loop:
reward loop → suppression → hidden burden remains → rebound urgency → stronger reward seekingAnother common loop is:
low coherence → relief signal → relief mistaken for repair → recurrence → higher gain relief seekingThe system either obeys the reward loop or attacks it, while the deeper reinforcement geometry remains unrepaired.
12. Restoration Direction
Restoration requires reducing reward gain, separating relief from restoration, rebuilding damping, restoring boundary and timing checks, and reorienting selection toward whole-system coherence.
Primary restoration direction:
reduce reward gain,
separate relief from restoration,
restore damping and boundaries,
and reorient reinforcement toward coherenceA fuller restoration path includes:
- Map the reward loop. Identify the reward, relief, novelty, avoidance, craving, or reinforcement pathway.
- Measure gain. Determine how strongly the signal selects attention, action, repetition, or priority.
- Separate relief from repair. Distinguish immediate signal reduction from origin-layer restoration.
- Identify hidden burden. Map what the reward loop masks, delays, or bypasses.
- Restore damping. Rebuild the ability for reward salience to decline without rebound.
- Repair boundaries. Reestablish limits around reward access, repetition, and reinforcement.
- Restore timing. Prevent immediate reward from overriding long-term restoration windows.
- Rebuild repair capacity. Strengthen non-reward pathways of restoration.
- Reorient selection. Reward coherence, integration, clearance, and durable repair rather than loop repetition.
- Validate across time. Confirm reduced recurrence, lower hidden burden, and stable coherence without high-gain dependency.
A valid restoration path should reduce:
reward over-weighting
relief / restoration gap
reinforcement dependency
damping failure
boundary bypass
timing distortion
hidden burden
recurrence
coherence brittleness
reward-driven audit opacityReward engineering gain amplification is not repaired by removing all reward.
It is repaired when reward becomes a coherent signal again instead of a command system.
13. Cross-Module Links
- Biology / Medicine: Standalone expression of reward, reinforcement, relief, and biological gain distortion.
- Coherence: Shows how reward can diverge from whole-system coherence.
- Restoration: Requires relief / restoration separation, damping, boundary repair, and time validation.
- Cybernetics: Appears as gain amplification, reward hacking, Goodhart collapse, and feedback capture.
- Scaling: Reward amplification becomes more destabilizing as availability, intensity, speed, and repetition increase.
- Diagnostics: Requires distinguishing reward intensity from durable restoration.
- Meta Theory: Demonstrates that positive feedback must remain subordinate to coherence.
14. Relationship to Parent / Child Modes
Production treatment: Standalone Entry
This mode maps upward to:
- FM-CORE-002 — Hidden Debt Accumulation
- FM-CORE-003 — Success Proxy Substitution
- FM-CORE-004 — Auditability Collapse
- FM-BIO-002 — Wrong-Solution Basin
- FM-BIO-003 — False Recovery
- FM-BIO-004 — Energy-First Compression
- FM-BIO-017 — Chronic Urgency Tone
Sibling or related Biology / Medicine modes include:
- FM-BIO-001 — Chronic Low-Coherence Basin
- FM-BIO-008 — Signal Flood
- FM-BIO-009 — Threshold Stack Overload
- FM-BIO-011 — Biological Inversion / Pseudo-Health
- FM-BIO-012 — Phase Error
- FM-BIO-018 — Artifact Signal Inversion
- FM-BIO-021 — Biological Clearance Failure
- FM-BIO-022 — Timing Failure
- FM-BIO-024 — Burden Opacity
- FM-BIO-026 — Distortion Normalization
Aliases preserved from source material:
- Reward Engineering Gain Amplification
- Reward Gain Amplification
- Biological Reward Engineering
- Reinforcement Gain Failure
- Craving Gain Amplification
- Relief Loop Amplification
- Pleasure Signal Overweighting
- Avoidance Reward Lock
- Engineered Reward Basin
- Former FM-BIOX-022
15. Minimal Entry Version
Definition: Reward engineering gain amplification occurs when biological reward, reinforcement, motivation, craving, relief, pleasure, avoidance, or feedback pathways are artificially or structurally amplified beyond coherence, repair, boundary, timing, or restoration capacity.
Signature:
reward gain↑
reinforcement loop↑
damping capacity↓
relief / restoration gap↑
repair bypassed
recurrence↑
H↑
O unstableRestoration direction:
- map the reward loop
- measure gain
- separate relief from repair
- identify hidden burden
- restore damping
- repair boundaries
- restore timing
- rebuild repair capacity
- reorient selection
- validate across time
16. Machine-Readable Summary
failure_mode:
id: "FM-BIO-023"
name: "Reward Engineering Gain Amplification"
family: "Biology / Medicine"
production_treatment: "Standalone Entry"
previous_id: "FM-BIOX-022"
primary_failure: "Reward, relief, pleasure, craving, avoidance, novelty, or reinforcement signal strength exceeds the system's damping, boundary, timing, repair, or coherence-governance capacity."
source: "UTS — Failure Modes Registry"
source_id: "FM-BIO-023"
scope_note: "Non-clinical and mapping-first; does not diagnose or treat medical conditions."
aliases:
- "Reward Engineering Gain Amplification"
- "Reward Gain Amplification"
- "Biological Reward Engineering"
- "Reinforcement Gain Failure"
- "Craving Gain Amplification"
- "Relief Loop Amplification"
- "Pleasure Signal Overweighting"
- "Avoidance Reward Lock"
- "Engineered Reward Basin"
- "Former FM-BIOX-022"
signature:
- "reward gain↑"
- "reinforcement loop↑"
- "damping capacity↓"
- "relief / restoration gap↑"
- "repair bypassed"
- "recurrence↑"
- "H↑"
- "O unstable"
primary_layers:
origin:
- "U1 — Power / Budgets"
- "U2 — Configuration / Boundaries"
- "U3 — Execution"
- "U4 — Information / Truth"
- "U5 — Coordination / Time"
- "U6 — Coherence Field"
- "U7 — Memory / Recurrence"
manifestation:
- "U3 — Execution"
- "U4 — Information / Truth"
- "U5 — Coordination / Time"
- "U6 — Coherence Field"
state_variables:
- "Γ"
- "K"
- "O"
- "H"
- "R"
- "Φ"
- "Τ"
- "BΣ"
- "Ψ"
- "Au"
first_gate_failure: "Gain Gate"
restoration:
- "Reward Loop Recalibration"
- "Gain Reduction"
- "Signal Damping Restoration"
- "Relief / Restoration Separation"
- "Repair Capacity Rebuild"
- "Boundary Repair"
- "Staged Slack Restoration"
- "Origin-Layer Repair"
- "Time-Validated Restoration"