Energy-Based Causal Memory
A common assumption hides inside most software systems. Memory is treated like a box. Put something in, and it stays there until someone deletes it. The past is assumed to be stable by default.
Reality does not work like that. Keeping information is not free. It must be maintained. Forgetting is not a bug, it is the baseline.
To remember is to spend.
This is an architectural constraint, not a design choice.
Time is the cost of keeping a past
There is a price to irreversible operations. Erasing information, merging states, committing to a single history — all of these create a one-way arrow. That arrow is what shows up as time.
So memory cannot be treated as static storage. Memory is a process that continuously pays to preserve structure, and continuously forgets what is not reinforced.
This immediately reframes the goal. The goal is not "store everything forever." The goal is:
Preserve what matters, by paying for it.
Why current LLM "memory" feels broken
Large language models operate on a context window. That creates a definition of memory that is purely positional: what is inside the window is "present," what falls outside is "gone." The past is not metabolized — it is clipped.
When teams patch this with retrieval-augmented generation (RAG), the dominant move is similarity search: find text chunks that look related, paste them back into the window. This works for facts. It breaks for understanding.
Because similarity is not relationship.
Similarity says: these two things look alike. Memory should say: this led to this.
That difference is the entire problem.
Causality is Memory
Similarity is useful. It can wake the past. It can pull relevant material toward the present. But similarity is not what makes a memory a memory.
A memory is a causal structure laid down in time:
- what co-occurred
- what preceded what
- what decision produced what consequence
- what assumption broke where
- what invariant survived
This is why real memory is not a list of stored facts. It is a graph of connections. The past becomes usable when it is linked, not when it is merely retrieved.
Causality is memory. Similarity is only the trigger.
The Quadratic Wall
There is a deeper cost behind context windows. When a model tries to attend to everything at once, the amount of pairwise interaction grows rapidly as the window grows (quadratic complexity).
That is the hidden tax of "just make the context bigger."
So there are two failures happening at once:
- The past is clipped, so continuity breaks
- When the past is expanded, the cost explodes
This is not a parameter problem. It is a memory architecture problem.
Infinite Context via Energy Allocation
The phrase "infinite context" is misleading if it sounds like infinite tokens. Infinite context means something else:
- The full lived history can grow without bound
- But only a bounded, energy-funded subset is active at any moment
- What becomes active is determined by causal relevance, not by recency alone
So context becomes a dynamic state, not a transcript. The past is present only when it is earned — meaning when it is activated by the current situation and paid for by energy.
The Mechanics
| Mechanism | What It Does |
|---|---|
| Energy Decay | All memories lose energy over time. Unused memories fade. |
| Causal Reinforcement | Connected memories share energy. Structure sustains structure. |
| Resonance | Current input adds energy to causally relevant past nodes. |
| Active Frontier | Only memories above energy threshold enter working context. |
This is how infinite context works: bounded active frontier over unbounded archive.
The Memory Cycle
Each time step follows a natural rhythm:
- Input — New information enters, creates a node with high energy
- Wake — The input resonates with the archive, exciting relevant past nodes
- Think — The active set (above threshold) is processed
- Wire — Co-occurring nodes strengthen their connections (Hebbian learning: what fires together, wires together)
- Cool — All nodes lose energy (decay)
- Sleep — Nodes below threshold drop out of the active set
This is a self-organizing system. We do not write code to "delete" memories. We define the cost of energy, and the system naturally forgets what it does not use.
Structure Over Substance
Isolated facts are unstable. Facts connected to a larger structure are stable.
When "Physics" is active, it transfers energy to "Math" (connected neighbor), keeping it alive even if Math isn't directly mentioned. Isolated nodes ("fluff") lose energy and fade.
The labels of a thing are gauge. The relations are real.
This is why causal memory is a graph, not a list. The edges matter more than the nodes.
Mainland vs Tissue
Not all knowledge should compete inside the same pool.
Mainland: Stable truths — physics, mathematics, definitions, established constraints, conserved quantities, evidences that have survived repeated tests.
Tissue: Living, evolving narrative — conversations, projects, relationships, local assumptions, local goals.
If these are mixed naively:
- Global concepts hijack local context (they're always "relevant" in some abstract sense)
- Internally consistent but false worlds can become self-reinforcing
Energy-based memory treats this as a stability problem:
- Mainland knowledge acts as constraints, not competitors for attention
- Tissue forms, decays, and reconfigures under causal reinforcement
- False worlds cannot become "immortal" just because they're internally coherent
Memory is not only about recall. Memory is about governance.
Why Trits Matter Here
A major reason memory systems stay noisy is that their substrate cannot represent "nothing" cleanly. Floating point representations treat "almost zero" as "something." That "something" still occupies space, still interacts, still burns energy.
With trits, each coordinate is one of three states:
- Supporting evidence (+1)
- Opposing evidence (−1)
- Nothing at this resolution (0)
This makes "nothingness" a first-class state. In sparse form, "nothing" is literally free — it is not even written down. Energy is spent only on active distinctions.
This matters because energy-based memory is an energy allocation system. A representation that makes silence free is aligned with forgetting as the default.
The Collapse
| Old View | New View |
|---|---|
| Memory is storage | Memory is metabolism |
| Past persists by default | Past decays unless paid for |
| Retrieval = similarity search | Similarity wakes, causality decides |
| More context = bigger window | More context = bounded frontier over unbounded archive |
| All knowledge in one pool | Mainland constraints + living tissue |
From "how much can be stored" to "what deserves to stay alive."
Memory Respects Structure
Memory operations must be representation-independent:
| Memory Operation | Requirement |
|---|---|
| Retrieval | Results depend on causal structure, not label encoding |
| Activation | What becomes "hot" is determined by structural relevance |
| Decay | What "cools" is based on lack of causal reinforcement |
| Consolidation | Stable truths are canonical, not representation-dependent |
Two memory states are equivalent if they produce the same outcomes for all valid queries. This ensures memory doesn't create hidden channels through label-dependent retrieval.
Prerequisites: Trits: The Distinction Alphabet
Next: Gauge-Invariant Truth Machine (GITM) — See the full architecture