Skip to main content

Energy-Based Causal Memory

A common assumption hides inside most software systems. Memory is treated like a box. Put something in, and it stays there until someone deletes it. The past is assumed to be stable by default.

Reality does not work like that. Keeping information is not free. It must be maintained. Forgetting is not a bug, it is the baseline.

This is an architectural constraint, not a design choice.


Time is the cost of keeping a past

There is a price to irreversible operations. Erasing information, merging states, committing to a single history — all of these create a one-way arrow. That arrow is what shows up as time.

So memory cannot be treated as static storage. Memory is a process that continuously pays to preserve structure, and continuously forgets what is not reinforced.

This immediately reframes the goal. The goal is not "store everything forever." The goal is:

Preserve what matters, by paying for it.


Why current LLM "memory" feels broken

Large language models operate on a context window. That creates a definition of memory that is purely positional: what is inside the window is "present," what falls outside is "gone." The past is not metabolized — it is clipped.

When teams patch this with retrieval-augmented generation (RAG), the dominant move is similarity search: find text chunks that look related, paste them back into the window. This works for facts. It breaks for understanding.

Because similarity is not relationship.

Similarity says: these two things look alike. Memory should say: this led to this.

That difference is the entire problem.


Causality is Memory

Similarity is useful. It can wake the past. It can pull relevant material toward the present. But similarity is not what makes a memory a memory.

A memory is a causal structure laid down in time:

  • what co-occurred
  • what preceded what
  • what decision produced what consequence
  • what assumption broke where
  • what invariant survived

This is why real memory is not a list of stored facts. It is a graph of connections. The past becomes usable when it is linked, not when it is merely retrieved.


The Quadratic Wall

There is a deeper cost behind context windows. When a model tries to attend to everything at once, the amount of pairwise interaction grows rapidly as the window grows (quadratic complexity).

That is the hidden tax of "just make the context bigger."

So there are two failures happening at once:

  1. The past is clipped, so continuity breaks
  2. When the past is expanded, the cost explodes

This is not a parameter problem. It is a memory architecture problem.


Infinite Context via Energy Allocation

The phrase "infinite context" is misleading if it sounds like infinite tokens. Infinite context means something else:

  • The full lived history can grow without bound
  • But only a bounded, energy-funded subset is active at any moment
  • What becomes active is determined by causal relevance, not by recency alone

So context becomes a dynamic state, not a transcript. The past is present only when it is earned — meaning when it is activated by the current situation and paid for by energy.

The Mechanics

MechanismWhat It Does
Energy DecayAll memories lose energy over time. Unused memories fade.
Causal ReinforcementConnected memories share energy. Structure sustains structure.
ResonanceCurrent input adds energy to causally relevant past nodes.
Active FrontierOnly memories above energy threshold enter working context.

This is how infinite context works: bounded active frontier over unbounded archive.

The Memory Cycle

Each time step follows a natural rhythm:

  1. Input — New information enters, creates a node with high energy
  2. Wake — The input resonates with the archive, exciting relevant past nodes
  3. Think — The active set (above threshold) is processed
  4. Wire — Co-occurring nodes strengthen their connections (Hebbian learning: what fires together, wires together)
  5. Cool — All nodes lose energy (decay)
  6. Sleep — Nodes below threshold drop out of the active set

This is a self-organizing system. We do not write code to "delete" memories. We define the cost of energy, and the system naturally forgets what it does not use.


Structure Over Substance

Isolated facts are unstable. Facts connected to a larger structure are stable.

When "Physics" is active, it transfers energy to "Math" (connected neighbor), keeping it alive even if Math isn't directly mentioned. Isolated nodes ("fluff") lose energy and fade.

The labels of a thing are gauge. The relations are real.

This is why causal memory is a graph, not a list. The edges matter more than the nodes.


Mainland vs Tissue

Not all knowledge should compete inside the same pool.

Mainland: Stable truths — physics, mathematics, definitions, established constraints, conserved quantities, evidences that have survived repeated tests.

Tissue: Living, evolving narrative — conversations, projects, relationships, local assumptions, local goals.

If these are mixed naively:

  • Global concepts hijack local context (they're always "relevant" in some abstract sense)
  • Internally consistent but false worlds can become self-reinforcing

Energy-based memory treats this as a stability problem:

  • Mainland knowledge acts as constraints, not competitors for attention
  • Tissue forms, decays, and reconfigures under causal reinforcement
  • False worlds cannot become "immortal" just because they're internally coherent

Memory is not only about recall. Memory is about governance.


Why Trits Matter Here

A major reason memory systems stay noisy is that their substrate cannot represent "nothing" cleanly. Floating point representations treat "almost zero" as "something." That "something" still occupies space, still interacts, still burns energy.

With trits, each coordinate is one of three states:

  • Supporting evidence (+1)
  • Opposing evidence (−1)
  • Nothing at this resolution (0)

This makes "nothingness" a first-class state. In sparse form, "nothing" is literally free — it is not even written down. Energy is spent only on active distinctions.

This matters because energy-based memory is an energy allocation system. A representation that makes silence free is aligned with forgetting as the default.


The Collapse

Old ViewNew View
Memory is storageMemory is metabolism
Past persists by defaultPast decays unless paid for
Retrieval = similarity searchSimilarity wakes, causality decides
More context = bigger windowMore context = bounded frontier over unbounded archive
All knowledge in one poolMainland constraints + living tissue

Memory Respects Structure

Memory operations must be representation-independent:

Memory OperationRequirement
RetrievalResults depend on causal structure, not label encoding
ActivationWhat becomes "hot" is determined by structural relevance
DecayWhat "cools" is based on lack of causal reinforcement
ConsolidationStable truths are canonical, not representation-dependent

Two memory states are equivalent if they produce the same outcomes for all valid queries. This ensures memory doesn't create hidden channels through label-dependent retrieval.


Prerequisites: Trits: The Distinction Alphabet

Next: Gauge-Invariant Truth Machine (GITM) — See the full architecture