ML-Master 2.0 holds the state of the art on OpenAI's MLE-Bench at a 56.44 percent medal rate under a 24-hour budget. The architectural claim it makes is specific. The gain is attributed to a memory architecture, not the underlying model. The architecture is called Hierarchical Cognitive Caching. The paper credits computer cache hierarchies as the inspiration.

That is the wrong borrow. Computer cache hierarchies optimize for latency. The actual problem HCC is solving is interference under sparse feedback over multi-day horizons. That is not a latency problem. That is the brain's problem, and a 1995 paper in Psychological Review solved it, named it, and provided the empirical constraints HCC currently lacks.

The paper is McClelland, McNaughton, and O'Reilly, "Why There Are Complementary Learning Systems in the Hippocampus and Neocortex," Psychological Review 102(3): 419 to 457, 1995. Updated in Kumaran, Hassabis, and McClelland 2016, with quantitative formalization continuing through Schapiro et al. and the Go-CLS framework published in Nature Neuroscience in 2023. The architecture is called Complementary Learning Systems. ML-Master 2.0 rebuilt it from scratch, gave it a different name, and credited a different field for the inspiration. The structural isomorphism is exact, not analogical.


The Structural Mapping

CLS posits two complementary learning systems with separate roles, separate write rates, and a transfer operator between them. The hippocampus is the fast-writing, sparse, pattern-separated, episodic store. New experiences are encoded there immediately, with low overlap to existing representations. The neocortex is the slow-writing, overlapping, distributed, statistical-structure store. It accumulates regularities across many episodes by integrating them gradually.

The transfer between the two systems happens via hippocampal replay during off-task periods, with replay schedules chosen to avoid overwriting prior cortical structure. The replay is not random rehearsal. It is the architectural mechanism that lets the slow store change a small amount on each reinstatement without destroying earlier-acquired regularities.

ML-Master 2.0's HCC has the same shape, with one extra tier the paper makes explicit. Tier one holds transient execution traces from the agent's recent task work. Tier two holds distilled stable knowledge derived from those traces. A third tier holds cross-task wisdom built up across many tier-two consolidations. The transfer between tiers is a distillation operator that runs during planning and reflection phases of the agent's loop.

The mapping is one to one.

HCC tier one (transient execution traces) maps onto the CLS hippocampus: fast-writing, episode-scoped, designed to absorb new information without contaminating older structure.

HCC tier two (distilled stable knowledge) maps onto the CLS neocortex: slow-writing, overlapping, the place where statistical regularities accumulate.

HCC's cross-task wisdom tier maps onto CLS systems-level consolidation: the longer-horizon process by which episodic content becomes semantic structure that survives without continued hippocampal scaffolding.

The HCC distillation operator maps onto hippocampal replay: the mechanism that lets cortical synapses change a small amount on each reinstatement without overwriting prior structure.

Same layer count plus an explicit consolidation tier. Same fast/slow tradeoff. Same replay-based transfer. Same target failure mode: catastrophic interference under sequential learning.

The paper attributes the architecture to "computer systems caches." Cache hierarchies optimize a different objective. Caches minimize latency. They do not solve interference, because there is no interference to solve in the cache problem statement. The actual problem ML-Master 2.0 is solving is the brain's problem. The architecture is the brain's architecture, with a different name.


The Three Constraints HCC Lacks

Naming a structural rediscovery is interesting. The reason it matters operationally is that the older field solved the problem with measured parameters that the AI version has not yet specified. CLS made three quantitative predictions in 1995 that HCC's current design does not honor. Each is testable. Each is a place the next iteration of HCC can climb past 56.44 percent or hit a ceiling McClelland predicted.

Sparsity ratios in the fast-writing tier. CLS predicts that pattern separation in the hippocampus requires sparse coding. Two similar experiences must be stored as non-overlapping representations, or new encoding will partially overwrite old encoding through shared activation patterns. The empirical sparsity ratio in the dentate gyrus (the canonical pattern-separation circuit) is on the order of one to three percent active per pattern. The exact number is task-dependent and species-dependent, and the connectionist literature has tracked it for three decades. HCC's transient execution-trace tier has no specified sparsity discipline. The traces are written without an interference-aware pattern-separation mechanism. CLS predicts this exact configuration produces catastrophic interference once the trace volume crosses a threshold. The threshold is calculable from the encoding capacity of the tier. ML-Master 2.0 has not published one.

Interleaving schedules in replay. This is the load-bearing finding of the 1995 paper, and it is the one most directly applicable to HCC's distillation step. CLS demonstrates that direct sequential training on new tasks destroys the cortical representation of old tasks. The phenomenon is catastrophic interference, and the original neural-network version of it was documented by McCloskey and Cohen in 1989. CLS is the architectural answer to that finding. Avoiding the interference requires interleaving: replaying old experiences alongside new ones, in a schedule chosen to balance new-information acquisition against old-information preservation. The 1995 paper provides the math. Subsequent work (Schapiro and colleagues; the Go-CLS framework) has refined when interleaving is necessary versus when item-level structure makes it optional. HCC papers do not specify the interleaving schedule on the distillation operator. CLS predicts that without an explicit interleaving discipline, the cross-task wisdom tier will progressively forget early-task structure as later tasks consolidate. That is the failure mode the 1995 paper named.

Retrograde gradient timescales. CLS provides measured consolidation timescales for the hippocampus-to-neocortex transfer. Rats consolidate over days to weeks. Primates over two to four weeks. Humans over years, with some traces remaining hippocampus-dependent for fifteen years or more. The gradient is task-dependent: declarative memory consolidates differently from procedural; structured-regularity tasks consolidate differently from one-shot episodes. The Go-CLS work formalized when memories stay episodic and when they become semantic. HCC operates inside a 24-hour budget envelope. That is a single timescale. The underlying field has a measured, gradient-shaped consolidation profile that HCC could borrow to make tier-two retention task-dependent rather than uniform. The current architecture treats consolidation as a flat operator. The brain does not.

Each of these three constraints is a testable prediction. If the next iteration of HCC adopts CLS-calibrated sparsity in the transient tier, an explicit interleaving schedule on the distillation operator, and a task-dependent consolidation gradient on the cross-task tier, the MLE-Bench medal rate likely climbs past 56.44 percent.

If it does not, the architecture will hit the catastrophic-interference ceiling McClelland nailed three decades ago. The 1995 paper has been sitting on the prediction that long. Reading it is cheaper than rediscovering it under benchmark pressure.


Why This Matters Beyond ML-Master

The pattern is bigger than one paper. The May 4 cross-domain hunt that surfaced the HCC mapping also surfaced a separate convergence in the agent-memory community. Eight independent open-source repositories shipped in the same window with substantially overlapping designs: typed memory tiers, hybrid retrieval, conflict detection at write time, decay and staleness scoring on stored items.

Five of the eight repositories use the cognitive-psychology trichotomy verbatim: semantic, episodic, procedural, plus a working-memory layer. That vocabulary comes from CLS-adjacent cognitive psychology. None of the five repositories cite McClelland 1995. The terms are being used as engineering nomenclature without the empirical grounding the source field provides. The same pattern that ML-Master 2.0 instantiates in a single research artifact, the agent-memory community is instantiating in eight production-track tools.

The mechanism is the same in both cases. Engineers under reliability pressure reach for the vocabulary their training overlaps with. Computer-systems language is the closest neighbor in most ML and infrastructure curricula. Cognitive psychology and neuroscience are not. So an architecture that is structurally a brain architecture gets named after the closest computer-systems metaphor (cache hierarchy), and an industry of memory tools that is structurally rebuilding CLS gets a vocabulary borrowed from cognitive psychology without the constraints attached.

The AI engineering culture is structurally siloed from its parent disciplines. The brain solved the catastrophic-interference problem in a system that is durable, multi-day, and runs under sparse feedback. Those are exactly the conditions modern agentic ML engineering is now operating under. The discipline that already calibrated the architecture is one citation away. The pattern is that nobody is making the citation.


The May 1 Connection

This is the second piece in this series in five days that names a structural rediscovery of an older named architecture. The May 1 piece on this site, The Supervisory Signal Layer, made the same shape of argument about agent control planes. Four hyperscalers shipped near-identical agent platforms in April 2026. The architecture they converged on has a name in control theory: supervisory control, formalized by Sheridan at MIT between 1978 and 1992, with Runtime Assurance as the modern aerospace descendant. The structural isomorphism between RTA and the 2026 agent platforms is exact. The vocabulary the AI field used was different. The architecture was the same. The aerospace field had measured parameters (control barrier functions, intervention thresholds, supervisor-controller separation invariants) that the agent platforms had not yet adopted.

The shape repeats. May 1: AI control planes are supervisory control with a different name. May 4: AI memory tiers are CLS with a different name.

Both are AI architectures with measured parent-discipline counterparts dating back twenty to thirty-five years. Both have empirical constraints in the older field that the AI version has not yet specified. Both are the kind of finding that lets one lab leapfrog another by reading a paper from a different discipline.

Two structural rediscoveries in four days is small enough to be coincidence and large enough to be a pattern. The mechanism is the same in both cases: AI engineering culture under capability and reliability pressure reaches for vocabulary from its closest neighboring fields (computer systems, software engineering) rather than from the fields that actually solved the problem (control theory, neuroscience). The fix is not academic. It is procurement-grade. If the next reliability win is sitting in a 1992 aerospace textbook or a 1995 Psychological Review article, the lab that reads it ships first.


What An Engineer Does With This

The takeaway is operational, not historical. Three concrete moves follow for any team building agent memory or continual learning in 2026.

Read CLS first. McClelland, McNaughton, and O'Reilly 1995 is the load-bearing reference. Kumaran, Hassabis, and McClelland 2016 is the modern update. The Go-CLS work in Nature Neuroscience (2023) is the current quantitative formalization. If a team is designing a fast-writing transient store and a slow-writing stable store with a transfer operator between them, the team is building CLS. The empirical constraints in the source field are calibrated. Borrowing them is faster than rederiving them under benchmark pressure.

Test the three predictions on the system being built. Sparsity ratio in the fast tier: measurable, publishable, and predicts an interference threshold. Interleaving schedule in replay or distillation: measurable, publishable, and predicts forgetting on early-task structure when absent. Retrograde gradient on consolidation: measurable, publishable, and predicts task-dependent retention if specified. If these three parameters cannot be measured on a CLS-shaped architecture, the architecture is structurally CLS-incomplete by the source field's own criteria. The next benchmark gain on MLE-Bench-class tasks is likely to come from a team that closes the gap on those three numbers.

Build for the next-generation HCC, not the current one. The next version of ML-Master will not ship preset cache parameters borrowed from computer systems. It will ship CLS-calibrated sparsity and interleaving discipline, because the medal rate will demand it. A team that ships those primitives now has a cleaner story when the larger labs catch up. A team that does not gets leapfrogged by a competitor who read McClelland.


Closing Claim

The structural rediscovery is real. ML-Master 2.0 made the right architectural bet. The paper credited the wrong field for the inspiration, and as a result it shipped without the constraints the right field already calibrated. The 56.44 percent medal rate is the floor of what a CLS-shaped architecture can produce without those constraints. The ceiling is higher.

The lab that names the architecture next inherits 31 years of neuroscience calibration. The lab that does not will keep cache-hierarchy-shaping a problem the brain already solved. Every memory architecture currently being shipped under "context engineering," "memory hierarchies," or "hierarchical cognitive caching" is structurally a CLS variant. The constraints are in the 1995 paper. The team that adopts them ships the next ceiling.


Citations and Sources

AI architecture and benchmark

  1. ML-Master 2.0. arXiv:2601.10402v1. (Hierarchical Cognitive Caching architecture; 56.44 percent medal rate on OpenAI's MLE-Bench under a 24-hour budget; gain attributed to memory architecture, not underlying model.)
  2. Air Street Press. "State of AI May 2026." May 4, 2026. https://press.airstreet.com/p/state-of-ai-may-2026 (Surfacing of ML-Master 2.0 as a current frontier result.)

Cognitive neuroscience and CLS

  1. McClelland, J. L., McNaughton, B. L., and O'Reilly, R. C. "Why There Are Complementary Learning Systems in the Hippocampus and Neocortex: Insights from the Successes and Failures of Connectionist Models of Learning and Memory." Psychological Review 102(3): 419-457. 1995. (Original CLS paper; dual-store architecture; catastrophic-interference analysis; interleaving as the architectural fix.)
  2. McCloskey, M., and Cohen, N. J. "Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem." The Psychology of Learning and Motivation 24: 109-165. 1989. (Original demonstration of catastrophic interference in neural networks; the failure mode CLS was designed to address.)
  3. Kumaran, D., Hassabis, D., and McClelland, J. L. "What Learning Systems Do Intelligent Agents Need? Complementary Learning Systems Theory Updated." Trends in Cognitive Sciences 20(7): 512-534. 2016. (Modern update of CLS for the deep-learning era.)
  4. Schapiro, A. C., and colleagues. Work formalizing when item-level structure makes interleaving optional versus when it is necessary for retention.
  5. Go-CLS framework. Nature Neuroscience. 2023. (Quantitative formalization of when memories stay episodic and when they become semantic; task-dependent retrograde gradient.)

Prior Signal in this series

  1. Diamond, Beau. "The Supervisory Signal Layer: Why Every Hyperscaler Just Shipped the Same Thing." beaudiamond.ai/signal/supervisory-signal-layer. May 1, 2026. (Companion structural-rediscovery piece: agent control planes as Sheridan supervisory control.)
  2. Diamond, Beau. "The Routing Failure." beaudiamond.ai/signal/routing-failure. May 4, 2026.
  3. Diamond, Beau. "The Cognitive State Layer." beaudiamond.ai/signal/cognitive-state-layer. April 30, 2026.