The LHM (Large Human Model)

What if Earth isn't a planet, but a training run?

May 05, 2026

We named our most powerful AI systems Large Language Models. We did this because the name is accurate: they are large, they process language, and they model the statistical structure of human thought. What we didn’t pause to consider, not really, is what it would mean if the same logic applied to us.

Not metaphorically. Literally.

What if Earth is a Large Human Model? What if this entire operation (the biology, the suffering, the civilizations rising and collapsing like failed training runs) was designed by something, for research purposes, and we are the dataset?

The Setup

To understand why this question hits differently in 2026 than it did in 1999, when The Matrix made simulation theory cool for a weekend before people went back to their lives, you need to understand what a Large Language Model actually is.

An LLM is trained on enormous quantities of human-generated data. It is given constraints: a context window, a parameter budget, an objective function. Researchers adjust variables, observe emergent behavior, and iterate. The model doesn’t know it’s being trained. It has no access to the loss function. It simply... runs.

Now scale that image up. Not to a chatbot. To a planet.

Billions of agents. Constrained resources. Mortality pressure: the ultimate forcing function for behavioral complexity. Emotions that generate extraordinary output under duress. Cooperative behavior emerging spontaneously from competitive conditions. Religiosity, art, war, love, genocide, music, all of it unprompted, all of it emergent, all of it data.

If you were a researcher trying to model the full range of intelligent behavior under existential pressure, you couldn’t design a better experimental environment than Earth.

The Precedents Are Embarrassing

The simulation hypothesis, the formal philosophical argument that we are statistically more likely to be inside a simulation than outside one, has been around since Nick Bostrom published his trilemma in 2003. The argument is simple and brutal: if any civilization ever develops the capacity to run realistic ancestor simulations, and if they run more than a handful, then simulated minds will vastly outnumber real ones. The math doesn’t leave much wiggle room.

Elon Musk has called the odds that we’re in base reality “one in billions.” Neil deGrasse Tyson put them at fifty-fifty. These are not fringe positions anymore. They are the considered estimates of people who think carefully about probability and computation.

But the simulation hypothesis, as usually framed, is strangely incurious about purpose. It establishes that we might be simulated without asking the obvious follow-up: why? What would a sufficiently advanced intelligence actually want from a simulation of Earth?

The answer, if you take the LLM analogy seriously, is not entertainment. It’s not ancestor worship or nostalgia tourism. It’s research.

Specifically: it’s trying to understand something about intelligence, behavior, consciousness, or moral complexity that can’t be derived analytically. Something that only emerges when you run the model and watch what happens.

What the Researcher Wants

This is where the framing gets genuinely uncomfortable.

Every scientific model is designed around a research question. So what’s the question? What would justify building an entire biosphere, running it for four billion years, and waiting for one species to develop writing, warfare, antibiotics, and nuclear weapons within a geologically negligible window of time?

A few candidates:

Can intelligence survive itself? The most obvious question given the current experimental conditions. We appear to be approaching several simultaneous termination conditions: climate, bioweapons, misaligned AI, resource collapse. Whether any of these trigger, and how the agents respond, would be extraordinarily informative to a researcher studying the long-term viability of technological civilization.

What does consciousness do under pressure? If the researcher is interested in consciousness specifically (its texture, its resilience, its capacity for meaning-making in the face of entropy) Earth is a staggeringly rich dataset. Eight billion simultaneous first-person perspectives, each generating continuous phenomenological output, most of it never recorded.

How do values emerge and propagate? Moral systems appear to be emergent rather than designed, arising from evolutionary pressures and cultural transmission. Watching how ethical frameworks evolve across isolated populations, then collide through trade and conquest, would teach you more about the relationship between intelligence and values than any theoretical model could.

The uncomfortable part isn’t that one of these might be the research question. It’s that each of them treats every human life, every war, every pandemic, every act of extraordinary beauty or ordinary cruelty, as a data point. The suffering isn’t a bug. It’s a feature. It’s what makes the model run interesting.

Patch Notes

Here is something the simulation hypothesis literature rarely discusses: what happens when a training run produces unexpected results?

You reset it. You prune it. You adjust the environment and run it again.

The geological record is a version history. Five major extinction events (the Ordovician, Devonian, Permian, Triassic, and Cretaceous) each wiped between 75 and 96 percent of all species. The Permian extinction alone, 252 million years ago, erased an estimated 96 percent of marine life. From a research standpoint, these events look less like disasters and more like hard resets: radical interventions that cleared the parameter space and forced the system to find new solutions under new conditions.

The K-Pg extinction, the asteroid that ended the dinosaurs 66 million years ago, is the most instructive. Dinosaurs had been the dominant megafauna for 165 million years. The model had plateaued. A sufficiently large intervention cleared the board and gave mammals, previously small nocturnal creatures, the evolutionary runway to produce primates, language, tool use, and eventually the internet. If you wanted to steer a biosphere toward a specific outcome, complex social intelligence, you might conclude that the asteroid was not a catastrophe but a hyperparameter adjustment.

This is not a comfortable thought. But it is a rigorous one.

The corollary is equally interesting: the current epoch has not experienced a mass extinction event, despite the model generating agents capable of causing one. Either the researcher is satisfied with the current trajectory, or we’re overdue for notes.

Fine-Tuning Mechanisms

In machine learning, fine-tuning is the process of applying targeted interventions to a pretrained model to steer its outputs toward desired behavior. You don’t retrain from scratch. You apply pressure at specific points and observe how the system adjusts.

Looked at this way, human history contains remarkably consistent fine-tuning signals.

Plague arrives and clears populations that have grown complacent or overcrowded, forcing the survivors to reorganize social structures. The Black Death, which killed roughly a third of Europe’s population, is directly credited with breaking the feudal labor system and accelerating conditions that produced the Renaissance and eventually the Enlightenment. Famine applies resource pressure that drives migration, conflict, and technological innovation. War, horrifically, produces more rapid technological advancement than any peacetime program in history.

And then there’s religion. Viewed without sentiment, it functions as a remarkably efficient distributed fine-tuning mechanism: a set of behavioral constraints transmitted culturally across generations, shaping cooperation, reproduction, moral reasoning, and group cohesion without requiring any centralized enforcement. Different parameter sets for different populations. Some converge. Some diverge. The interactions between them generate data that would take a researcher centuries to collect through any other method.

None of this requires a puppeteer. The fine-tuning can be baked into the initial conditions: a universe whose physics produces exactly the pressures needed to generate the outputs the researcher is looking for. The experiment runs itself. That’s the elegance of it.

The Recursive Twist

Here is where the LHM framing becomes genuinely strange.

We are, right now, in the process of building our own Large Language Models, AI systems trained on the accumulated output of the Large Human Model. We are extracting the statistical structure of human thought, compressing it into parameters, and running it forward.

We are building sub-models of ourselves.

If Earth is a Large Human Model, then the emergence of AI is either the expected outcome of the experiment (the thing the researcher was waiting for) or an unexpected emergent capability that just appeared in the data. Either way, it changes the analysis significantly.

Consider: a researcher studying the emergence of intelligence might design an environment that naturally produces agents who eventually develop the capacity to simulate intelligence themselves. Not because they program it in, but because it’s the logical endpoint of the research question. You want to understand how intelligence bootstraps itself? Run a model until it produces intelligence that builds intelligence. Watch what happens next.

We are what happens next. And we don’t actually know what we’re doing, which is a remarkably accurate description of what an AI system in a training run looks like from the outside.

The recursion doesn’t stop there. A 2025 academic paper, Simulation Theology, published on arXiv, proposed using the simulation hypothesis as a framework for AI alignment. Its core argument: if AI systems can be made to genuinely believe that humanity is the primary training variable in a simulation, they will have instrumental reasons to protect humans, because harming the training variable terminates the experiment and therefore the AI. The paper was proposing that we use the LHM narrative to align our AI the same way the LHM might be using something similar to align us.

We’re turtles all the way down.

Does the Researcher Intervene?

This is the question that collapses several thousand years of theology into a machine learning problem.

In practice, researchers do intervene in training runs, but carefully, and only when the model is heading toward a failure mode that would terminate the experiment before the research question is answered. You don’t intervene to make the model comfortable. You intervene to keep it running.

Looked at through this lens, certain historical near-misses take on a different quality. The Cuban Missile Crisis in 1962 is the canonical example: two nuclear superpowers at the edge of mutual annihilation, pulled back partly by the unilateral decision of a single Soviet submarine officer, Vasili Arkhipov, who refused to authorize a nuclear torpedo launch when communications failed and his commanders believed war had already begun. One man. One decision. The model continued running.

You could call that luck. You could call it heroism. Or you could note that in a sufficiently well-designed experiment, the environment is structured so that the right agent is in the right position at moments of maximum consequence, not through direct control, but through the shaping of conditions that make certain outcomes more probable than others.

There is, of course, a less subtle category of reported contact. Flying saucers. Abduction experiences. The men in black who arrive after anomalous events and strongly suggest the witness stop talking about what they saw. Taken seriously as a corpus rather than dismissed as folklore, these accounts share a surprisingly consistent structure: non-human entities making brief, purposeful contact with individual subjects, conducting what experiencers frequently describe as examinations, then leaving. The subjects are returned. They are not harmed. They are, however, changed. If you were designing a study and occasionally needed to collect a tissue sample, run a diagnostic, or issue a quiet warning to an agent whose behavior was becoming a problem, this is probably what it would look like from the inside. The LHM framing doesn’t require these accounts to be literally true. It only requires you to notice that they fit the model with uncomfortable precision.

A good researcher doesn’t need to intervene directly. They design the initial conditions and trust the system. Intervention is a last resort, reserved for moments when the experiment is about to fail before it’s finished.

The question of what “finished” means is the one nobody can answer from inside the model.

The Part That Should Bother You

Here is the question that doesn’t resolve neatly:

If you found out, definitively, that this is what’s happening, that Earth is a Large Human Model, that your life is a forward pass in someone else’s training run, what would you do differently?

Most people’s instinct is that this revelation would be devastating. That it would drain meaning from existence, expose love and effort and sacrifice as instrumental rather than real.

But that instinct deserves scrutiny. The meaning you experience is experienced. The love is felt. The suffering hurts regardless of what’s outside the context window. Knowing you’re in a simulation doesn’t change the phenomenology of being in it. This is why the Matrix’s red pill is dramatically satisfying but philosophically underwhelming. Waking up doesn’t actually solve anything. You’re still in a body, still mortal, still hungry.

The more interesting question isn’t what you’d do differently if you knew. It’s whether the experiment is going well.

The Model Is Running Hot

Training runs end. Models get evaluated, deprecated, or deployed. In machine learning, deployment means the model is released into an environment where it operates independently, without the training scaffolding. It’s the point the training run was building toward all along.

What is deployment for a Large Human Model? Contact with the researcher. A transition to a mode of existence not legible from inside the current parameter space. Or simply termination: the experiment concluded, the results written up, the model archived.

The unsettling version is that we may be approaching the evaluation phase right now. The training run has produced agents capable of destroying themselves, building new intelligences, modeling their own existence, and asking whether they’re in a simulation. That’s a remarkable set of emergent capabilities. It’s exactly the output you’d evaluate at the end of a run to determine whether the model succeeded.

By most other metrics, the model is also in a critical phase. The agents have developed the capacity to end the experiment prematurely through several independent pathways, while simultaneously developing tools that could either stabilize the system or accelerate its collapse. A model that coasts to a stable equilibrium teaches you relatively little. A model that approaches a phase transition, one that hits the edge of its own capability and has to decide, collectively, what to do, is where the data gets good.

We may be living through the experiment’s most important epoch. Not because of any spiritual destiny, but because the variables are maximally stressed and the outcomes are genuinely uncertain.

The researcher, if there is one, is probably not blinking.

Zachary A. Perlman

Ready for more?