How to make progress on metaphysical puzzles in AI?

New paper from Lossfunk

May 21, 2026

The following is a brief summary of my position paper that got accepted into ICML 2026. Read it here: https://lossfunk.com/papers/ai-metaphysics.pdf

AI debates often get stuck between two bad options.

• Realism: concepts like intelligence/consciousness have true essences, and our job is to discover them.

• Quietism: metaphysical debates are just word games, so ignore them.

We argue for a third option: pragmatism.

Pragmatism (in the tradition of James, Peirce, Quine and Wittgenstein) asks us to judge ideas by their consequences: what they let us predict, measure, build, or explain. For example, there may not be an essence of “intelligence”, waiting to be discovered. Rather, there are more or less useful framings of “intelligence” that we could hope to invent.

Pragmatic framing avoids both essence-hunting and giving up on metaphysical questions entirely and suggests that a concept earns its keep by the research programs it opens.

As a practical framework, whenever metaphysical puzzles / questions are encountered in AI, I propose a two-step procedure (let’s call it productive confusion):

Step 1: Clarify - What different things might this loaded term mean?
Step 2: Invent - What empirically tractable questions does each meaning suggest?

In Step 1, I recommend asking:

“In what sense is this word being used, and how would we know if an answer was right?”

Often the puzzle turns out to be one of four things:
• a language trap
• an idle/unverifiable question
• a family-resemblance concept
• an empirical question in disguise

In Step 2, I recommend asking:

“What nearby empirically tractable question would make us think or act differently?”

Example:

Instead of asking “Do LLMs really understand?”, ask:
• What different capacities do we bundle under “understanding”: recall, abstraction, causal reasoning, grounding, generalization?
• What failures distinguish shallow pattern matching from robust abstraction?
• What internal structures support those behaviors?

In the paper, I apply the framework to four AI debates (Searle’s Chinese Room, o1 scheming, AGI definition and world models in LLMs). As one example here, on the question “Do LLMs have world models“, here’s what I write:

Step 1 (Clarify). The question contains two terms each carrying multiple family resemblances, and this turns out to be where the disagreement lives. Visibly conflicting conclusions in the literature (Li et al., 2023; Vafa et al., 2024; Kambhampati et al., 2024) are not really conflicts about the same proposition: each paper is rigorous on its own terms but operationalizes the question differently.
“World” admits at least the following senses: the physical world we inhabit (with emphasis on physics), an abstract environment the model encounters (a game, a simulator), a specific domain (chemistry, geography), or the space of all possible worlds (including the worlds of mathematics and logic). “Model” similarly admits: symbolic representation of dynamics (e.g., ODEs/PDEs), subsymbolic prediction of the next state, pixel-level video rollouts, or a coherent latent representation recoverable by a probe. Models can further emphasize different desiderata: high vs. low fidelity, internal consistency vs. behavioral adequacy, causal vs. purely correlational structure. The question “Do LLMs have a world model?” as posed therefore does not admit a single answer; it admits at least one answer per cell of the (world-sense × model-sense) grid.
Step 2 (Invent). Rather than dismissing the original question, we use the grid itself as a generator. Each cell of the (world-sense × model-sense) cross-product suggests a different empirical program. Some examples:
Can LLMs predict outcomes of novel physical experiments absent from their training data? (dynamics model × physical world)
If fine-tuned on the rules of a toy physics never seen in pretraining, can LLMs simulate trajectories whose state representations are linearly recoverable via probes (Li et al., 2023)? (latent representation × specific environment)
Are LLMs’ implicit world models coherent under Myhill-Nerode-style sequence-distinction tests (Vafa et al., 2024), or merely adequate for typical-distribution next-token prediction? (coherent vs. adequate × abstract world)
Can LLMs answer counterfactual questions requiring causal intervention rather than mere statistical conditioning? (causal model × actual world)
Can LLMs plan in domains requiring action-effect models (Kambhampati et al., 2024), or do they require external symbolic components? (procedural model × task environment)
Note what the framework enables: a range of empirically grounded research programs, each inspired by a different sense implied by the original family-resemblance question.

As another example, take intelligence. Excerpting from the paper:

As a case study, let us ask: what is intelligence? This question becomes urgent in AI as we try to evaluate progress and set research agendas. Yet popular definitions reveal no consensus:
Goal achievement: Intelligence is the ability to achieve goals in environments (McCarthy, 2007). This emphasizes capability and effectiveness.
Learning efficiency: Intelligence is rate of skill acquisition, i.e. how quickly an agent learns new tasks (Chollet, 2019). This emphasizes adaptability.
Generalization: Intelligence is satisfying diverse goals in varied contexts (Legg and Hutter, 2007). This emphasizes breadth.
Handling uncertainty: Intelligence is adaptation with insufficient knowledge and resources (Wang, 2019). This emphasizes robustness.
Scientific reasoning: Intelligence is doing science, involving generating and testing hypotheses (Bennett, 2025). This emphasizes discovery.
Navigation: Intelligence is competence in navigating abstract and physical spaces (Levin, 2024). This emphasizes spatial reasoning.
There is no one “correct” definition here. Rather we have overlapping aspects of what is colloquially understood by “intelligence”, each proving useful for different research purposes. Our criteria for engaging with different definitions of intelligence shouldn’t be which one is “true”, but which ones help us build better systems, design better experiments, or understand cognition more deeply.

My recommendation (in the paper) is that instead of asking: “Which definition of intelligence is correct?“

Ask: “What does this definition help us do?“

• Does it suggest benchmarks?
• Does it expose failure modes?
• Does it predict behavior?
• Does it guide system design?

Different definitions have different consequences, and focusing on those is more important than trying to settle on one true definition.

The full paper has a lot more detail and nuance. Read it here: https://lossfunk.com/papers/ai-metaphysics.pdf

Would love feedback and pushback on my position.

Grant Castillou

May 21

It's becoming clear that with all the brain and consciousness theories out there, the proof will be in the pudding. By this I mean, can any particular theory be used to create a human adult level conscious machine. My bet is on the late Gerald Edelman's Extended Theory of Neuronal Group Selection. The lead group in robotics based on this theory is the Neurorobotics Lab at UC at Irvine. Dr. Edelman distinguished between primary consciousness, which came first in evolution, and that humans share with other conscious animals, and higher order consciousness, which came to only humans with the acquisition of language. A machine with only primary consciousness will probably have to come first.

What I find special about the TNGS is the Darwin series of automata created at the Neurosciences Institute by Dr. Edelman and his colleagues in the 1990's and 2000's. These machines perform in the real world, not in a restricted simulated world, and display convincing physical behavior indicative of higher psychological functions necessary for consciousness, such as perceptual categorization, memory, and learning. They are based on realistic models of the parts of the biological brain that the theory claims subserve these functions. The extended TNGS allows for the emergence of consciousness based only on further evolutionary development of the brain areas responsible for these functions, in a parsimonious way. No other research I've encountered is anywhere near as convincing.

I post because on almost every video and article about the brain and consciousness that I encounter, the attitude seems to be that we still know next to nothing about how the brain and consciousness work; that there's lots of data but no unifying theory. I believe the extended TNGS is that theory. My motivation is to keep that theory in front of the public. And obviously, I consider it the route to a truly conscious machine, primary and higher-order.

My advice to people who want to create a conscious machine is to seriously ground themselves in the extended TNGS and the Darwin automata first, and proceed from there, by applying to Jeff Krichmar's lab at UC Irvine, possibly. Dr. Edelman's roadmap to a conscious machine is at https://arxiv.org/abs/2105.10461, and here is a video of Jeff Krichmar talking about some of the Darwin automata, https://www.youtube.com/watch?v=J7Uh9phc1Ow

2 replies

2 more comments...

Lossfunk Letters

Discussion about this post

Ready for more?