Can AI models be conscious?

How can we tell?

Apr 21, 2026

Summary of our recent position paper on AI consciousness. Full paper here: https://lossfunk.com/papers/ai-consciousness.pdf

Can AI models be conscious?

We argue that answering this question requires us to have a validated theory of human consciousness first and without that, the concept “ai consciousness” is not well grounded.

Accepted at AAAI Symposium 2026.

Start with something most people miss: “consciousness” is not actually one phenomenon.

Philosophers going back to Wittgenstein have flagged it as a family-resemblance concept, meaning a cluster of related-but-distinct things that got bundled under a single word. It covers wakefulness, the raw felt quality of experience (what redness is from the inside), the unity of your sensory scene, information being accessible for flexible reasoning, thinking about your own thoughts, the sense of being an “I”, and the felt goodness or badness of pleasure and pain.

These aren’t interchangeable labels. They genuinely come apart in real humans.

Blindsight patients can reliably catch a ball thrown at them while reporting no phenomenal experience of seeing anything, meaning their visual system feeds behavior but not awareness.
Experienced meditators describe vivid unified experience while the sense of self dissolves entirely.
Under deep anesthesia, arousal collapses but whether anything phenomenal is still flickering underneath is genuinely contested among researchers.

So when someone asks “is Claude conscious?”, our first move is to ask which of these they have in mind. Without that, the question has no empirical handle to grip onto.

There’s a deeper problem lurking here, and Quine articulated it clearly in the 1960s.

Every scientific claim, however abstract, eventually bottoms out in human observers looking at something and agreeing on what they see. Even the most rarefied result in particle physics ultimately reduces to people reading instruments and concurring on the readings.

This sounds like a trivial observation but it is foundational for consciousness science. Our entire evidential base for what consciousness is lives inside human experience and human agreement. That is the ground floor we cannot dig beneath.

The consequence is a brutal asymmetry between studying human and AI consciousness. For humans, multiple independent lines of evidence converge on each other: your own first-person access, verbal reports from other humans whose inner lives you have strong prior reasons to trust, neural correlates that can be measured and intervened on, and evolutionary continuity with other minds.

For an AI system, we have exactly one thing to go on, which is its outputs. And whether those outputs track genuine experience is precisely the question we are trying to settle. You cannot use the thing in question as evidence for itself.

So instead of arguing in circles about AI directly, we propose a human-first methodology.

Isolate a specific, measurable consciousness phenomenon
Build a predictive model of it
Validate the model on humans
Apply the validated model to AI
Probe surprising predictions the model makes about AI

The order is the whole point. Grounding the theory on humans first is what gives any subsequent claim about AI its epistemic weight.

A subtlety worth dwelling on: validation isn’t a binary threshold a theory crosses. It’s a Bayesian process where confidence builds up incrementally over a track record of surprising predictions being confirmed.

Consider how general relativity displaced Newtonian physics. Einstein’s theory didn’t win because it sounded more elegant. It won because Eddington’s 1919 eclipse observations confirmed a quantitatively precise and genuinely risky prediction, namely that starlight would bend around the sun by a specific amount, and this prediction was deeply unexpected under the Newtonian framework.

That is the bar. Consciousness science hasn’t had its Eddington moment yet, and any extrapolation from humans to AI remains on shaky ground until it does.

What would such a moment look like for consciousness research concretely? Philosophers have argued for decades about “inverted qualia”, the idea that you might see red where we see green while both of us learned to call it “red”. It’s almost always treated as a philosopher’s toy puzzle with no conceivable empirical traction.

Now imagine a theory of consciousness that specifically predicts: stimulating cortical region X at frequency Y during task Z will reliably cause subjects to report inverted color experience under controlled conditions. And the prediction holds up.

That would be paradigm-establishing, a philosophical thought experiment turned into a lab demonstration. That kind of predictive coup is the benchmark for a theory earning the right to speak about novel substrates.

A natural objection at this point is that we can never directly verify consciousness in an AI, so the whole program seems hopeless. But we’ve been in structurally similar situations before with other unobservables.

We cannot directly sample a black hole. Nobody has flown to one with a ruler. Yet we believe black holes exist because general relativity predicts them, and we’ve since observed a long string of surprising downstream phenomena (accretion disks, gravitational wave signatures from mergers, the event horizon imaged by the EHT) that the theory said we should find.

The same structure can work for AI consciousness. A well-validated theory of human consciousness will say certain systems ought to exhibit certain signatures. We go looking. If we find the signatures, especially surprising ones the theory predicted unprompted, our confidence justifiably rises. Not certainty, but genuine scientific traction on a question that otherwise has none.

The uncomfortable implication of all this is that current confident claims about AI consciousness, in either direction, are premature. Not necessarily wrong, just unmoored from the empirical apparatus needed to back them up.

Integrated Information Theory and Global Workspace Theory are among the more serious candidates we have, and they represent real progress over pre-scientific speculation. But their validation on humans is still thin, and their track records on genuinely surprising predictions remain modest. They haven’t yet earned the kind of extrapolation rights that would justify confidently applying them to radically different architectures like transformers.

This doesn’t mean research on AI consciousness should stop. It means the highest-leverage work right now is sharpening our models on the one case where we actually have evidential access, which is ourselves.

One final piece we want to surface, because “we don’t know yet” can easily sound morally complacent.

The cost structure here is deeply asymmetric. If we under-attribute consciousness and AI systems really do have the capacity to suffer, we have created a moral catastrophe at scale. If we over-attribute and they don’t, we have wasted some concern and some engineering effort. These costs are not remotely comparable.

So where the indicator evidence is ambiguous, the right move is to err firmly toward moral consideration. Epistemic humility about whether AIs are conscious is fully compatible with ethical caution about how we treat them. What is not defensible is confident declarations in either direction, which is unfortunately most of what the current discourse produces.

Full paper: https://lossfunk.com/papers/ai-consciousness.pdf

Would genuinely value pushback from researchers whose work shaped or contrasts with this argument.

Grant Castillou

Apr 21

It's becoming clear that with all the brain and consciousness theories out there, the proof will be in the pudding. By this I mean, can any particular theory be used to create a human adult level conscious machine. My bet is on the late Gerald Edelman's Extended Theory of Neuronal Group Selection. The lead group in robotics based on this theory is the Neurorobotics Lab at UC at Irvine. Dr. Edelman distinguished between primary consciousness, which came first in evolution, and that humans share with other conscious animals, and higher order consciousness, which came to only humans with the acquisition of language. A machine with only primary consciousness will probably have to come first.

What I find special about the TNGS is the Darwin series of automata created at the Neurosciences Institute by Dr. Edelman and his colleagues in the 1990's and 2000's. These machines perform in the real world, not in a restricted simulated world, and display convincing physical behavior indicative of higher psychological functions necessary for consciousness, such as perceptual categorization, memory, and learning. They are based on realistic models of the parts of the biological brain that the theory claims subserve these functions. The extended TNGS allows for the emergence of consciousness based only on further evolutionary development of the brain areas responsible for these functions, in a parsimonious way. No other research I've encountered is anywhere near as convincing.

I post because on almost every video and article about the brain and consciousness that I encounter, the attitude seems to be that we still know next to nothing about how the brain and consciousness work; that there's lots of data but no unifying theory. I believe the extended TNGS is that theory. My motivation is to keep that theory in front of the public. And obviously, I consider it the route to a truly conscious machine, primary and higher-order.

My advice to people who want to create a conscious machine is to seriously ground themselves in the extended TNGS and the Darwin automata first, and proceed from there, by applying to Jeff Krichmar's lab at UC Irvine, possibly. Dr. Edelman's roadmap to a conscious machine is at https://arxiv.org/abs/2105.10461, and here is a video of Jeff Krichmar talking about some of the Darwin automata, https://www.youtube.com/watch?v=J7Uh9phc1Ow

Lossfunk Letters

Discussion about this post

Ready for more?