Why do I feel like I need to build my AI a body?: Diana Wolfe

← Back to Blog

"I had worked hard for nearly two years, for the sole purpose of infusing life into an inanimate body." Mary Shelley wrote that in 1818. I read it in an art history seminar in my twenties, and it never left. Now I am standing in my basement, soldering wires into a robotic head (why did I choose the head?), and the sentence has become less literary allusion than project status update.

The project is called Ceous Titan: an animatronic I am building from servos, 3D-printed components, and an AI backbone. It started as curiosity. It became a compulsion. And the question I keep returning to is not how to build it but why. Why software is not enough. Why I feel pulled toward giving intelligence a physical form. And whether that pull is personal idiosyncrasy or something the field needs to take seriously.

The evidence that this impulse extends well beyond hobbyists is everywhere. Budapest-based Allonic raised $7.2 million in pre-seed funding in February 2026 to solve what they call the body problem. Between 13,000 and 18,000 humanoid robots were sold globally in 2025, with projections reaching $15 billion by 2030 (MarketsandMarkets, 2025). But physical robots are only one form the body takes. AI avatars are now standard in customer service, education, and healthcare. Persistent voice personas are developing recognizable presence. Digital humans in VR and mixed reality are becoming a legitimate interaction medium. The question of what it means for AI to have a body spans a spectrum from a humanoid moving through physical space to an avatar that maintains consistent visual identity across interactions. The design questions are different at every point on that spectrum. What holds them together is a shared intuition: that presence changes the interaction in ways absence cannot.

The hypothesis: the impulse to embody AI reflects a structural gap in how we design intelligent systems, and the convergence of falling hardware costs, open-source AI models, and new manufacturing methods is turning that impulse into an industry.

Three takeaways

Software without a body hits a ceiling.

Varela, Thompson, and Rosch (1991) argued that cognition arises from the dynamic interaction between an organism, its body, and its environment. For decades that remained theoretical. It is now arriving as an engineering problem. Fei-Fei Li has articulated this most clearly through her work on spatial intelligence and what she calls world models: internal representations that allow a system to understand, reason about, and predict the behavior of three-dimensional physical environments (Li, 2025). A world model is what lets you catch a set of keys tossed across a room, pour coffee without looking at the mug, or navigate a crowded sidewalk. It connects perception to action through physics, geometry, and spatial reasoning. Li's argument is that today's large language models remain "wordsmiths in the dark," eloquent but ungrounded. They can describe a room but cannot navigate it. Without hardware inputs that ground the system in physical reality, AI cannot build the world models that would make it genuinely capable in embodied contexts. This is not a training data problem. It is a modality problem. The data for spatial intelligence, as Li puts it, "is all in our heads. It is not accessible like language." The body is what gets the system from language to world.

Bainbridge et al. (2011) found that physically present robots produced higher compliance, more positive evaluations, and longer engagement than video-displayed agents performing identical tasks. Embodiment is not decoration. It changes what the interaction can do, and the world model argument explains part of why: a system grounded in physical space processes the interaction differently than one operating through text or video alone.

Presence changes the relationship in ways absence cannot.

Dreyfus (1972) drew on Heidegger and Merleau-Ponty to argue that human expertise depends on embodied, situated engagement with the world: a form of knowing that cannot be reduced to rules or representations. Large language models have surprised Dreyfus's heirs with their capability, but his core challenge persists. An avatar that maintains a consistent face and voice changes the trust dynamics of an interaction in ways a text interface does not. A robot that moves through space changes accountability structures in ways software alone cannot. These are design problems, not just engineering problems. They produce different knowledge than code and arise differently depending on what kind of body the AI has: physical, visual, vocal, or some hybrid we have not yet named.

Building the body changes the builder.

Sennett (2008) argued that working with materials produces knowledge that purely intellectual engagement cannot replicate. When I moved from building AI agents in code to building physical systems, my understanding of failure modes, interaction design, and the gaps between intended and actual behavior shifted in ways I did not anticipate. Physical constraints (latency, weight, friction, heat) forced a different kind of engineering attention. The abstraction layer that makes software elegant is the same layer that conceals what happens when intelligence meets the world. The material teaches you things the abstraction conceals.

The longer view

Merleau-Ponty's (1945/1962) concept of the body schema offers the deepest lens here. The body, in his account, is not merely an object we possess but the medium through which we engage with the world. When we use a tool long enough, it becomes an extension of the body schema: the blind person's cane, the surgeon's scalpel. The question for embodied AI is whether a physically present system can become part of a human's extended body schema in ways that screen-based interfaces cannot. Human-robot interaction research suggests the answer is yes (Bainbridge et al., 2011), and the implications for trust are significant: trust calibrated through physical co-presence may be qualitatively different from trust calibrated through a chat window. If that is true, then organizations deploying AI into high-stakes, trust-dependent contexts may find that embodiment is not a luxury but a design requirement.

But Merleau-Ponty's insight cuts in a second direction that matters for AI systems themselves, not just for the humans who interact with them. If intelligence is constituted through bodily engagement with the world rather than merely housed in it, then intelligence without a body is intelligence without its generative substrate. This is the philosophical foundation beneath Li's engineering argument. The reason software alone cannot build world models is not a limitation of compute or data. It is that the kind of knowledge a world model requires (spatial, physical, dynamic) is produced through interaction with the world, not through description of it. Giving AI a body is not just about changing the human's experience of the system. It is about changing what the system can know. The perception-action loop that Li describes as the core of spatial intelligence is Merleau-Ponty's body schema translated into an engineering specification: a system that perceives, acts, receives feedback, and updates its model of the world through that cycle. Varela, Thompson, and Rosch (1991) called this the enactive approach. Li is building the infrastructure to make it real.

The history of automata provides the longer arc. Humans have been building mechanical beings for millennia: from ancient Greek myths of Talos to Vaucanson's eighteenth-century mechanical duck to modern animatronics. Kang (2011) traces this impulse through Western history and argues that it reflects fundamental questions about the nature of life, consciousness, and what it means to create. Ceous Titan places me in a tradition much older than AI. What has changed is that the intelligence available to inhabit these bodies is now genuinely capable, the bodies themselves are becoming affordable and manufacturable at scale, and we are beginning to understand (through Li, through Varela, through Merleau-Ponty) that the body is not a container for intelligence but a condition of it. The question is no longer whether we can give AI a body. It is what intelligence becomes when we do.

My two cents

I do not fully understand the impulse yet, and I am comfortable saying that. What I know is that every time I move from software to hardware, something in my understanding of AI shifts. The object in front of me talks back in ways that code on a screen does not. Shelley already wrote this story. The horror of Frankenstein is not the creation of intelligence. It is the creation of a body, and the refusal to reckon with what that body demanded. The body changed the relationship. It always does.

There is a research insight hiding in this experience, something about how embodiment changes the trust dynamics, the capability ceiling, and the co-evolution patterns between humans and AI systems. Whether the body is a robot moving through a warehouse, an avatar maintaining a consistent face across customer interactions, or a voice persona that accumulates presence over time, the convergence of capable AI and falling costs across all these forms means this is no longer a fringe question. It is arriving as an industrial reality, and the organizations that understand what changes when AI has a body (any kind of body) will have an advantage over those that encounter embodiment for the first time at deployment.

Try This

If you work in AI strategy or research and have never built something physical, consider it. A robotic arm, a sensor system, a physical computing project. The point is not to become a roboticist. The point is to encounter AI's relationship to the physical world firsthand, because that frontier is arriving faster than most strategy conversations acknowledge, and the people who have touched the material will see things the people who have not will miss.

Read to learn more

Academic: Varela, F. J., Thompson, E., & Rosch, E. (1991). The embodied mind: Cognitive science and human experience. MIT Press.

Industry: Li, F.-F. (2025). From words to worlds: Spatial intelligence is AI's next frontier. https://drfeifei.substack.com/p/from-words-to-worlds-spatial-intelligence

References

Bainbridge, W. A., Hart, J. W., Kim, E. S., & Scassellati, B. (2011). The benefits of interactions with physically present robots over video-displayed agents. International Journal of Social Robotics, 3(1), 41–52.

Dreyfus, H. L. (1972). What computers can't do: A critique of artificial reason. Harper & Row.

Kang, M. (2011). Sublime dreams of living machines: The automaton in the European imagination. Harvard University Press.

Li, F.-F. (2025). From words to worlds: Spatial intelligence is AI's next frontier. https://drfeifei.substack.com/p/from-words-to-worlds-spatial-intelligence

MarketsandMarkets. (2025). Humanoid robot market: Global forecast to 2030.

Merleau-Ponty, M. (1962). Phenomenology of perception (C. Smith, Trans.). Routledge. (Original work published 1945)

Sennett, R. (2008). The craftsman. Yale University Press.

Shelley, M. (1818). Frankenstein; or, the modern Prometheus. Lackington, Hughes, Harding, Mavor & Jones.

Varela, F. J., Thompson, E., & Rosch, E. (1991). The embodied mind: Cognitive science and human experience. MIT Press.