OpenAI is developing new voice models to launch the "first voice" device by 2026, in line with the industry's shift to screen-less interfaces.
OpenAI is quietly reorganizing a relevant part of its internal structure to focus on the front that has until now occupied the second place in the public conversation about artificial intelligence: audio.It's not just to make ChatGPT sound familiar or less robotic.As revealed by The Information, in recent months the company has united several engineering, product and research teams with a specific goal: to prepare a new generation of audio models and, in accordance, each device is designed as an audio-start from scratch, with a planned launch until 2026.
The move presents an uncomfortable question for an industry that has been orbiting larger and more ubiquitous screens for more than a decade.What if the next step in personal computing isn't about seeing more, but hearing better?In Silicon Valley, that hypothesis is becoming marginal.
OpenAI's commitment is part of a broader trend that is permeating much of the tech sector.More than a third of American homes already have a voice assistant.It is integrated into a smart speaker that acts as home infrastructure.MeanwhileMajor players are exploring ways to translate such interactions into a less static context. Meta recently added a feature to its Ray-Ban glasses, using a five-microphone array that can isolate conversations in noisy environments.It turns the user's head into a truly guided listening system. Google, for its part, began experimenting with audio overviews in June.It's a summary of a conversation that turns search results into spoken stories. Even Tesla has integrated its xAI-developed chatbot Grok into its vehicles to handle navigation.climate controlor general questions through natural conversation
In this sense, Opening, while showing his own details, is less likely to break up before the first one was seen.The company is working in a new person's new example, set for the first quarter of the first 2026, I hope to hold on the nearest aim to the nearer in the true personality.Not like the current features of this time, can't say and listen to the same time, this system can do when the user is going to work.This is a technical detail, though.Show a serious change of human relationship with a machine.
The project is not just limited to software.OpenAI is exploring a family of devices that would include everything from screen-less speakers to smart glasses designed less as disposable tools and more as persistent companions.The idea is related to the vision of ambient computing, which has been repeatedly advocated by Sam Altman, the company's CEO: systems that are present, attentive, but do not require constant attention.
These ambitions have been boosted by the hiring of former Apple design chief Jony Ive following the $6.5 billion acquisition of io Studio in May.Ive is critical of the addictive drift of many consumer devices and sees audio as an opportunity to correct some of the abuses of the past.This is no small argument at a time when screen saturation is beginning to be seen as a social and regulatory problem.
However, recent history provides examples that warrant caution.The Humane AI Battery, a screenless device that promised voice-based experiences and laser projection, ended up being a poster child for inflated expectations and poor execution after spending hundreds of millions of dollars.Other experiments, such as Friend AI Chain, which claims to record users' lives to provide friendship and context, have raised concerns about their impact on privacy and constant surveillance.
Despite these setbacks, the startup ecosystem continues to press on.Companies like Sandbar or a new project led by the founder of Pebble Eric Migicovsky are working on rings with listening capabilities and voice response that could hit the market in 2026.
From an industry perspective, OpenAI is also preparing its supply chain for this hardware increase.According to UDN and other sources, Benzinga said that the company transferred the first equipment from Luxor to Foxconn in order to reduce its dependence on China and to find assembly in Vietnam or the United States.Known internally as "Gumdrop", this project is in the planning stage and could start as a compact device like the iPod Shuffle, which has a built-in microphone and camera and functions such as copying handwritten notes in ChatGPT.
This emphasis on sound poses particular technical challenges.Models should be light enough to at least partially run on equipment.Local processing reduces latency and cost, and alleviates some privacy concerns.Google has moved in this direction with its Gemini Nano Pixel.OpenAI could follow a similar path by developing customized versions of models that can run at the edge without requiring the cloud to be available for each interaction.
At the same time, the rise of audio is not limited to speech.AI-generated music is growing rapidly, with startups like Suno reaching annual revenues of more than $200 million, according to The Wall Street Journal.It's unclear whether the new version of OpenAI will include music technology, but the commercial incentives are there, especially as the company looks to diversify its consumer business.
Beyond technology, the move to sensory interaction is reopening a debate that seemed settled.The voice is intimate, contextual, and often communal.Talking to a machine in one place is not the same as typing on a screen.The issues related to privacy, social ergonomics, and accessibility are deep and still under-explored.The promise of more "natural" interactions is accompanied by the risk of more proliferation and constant surveillance.
In essence, OpenAI's disdain for audio doesn't eliminate the screen, but it does remove it.It makes it a secondary element, activated only when necessary.This is an ambitious hypothesis linking sensory fatigue to visual attention economy.However, it also depends on the technology delivering on this promise, and on users embracing a new kind of digital presence that is less obvious but potentially more intrusive.
The industry seems willing to explore this area, although the outcome is far from clear.Whether audio will establish itself as the primary interface or remain relevant in depth will depend on technological advances and cultural and regulatory constraints.OpenAI has put an important chip on the table.The board, however, continues to move.
