"" What is ChatGPT 4o?

What is ChatGPT 4o?

Real-time audio-video discussions with "emotional" chatbots are possible using ChatGPT 4o.

In the GPT-4o demo a novel AI model that can read user facial expressions sings a bedtime story.


GPT-4o (o for "omni"), a significant new AI model was unveiled by OpenAI on Monday. It is said to be able to read emotional cues, respond to visual input and engage in real-time spoken conversations. It functions more quickly than GPT-4 Turbo, OpenAI's previous top model. ChatGPT users will be able to access it for free and it will be made accessible as an API service over the course of the next few weeks.


In a YouTube livestream titled "OpenAI Spring Update" OpenAI CTO Mira Murati and staff members Mark Chen and Barret Zoph unveiled the enhanced audio conversation and image comprehension capabilities along with live demonstrations of GPT-4o in operation.



ChatGPT 4o omni can read user facial expressions



According to a 2009 study by OpenAI, GPT-4o reacts to audio inputs at an average of 320 milliseconds which is comparable to human response speeds in conversation. Using text, vision and audio OpenAI claims to have trained a whole new AI model end-to-end with GPT-4o so that all inputs and outputs are processed by the same neural network.


As GPT-4o is a new model merging all of these different modalities, we are simply just scratching the surface of uncovering what the model can achieve and its boundaries, stated OpenAI.


OpenAI showcased GPT-4o's real-time audio conversation capabilities during the webcast, demonstrating its capacity to have genuine, responsive conversations without the usual 1-3 second latency seen with earlier models. The AI assistant even included music effects, laughing and singing in its responses. It also appeared to be able to identify and respond to the user's emotions with ease.


The improved visual comprehension of GPT-4o was also emphasised by the presenters. Users can talk about the visual information and get data analysis from GPT-4o by sharing screenshots, documents with text and photographs or charts. During the demonstration, the model showed that it could recognise emotions in selfies, analyse them and joke about with the photos.


Furthermore, GPT-4o demonstrated enhanced speed and quality in over 50 languages, encompassing 97% of the global population. Additionally the model demonstrated its real-time translation capabilities by skillfully enabling near-instantaneous translations during talks between speakers of several languages.


In September 2023, OpenAI first integrated conversational voice capabilities into ChatGPT. These features used a bespoke voice synthesis technology for the output and Whisper an AI speech recognition model for the input.


Three processes were previously employed by OpenAI's multimodal ChatGPT interface: text to speech, intelligence and transcription, each of which added latency. It is said that GPT-4o does all of that simultaneously. This "reasons across text, voice and vision," says Murati. They referred to this as a "omnimodel" in a slide that was displayed during the broadcast behind Murati's screen.



All ChatGPT users will have access to GPT-4o according to OpenAI, with paying customers having five times the rate limits of free users. Additionally the API has been modified with five times greater rate limitations, half the cost and double the speed of GPT-4 Turbo.


The features are reminiscent to the intelligent AI agent from the science fiction movie Her (2013). The main character in that movie grows attached to the AI's personality. Given that OpenAI's GPT-4o is emotionally expressive, it's not impossible that OpenAI's helper will grow to feel the same way. In terms of safety, Murati acknowledged the unique problems presented by GPT-4o's real-time audio and image capabilities. The company will continue its incremental deployment over the next few weeks, Murati said.


Post a Comment

0 Comments