This post was originally published on this site
Gemini Live, Google’s answer to the recently launched (in limited alpha) Advanced Voice Mode for OpenAI’s ChatGPT, is rolling out on Tuesday, months after being announced at Google’s I/O 2024 developer conference. It was announced at Google’s Made by Google 2024 event.
Gemini Live lets users have “in-depth” voice chats with Gemini, Google’s generative AI-powered chatbot, on their smartphones. Thanks to an enhanced speech engine that delivers what Google claims is more consistent, emotionally expressive and realistic multi-turn dialogue, people can interrupt Gemini while the chatbot’s speaking to ask follow-up questions, and it’ll adapt to their speech patterns in real time.
Here’s how Google describes it in a blog post: “With Gemini Live [via the Gemini app], you can talk to Gemini and choose from [10 new] natural-sounding voices it can respond with. You can even speak at your own pace or interrupt mid-response with clarifying questions, just like you would in any conversation.”
Gemini Live is hands-free if you want it to be. You can keep speaking with the Gemini app in the background or when your phone’s locked, and conversations can be paused and resumed at any time.
So how might this be useful? Google gives the example of rehearsing for a job interview — a bit of an ironic scenario, but OK. Gemini Live can practice with you, Google says, giving speaking tips and suggesting skills to highlight when speaking with a hiring manager (or AI, as the case may be).
One advantage Gemini Live might have over ChatGPT’s Advanced Voice Mode is a better memory. The architecture of the generative AI model underpinning Live, Gemini 1.5 Pro and Gemini 1.5 Flash, have a longer-than-average “context window,” meaning they can take in and reason over a lot of data — theoretically hours of back-and-forth conversations — before crafting a response.
“Live uses our Gemini Advanced models that we have adapted to be more conversational,” a Google spokesperson told TechCrunch via email. “The model’s large context window is utilized when users have long conversations with Live.”
We’ll have to see how well this all works in practice, of course. If OpenAI’s setbacks with Advanced Voice Mode are any indication, rarely do demos translate seamlessly to the real world.
On that subject, Gemini Live doesn’t have one of the capabilities Google showcased at I/O just yet: multimodal input. Back in May, Google released pre-recorded videos showing Gemini Live seeing and responding to users’ surroundings via photos and footage captured by their phones’ cameras, for example naming a part on a broken bicycle or explaining what a portion of code on a computer screen does.
Multimodal input will arrive “later this year,” Google said, declining to provide specifics. Also later this year, Live will expand to additional languages and to iOS via the Google app; it’s only available in English for the time being.
Gemini Live, like Advanced Voice Mode, isn’t free. It’s exclusive to Gemini Advanced, a more sophisticated version of Gemini that’s gated behind the Google One AI Premium Plan, priced at $20 per month.
Other new Gemini features on the way are free, though.
Android users can soon (in the coming weeks) bring up Gemini’s overlay on top of any app they’re using to ask questions about what’s on the screen (a YouTube video, for example) by holding their phone’s power button or saying “Hey Google.” Gemini will be able to generate images (but still not images of people, unfortunately) directly from the overlay — images that can be dragged and dropped into apps like Gmail and Google Messages.
Gemini is also gaining new integrations with Google services (or “extensions,” as the company prefers to call them) both on mobile and the web. In the coming weeks, Gemini will be able to take more actions with Google Calendar, Keep, Tasks, YouTube Music and Utilities, the apps that control on-device features like timers and alarms, media controls, the flashlight, volume, Wi-Fi, Bluetooth and so on.
In a blog post, Google gives a few ideas of how people might take advantage. Sounds nifty, assuming it all works reliably:
- Ask Gemini to “make a playlist of songs that remind me of the late ’90s.”
- Snap a photo of a concert flier and ask Gemini if you’re free that day — and even set a reminder to buy tickets.
- Have Gemini dig out a recipe in your Gmail and ask it to add the ingredients to your shopping list in Keep.
Lastly, starting later this week, Gemini will be available on Android tablets.