Multimodal mind-blowers!

This week OpenAI and Google launched powerful new AI abilities.

May 15, 2024

A glimpse of the (near) future: ChatGPT voice app coaches a learner in real time while they solve a maths problem. See the whole video here.

Dear reader,

Welcome to BN Edition: concise analysis on the stories that offer us hints at our unfolding future.

Each edition takes the most important stories from recent weeks and asks three things:

What? The story in a few sentences.
So what? Why do I need to know?
What next? What do I need to do or watch out for?

It’s been A WEEK in AI.

There was a blizzard of announcements from two of the biggest players in AI: OpenAI and Google.

This week, find out about the two most important announcements:

OpenAI launched 4o
Apple launch I/O

OpenAI launches 4o

What?

On Monday, ChatGPT creator OpenAI announced a faster, better, faster, stronger version of its GPT-4 model calling it 4o (the lowercase o stands for omni).

Here’s the top headlines:

It’s connected to the web: ChatGPT-4 was trained on information up to December 2023, 4o can search for information in real-time, retrieve the latest news, and find specific data or sources.
You can talk to it: There’s been impressive improvements to its conversation feature (accessed by the headphones symbol on the app), including faster responses and the ability to read and express simulated emotions. Side note: If you use the voice ‘Sky’ in your Settings, it sounds remarkably like Scarlett Johanssen in the movie Her.
It’s multimodal: It accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs.
It’s multilingual: A big upgrade is its new expressive voice/natural voice chat and translation features. Previous models weren’t great at languages that weren’t English. This model can perform real time translation – you speak and it immediately speaks for you – in 50 languages.
There’s a desktop app incoming: Soon, all Plus users will be able to download a ChatGPT macOS app (Windows version coming later in the year). By pressing Option + Space), you can instantly ask ChatGPT a question about something (anything) on your screen.

Three important things to note about this update:

They made the improvements iImmediately available to customers of ChatGPT Plus and Team).
Standard GPT-4 will be free to everyone from next week. And people will also be able to use the new 4o in small amounts. So if previously you only had access to the 3.5 model, you’ll be given an upgrade to the much better 4 model for free!
The GPT Store (the ChatGPT version of the App Store will also be open free to everyone.

All of this can seem abstract until you try it.

Here’s some demo videos. Watching these, you can see how big a change is coming:

Source: OpenAI

So what?

This is a hugely democratising, yet highly political move from OpenAI.

This is how they positioned it:

“In line with our mission, we are focused on advancing AI technology and ensuring it is accessible and beneficial to everyone. Today we are introducing our newest model, GPT-4o, and will be rolling out more intelligence and advanced tools to ChatGPT for free.”

Released a day before Google was due to release major news they got out ahead to start the announcement season. OpenAI on Monday, Google on Tuesday, Microsoft next week, and then in June: Cisco, Apple and others are making announcements around AI also.

Usually, costs come down as technologies mature. But by making their most powerful models free for all, OpenAI is trying to win by putting its product in as many hands as possible for free.

What next?

Clearly, the race is on.

Our advice is:

Double down on AI plans: team literacy and capability, security, innovation, strategy and policy are even more important.
Compare to Google’s IO announcement: OpenAI’s announcement’s timing was a spoiler tactic and they seriously upstaged Google's AI announcement. We’re now living in a world where Google is behind. It feels like the axis are shifting.
Get hands on: As with the previous version, the best way to understand the new model and features is to try them out for yourself.

FAQs

We had a few immediate client questions about this update:

Q: What's the point of paying for premium ChatGPT Team licences?

ChatGPT Team is still the most secure option – chats held within a Team account are not shared with ChatGPT and not used to train the model. Within a Team account or indeed a Plus account (the one you used to have to access GPT-4) you now get access to all the new features for no extra cost. And it’s only $25 a month (still the “bargain of the century” in our view).

This sudden shift in capabilities and pricing also shows why paying for monthly services at the moment is a better option than committing to multi-year contracts and solutions. Gen AI isn’t software as we know it, and it shouldn’t be bought like traditional software. Not yet.

Q: Is this ChatGPT-5?

No. OpenAI still plans to launch ChatGPT-5 this year. This is a sort of bonus update to the existing model – faster and easier to use.

Q: Is it better than ChatGPT-4?

Yes. After a day’s experimenting, we’ve found it to be considerably better in terms of:

Speed: The model is 2x faster
Image generation: It can now include text in an image
Voice capability: It sounds and feels more human and emotive

What’s NOT different though, is its level of intelligence.

It will still get things wrong and make things up. But because this model has been trained on voice, text and image at the same time, it doesn’t have that one-dimensional text-only output of previous models.

That’s what makes it seem like it has more human-level intelligence.

Google launches I/O

What?

24 hours later, on Tuesday, Google launched I/O – which is what they’re calling a collection of several AI features available to Google users, primarily focusing on the new capabilities of the Gemini AI models.

Key features include:

Video-based searches via Google Lens
Integration of Gemini in Google Workspace
The launch of Gemini 1.5 Flash for quick-response tasks

They also launches two new innovations:

Project Astra: a multimodal AI aimed at transforming virtual assistance (making Siri better?)
Veo: a generative video model (similar to OpenAI’s Sora)

Source: Google

So what?

The short version of Google’s 3 hour event was: “let Google do the Googling for you”. Their intention is clear – to keep their edge by redefining search with AI.

Google and OpenAI’s presentations this week could not be more different in style. Google’s very much in the style of your typical big tech update – huge stage, celebrity ambassadors, the CEO reading from an auto-cue and a rapturously clapping audience. OpenAI are positioning themselves as the antidote to that. Their announcement was essentially a fireside chat and almost entirely demos. Google had a lot to talk about, but it felt conceptual and intangible (admittedly, Google’s was a ‘developer conference’ and OpenAI’s was for everyone, but Google must know the world was watching.)

But despite their difference in delivery, it’s clear both Google and OpenAI share a vision for how AI will integrate with our lives. And that is as a kind of an omnipresent, omniscient digital assistant or as multiple, connected ‘agents’.

What next?

Multimodal redefined: Both Google and OpenAI have launched models that can see and observe things around you and comment on them.

Multimodal AI systems can process and understand data in different forms like text, speech, images and video. Previously applications of this would be creating content in one format using input for another, or being able to describe photographs and diagrams as input data.

OpenAI and Google’s new apps this week mean that it can act as a co-pilot in the real world, interpreting and commenting on what it can see in the world or on your screen. The real-time conversation feature in this was brought to life by OpenAI in a demo where the founder of the Khan Academy and his son worked on a maths problem on an iPad with ChatGPT coaching them as they went.

Still, nobody knows anything: Although we have big tech telling us what the future is going to be like, nobody really knows what this will mean long-term yet.

Amara's Law states:

"We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run."

Initial hype over new technologies leads to early disappointment, only to be followed by significant, transformative impacts over time. This insight is crucial for understanding the trajectory of AI and digital transformation, where real value unfolds gradually but profoundly. And the only way to stay ahead is to stay engaged and stay learning.

What will next week bring?

The Brilliant Noise team

Multimodal mind-blowers!

This week OpenAI and Google launched powerful new AI abilities.

OpenAI launches 4o

So what?

What next?

Google launches I/O

What?

So what?

What next?

Discussion about this post