Code is a Four Letter Word: February 2024

An artistic rendition of the constellation Gemini on the ceiling of Grand Central Terminal, New York city. It shows the major stars of the constellation along with an illustration of twin boys superimposed over the stars. The stars and illustration are golden on a dark turquoise background.

Gemini, the Constellation
Grand Central Terminal Ceiling
Photograph by Allen Firstenberg

In 2023, Google made a number of announcements around new Generative AI features, but there were two that were most notable:

In February, they announced a conversational AI system called Bard would be available to the public to answer questions and help with creative tasks.
In May, they announced that an upcoming model known as Gemini would be powering many Google services in the future, and it would be available for outside developers to use.

During 2023, both these products had several updates, culminating in a recent announcement that the names of these two products were being merged and both would be known as Gemini.

Which now raises the question: What, exactly, do we mean when we talk about Gemini?
Let's try to untangle all the terminology.

Gemini, the Model

Julius: "My name is Julius Benedict and I'm your twin brother."
Vincent: "Oh, obviously!"
-- Arnold Schwarzenegger and Danny DeVito,Twins (1988)

At the heart of all this discussion is a multimodal machine learning model family known as Gemini.

Although it was announced in May of 2023 at Google I/O, details and public announcements of what it could do weren't released until December 2023. At this time, we learned that it was a machine learning model that was specifically trained on multimodal content. This means that it was trained to handle words, pictures, videos, and other media "modes" natively.

Gemini was divided into three sizes, with the understanding that the larger versions were more capable or could handle more complex tasks. All three, however, were multimodal. From smallest to largest, the three sizes were:

Nano
Pro
Ultra

When released in December 2023, there were also several announcements about how the Gemini model would be used:

Google was switching all of its products that used the previous generations of models (in the LaMDA or PaLM families) to use Gemini.
The first product to make this switch would be the Google Bard chatbot, where Gemini Pro would be the underlying model for most regions in the world.
Developers would have access to the Gemini Pro model through a cloud-based API.
Some early testers would have access to Gemini Nano on select Android devices through a library.

Gemini, the API

Well... the APIs.

"The best part of working with your twin? You always have someone to blame if things go wrong"
-- Unknown

Shortly after the Gemini announcement in December 2023, the model was made available to developers through an Application Programming Interface (API) and a set of libraries for a variety of different programming languages. The API provided access to two different variants of the model:

gemini-pro
gemini-pro-vision

Both are similar, but the gemini-pro-vision version was trained to take images (and sometimes videos) along with text for the input, while the gemini-pro version was better trained to be more conversational. Both could only return text.

Both of these models were available using two different developer platforms:

The Google Generative AI platform, sometimes known as the MakerSuite platform or the Google AI Studio platform
The Google Cloud Vertex AI platform

The two platforms were substantially the same, but there were slight differences between the two:

The MakerSuite platform was simpler to get up and running since developers could use a simple authentication scheme known as an API Key.
The VertexAI platform had a few more features, including video support, since it built on other Google Cloud features including authentication.

Importantly, however, the underlying model used by both is the same: Gemini Pro.

Gemini, the application

What's in a name? That which we call a rose,
By any other word would smell as sweet.
-- Romeo and Juliet, Act II, Scene 2, by William "The Bard" Shakespeare

In February 2024, Google announced several major developments with the Bard chatbot, the most surprising of which was that it was being renamed to Gemini. It also indicated that the entire suite of professional assistance tools, formerly known as Duet AI, would also come under the Gemini brand.

Other changes and updates included:

A split in features:

The basic Gemini chat would be using the Gemini Pro model for text-based work for all countries that can access the chatbot
The introduction of a premium level called Gemini Advanced which uses the Gemini Ultra model.

New features, including the ability to generate images using the Imagen 2 model.
The initial launch of an app for Android and iOS

So the natural question is how does Gemini, the chat application or assistant, differ from Gemini, the API, or Gemini, the model?

Gemini chat is a consumer-level application that provides a way for people to ask conversational questions that are handled by the Gemini model. It also has features that go beyond what the Gemini model or API handle, including:

Generating images using the Imagen 2 model
Accessing a user's personal email or files in Google Drive
Having access to up-to-date information from the internet

While the Gemini tools in Workspace provide specialized assistance about Google Cloud and Google Workspace, such as code assistance.

With this change, it is important to understand two things about the Gemini API:

It does not provide the same features that Gemini chat does.
It does not let you access Gemini chat through an API.

While developers can do things like write programs that use the Gemini API and have similar features to Gemini chat or the other Gemini assistants - developers must write the code to implement those features..

The Gemini model is used by all of of these products, along with several other products from Google. It may have more features and capabilities than either use or make available at this time.

Gemini, the Conclusion

“When I use a word,” Humpty Dumpty said in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.”
“The question is,” said Alice, “whether you can make words mean so many different things.”
“The question is,” said Humpty Dumpty, “which is to be master—that’s all.”
-- Through the Looking Glass, Lewis Carroll

While sometimes it is fine to use the term "Gemini" generically, we should make sure that it is clear what we're talking about.

If we're talking about the model, we should specify "the Gemini model" or a particular size such as "Gemini Pro".

If we're talking about the chat application, we should say "Gemini chat" or "the Gemini app" or talk about "Gemini Advanced chat". While if we're talking about other Google products under the Gemini name, we should be clear which we're talking about (such as "Gemini for Code Assistance").

If we're talking about developing, we'll probably talk about the "Gemini API" and possibly say which platform (Google AI Studio or Vertex AI) we're on. We may even talk about a particular model such as "gemini-ultra" or "gemini-pro-vision".

By following this guidance, we should make sure we are clearly understood. By human and AI model alike.

Code is a Four Letter Word

2024-02-08

Gemini Versus Gemini: Understanding Google's Latest... Thing

Gemini, the Model

Gemini, the API

Gemini, the application

Gemini, the Conclusion