2024-02-08

Gemini Versus Gemini: Understanding Google's Latest... Thing

An artistic rendition of the constellation Gemini on the ceiling of Grand Central Terminal, New York city. It shows the major stars of the constellation along with an illustration of twin boys superimposed over the stars. The stars and illustration are golden on a dark turquoise background.
Gemini, the Constellation
Grand Central Terminal Ceiling
Photograph by Allen Firstenberg


In 2023, Google made a number of announcements around new Generative AI features, but there were two that were most notable:

  • In February, they announced a conversational AI system called Bard would be available to the public to answer questions and help with creative tasks.
  • In May, they announced that an upcoming model known as Gemini would be powering many Google services in the future, and it would be available for outside developers to use.


During 2023, both these products had several updates, culminating in a recent announcement that the names of these two products were being merged and both would be known as Gemini.

Which now raises the question: What, exactly, do we mean when we talk about Gemini?
Let's try to untangle all the terminology.


Gemini, the Model

Julius: "My name is Julius Benedict and I'm your twin brother."
Vincent: "Oh, obviously!"
-- Arnold Schwarzenegger and Danny DeVito,Twins (1988)

At the heart of all this discussion is a multimodal machine learning model family known as Gemini.

Although it was announced in May of 2023 at Google I/O, details and public announcements of what it could do weren't released until December 2023. At this time, we learned that it was a machine learning model that was specifically trained on multimodal content. This means that it was trained to handle words, pictures, videos, and other media "modes" natively.

Gemini was divided into three sizes, with the understanding that the larger versions were more capable or could handle more complex tasks. All three, however, were multimodal. From smallest to largest, the three sizes were:

  • Nano
  • Pro
  • Ultra

When released in December 2023, there were also several announcements about how the Gemini model would be used:
  • Google was switching all of its products that used the previous generations of models (in the LaMDA or PaLM families) to use Gemini.
  • The first product to make this switch would be the Google Bard chatbot, where Gemini Pro would be the underlying model for most regions in the world.
  • Developers would have access to the Gemini Pro model through a cloud-based API.
  • Some early testers would have access to Gemini Nano on select Android devices through a library.

 

Gemini, the API

Well... the APIs.
"The best part of working with your twin? You always have someone to blame if things go wrong"
-- Unknown
Shortly after the Gemini announcement in December 2023, the model was made available to developers through an Application Programming Interface (API) and a set of libraries for a variety of different programming languages. The API provided access to two different variants of the model:
  • gemini-pro
  • gemini-pro-vision

Both are similar, but the gemini-pro-vision version was trained to take images (and sometimes videos) along with text for the input, while the gemini-pro version was better trained to be more conversational. Both could only return text.

Both of these models were available using two different developer platforms:
  • The Google Generative AI platform, sometimes known as the MakerSuite platform or the Google AI Studio platform
  • The Google Cloud Vertex AI platform

The two platforms were substantially the same, but there were slight differences between the two:
  • The MakerSuite platform was simpler to get up and running since developers could use a simple authentication scheme known as an API Key.
  • The VertexAI platform had a few more features, including video support, since it built on other Google Cloud features including authentication.


Importantly, however, the underlying model used by both is the same: Gemini Pro.


Gemini, the application

What's in a name? That which we call a rose,
By any other word would smell as sweet.
-- Romeo and Juliet, Act II, Scene 2, by William "The Bard" Shakespeare
In February 2024, Google announced several major developments with the Bard chatbot, the most surprising of which was that it was being renamed to Gemini. It also indicated that the entire suite of professional assistance tools, formerly known as Duet AI, would also come under the Gemini brand.

Other changes and updates included:
  • A split in features:
    • The basic Gemini chat would be using the Gemini Pro model for text-based work for all countries that can access the chatbot
    • The introduction of a premium level called Gemini Advanced which uses the Gemini Ultra model.
  • New features, including the ability to generate images using the Imagen 2 model.
  • The initial launch of an app for Android and iOS

So the natural question is how does Gemini, the chat application or assistant, differ from Gemini, the API, or Gemini, the model?

Gemini chat is a consumer-level application that provides a way for people to ask conversational questions that are handled by the Gemini model. It also has features that go beyond what the Gemini model or API handle, including:
  • Generating images using the Imagen 2 model
  • Accessing a user's personal email or files in Google Drive
  • Having access to up-to-date information from the internet
While the Gemini tools in Workspace provide specialized assistance about Google Cloud and Google Workspace, such as code assistance.

With this change, it is important to understand two things about the Gemini API:
  • It does not provide the same features that Gemini chat does.
  • It does not let you access Gemini chat through an API.

While developers can do things like write programs that use the Gemini API and have similar features to Gemini chat or the other Gemini assistants - developers must write the code to implement those features..

The Gemini model is used by all of of these products, along with several other products from Google. It may have more features and capabilities than either use or make available at this time.


Gemini, the Conclusion

“When I use a word,” Humpty Dumpty said in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.”
“The question is,” said Alice, “whether you can make words mean so many different things.”
“The question is,” said Humpty Dumpty, “which is to be master—that’s all.”
-- Through the Looking Glass, Lewis Carroll
While sometimes it is fine to use the term "Gemini" generically, we should make sure that it is clear what we're talking about.

If we're talking about the model, we should specify "the Gemini model" or a particular size such as "Gemini Pro".

If we're talking about the chat application, we should say "Gemini chat" or "the Gemini app" or talk about "Gemini Advanced chat". While if we're talking about other Google products under the Gemini name, we should be clear which we're talking about (such as "Gemini for Code Assistance").

If we're talking about developing, we'll probably talk about the "Gemini API" and possibly say which platform (Google AI Studio or Vertex AI) we're on. We may even talk about a particular model such as "gemini-ultra" or "gemini-pro-vision".

By following this guidance, we should make sure we are clearly understood. By human and AI model alike.



2024-01-04

Gemini, MakerSuite, API Keys, and "The caller does not have permission"

 On the Google Developer Community #gemini-api Discord channel, there have been a rise in the number of problems of people generating keys. Folks would say something like:

"I'm using MakerSuite with Gemini and I deleted an API Key. I went to create a new one, but I'm getting an error saying the caller does not have permission. What does that mean and how can I get a key?"


It took a few days to figure out what was going on, but we think we have a good solution. Let's take a look at what is going on, why, and what you can do about it.

API Keys and What Is Causing This

API Keys are a basic authorization system that lets Google authorize developers to access the Generative AI platform API, including API access to the Gemini model. Getting a key should be fairly easy - you select "Get API Key" at the Google AI Studio website and can then copy the key. You can then use this key in your code when you try to access the API.

Google uses this key to get a good idea how many different projects are accessing Gemini and makes sure it isn't being abused. Abuse prevention is important because the free tier for Gemini is limited to 60 queries per minute. It also leads to the problem people are now encountering.

Previously, you were able to create an unlimited number of keys. However, Google has apparently limited this to one key per project, probably as part of a plan to make sure people don't use multiple keys to get around the rate limit.

There appears to be a bug, however, where you can delete an API Key... but you won't be able to create a new one because Google thinks the key still exists.

In a way - it does.

The API Keys are actually associated with a Google Cloud project that gets created when you create a new key (the "Create API key in new project" button in the screen shot above). The MakerSuite console, however, hides this information behind the scenes to make it easier for developers to get started. When you delete a key, Google Cloud makes it so it can't be used - but also allows you to "undelete" it within 30 days in case there was a mistake. We'll use this fact to get your key back so you can use it.

Getting the Key Back

To get the key back, we'll go into the Google Cloud Console credentials page and restore the deleted key.

Go directly to the credentials page in Google Cloud Console at https://console.cloud.google.com/apis/credentials

  • Make sure the account is the same as the one you're using for MakerSuite. You should be able to see the account in the upper right hand corner.
  • If you have more than one project, make sure you're using the right one. The default keys are created in a project named "Generative Language Client", but you may have done it in a different project.

Select the "Restore Deleted Credentials" link.

For the key with the name "Generative Language API Key", select the "RESTORE" link.

On the pop-up, click the "RESTORE" button again.

Then click the back arrow to leave the "Deleted credentials" page.


On the Credentials page, you'll see that the credential has been restored.

And if you go back to the MakerSuite Google AI Studio, you'll see that the key now shows up there as well.

Creating a new API Key

In some cases, however, you actually want to keep the key deleted and need a new one. For example, you may have accidentally included the API Key in code that you posted on GitHub, and now need to invalidate it so nobody else can use it. But that means you'll need a new one.

You won't be able to use the Google AI Studio page to do this, but you will be able to do it through the Google Cloud Console page.

As above, you would go directly to the credentials page in Google Cloud Console at https://console.cloud.google.com/apis/credentials

    This time, however, you would select the "Create Credentials" link along the top and in the drop-down menu select "API key"

    The system will create the key and pop up a message saying it has done so, and let you copy the key at this time. More importantly, however, you'll see a warning that the key is unrestricted. This is a bad idea from a security perspective, so you should click on "Edit API key" to restrict how the key will be used.

    We want to restrict this so it can only use the Generative Language API, so we'll select the radio button to "Restrict key" and then make sure we locate and check the box next to "Generative Language API" and click on OK.


    We can also do other things from this page, such as change the name so it will be more obvious what it is used for, but that isn't necessary.

    Instead, we'll just save these settings.


    While the key will show up in the Google Cloud Console, it won't appear in the MakerSuite Google AI Studio key page. Instead, if you want to manage this key in the future, you'll need to do it from this page.

    Conclusion

    As I hope you've seen, while the MakerSuite Google AI Studio page simplifies managing your API Key for Gemini, you may sometimes need to use the Google Cloud Console Credentials page to address some issues you may encounter. Hopefully, this guide has made it relatively easy to navigate these tasks.

    If you have found this useful, please let me know. You can find my contact information on my website, prisoner.com. Or feel free to join the #gemini-api channel on the Google Developer Community Discord server.

    My thanks to the Googlers who have assisted in helping diagnose the problem and all the members of the #gemini-api channel who reported the problem and helped test this solution.