2024-06-05

LangChain.js and Gemini: Getting Started

 Gemini is an exciting new set of generative AI models from Google that work with text and, in some forms, other media such as images or video. While getting started is simple, the number of options for running Gemini and securely authenticating yourself can seem complicated. Fortunately, LangChain.js makes this process easier.


In this tutorial, we will go over the two platforms you can use to run Gemini, the Google AI Studio API (sometimes called the "Gemini API") and on Google Cloud's Vertex AI platform. These two platforms support different authentication methods. We'll look at the different methods available, when to best use each, and how. Finally, we'll look at some code and see how this all fits into the LangChain.js architecture.


If you're eager - you can skip ahead to the code section to see what it will look like. But be sure to return to the top to understand how to pick a platform and authentication method.

Where Gemini runs

The Gemini family of models from Google form the underlying infrastructure for a number of different products. We won't be focusing on all these platforms, but for more info you can check out the article I've already written about them.




As developers, we are focusing specifically on those in blue since we can access them using LangChain.js. In particular, we are focusing on the two platforms that let us use an API to access the Gemini model:

  • Google's Generative AI platform, which can be accessed using the Google AI Studio previously known as MakerSuite

  • Google Cloud's Vertex AI platform


Gemini works quite similarly on these two platforms. The LangChain.js classes hide most of the differences between the two. So you can switch from one platform to the other. If you're familiar with OpenAI's access to the GPT models, and how the GPT models are also available using Microsoft's Azure cloud platform, there are some similarities between the concepts.


Why are there two platforms? As strange as it sounds - simplicity.

  • The AI Studio-based platform provides an easy way to get started using Gemini and a free tier to experiment with. While there are some restrictions on this free tier (including what countries it can be used in), it is good to get started with since authentication can be done fairly easily.

  • The Vertex AI-based platform is better suited for projects that are already using Google Cloud. It provides some additional features that work well with other Google Cloud components, but requires additional authentication requirements.


Much of the time, you will likely start with the AI Studio-based system and authentication, and then move to Vertex AI as your requirements change. LangChain.js makes this relatively easy, as we'll see.


Since the biggest difference between the two is how we authenticate to the platform, let's take a look at this. 

Authenticating to Gemini

In order to prevent abuse, Gemini ties each of your requests to a project in your Google account. To identify which account is being used, Google needs each request to be authenticated. There are two major ways this authentication is done:

  • An API Key, which you will send with each request.

  • OAuth with Google Cloud Authentication, which is usually done either with an explicit service account from Google Cloud that has been permitted to the API, or by Application Default Credentials associated with the Google Cloud project.


API Keys are fairly straightforward to use, but are somewhat less secure since they require the key to be protected. While they can be used from web pages, it is advised that this just be done during prototyping, since the key is accessible through the page source code, allowing anyone to use your key to access Gemini. Only AI Studio's API allows access through an API Key (although some methods require OAuth).


OAuth and Google Cloud Authentication is more secure, but does require a bit more setup. If you're using other Google services, or running your code on a Google Cloud Platform service, this can simplify some tasks. Both AI Studio and Vertex AI can use this method.


Let's go into some details about how to create the authentication credentials we can use. You will only need to use one of these strategies - but it can be useful to understand all of them for the future.

Getting an API Key

This is a pretty straightforward process that will give you a string that you can use as an API Key. (We'll see how to use this below.)


  1. Go to https://aistudio.google.com and click on the "Get API Key" button at the top of the left navigation. Or you can go there directly at https://aistudio.google.com/app/apikey

  2. If this is your first API key, click on "Create API Key in new project". (If you already have an API key and you need another, you can use "Create API key in existing project".)

  3. It will generate a new API Key for you. You can copy it at this time, or you can return to this page later and view your keys to get it.


This  key will allow you to make calls using the Generative AI platform's Gemini API, but not take other actions. If you do need to do more, you should consider switching to a service account, described below.


Above all - remember to protect this key. You shouldn't check it in as code anywhere, for example. If you do compromise your key, you can delete it from this page and replace it with a new one. Some tips to protect this key:

  • Don't hard code it directly into your code. LangChain.js allows it - but we won't show you how, and you shouldn't do it.

  • If you do store it in a file in your project (such as .env), make sure you add that file to your .gitignore file. Really - do that right away. Like now. We'll wait.

Setting up a Google Cloud Service Account Key

Service accounts are common ways to permit access to a limited set of APIs in Google Cloud. They are "virtual" accounts in the sense that you can't actually log in with them - but you can run programs that use the service account credentials, or permit all programs running on a Google Cloud machine (for example) to run with its credentials.


First, you will need to make sure the Gemini API is available in the Google Cloud project you're working on. This is easiest to do using the Google Cloud Console:

  1. Open the Google Cloud Console and select the project you'll be using (or create a new project).

  2. Select "APIs and Services" on the left navigation.

  3. Click on "Enable APIs and Services"

  4. Search for the API to enable:

    1. If you want to access Gemini through the AI Studio or Generative Language API, search for "Generative Language API" and select the result

    2. If you want to access Gemini through the Vertex AI API, search for "Vertex AI API" and select the result

  5. Click on the "Enable" button


You provide access to the service account using a service account key file. There are several ways to do this (see https://cloud.google.com/iam/docs/service-accounts-create for all the gory details) but the most common is using the Google Cloud Console: 


  1. In the Google Cloud Console, choose the project select "IAM & Admin" from the left navigation and then "Service accounts" on the sub-menu.

  2. Select "Create service account"

  3. Enter a service account name. Make it clear and easy to identify the purpose.

  4. It will use this to create a default service account ID and an email address based on this ID. You can change the ID right now to something else, but once created, it cannot be changed.

  5. If you wish, you can enter a description.

  6. Click "Create and Continue"

  7. Select roles for this service account

    1. If you are using Vertex AI, search for "Vertex AI User" and select it.

    2. If you are using the Generative Language API, you don't need to set anything.

  8. Click "Continue"

  9. You don't need to grant users access to this service account, so click "Done"

  10. You'll then see a list of service accounts. Click on the email of the service account you just created.

  11. Select "Keys" along the top menu.

  12. Click on "Add Key" then "Create new key"

  13. Make sure the JSON key type is selected and then the "Create" button

  14. It will create a key and download it to your machine. You should save and guard this key file in a safe place. If you lose it or lose track of it, you'll need to repeat starting at step 10 to get a new key file. We'll be using this key file below.

  15. You can close the popup window that tells you the key has been downloaded


Other Authentication Strategies

There are other authentication strategies that Google Cloud allows for what are suitable in different cases. We won't cover them here, but expect to see them in future articles.

  • Application Default Credentials (ADC) are valid when your code is running on a Google Cloud service or you have configured your development environment to use them. See  https://cloud.google.com/docs/authentication/provide-credentials-adc for how to set them up and if they are appropriate.

  • OAuth allows for users to authenticate to your service and provide per-user credentials for some services.


LangChain.js handles all of these authentication strategies through two libraries that you use depending on your environment, as we'll see next.

Authentication with LangChain.js

Once  we have either an API Key or our Service Account Key, we're ready to start writing code! Well... almost.


LangChain.js provides two different packages that do authentication, and you should only pick one of them! But how do you decide?


Which one you choose depends on the environment you'll be running in. While you can easily change later if you need to, it is good if you start with the right one:

  • @langchain/google-gauth is used if you're running in a Google hosted environment since it can provide streamlined access to service accounts or other Application Default Credentials (ADC).

  • @langchain/google-webauth lets you use service accounts in other environments that may not have access to a file system or to Google's ADC configuration. This could be inside a web browser, or in various edge server environments.


Both of them work with API Keys or with Service Accounts (although they treat service accounts slightly differently). When you're developing on your local machine, both work similarly. Both have the same classes (with the same names) to access the same Google APIs. Aside from the authentication portion, they work identically. The only difference will be the package you import your classes from.


So which one should you pick?

  • In most cases, you should go with the @langchain/google-gauth package. Particularly if you are going to deploy it in the Google Cloud Platform.

  • In some cases, particularly if you don't have access to a file system or environment, you may need to use @langchain/google-webauth

Fortunately, you can start with either and switch fairly simply.


As an aside, there are some other packages that have Google-related classes. More details about these packages will be in a future article.

Using google-gauth

Installation: yarn add langchain @langchain/core @langchain/google-gauth


To use this package, you'll be importing from @langchain/google-gauth, so your imports may look something like


import { GoogleLLM } from "@langchain/gooogle-gauth"

import { ChatGoogle } from "@langchain/google-gauth"


LangChain.js will attempt to authenticate you using one of the following methods, in this order:

  1. An API Key that is passed to the constructor using the apiKey attribute. (Usually this isn't necessary - you'll use the API Key from an environment variable.)

  2. Credentials that are passed to the constructor using the authInfo attribute

    1. This is an object of type GoogleAuthOptions defined by the google-auth-library package from Google and is passed to the constructor of a GoogleAuth object.

    2. See https://cloud.google.com/nodejs/docs/reference/google-auth-library/latest for more about how this can be used

  3. An API Key that is set in the environment variable API_KEY

  4. The Service Account credentials provided via an appropriate ADC method documented at https://cloud.google.com/docs/authentication/provide-credentials-adc. Usually this means one of:

    1. The Service Account credentials that are saved in a file. The path to this file is set in the GOOGLE_APPLICATION_CREDENTIALS environment variable.

    2. If you are running on a Google Cloud Platform resource, or if you have logged in using `gcloud auth application-default login`, then the default credentials.

Using google-webauth

Installation: yarn add langchain @langchain/core @langchain/google-webauth


To use this package, you'll be importing from @langchain/google-webauth, so your imports may look something like


import { GoogleLLM } from "@langchain/gooogle-webauth"

import { ChatGoogle } from "@langchain/google-webauth"


If you're using the google-webauth package, LangChain.js goes through the following steps, in this order, attempting to authenticate you:

  1. An API Key that is passed to the constructor using the apiKey attribute. (Usually this isn't necessary - you'll use the API Key from an environment variable.)

  2. Credentials that are passed to the constructor using the authInfo attribute

    1. This is an object of type WebGoogleAuth that, when created, includes a credentials attribute.

    2. This credentials attribute can either be a string containing the exact contents of the Service Account key JSON file that you downloaded.

    3. or it contains a Credentials object from the "web-auth-library/google" package with values that are in this file, including the project_id, private_key, and client_id

  3. An API Key that is set in the environment variable API_KEY

  4. The Service Account credentials that are saved directly into the GOOGLE_WEB_CREDENTIALS environment variable.

    1. These should be the exact contents of the Service Account key JSON file that you downloaded. Not a pointer to the file.

Yeah, whatever. Show me some code!

Once we have our authentication strategy clear, our authentication information saved or referenced in an environment variable, and the class we need imported, we can actually get to some code.

Setting environment variables

A very common way to set an environment variable when using Node.js is to set them using a .env file that you include alongside your source. This is useful because you can save the information in a file, but it is easy to make sure the file isn't checked into version control.


Keep in mind these are just examples! You should make sure you know how to set environment variables in your own setup.


If you're using AI Studio with an API Key, your .env file may have contents like this:


API_KEY=XIzaSyDv4XX0vqtJb-f5x4JnrBvalZfVQmGN0HI


If you're using gauth, you may not need a .env file if you're running your code on GCP. But if you're not, you may have it point to a certificate file with something like this:


GOOGLE_APPLICATION_CREDENTIALS=/etc/credentials/project-name-123456-a213e257d2d9.json


While if you're using webauth, it needs to contain the contents of that file with something like this:


GOOGLE_WEB_CREDENTIALS="{\"type\": \"service_account\" <lots of stuff omitted>"


There will be other articles in the future going over some of these details, but this should help get you started.

Simple text prompting

The most straightforward way to access an LLM model with text completion is by invoking it with a prompt and getting a string back as a response. It could look something like this:


// import GoogleLLM for your auth method, as described above


async function getJoke(){

  const prompt = "Tell me a short joke about ice cream.";

  const model = new GoogleLLM();

  const response = await model.invoke(prompt);

  return response;

}


console.log(await getJoke());


This lets us provide a static prompt in a string and get a string back that we can display. 

Conversations

Simple text prompts are good, but they are insufficient for two reasons:

  • Typically we take a prompt or message from the user and combine it with system instructions or other information as part of the prompt.

  • This is usually done as part of a back-and-forth conversation where we will need to save the history (or context) of the conversation in between messages from the user.

 

LangChain.js supports these in a few ways

  • Chat models specifically return additional information about the conversation, along with tools that format this output as needed, including plain text

  • Prompt templates and chat prompt templates so you can combine user input with your pre-defined instructions and structure

  • Tools to manage the history of a conversation, save and retrieve it to various data stores, and use them with prompt templates.

  • Letting you combine these tools into chains - components (called Runnables) that take input in a standard form and return output in a standard form that can be handed to the next step in the chain.


Here is a joke-telling bot that builds a chain to take a topic as a parameter, use it in a template, pass that to a model, and then format the output. We'll then try it out (or "invoke" it) with a value that we pass in.


// import the ChatGoogle class for your auth method, as described above


import { ChatPromptTemplate } from "@langchain/core/prompts";

import { StringOutputParser } from "@langchain/core/output_parsers";


async function getJoke(topic){

  // Setup the prompt template

  const template = "Tell me a joke about {topic}.";

  const prompt = ChatPromptTemplate.fromMessages([

    ["system", "You are a comedian, and your specialty is telling short jokes about a topic."],

    ["human", template],

  ]);


  // Setup our model

  const model = new ChatGoogle();


  // The output of a chat model contains a lot of additional information,

  // so we want to convert it to a string for this particular use.

  const outputParser = new StringOutputParser();


  // Create (but don't run) the chain

  const chain = prompt.pipe(model).pipe(outputParser);


  // Run the chain, providing it the values to put into the template.

  const response = await chain.invoke({

    topic,

  });

  return response;

}


console.log(await getJoke("ice cream"));


Some important notes about how Gemini works with the prompt messages:

  • The full history must start and end with "user" messages when it is sent to the model. (But see below about "system" messages.)

  • The history must alternate between "user" and "ai" messages. ie - this is a conversation where the user sends a message and the AI will send a response.

  • Older Gemini models do not support the "system" message. On these models, LangChain.js will map any "system" messages to a pair of "user" and "ai" messages on models that don't support it. (It does it this way to preserve the alternating "user" and "ai" messages.) Unless you want to force "system" message behavior, you don't need to worry about this.


Like many other models, you can pass attributes to the model constructor to influence some of its behavior. For example, adjusting the temperature will adjust the level of randomness used in selecting the next token for the output. Google also has parameters called topK and topP which determine how many tokens are submitted for random selection. For all of these, larger values tend to create more random and "creative" responses while lower ones tend to be more consistent.


So if we wanted a fairly random set of responses, we might create our model with something like


const model = new GoogleLLM({

  temperature: 0.9,

  topK: 5,

  topP: 0.8,

});


while one that tends to be less random might have values such as


const model = new GoogleLLM({

  temperature: 0.2,

  topK: 2,

  topP: 0.5,

});


You may wish to experiment as you seek good responses from Gemini.


Another common attribute is the name of the model to use. It defaults to the "gemini-pro" model, which will always point to the "latest stable" version. If, however, you want to lock it into a specific version, you may wish to provide a different name


const model = new GoogleLLM({

  modelName: "gemini-1.0-pro-001",

});


The model names available may vary between AI Studio and Vertex AI, so you should check the list for each platform, and a future article will discuss how you can get the list yourself.

Prompts with images

Gemini also offers models that allow you to include images (and, in some cases, audio and video) as part of your prompt. To do so, we'll need to build a prompt that contains multiple parts - in this case, a text part and an image part. The image part should specify the image as a "Data URL". We'll build a few functions that help us put things together.


First, we'll need a function that loads in the image and returns a Data URL. This function works in node and uses the local file system, but it could be adapted to get the image in other ways. It also makes the assumption that the image is a PNG file, which isn't always a good assumption.


import fs from "fs";

function filenameToDataUrl(filename) {

  const buffer = fs.readFileSync(filename);

  const base64 = buffer.toString("base64");


  # FIXME - Get the real image type

  const mimeType = "image/png";


  const url = `data:${mimeType};base64,${base64}`;

  return url;

}


We'll also need a function that will take the Data URL for the file  and create the prompt that we want for it.


The prompt parts need to be an array of MessageContentComplex types. In our case, we will need one text type and one image_url type, although you can have multiple of each. We will assign these to the content attribute passed into a HumanMessageChunk. We'll then use this to build the ChatPromptTemplate that we'll be returning to use in the chain.


import { HumanMessageChunk } from "@langchain/core/messages"; import { ChatPromptTemplate } from "@langchain/core/prompts"; function buildPrompt(dataUrl) { const message = [ { type: "text", text: "What is in this image?", }, { type: "image_url", image_url: dataUrl, }, ]; const messages = [ new HumanMessageChunk({ content: message }), ]; const template = ChatPromptTemplate.fromMessages(messages); return template; }


Finally, we can tie this together with our model class that has been created with a stable Gemini 1.5 Flash model that supports images and other media. (We're using a different model here since Gemini 1.5 has better multimodal support.)


async function describeImageFile(filename) {

  // Use the functions we defined above to build the prompt const dataUrl = filenameToDataUrl(filename); const prompt = buildPrompt(dataUrl); const model = new ChatGoogle({ modelName: "gemini-1.5-flash-001", }); // The output of a chat model contains a lot of additional information, // so we want to convert it to a string again. const outputParser = new StringOutputParser(); // Create, but don't execute, the chain const chain = prompt.pipe(model).pipe(outputParser); // Execute the chain. We don't have any other parameters. const response = await chain.invoke(); return response

}


const filename = "img/langchain-logo.png";

const text = await describeImageFile(filename)

console.log(text);


Dealing with video and audio requires some additional techniques that we'll discuss in a future article.

Beyond the Basics - What's Next?

This introduction seems like a lot - but you've gotten past the most difficult parts!


You should have a better understanding of the different Gemini API platforms that are available, how you can authenticate to each of them, and which LangChain.js packages you'll need. We've also looked at the basics of using LangChain.js with Gemini to do some simple text prompting, prompting that includes images in your prompt, and conversational prompting.


One great thing about LangChain.js is that the models are, for the most part, interchangeable. So if you see an example that uses ChatOpenAI, then in many cases you can swap it for ChatGoogle. Give it a try and see how it works!


Future articles will explore how we can use LangChain.js with other Google tools, including:

  • Advanced Safety features that Gemini offers to screen both prompts and potential responses for dangerous information, how you can tune how sensitive Gemini is to this, and how to handle safety and other content violations.

  • More about the various media capabilities that the Gemini 1.5 models offer.

  • Other settings that are useful in various situations, including streaming responses, JSON responses with schema definitions. and using built-in tool definitions.

  • Getting the Gemini API to turn human statements into well defined structures that you can use to make function calls with parameters.

  • Using embeddings to turn text into vectors of numbers that can be used to do semantic searches and the variety of vector databases (from Google and others) that can assist with these tasks.

  • Accessing other models in Google Cloud's Vertex AI Model Garden, fine tuned models in AI Studio, or even locally.

  • The internal design decisions about the LangChain.js Google modules so you can leverage the authentication system to access other tools offered by Google.


No comments:

Post a Comment