2025-02-05

Grounding Results with Google Search, Gemini, and LangChainJS

Have you ever used a Large Language Model (LLM) to help answer factual questions, only to seem like it is making up, or hallucinating, the results? Retrieval Augmented Generation (RAG) is a frequent solution for this problem, and modern LLMs can use tool and function calling with LangChainJS to make RAG solutions even easier.

Gemini takes this several steps further by providing tools that are well integrated into Gemini, and Gemini 2.0 improves this even more. One tool in particular, Grounding with Google Search, helps you bring factual and current information from Google Search into your results. It even provides you references for the results!

We’ll take a look at how Grounding with Google Search works, how you can enable this tool in your calls to Gemini, the differences between how it works in Gemini 1.5 and Gemini 2.0 (and how LangChainJS) hides the differences between the two, and how you can easily format the results using the Lang Chain Extension Language.

The beginning… and the BIG problem.

As you may remember, Gemini can be accessed on two different platforms, AI Studio and Vertex AI, with several different auth methods involved. If you’re not familiar with developing for Gemini with LangChainJS, you should check out LangChain.js and Gemini: Getting Started.

For simplicity, we’ll be using the ChatGoogle class from the google-gauth package, but you can use one that meets your need.

Our code to ask a question and print the results might look something like this:

import { ChatGoogle } from "@langchain/google-gauth";
const question = "Who won the Nobel Prize in physics in 2024?"
const modelName = "gemini-2.0-flash-001";
const model = new ChatGoogle({
modelName,
temperature: 0,
});
const result = await model.invoke(question);
console.log(result.content);

Since Gemini 2.0Flash has a knowledge cutoff date before the Nobel Prizes were awarded in 2024, the answer might be something like this:

I don't know, since the Nobel Prizes are usually awarded in October.

Not terribly useful, eh? What can we do about it?

Getting answers with Google Search

If we had done a normal Google Search, or even used Google’s Gemini chatbot, to ask this question we’d have gotten the names of the recipients along with other information. But when calling the Gemini API, we will need some other tools to help us ground our query.

Gemini 1.5 provides this using a tool named googleSearchRetrieval, while Gemini 2.0 has a similar tool called googleSearch. While there are some differences between the two, LangChainJS lets you use either, no matter which model you choose.

We can import the tool to serve as a reference in typescript:

import { GeminiTool } from "@langchain/google-common";

Then we’ll configure the tool and create a model object that is aware of the tool thusly:

const searchTool: GeminiTool = {
googleSearch: {},
};
const tools = [searchTool];
const model = new ChatGoogle({
modelName,
temperature: 0,
}).bindTools(tools);

With this, we can invoke the model with our question as we did in the previous section:

const result = await model.invoke(question);
console.log(result.content);

and get a significantly different answer. Perhaps something like this:

The 2024 Nobel Prize in Physics was awarded jointly to 
Geoffrey E. Hinton and John J. Hopfield.

One difference between this result and what you might get by doing a “traditional” Google Search on the website or on the Google app is that we can follow links to help verify the information we get. Calling the Gemini API this way doesn’t give us access to this information.

Or does it?

Cite your sources!

Gemini’s Google Search Tool provides much of the same reference information that we could have gotten through Google Search itself, although it is in a slightly different form. LangChainJS provides this as part of the result object that we get from invoking the model with our prompt, specifically in the response_metadata object.

We can look at the groundingMetadata attribute with something like this:

const result = await model.invoke(question);
const grounding = result.response_metadata.groundingMetadata ?? {};
console.log(JSON.stringify(grounding, null, 2));

The groundingMetadata object contains several fields that can be useful to us. These are all objects provided directly from Gemini, and you can read the details at the documentation that I've linked to.

webSearchQueries and searchEntryPoint

documentation

Google requires you to provide the search queries as links to a Google Search Results page. To help you do this, it provides a list of the search queries that you need to provide in webSearchQueries. It also provides some formatted HTML which accomplishes this in searchEntryPoint.

groundingChunks

documentation

This is an array of web objects that contain a title and uri. These are references that were used to ground the information. You can provide them as citations. The title is usually the name of a domain, while the uri is a specialized URL that redirects through Google to go to the site with the information itself.

groundingSupports

documentation

This is an array of objects that contain three attributes in each array element:

  • segment - Which part of the content we're talking about. This contains the start and end index into the string content.
  • groundingChunkIndicies - An array of numeric indexes into the groundingChunks array. This indicates which of those chunks were used to create that output, or is a reference to that output.
  • confidenceScores - Another array of numbers between 0 and 1 indicating how likely the reference specified by the corresponding groundingChunkIndicies element is relevant to the reply given.

You can use all this information to format the output so it provides the reference information to the person using your application.

That seems like it might end up being complicated, however, doesn’t it? Is there a better way?

Formatting the response and understanding the source

Fortunately, LangChainJS provides a way to take the output of a model and transform it into a more suitable format. These are known as output parsers and work with the LangChain Extension Language (LCEL) as a core component of the library.

We might visualize it as something like this:

The Google modules provide a BaseGoogleSearchOutputParser abstract class which can take an AIMessage and, if there is groundingMetadata, format it to include this metadata. We'll look at how to create your own formatter in a moment, but there are some formatters already provided to address a couple of common use cases.

The SimpleGoogleSearchOutputParser does some basic formatting and is useful for examining the output from the search grounding. More useful is probably the MarkdownGoogleSearchOutputParser which is used to create output that looks very similar to how AI Studio formats the output when search grounding is turned on.

When using these classes, we’ll need to import the parser we want from the google-common package with something like this:

import { SimpleGoogleSearchOutputParser } from "@langchain/google-common";

We then setup a chain where the model will pipe the output to this output parser, and we’ll invoke this chain. Since the output of this chain will be a string, we can just display it. This part might look something like this:

const parser = new SimpleGoogleSearchOutputParser();
const chain = model.pipe(parser);
const result = await chain.invoke(question);
console.log(result);

This might give us output that looks something like this:

Google Says:
The 2024 Nobel Prize in Physics was awarded jointly to John J. Hopfield
and Geoffrey E. Hinton. [1, 2, 3, 4] They were recognized for their
foundational discoveries and inventions that enable machine learning with
artificial neural networks. [1, 2, 4, 3] Hinton's contributions included
work on the Boltzmann machine, building upon the Hopfield network. [3]
1. aip.org - https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUBnsYv92K2WO8rhfenrR_l8VlkUrXzm4saKwerbhp50YzLAfYZpOUVIknxhZgjQwLy2i1phmdH_zfWquaBFhfSwuMemcI9UvHls0UCuBraT7M0XWOrEl-yegUWO5lz8wS-WGb9NSpXyKJ2LEHa4IVqKZaur9I9pQxN1
2. youtube.com - https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUBnsYtjy5nhKdekL8InSP-JiGxdvPYgbfcp27IgM96nAohukAkHqsY4pVJ4X4liElC7-dZyzDVWdyu-rsrRJWBXjB0gSChWaz2wTAMzbirTyxfYLbPiHL6lp_QaX8trlUD9V3a4Y92OYg==
3. utoronto.ca - https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUBnsYufnzAWAqKJkUgLlseRJW_Q2paQlbO0siY0W9jtDtQ9XittUnt4yaK76wtBNv95qHXYA9NbtUl7akTnRwZsVU4UqR84XxOX1UGsnMfts1_lQpUHi_7eOKy5EtPEPoK07_fq8bY0OEQh9YaS_2SgPqRe8zLydqOr3xp38i8cvUTfmFlK-ZCfaWRmlVZI
4. nsf.gov - https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUBnsYsc9tzlR82NAGHfI2wGz8GpTqnEyP5FXcnSkCYb2zWcj2GSbPq15disZJvYuu3yrxpi7IKFEeeELpaTN_EVwX4T6QxPPVxnyW7NaQ3XizRmZR3AZpLx1oidL57OEsTQJ3zFXJTQwvWnmnvMonkpF4PWYD116vL2101py-vGLaHAtCh9Fuj93_o=

While the MarkdownGoogleSearchOutputParser works similarly, just with a slightly different format, and should work for most needs, there may be cases where you want to modify either of these or create a completely different formatter. Let's take a look at how you can do this.

Building your own formatter

Although LangChainJS provides some useful formatters with the SimpleGoogleSearchOutputParser and MarkdownGoogleSearchOutputParser, there are certainly cases where it might make sense to either modify one of these or to create your own.

The BaseGoogleSearchOutputParser is an abstract class that provides four abstract methods that you would need to implement, plus one other that might be useful to either use or modify.

textPrefix() and textSuffix()

These two methods each take two parameters:

  • text - a string with the body of the reply from the Model.
  • grounding - the GroundingInfo which contains a metadata field with all the metadata specified above in the reply.

It is expected to either return a string or undefined. Undefined is treated as if it was an empty string.

As the names suggest, you use this to specify a string that you wish to put either before or after content from the model.

searchSuggestion()

This method is already defined, is passed a GroundingInfo object, and returns a string with the renderedContent field from the info. You may wish to include this as part of your textSuffix() implementation.

segmentPrefix() and segmentSuffix()

These methods are called for each segment that is available - the part of the content that we're talking about in each element of the groundingSupports. As the names suggest, these are called to insert text either just before or just after the text in the segment.

They each take three parameters:

  • grounding - the GroundingInfo, which contains the metadata for the entire response.
  • support - The support information about this specific segment.
  • index - Which segment index we're working with, starting with 0

An example

With this, we can see how the SimpleGoogleSearchOutputParser works. Our goal in this is to have a formatter that will put a numbered list of references after the relevant text, and then have that list appear at the bottom.

Our class is straightforward. It needs to extend from the BaseGoogleSearchOutputParser and doesn't need a constructor, since there are no other parameters to configure:

export class SimpleGoogleSearchOutputParser extends BaseGoogleSearchOutputParser {
// ...
}

There isn’t anything we need to print before each text segment, so we can have that function return undefined:

protected segmentPrefix(
_grounding: GroundingInfo,
_support: GeminiGroundingSupport,
_index: number
): string | undefined {
return undefined;
}

But after each segment, we want the link indices to be a comma-separated list in square brackets. The index number is 0-based, but we want to start with “1” since that is more human friendly:

protected segmentSuffix(
_grounding: GroundingInfo,
support: GeminiGroundingSupport,
_index: number
): string | undefined {
const indices: number[] = support.groundingChunkIndices.map((i) => i + 1);
return ` [${indices.join(", ")}]`;
}

Before the entire block of text, we just want to print a simple message about where the results come from:

protected textPrefix(_text: string, _grounding: GroundingInfo): string {
return "Google Says:\n";
}

And then after, we need to print that list of indexes (starting with 1). To do this, we’ll make a utility method that gets the information we need (the title and URI) and formats it along with the index number. We can then loop over all of the groundingChunks and add this formatted string to the value returned:

protected chunkToString(chunk: GeminiGroundingChunk, index: number): string {
const info = chunk.retrievedContext ?? chunk.web;
return `${index + 1}. ${info.title} - ${info.uri}`;
}
  protected textSuffix(_text: string, grounding: GroundingInfo): string {
let ret = "\n";
const chunks: GeminiGroundingChunk[] = grounding.metadata.groundingChunks;
chunks.forEach((chunk, index) => {
ret = `${ret}${this.chunkToString(chunk, index)}\n`;
});
return ret;
}

Conclusions

Large Language Models, such as Gemini, aren’t reliable when it comes to answering questions. We should never rely on an LLM to provide “truth” and this undermines our trust in these systems.

Fortunately, there are approaches that help us “ground” an LLM’s answers in sets of data to help them be more trustworthy.

Grounding with Google Search becomes a powerful tool that you can use that provides additional information about where the results come from, letting people better understand the context of the answers. This provides even more information that a person can use to evaluate how much they trust the answers.

LangChainJS further enhances this, providing tools that you can use to format these results in ways that people would expect. These OutputParsers are flexible and let you tailor the output to your needs.

I hope you enjoyed this article! I look forward to hearing about how Grounding with Google Search and LangChainJS has helped you develop better AI agents.

Acknowledgements

The development of LangChainJS support for Search Grounding in Gemini 2.0, the Output Parsers supporting it, and this documentation were all supported by Google Cloud Platform Credits provided by Google. My thanks to the teams at Google for their support.

Special thanks to Linda Lawton, Denis V., Steven Gray, and Noble Ackerson for their help, feedback, and friendship.

2024-06-05

LangChain.js and Gemini: Getting Started

 Gemini is an exciting new set of generative AI models from Google that work with text and, in some forms, other media such as images or video. While getting started is simple, the number of options for running Gemini and securely authenticating yourself can seem complicated. Fortunately, LangChain.js makes this process easier.


In this tutorial, we will go over the two platforms you can use to run Gemini, the Google AI Studio API (sometimes called the "Gemini API") and on Google Cloud's Vertex AI platform. These two platforms support different authentication methods. We'll look at the different methods available, when to best use each, and how. Finally, we'll look at some code and see how this all fits into the LangChain.js architecture.


If you're eager - you can skip ahead to the code section to see what it will look like. But be sure to return to the top to understand how to pick a platform and authentication method.

Where Gemini runs

The Gemini family of models from Google form the underlying infrastructure for a number of different products. We won't be focusing on all these platforms, but for more info you can check out the article I've already written about them.




As developers, we are focusing specifically on those in blue since we can access them using LangChain.js. In particular, we are focusing on the two platforms that let us use an API to access the Gemini model:

  • Google's Generative AI platform, which can be accessed using the Google AI Studio previously known as MakerSuite

  • Google Cloud's Vertex AI platform


Gemini works quite similarly on these two platforms. The LangChain.js classes hide most of the differences between the two. So you can switch from one platform to the other. If you're familiar with OpenAI's access to the GPT models, and how the GPT models are also available using Microsoft's Azure cloud platform, there are some similarities between the concepts.


Why are there two platforms? As strange as it sounds - simplicity.

  • The AI Studio-based platform provides an easy way to get started using Gemini and a free tier to experiment with. While there are some restrictions on this free tier (including what countries it can be used in), it is good to get started with since authentication can be done fairly easily.

  • The Vertex AI-based platform is better suited for projects that are already using Google Cloud. It provides some additional features that work well with other Google Cloud components, but requires additional authentication requirements.


Much of the time, you will likely start with the AI Studio-based system and authentication, and then move to Vertex AI as your requirements change. LangChain.js makes this relatively easy, as we'll see.


Since the biggest difference between the two is how we authenticate to the platform, let's take a look at this. 

Authenticating to Gemini

In order to prevent abuse, Gemini ties each of your requests to a project in your Google account. To identify which account is being used, Google needs each request to be authenticated. There are two major ways this authentication is done:

  • An API Key, which you will send with each request.

  • OAuth with Google Cloud Authentication, which is usually done either with an explicit service account from Google Cloud that has been permitted to the API, or by Application Default Credentials associated with the Google Cloud project.


API Keys are fairly straightforward to use, but are somewhat less secure since they require the key to be protected. While they can be used from web pages, it is advised that this just be done during prototyping, since the key is accessible through the page source code, allowing anyone to use your key to access Gemini. Only AI Studio's API allows access through an API Key (although some methods require OAuth).


OAuth and Google Cloud Authentication is more secure, but does require a bit more setup. If you're using other Google services, or running your code on a Google Cloud Platform service, this can simplify some tasks. Both AI Studio and Vertex AI can use this method.


Let's go into some details about how to create the authentication credentials we can use. You will only need to use one of these strategies - but it can be useful to understand all of them for the future.

Getting an API Key

This is a pretty straightforward process that will give you a string that you can use as an API Key. (We'll see how to use this below.)


  1. Go to https://aistudio.google.com and click on the "Get API Key" button at the top of the left navigation. Or you can go there directly at https://aistudio.google.com/app/apikey

  2. If this is your first API key, click on "Create API Key in new project". (If you already have an API key and you need another, you can use "Create API key in existing project".)

  3. It will generate a new API Key for you. You can copy it at this time, or you can return to this page later and view your keys to get it.


This  key will allow you to make calls using the Generative AI platform's Gemini API, but not take other actions. If you do need to do more, you should consider switching to a service account, described below.


Above all - remember to protect this key. You shouldn't check it in as code anywhere, for example. If you do compromise your key, you can delete it from this page and replace it with a new one. Some tips to protect this key:

  • Don't hard code it directly into your code. LangChain.js allows it - but we won't show you how, and you shouldn't do it.

  • If you do store it in a file in your project (such as .env), make sure you add that file to your .gitignore file. Really - do that right away. Like now. We'll wait.

Setting up a Google Cloud Service Account Key

Service accounts are common ways to permit access to a limited set of APIs in Google Cloud. They are "virtual" accounts in the sense that you can't actually log in with them - but you can run programs that use the service account credentials, or permit all programs running on a Google Cloud machine (for example) to run with its credentials.


First, you will need to make sure the Gemini API is available in the Google Cloud project you're working on. This is easiest to do using the Google Cloud Console:

  1. Open the Google Cloud Console and select the project you'll be using (or create a new project).

  2. Select "APIs and Services" on the left navigation.

  3. Click on "Enable APIs and Services"

  4. Search for the API to enable:

    1. If you want to access Gemini through the AI Studio or Generative Language API, search for "Generative Language API" and select the result

    2. If you want to access Gemini through the Vertex AI API, search for "Vertex AI API" and select the result

  5. Click on the "Enable" button


You provide access to the service account using a service account key file. There are several ways to do this (see https://cloud.google.com/iam/docs/service-accounts-create for all the gory details) but the most common is using the Google Cloud Console: 


  1. In the Google Cloud Console, choose the project select "IAM & Admin" from the left navigation and then "Service accounts" on the sub-menu.

  2. Select "Create service account"

  3. Enter a service account name. Make it clear and easy to identify the purpose.

  4. It will use this to create a default service account ID and an email address based on this ID. You can change the ID right now to something else, but once created, it cannot be changed.

  5. If you wish, you can enter a description.

  6. Click "Create and Continue"

  7. Select roles for this service account

    1. If you are using Vertex AI, search for "Vertex AI User" and select it.

    2. If you are using the Generative Language API, you don't need to set anything.

  8. Click "Continue"

  9. You don't need to grant users access to this service account, so click "Done"

  10. You'll then see a list of service accounts. Click on the email of the service account you just created.

  11. Select "Keys" along the top menu.

  12. Click on "Add Key" then "Create new key"

  13. Make sure the JSON key type is selected and then the "Create" button

  14. It will create a key and download it to your machine. You should save and guard this key file in a safe place. If you lose it or lose track of it, you'll need to repeat starting at step 10 to get a new key file. We'll be using this key file below.

  15. You can close the popup window that tells you the key has been downloaded


Other Authentication Strategies

There are other authentication strategies that Google Cloud allows for what are suitable in different cases. We won't cover them here, but expect to see them in future articles.

  • Application Default Credentials (ADC) are valid when your code is running on a Google Cloud service or you have configured your development environment to use them. See  https://cloud.google.com/docs/authentication/provide-credentials-adc for how to set them up and if they are appropriate.

  • OAuth allows for users to authenticate to your service and provide per-user credentials for some services.


LangChain.js handles all of these authentication strategies through two libraries that you use depending on your environment, as we'll see next.

Authentication with LangChain.js

Once  we have either an API Key or our Service Account Key, we're ready to start writing code! Well... almost.


LangChain.js provides two different packages that do authentication, and you should only pick one of them! But how do you decide?


Which one you choose depends on the environment you'll be running in. While you can easily change later if you need to, it is good if you start with the right one:

  • @langchain/google-gauth is used if you're running in a Google hosted environment since it can provide streamlined access to service accounts or other Application Default Credentials (ADC).

  • @langchain/google-webauth lets you use service accounts in other environments that may not have access to a file system or to Google's ADC configuration. This could be inside a web browser, or in various edge server environments.


Both of them work with API Keys or with Service Accounts (although they treat service accounts slightly differently). When you're developing on your local machine, both work similarly. Both have the same classes (with the same names) to access the same Google APIs. Aside from the authentication portion, they work identically. The only difference will be the package you import your classes from.


So which one should you pick?

  • In most cases, you should go with the @langchain/google-gauth package. Particularly if you are going to deploy it in the Google Cloud Platform.

  • In some cases, particularly if you don't have access to a file system or environment, you may need to use @langchain/google-webauth

Fortunately, you can start with either and switch fairly simply.


As an aside, there are some other packages that have Google-related classes. More details about these packages will be in a future article.

Using google-gauth

Installation: yarn add langchain @langchain/core @langchain/google-gauth


To use this package, you'll be importing from @langchain/google-gauth, so your imports may look something like


import { GoogleLLM } from "@langchain/gooogle-gauth"

import { ChatGoogle } from "@langchain/google-gauth"


LangChain.js will attempt to authenticate you using one of the following methods, in this order:

  1. An API Key that is passed to the constructor using the apiKey attribute. (Usually this isn't necessary - you'll use the API Key from an environment variable.)

  2. Credentials that are passed to the constructor using the authInfo attribute

    1. This is an object of type GoogleAuthOptions defined by the google-auth-library package from Google and is passed to the constructor of a GoogleAuth object.

    2. See https://cloud.google.com/nodejs/docs/reference/google-auth-library/latest for more about how this can be used

  3. An API Key that is set in the environment variable API_KEY

  4. The Service Account credentials provided via an appropriate ADC method documented at https://cloud.google.com/docs/authentication/provide-credentials-adc. Usually this means one of:

    1. The Service Account credentials that are saved in a file. The path to this file is set in the GOOGLE_APPLICATION_CREDENTIALS environment variable.

    2. If you are running on a Google Cloud Platform resource, or if you have logged in using `gcloud auth application-default login`, then the default credentials.

Using google-webauth

Installation: yarn add langchain @langchain/core @langchain/google-webauth


To use this package, you'll be importing from @langchain/google-webauth, so your imports may look something like


import { GoogleLLM } from "@langchain/gooogle-webauth"

import { ChatGoogle } from "@langchain/google-webauth"


If you're using the google-webauth package, LangChain.js goes through the following steps, in this order, attempting to authenticate you:

  1. An API Key that is passed to the constructor using the apiKey attribute. (Usually this isn't necessary - you'll use the API Key from an environment variable.)

  2. Credentials that are passed to the constructor using the authInfo attribute

    1. This is an object of type WebGoogleAuth that, when created, includes a credentials attribute.

    2. This credentials attribute can either be a string containing the exact contents of the Service Account key JSON file that you downloaded.

    3. or it contains a Credentials object from the "web-auth-library/google" package with values that are in this file, including the project_id, private_key, and client_id

  3. An API Key that is set in the environment variable API_KEY

  4. The Service Account credentials that are saved directly into the GOOGLE_WEB_CREDENTIALS environment variable.

    1. These should be the exact contents of the Service Account key JSON file that you downloaded. Not a pointer to the file.

Yeah, whatever. Show me some code!

Once we have our authentication strategy clear, our authentication information saved or referenced in an environment variable, and the class we need imported, we can actually get to some code.

Setting environment variables

A very common way to set an environment variable when using Node.js is to set them using a .env file that you include alongside your source. This is useful because you can save the information in a file, but it is easy to make sure the file isn't checked into version control.


Keep in mind these are just examples! You should make sure you know how to set environment variables in your own setup.


If you're using AI Studio with an API Key, your .env file may have contents like this:


API_KEY=XIzaSyDv4XX0vqtJb-f5x4JnrBvalZfVQmGN0HI


If you're using gauth, you may not need a .env file if you're running your code on GCP. But if you're not, you may have it point to a certificate file with something like this:


GOOGLE_APPLICATION_CREDENTIALS=/etc/credentials/project-name-123456-a213e257d2d9.json


While if you're using webauth, it needs to contain the contents of that file with something like this:


GOOGLE_WEB_CREDENTIALS="{\"type\": \"service_account\" <lots of stuff omitted>"


There will be other articles in the future going over some of these details, but this should help get you started.

Simple text prompting

The most straightforward way to access an LLM model with text completion is by invoking it with a prompt and getting a string back as a response. It could look something like this:


// import GoogleLLM for your auth method, as described above


async function getJoke(){

  const prompt = "Tell me a short joke about ice cream.";

  const model = new GoogleLLM();

  const response = await model.invoke(prompt);

  return response;

}


console.log(await getJoke());


This lets us provide a static prompt in a string and get a string back that we can display. 

Conversations

Simple text prompts are good, but they are insufficient for two reasons:

  • Typically we take a prompt or message from the user and combine it with system instructions or other information as part of the prompt.

  • This is usually done as part of a back-and-forth conversation where we will need to save the history (or context) of the conversation in between messages from the user.

 

LangChain.js supports these in a few ways

  • Chat models specifically return additional information about the conversation, along with tools that format this output as needed, including plain text

  • Prompt templates and chat prompt templates so you can combine user input with your pre-defined instructions and structure

  • Tools to manage the history of a conversation, save and retrieve it to various data stores, and use them with prompt templates.

  • Letting you combine these tools into chains - components (called Runnables) that take input in a standard form and return output in a standard form that can be handed to the next step in the chain.


Here is a joke-telling bot that builds a chain to take a topic as a parameter, use it in a template, pass that to a model, and then format the output. We'll then try it out (or "invoke" it) with a value that we pass in.


// import the ChatGoogle class for your auth method, as described above


import { ChatPromptTemplate } from "@langchain/core/prompts";

import { StringOutputParser } from "@langchain/core/output_parsers";


async function getJoke(topic){

  // Setup the prompt template

  const template = "Tell me a joke about {topic}.";

  const prompt = ChatPromptTemplate.fromMessages([

    ["system", "You are a comedian, and your specialty is telling short jokes about a topic."],

    ["human", template],

  ]);


  // Setup our model

  const model = new ChatGoogle();


  // The output of a chat model contains a lot of additional information,

  // so we want to convert it to a string for this particular use.

  const outputParser = new StringOutputParser();


  // Create (but don't run) the chain

  const chain = prompt.pipe(model).pipe(outputParser);


  // Run the chain, providing it the values to put into the template.

  const response = await chain.invoke({

    topic,

  });

  return response;

}


console.log(await getJoke("ice cream"));


Some important notes about how Gemini works with the prompt messages:

  • The full history must start and end with "user" messages when it is sent to the model. (But see below about "system" messages.)

  • The history must alternate between "user" and "ai" messages. ie - this is a conversation where the user sends a message and the AI will send a response.

  • Older Gemini models do not support the "system" message. On these models, LangChain.js will map any "system" messages to a pair of "user" and "ai" messages on models that don't support it. (It does it this way to preserve the alternating "user" and "ai" messages.) Unless you want to force "system" message behavior, you don't need to worry about this.


Like many other models, you can pass attributes to the model constructor to influence some of its behavior. For example, adjusting the temperature will adjust the level of randomness used in selecting the next token for the output. Google also has parameters called topK and topP which determine how many tokens are submitted for random selection. For all of these, larger values tend to create more random and "creative" responses while lower ones tend to be more consistent.


So if we wanted a fairly random set of responses, we might create our model with something like


const model = new GoogleLLM({

  temperature: 0.9,

  topK: 5,

  topP: 0.8,

});


while one that tends to be less random might have values such as


const model = new GoogleLLM({

  temperature: 0.2,

  topK: 2,

  topP: 0.5,

});


You may wish to experiment as you seek good responses from Gemini.


Another common attribute is the name of the model to use. It defaults to the "gemini-pro" model, which will always point to the "latest stable" version. If, however, you want to lock it into a specific version, you may wish to provide a different name


const model = new GoogleLLM({

  modelName: "gemini-1.0-pro-001",

});


The model names available may vary between AI Studio and Vertex AI, so you should check the list for each platform, and a future article will discuss how you can get the list yourself.

Prompts with images

Gemini also offers models that allow you to include images (and, in some cases, audio and video) as part of your prompt. To do so, we'll need to build a prompt that contains multiple parts - in this case, a text part and an image part. The image part should specify the image as a "Data URL". We'll build a few functions that help us put things together.


First, we'll need a function that loads in the image and returns a Data URL. This function works in node and uses the local file system, but it could be adapted to get the image in other ways. It also makes the assumption that the image is a PNG file, which isn't always a good assumption.


import fs from "fs";

function filenameToDataUrl(filename) {

  const buffer = fs.readFileSync(filename);

  const base64 = buffer.toString("base64");


  # FIXME - Get the real image type

  const mimeType = "image/png";


  const url = `data:${mimeType};base64,${base64}`;

  return url;

}


We'll also need a function that will take the Data URL for the file  and create the prompt that we want for it.


The prompt parts need to be an array of MessageContentComplex types. In our case, we will need one text type and one image_url type, although you can have multiple of each. We will assign these to the content attribute passed into a HumanMessageChunk. We'll then use this to build the ChatPromptTemplate that we'll be returning to use in the chain.


import { HumanMessageChunk } from "@langchain/core/messages"; import { ChatPromptTemplate } from "@langchain/core/prompts"; function buildPrompt(dataUrl) { const message = [ { type: "text", text: "What is in this image?", }, { type: "image_url", image_url: dataUrl, }, ]; const messages = [ new HumanMessageChunk({ content: message }), ]; const template = ChatPromptTemplate.fromMessages(messages); return template; }


Finally, we can tie this together with our model class that has been created with a stable Gemini 1.5 Flash model that supports images and other media. (We're using a different model here since Gemini 1.5 has better multimodal support.)


async function describeImageFile(filename) {

  // Use the functions we defined above to build the prompt const dataUrl = filenameToDataUrl(filename); const prompt = buildPrompt(dataUrl); const model = new ChatGoogle({ modelName: "gemini-1.5-flash-001", }); // The output of a chat model contains a lot of additional information, // so we want to convert it to a string again. const outputParser = new StringOutputParser(); // Create, but don't execute, the chain const chain = prompt.pipe(model).pipe(outputParser); // Execute the chain. We don't have any other parameters. const response = await chain.invoke(); return response

}


const filename = "img/langchain-logo.png";

const text = await describeImageFile(filename)

console.log(text);


Dealing with video and audio requires some additional techniques that we'll discuss in a future article.

Beyond the Basics - What's Next?

This introduction seems like a lot - but you've gotten past the most difficult parts!


You should have a better understanding of the different Gemini API platforms that are available, how you can authenticate to each of them, and which LangChain.js packages you'll need. We've also looked at the basics of using LangChain.js with Gemini to do some simple text prompting, prompting that includes images in your prompt, and conversational prompting.


One great thing about LangChain.js is that the models are, for the most part, interchangeable. So if you see an example that uses ChatOpenAI, then in many cases you can swap it for ChatGoogle. Give it a try and see how it works!


Future articles will explore how we can use LangChain.js with other Google tools, including:

  • Advanced Safety features that Gemini offers to screen both prompts and potential responses for dangerous information, how you can tune how sensitive Gemini is to this, and how to handle safety and other content violations.

  • More about the various media capabilities that the Gemini 1.5 models offer.

  • Other settings that are useful in various situations, including streaming responses, JSON responses with schema definitions. and using built-in tool definitions.

  • Getting the Gemini API to turn human statements into well defined structures that you can use to make function calls with parameters.

  • Using embeddings to turn text into vectors of numbers that can be used to do semantic searches and the variety of vector databases (from Google and others) that can assist with these tasks.

  • Accessing other models in Google Cloud's Vertex AI Model Garden, fine tuned models in AI Studio, or even locally.

  • The internal design decisions about the LangChain.js Google modules so you can leverage the authentication system to access other tools offered by Google.