Google AI Gemini

https://ai.google.dev/gemini-api/docs

Maven 依赖

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-google-ai-gemini</artifactId>
    <version>1.5.0</version>
</dependency>

API 密钥

在以下链接免费获取 API 密钥：https://ai.google.dev/gemini-api/docs/api-key 。

可用模型

查看文档中的可用模型列表。

gemini-2.0-flash
gemini-1.5-flash
gemini-1.5-pro
gemini-1.0-pro

GoogleAiGeminiChatModel

支持常规的 chat(...) 方法：

ChatModel gemini = GoogleAiGeminiChatModel.builder()
    .apiKey(System.getenv("GEMINI_AI_KEY"))
    .modelName("gemini-1.5-flash")
    ...
    .build();

String response = gemini.chat("Hello Gemini!");

同样支持 ChatResponse chat(ChatRequest req) 方法：

ChatModel gemini = GoogleAiGeminiChatModel.builder()
    .apiKey(System.getenv("GEMINI_AI_KEY"))
    .modelName("gemini-1.5-flash")
    .build();

ChatResponse chatResponse = gemini.chat(ChatRequest.builder()
    .messages(UserMessage.from(
        "How many R's are there in the word 'strawberry'?"))
    .build());

String response = chatResponse.aiMessage().text();

配置

ChatModel gemini = GoogleAiGeminiChatModel.builder()
    .httpClientBuilder(...)
    .defaultRequestParameters(...)
    .apiKey(System.getenv("GEMINI_AI_KEY"))
    .baseUrl(...)
    .modelName("gemini-1.5-flash")
    .maxRetries(...)
    .temperature(1.0)
    .topP(0.95)
    .topK(64)
    .seed(42)
    .frequencyPenalty(...)
    .presencePenalty(...)
    .maxOutputTokens(8192)
    .timeout(Duration.ofSeconds(60))
    .responseFormat(ResponseFormat.JSON) // 或 .responseFormat(ResponseFormat.builder()...build()) 
    .stopSequences(List.of(...))
    .toolConfig(GeminiFunctionCallingConfig.builder()...build()) // 或以下方式
    .toolConfig(GeminiMode.ANY, List.of("fnOne", "fnTwo"))
    .allowCodeExecution(true)
    .includeCodeExecution(true)
    .logRequestsAndResponses(true)
    .safetySettings(List<GeminiSafetySetting> 或 Map<GeminiHarmCategory, GeminiHarmBlockThreshold>)
    .thinkingConfig(...)
    .returnThinking(true)
    .sendThinking(true)
    .responseLogprobs(...)
    .logprobs(...)
    .enableEnhancedCivicAnswers(...)
    .listeners(...)
    .supportedCapabilities(...)
    .build();

GoogleAiGeminiStreamingChatModel

GoogleAiGeminiStreamingChatModel 允许逐个令牌流式传输响应的文本。
响应必须由 StreamingChatResponseHandler 处理。

StreamingChatModel gemini = GoogleAiGeminiStreamingChatModel.builder()
        .apiKey(System.getenv("GEMINI_AI_KEY"))
        .modelName("gemini-1.5-flash")
        .build();

CompletableFuture<ChatResponse> futureResponse = new CompletableFuture<>();

gemini.chat("Tell me a joke about Java", new StreamingChatResponseHandler() {

    @Override
    public void onPartialResponse(String partialResponse) {
        System.out.print(partialResponse);
    }

    @Override
    public void onCompleteResponse(ChatResponse completeResponse) {
        futureResponse.complete(completeResponse);
    }

    @Override
    public void onError(Throwable error) {
        futureResponse.completeExceptionally(error);
    }
});

futureResponse.join();

工具

支持工具（也称为函数调用），包括并行调用。
您可以使用接受 ChatRequest 的 chat(ChatRequest) 方法，该方法可以配置一个或多个 ToolSpecification，以通知 Gemini 可以请求调用某个函数。
或者，您可以使用 LangChain4j 的 AiServices 来定义工具。

以下是一个使用 AiServices 的天气工具示例：

record WeatherForecast(
    String location,
    String forecast,
    int temperature) {}

class WeatherForecastService {
    @Tool("Get the weather forecast for a location")
    WeatherForecast getForecast(
        @P("Location to get the forecast for") String location) {
        if (location.equals("Paris")) {
            return new WeatherForecast("Paris", "sunny", 20);
        } else if (location.equals("London")) {
            return new WeatherForecast("London", "rainy", 15);
        } else if (location.equals("Tokyo")) {
            return new WeatherForecast("Tokyo", "warm", 32);
        } else {
            return new WeatherForecast("Unknown", "unknown", 0);
        }
    }
}

interface WeatherAssistant {
    String chat(String userMessage);
}

WeatherForecastService weatherForecastService =
    new WeatherForecastService();

ChatModel gemini = GoogleAiGeminiChatModel.builder()
    .apiKey(System.getenv("GEMINI_AI_KEY"))
    .modelName("gemini-1.5-flash")
    .temperature(0.0)
    .build();

WeatherAssistant weatherAssistant =
    AiServices.builder(WeatherAssistant.class)
        .chatModel(gemini)
        .tools(weatherForecastService)
        .build();

String tokyoWeather = weatherAssistant.chat(
        "What is the weather forecast for Tokyo?");

System.out.println("Gemini> " + tokyoWeather);
// Gemini> The weather forecast for Tokyo is warm
//         with a temperature of 32 degrees.

结构化输出

有关结构化输出的更多信息，请参见此处。

从自由文本中提取类型安全的结构化数据

大型语言模型擅长从非结构化文本中提取结构化信息。
在以下示例中，我们通过 AiServices 从天气预报文本中提取一个类型安全的 WeatherForecast 对象：

// 类型安全的天气预报对象
record WeatherForecast(
    @Description("minimum temperature")
    Integer minTemperature,
    @Description("maximum temperature")
    Integer maxTemperature,
    @Description("chances of rain")
    boolean rain
) { }

// 与 Gemini 交互的接口契约
interface WeatherForecastAssistant {
    WeatherForecast extract(String forecast);
}

// 提取数据：
ChatModel gemini = GoogleAiGeminiChatModel.builder()
    .apiKey(System.getenv("GEMINI_AI_KEY"))
    .modelName("gemini-1.5-flash")
    .supportedCapabilities(RESPONSE_FORMAT_JSON_SCHEMA) // 启用结构化输出功能所需
    .build();

WeatherForecastAssistant forecastAssistant =
    AiServices.builder(WeatherForecastAssistant.class)
        .chatModel(gemini)
        .build();

WeatherForecast forecast = forecastAssistant.extract("""
    Morning: The day dawns bright and clear in Osaka, with crisp
    autumn air and sunny skies. Expect temperatures to hover
    around 18°C (64°F) as you head out for your morning stroll
    through Namba.
    Afternoon: The sun continues to shine as the city buzzes with
    activity. Temperatures climb to a comfortable 22°C (72°F).
    Enjoy a leisurely lunch at one of Osaka's many outdoor cafes,
    or take a boat ride on the Okawa River to soak in the beautiful
    scenery.
    Evening: As the day fades, expect clear skies and a slight chill
    in the air. Temperatures drop to 15°C (59°F). A cozy dinner at a
    traditional Izakaya will be the perfect way to end your day in
    Osaka.
    Overall: A beautiful autumn day in Osaka awaits, perfect for
    exploring the city's vibrant streets, enjoying the local cuisine,
    and soaking in the sights.
    Don't forget: Pack a light jacket for the evening and wear
    comfortable shoes for all the walking you'll be doing.
    """);

响应格式 / 响应模式

您可以在创建 GoogleAiGeminiChatModel 或调用时指定 ResponseFormat。
以下是一个在创建 GoogleAiGeminiChatModel 时定义食谱 JSON 模式的示例：

ResponseFormat responseFormat = ResponseFormat.builder()
        .type(ResponseFormatType.JSON)
        .jsonSchema(JsonSchema.builder() // 见下方 [1]
                .rootElement(JsonObjectSchema.builder()
                        .addStringProperty("title")
                        .addIntegerProperty("preparationTimeMinutes")
                        .addProperty("ingredients", JsonArraySchema.builder()
                                .items(new JsonStringSchema())
                                .build())
                        .addProperty("steps", JsonArraySchema.builder()
                                .items(new JsonStringSchema())
                                .build())
                        .build())
                .build())
        .build();

ChatModel gemini = GoogleAiGeminiChatModel.builder()
        .apiKey(System.getenv("GEMINI_AI_KEY"))
        .modelName("gemini-1.5-flash")
        .responseFormat(responseFormat)
        .build();

String recipeResponse = gemini.chat("Suggest a dessert recipe with strawberries");

System.out.println(recipeResponse);

注意：

[1] - 可以使用 JsonSchemas.jsonSchemaFrom() 辅助方法从您的类自动生成 JsonSchema。

JsonSchema jsonSchema = JsonSchemas.jsonSchemaFrom(TripItinerary.class).get();

以下是一个在调用 GoogleAiGeminiChatModel 时定义食谱 JSON 模式的示例：

ChatModel gemini = GoogleAiGeminiChatModel.builder()
        .apiKey(System.getenv("GEMINI_AI_KEY"))
        .modelName("gemini-1.5-flash")
        .build();

ResponseFormat responseFormat = ...;

ChatRequest chatRequest = ChatRequest.builder()
        .messages(UserMessage.from("Suggest a dessert recipe with strawberries"))
        .responseFormat(responseFormat)
        .build();

ChatResponse chatResponse = gemini.chat(chatRequest);

System.out.println(chatResponse.aiMessage().text());

JSON 模式

您可以强制 Gemini 以 JSON 格式回复：

ChatModel gemini = GoogleAiGeminiChatModel.builder()
    .apiKey(System.getenv("GEMINI_AI_KEY"))
    .modelName("gemini-1.5-flash")
    .responseFormat(ResponseFormat.JSON)
    .build();

String roll = gemini.chat("Roll a 6-sided dice");

System.out.println(roll);
// {"roll": "3"}

系统提示可以进一步描述 JSON 输出的结构。
Gemini 通常会遵循建议的模式，但不能保证。
如果您想要保证 JSON 模式的严格应用，应如前所述定义响应格式。

Python 代码执行

除了函数调用外，Google AI Gemini 允许在沙盒环境中创建和执行 Python 代码。
这对于需要更高级计算或逻辑的情况特别有用。

ChatModel gemini = GoogleAiGeminiChatModel.builder()
    .apiKey(System.getenv("GEMINI_AI_KEY"))
    .modelName("gemini-1.5-flash")
    .allowCodeExecution(true)
    .includeCodeExecutionOutput(true)
    .build();

有两个构建器方法：

allowCodeExecution(true)：通知 Gemini 可以执行 Python 代码
includeCodeExecutionOutput(true)：如果您想查看 Gemini 生成的 Python 脚本及其执行输出

ChatResponse mathQuizz = gemini.chat(
    SystemMessage.from("""
        You are an expert mathematician.
        When asked a math problem or logic problem,
        you can solve it by creating a Python program,
        and execute it to return the result.
        """),
    UserMessage.from("""
        Implement the Fibonacci and Ackermann functions.
        What is the result of `fibonacci(22)` - ackermann(3, 4)?
        """)
);

Gemini 将生成 Python 脚本，在其服务器上执行，并返回结果。
由于我们要求查看代码和执行输出，答案将如下所示：

执行的代码：
```python
def fibonacci(n):
    if n <= 1:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)

def ackermann(m, n):
    if m == 0:
        return n + 1
    elif n == 0:
        return ackermann(m - 1, 1)
    else:
        return ackermann(m - 1, ackermann(m, n - 1))

print(fibonacci(22) - ackermann(3, 4))
```
输出：
```
17586
```
`fibonacci(22) - ackermann(3, 4)` 的结果是 **17586**。

我实现了 Fibonacci 和 Ackermann 函数。
然后调用 `fibonacci(22) - ackermann(3, 4)` 并打印结果。

如果我们没有要求查看代码/输出，将只收到以下文本：

`fibonacci(22) - ackermann(3, 4)` 的结果是 **17586**。

我实现了 Fibonacci 和 Ackermann 函数。
然后调用 `fibonacci(22) - ackermann(3, 4)` 并打印结果。

多模态

Gemini 是一个多模态模型，意味着它可以接受和生成除文本外的其他模态。

输入模态

Gemini 支持以下输入模态：

图片（ImageContent）
视频（VideoContent）
音频文件（AudioContent）
PDF 文件（PdfFileContent）

以下示例展示如何将文本提示与图片混合：

// LangChain4j 项目可爱彩色鹦鹉吉祥物的 PNG 图片
String base64Img = b64encoder.encodeToString(readBytes(
  "https://avatars.githubusercontent.com/u/132277850?v=4"));

ChatModel gemini = GoogleAiGeminiChatModel.builder()
    .apiKey(System.getenv("GEMINI_AI_KEY"))
    .modelName("gemini-1.5-flash")
    .build();

ChatResponse response = gemini.chat(
    UserMessage.from(
        ImageContent.from(base64Img, "image/png"),
        TextContent.from("""
            Do you think this logo fits well
            with the project description?
            """)
    )
);

图像生成输出

某些 Gemini 模型（例如 gemini-2.5-flash-image-preview）可以在响应中生成图像。生成图像存储在 AiMessage 属性中，并可使用 GeneratedImageHelper 实用类访问。

ChatModel gemini = GoogleAiGeminiChatModel.builder()
    .apiKey("Your API Key")
    .modelName("gemini-2.5-flash-image-preview")
    .build();

ChatResponse response = gemini.chat(UserMessage.from("A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black"));

// 从响应中提取生成的图像
AiMessage aiMessage = response.aiMessage();
List<Image> generatedImages = GeneratedImageHelper.getGeneratedImages(aiMessage);

if (GeneratedImageHelper.hasGeneratedImages(aiMessage)) {
    System.out.println("Generated " + generatedImages.size() + " image(s)");
    System.out.println("Text response: " + aiMessage.text());

    for (Image image : generatedImages) {
        String base64Data = image.base64Data();
        String mimeType = image.mimeType();
        
        // 您可以保存图像、显示图像或进一步处理
        // 例如，保存到文件：
        byte[] imageBytes = Base64.getDecoder().decode(base64Data);
        Files.write(Paths.get("generated_image.png"), imageBytes);
    }
} else {
    System.out.println("Text response: " + aiMessage.text());
}

思考

GoogleAiGeminiChatModel 和 GoogleAiGeminiStreamingChatModel 都支持思考。

以下参数也控制思考行为：

GeminiThinkingConfig.includeThoughts 和 thinkingBudget：启用思考，更多详情见此处。
returnThinking：控制是否在 AiMessage.thinking() 中返回思考（如果可用），以及在使用 GoogleAiGeminiStreamingChatModel 时是否调用 StreamingChatResponseHandler.onPartialThinking() 和 TokenStream.onPartialThinking() 回调。默认禁用。如果启用，思考签名也将存储并在 AiMessage.attributes() 中返回。
sendThinking：控制是否在后续请求中将存储在 AiMessage 中的思考和签名发送到 LLM。默认禁用。

注

请注意，当 returnThinking 未设置（为 null）且 thinkingConfig 已设置时，思考文本将前置到 AiMessage.text() 字段的实际响应中，并且会调用 StreamingChatResponseHandler.onPartialResponse() 而不是 StreamingChatResponseHandler.onPartialThinking()。

以下是如何配置思考的示例：

GeminiThinkingConfig thinkingConfig = GeminiThinkingConfig.builder()
        .includeThoughts(true)
        .thinkingBudget(250)
        .build();

ChatModel model = GoogleAiGeminiChatModel.builder()
        .apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
        .modelName("gemini-2.5-flash")
        .thinkingConfig(thinkingConfig)
        .returnThinking(true)
        .sendThinking(true)
        .build();

了解更多

如果您想了解更多关于 Google AI Gemini 模型的信息，请查看其文档。