Firebase is back at Google I/O on May 20-21! Register now.

このページは Cloud Translation API によって翻訳されました。

Gemini API を使用してマルチモーダルプロンプトからテキストを生成する

Vertex AI in Firebase SDK を使用してアプリから Gemini API を呼び出すときに、マルチモーダル入力に基づいてテキストを生成するように Gemini モデルにプロンプトを出すことができます。マルチモーダルプロンプトには、テキスト、画像、PDF、テキストファイル、動画、音声など、複数のモダリティ（または入力タイプ）を含めることができます。

各マルチモーダルリクエストで、必ず次の情報を指定する必要があります。

ファイルの mimeType。各入力ファイルでサポートされている MIME タイプについて確認する。
ファイル。ファイルは、（このページに示すように）インラインデータとして指定することも、URL または URI を使用して指定することもできます。

マルチモーダルプロンプトのテストと反復処理には、Vertex AI Studio を使用することをおすすめします。

Gemini API を操作するためのその他のオプション

必要に応じて、Gemini API の代替の「Google AI」バージョンをテストします。
Google AI Studio と Google AI クライアント SDK を使用して、（制限内で、利用可能な場合）無料アクセスを取得します。これらの SDK は、モバイルアプリとウェブアプリのプロトタイピングのみに使用してください。

Gemini API の仕組みに慣れたら、Vertex AI in Firebase SDK に移行（このドキュメント）します。Vertex AI in Firebase SDK には、Firebase App Check を使用した API の不正使用からの保護や、リクエスト内の大規模なメディアファイルのサポートなど、モバイルアプリとウェブアプリに重要な多くの追加機能が含まれています。

必要に応じて、Gemini API in Vertex AI サーバーサイドを呼び出す（Python、Node.js、Go など）
サーバーサイド Vertex AI SDK、Genkit、または Firebase Extensions Gemini API を使用します。

始める前に

まだ行っていない場合は、スタートガイドを完了してください。Firebase プロジェクトの設定、アプリの Firebase への接続、SDK の追加、Vertex AI サービスの初期化、GenerativeModel インスタンスの作成方法が記載されています。

テキストと 1 つの画像からテキストを生成するテキストと複数の画像からテキストを生成するテキストと動画からテキストを生成する

サンプルメディアファイル

メディアファイルがない場合は、次の一般公開ファイルを使用できます。これらのファイルは Firebase プロジェクトにないバケットに保存されているため、URL には https://storage.googleapis.com/BUCKET_NAME/PATH/TO/FILE 形式を使用する必要があります。

画像: https://storage.googleapis.com/cloud-samples-data/generative-ai/image/scones.jpg（MIME タイプは image/jpeg）。この画像を表示またはダウンロードする
PDF: MIME タイプが application/pdf の https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf。こちらの PDF を表示またはダウンロードする。
動画: https://storage.googleapis.com/cloud-samples-data/video/animals.mp4（MIME タイプは video/mp4）。こちらの動画を視聴またはダウンロードする。
音声: https://storage.googleapis.com/cloud-samples-data/generative-ai/audio/pixel.mp3（MIME タイプは audio/mp3）。この音声を聴くまたはダウンロードする。

テキストと 1 つの画像からテキストを生成する

このサンプルを試す前に、このガイドの始める前にのセクションを完了していることを確認してください。

Gemini API は、テキストと単一のファイル（この例の画像など）の両方を含むマルチモーダルプロンプトで呼び出すことができます。

入力ファイルの要件と推奨事項を確認してください。

Swift

generateContent() を呼び出して、テキストと 1 つの画像を含むマルチモーダルプロンプトリクエストからテキストを生成できます。

import FirebaseVertexAI

// Initialize the Vertex AI service
let vertex = VertexAI.vertexAI()

// Create a `GenerativeModel` instance with a model that supports your use case
let model = vertex.generativeModel(modelName: "gemini-2.0-flash")

guard let image = UIImage(systemName: "bicycle") else { fatalError() }

// Provide a text prompt to include with the image
let prompt = "What's in this picture?"

// To generate text output, call generateContent and pass in the prompt
let response = try await model.generateContent(image, prompt)
print(response.text ?? "No text in response.")

注: 上記の例では、マルチモーダルプロンプトでプラットフォーム固有の画像タイプ（UIImage、NSImage、CIImage、CGImage）を処理する簡素な方法を利用しています。これらの画像タイプは（元の形式に関係なく）、クライアント側で 80% の品質で JPEG に変換された後、サーバーに送信されます。つまり、上記の例のように画像をインラインで指定する場合、MIME タイプを指定する必要はありません。

画像の形式と変換をより細かく制御するには、画像を InlineDataPart として指定し、特定の MIME タイプを指定します。例: InlineDataPart(data: Data(/* PNG Data */), mimeType: "image/png")。

Kotlin

generateContent() を呼び出して、テキストと 1 つの画像を含むマルチモーダルプロンプトリクエストからテキストを生成できます。

^{Kotlin の場合、この SDK のメソッドは suspend 関数であり、Coroutine スコープから呼び出す必要があります。}

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
val generativeModel = Firebase.vertexAI.generativeModel("gemini-2.0-flash")

// Loads an image from the app/res/drawable/ directory
val bitmap: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky)

// Provide a prompt that includes the image specified above and text
val prompt = content {
  image(bitmap)
  text("What developer tool is this mascot from?")
}

// To generate text output, call generateContent with the prompt
val response = generativeModel.generateContent(prompt)
print(response.text)

注: 上記の例では、マルチモーダルプロンプトでプラットフォーム固有の画像タイプ（Bitmap）を処理する簡素な方法を利用しています。これらの画像タイプは（元の形式に関係なく）、クライアント側で 80% の品質で JPEG に変換された後、サーバーに送信されます。つまり、上記の例のように画像をインラインで指定する場合、MIME タイプを指定する必要はありません。

画像の形式と変換をより細かく制御するには、画像を InlineDataPart として指定し、特定の MIME タイプを指定します。例: content { inlineData(/* PNG as byte array */, "image/png") }。

Java

generateContent() を呼び出して、テキストと 1 つの画像を含むマルチモーダルプロンプトリクエストからテキストを生成できます。

^{Java の場合、この SDK のメソッドは ListenableFuture を返します。}

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
GenerativeModel gm = FirebaseVertexAI.getInstance()
        .generativeModel("gemini-2.0-flash");
GenerativeModelFutures model = GenerativeModelFutures.from(gm);

Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.sparky);

// Provide a prompt that includes the image specified above and text
Content content = new Content.Builder()
        .addImage(bitmap)
        .addText("What developer tool is this mascot from?")
        .build();

// To generate text output, call generateContent with the prompt
ListenableFuture<GenerateContentResponse> response = model.generateContent(content);
Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
    @Override
    public void onSuccess(GenerateContentResponse result) {
        String resultText = result.getText();
        System.out.println(resultText);
    }

    @Override
    public void onFailure(Throwable t) {
        t.printStackTrace();
    }
}, executor);

Web

generateContent() を呼び出して、テキストと 1 つの画像を含むマルチモーダルプロンプトリクエストからテキストを生成できます。

import { initializeApp } from "firebase/app";
import { getVertexAI, getGenerativeModel } from "firebase/vertexai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Vertex AI service
const vertexAI = getVertexAI(firebaseApp);

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(vertexAI, { model: "gemini-2.0-flash" });

// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the image
  const prompt = "What's different between these pictures?";

  const fileInputEl = document.querySelector("input[type=file]");
  const imagePart = await fileToGenerativePart(fileInputEl.files[0]);

  // To generate text output, call generateContent with the text and image
  const result = await model.generateContent([prompt, imagePart]);

  const response = result.response;
  const text = response.text();
  console.log(text);
}

run();

Dart

generateContent() を呼び出して、テキストと 1 つの画像を含むマルチモーダルプロンプトリクエストからテキストを生成できます。

import 'package:firebase_vertexai/firebase_vertexai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
final model =
      FirebaseVertexAI.instance.generativeModel(model: 'gemini-2.0-flash');

// Provide a text prompt to include with the image
final prompt = TextPart("What's in the picture?");
// Prepare images for input
final image = await File('image0.jpg').readAsBytes();
final imagePart = InlineDataPart('image/jpeg', image);

// To generate text output, call generateContent with the text and image
final response = await model.generateContent([
  Content.multi([prompt,imagePart])
]);
print(response.text);

ユースケースとアプリに適したモデルと、必要に応じてロケーションを選択する方法を学びます。

テキストと複数の画像からテキストを生成する

このサンプルを試す前に、このガイドの始める前にのセクションを完了していることを確認してください。

Gemini API は、テキストと複数のファイル（この例の画像など）の両方を含むマルチモーダルプロンプトで呼び出すことができます。

入力ファイルの要件と推奨事項を確認してください。

Swift

generateContent() を呼び出して、テキストと複数の画像を含むマルチモーダルプロンプトリクエストからテキストを生成できます。

import FirebaseVertexAI

// Initialize the Vertex AI service
let vertex = VertexAI.vertexAI()

// Create a `GenerativeModel` instance with a model that supports your use case
let model = vertex.generativeModel(modelName: "gemini-2.0-flash")

guard let image1 = UIImage(systemName: "car") else { fatalError() }
guard let image2 = UIImage(systemName: "car.2") else { fatalError() }

// Provide a text prompt to include with the images
let prompt = "What's different between these pictures?"

// To generate text output, call generateContent and pass in the prompt
let response = try await model.generateContent(image1, image2, prompt)
print(response.text ?? "No text in response.")

Kotlin

generateContent() を呼び出して、テキストと複数の画像を含むマルチモーダルプロンプトリクエストからテキストを生成できます。

^{Kotlin の場合、この SDK のメソッドは suspend 関数であり、Coroutine スコープから呼び出す必要があります。}

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
val generativeModel = Firebase.vertexAI.generativeModel("gemini-2.0-flash")

// Loads an image from the app/res/drawable/ directory
val bitmap1: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky)
val bitmap2: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky_eats_pizza)

// Provide a prompt that includes the images specified above and text
val prompt = content {
  image(bitmap1)
  image(bitmap2)
  text("What is different between these pictures?")
}

// To generate text output, call generateContent with the prompt
val response = generativeModel.generateContent(prompt)
print(response.text)

Java

generateContent() を呼び出して、テキストと複数の画像を含むマルチモーダルプロンプトリクエストからテキストを生成できます。

^{Java の場合、この SDK のメソッドは ListenableFuture を返します。}

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
GenerativeModel gm = FirebaseVertexAI.getInstance()
        .generativeModel("gemini-2.0-flash");
GenerativeModelFutures model = GenerativeModelFutures.from(gm);

Bitmap bitmap1 = BitmapFactory.decodeResource(getResources(), R.drawable.sparky);
Bitmap bitmap2 = BitmapFactory.decodeResource(getResources(), R.drawable.sparky_eats_pizza);

// Provide a prompt that includes the images specified above and text
Content prompt = new Content.Builder()
    .addImage(bitmap1)
    .addImage(bitmap2)
    .addText("What's different between these pictures?")
    .build();

// To generate text output, call generateContent with the prompt
ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt);
Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
    @Override
    public void onSuccess(GenerateContentResponse result) {
        String resultText = result.getText();
        System.out.println(resultText);
    }

    @Override
    public void onFailure(Throwable t) {
        t.printStackTrace();
    }
}, executor);

Web

generateContent() を呼び出して、テキストと複数の画像を含むマルチモーダルプロンプトリクエストからテキストを生成できます。

import { initializeApp } from "firebase/app";
import { getVertexAI, getGenerativeModel } from "firebase/vertexai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Vertex AI service
const vertexAI = getVertexAI(firebaseApp);

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(vertexAI, { model: "gemini-2.0-flash" });

// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the images
  const prompt = "What's different between these pictures?";

  // Prepare images for input
  const fileInputEl = document.querySelector("input[type=file]");
  const imageParts = await Promise.all(
    [...fileInputEl.files].map(fileToGenerativePart)
  );

  // To generate text output, call generateContent with the text and images
  const result = await model.generateContent([prompt, ...imageParts]);

  const response = result.response;
  const text = response.text();
  console.log(text);
}

run();

Dart

generateContent() を呼び出して、テキストと複数の画像を含むマルチモーダルプロンプトリクエストからテキストを生成できます。

import 'package:firebase_vertexai/firebase_vertexai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
final model =
      FirebaseVertexAI.instance.generativeModel(model: 'gemini-2.0-flash');

final (firstImage, secondImage) = await (
  File('image0.jpg').readAsBytes(),
  File('image1.jpg').readAsBytes()
).wait;
// Provide a text prompt to include with the images
final prompt = TextPart("What's different between these pictures?");
// Prepare images for input
final imageParts = [
  InlineDataPart('image/jpeg', firstImage),
  InlineDataPart('image/jpeg', secondImage),
];

// To generate text output, call generateContent with the text and images
final response = await model.generateContent([
  Content.multi([prompt, ...imageParts])
]);
print(response.text);

ユースケースとアプリに適したモデルと、必要に応じてロケーションを選択する方法を学びます。

テキストと動画からテキストを生成する

このサンプルを試す前に、このガイドの始める前にのセクションを完了していることを確認してください。

テキストファイルと動画ファイルの両方を含むマルチモーダルプロンプトを使用して Gemini API を呼び出すことができます（この例を参照）。

入力ファイルの要件と推奨事項を確認してください。

Swift

generateContent() を呼び出して、テキストと 1 つの動画を含むマルチモーダルプロンプトリクエストからテキストを生成できます。

import FirebaseVertexAI

// Initialize the Vertex AI service
let vertex = VertexAI.vertexAI()

// Create a `GenerativeModel` instance with a model that supports your use case
let model = vertex.generativeModel(modelName: "gemini-2.0-flash")

// Provide the video as `Data` with the appropriate MIME type.
let video = InlineDataPart(data: try Data(contentsOf: videoURL), mimeType: "video/mp4")

// Provide a text prompt to include with the video
let prompt = "What is in the video?"

// To generate text output, call generateContent with the text and video
let response = try await model.generateContent(video, prompt)
print(response.text ?? "No text in response.")

Kotlin

generateContent() を呼び出して、テキストと 1 つの動画を含むマルチモーダルプロンプトリクエストからテキストを生成できます。

^{Kotlin の場合、この SDK のメソッドは suspend 関数であり、Coroutine スコープから呼び出す必要があります。}

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
val generativeModel = Firebase.vertexAI.generativeModel("gemini-2.0-flash")

val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(videoUri).use { stream ->
  stream?.let {
    val bytes = stream.readBytes()

    // Provide a prompt that includes the video specified above and text
    val prompt = content {
        inlineData(bytes, "video/mp4")
        text("What is in the video?")
    }

    // To generate text output, call generateContent with the prompt
    val response = generativeModel.generateContent(prompt)
    Log.d(TAG, response.text ?: "")
  }
}

Java

generateContent() を呼び出して、テキストと 1 つの動画を含むマルチモーダルプロンプトリクエストからテキストを生成できます。

^{Java の場合、この SDK のメソッドは ListenableFuture を返します。}

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
GenerativeModel gm = FirebaseVertexAI.getInstance()
        .generativeModel("gemini-2.0-flash");
GenerativeModelFutures model = GenerativeModelFutures.from(gm);

ContentResolver resolver = getApplicationContext().getContentResolver();
try (InputStream stream = resolver.openInputStream(videoUri)) {
    File videoFile = new File(new URI(videoUri.toString()));
    int videoSize = (int) videoFile.length();
    byte[] videoBytes = new byte[videoSize];
    if (stream != null) {
        stream.read(videoBytes, 0, videoBytes.length);
        stream.close();

        // Provide a prompt that includes the video specified above and text
        Content prompt = new Content.Builder()
                .addInlineData(videoBytes, "video/mp4")
                .addText("What is in the video?")
                .build();

        // To generate text output, call generateContent with the prompt
        ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt);
        Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
            @Override
            public void onSuccess(GenerateContentResponse result) {
                String resultText = result.getText();
                System.out.println(resultText);
            }

            @Override
            public void onFailure(Throwable t) {
                t.printStackTrace();
            }
        }, executor);
    }
} catch (IOException e) {
    e.printStackTrace();
} catch (URISyntaxException e) {
    e.printStackTrace();
}

Web

generateContent() を呼び出して、テキストと 1 つの動画を含むマルチモーダルプロンプトリクエストからテキストを生成できます。

import { initializeApp } from "firebase/app";
import { getVertexAI, getGenerativeModel } from "firebase/vertexai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Vertex AI service
const vertexAI = getVertexAI(firebaseApp);

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(vertexAI, { model: "gemini-2.0-flash" });

// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the video
  const prompt = "What do you see?";

  const fileInputEl = document.querySelector("input[type=file]");
  const videoPart = await fileToGenerativePart(fileInputEl.files[0]);

  // To generate text output, call generateContent with the text and video
  const result = await model.generateContent([prompt, videoPart]);

  const response = result.response;
  const text = response.text();
  console.log(text);
}

run();

Dart

generateContent() を呼び出して、テキストと 1 つの動画を含むマルチモーダルプロンプトリクエストからテキストを生成できます。

import 'package:firebase_vertexai/firebase_vertexai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
final model =
      FirebaseVertexAI.instance.generativeModel(model: 'gemini-2.0-flash');

// Provide a text prompt to include with the video
final prompt = TextPart("What's in the video?");

// Prepare video for input
final video = await File('video0.mp4').readAsBytes();

// Provide the video as `Data` with the appropriate mimetype
final videoPart = InlineDataPart('video/mp4', video);

// To generate text output, call generateContent with the text and images
final response = await model.generateContent([
  Content.multi([prompt, ...videoPart])
]);
print(response.text);

ユースケースとアプリに適したモデルと、必要に応じてロケーションを選択する方法を学びます。

レスポンスをストリーミングする

これらのサンプルを試す前に、このガイドの始める前にのセクションを完了していることを確認してください。

モデル生成の結果全体を待たずに、ストリーミングを使用して部分的な結果を処理することで、インタラクションを高速化できます。レスポンスをストリーミングするには、generateContentStream を呼び出します。

例を表示: テキストと 1 つの画像から生成されたテキストをストリーミングする

Swift

generateContentStream() を呼び出して、テキストと 1 つの画像を含むマルチモーダルプロンプトリクエストから生成されたテキストをストリーミングできます。

import FirebaseVertexAI

// Initialize the Vertex AI service
let vertex = VertexAI.vertexAI()

// Create a `GenerativeModel` instance with a model that supports your use case
let model = vertex.generativeModel(modelName: "gemini-2.0-flash")

guard let image = UIImage(systemName: "bicycle") else { fatalError() }

// Provide a text prompt to include with the image
let prompt = "What's in this picture?"

// To stream generated text output, call generateContentStream and pass in the prompt
let contentStream = try model.generateContentStream(image, prompt)
for try await chunk in contentStream {
  if let text = chunk.text {
    print(text)
  }
}

Kotlin

^{Kotlin の場合、この SDK のメソッドは suspend 関数であり、Coroutine スコープから呼び出す必要があります。}

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
val generativeModel = Firebase.vertexAI.generativeModel("gemini-2.0-flash")

// Loads an image from the app/res/drawable/ directory
val bitmap: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky)

// Provide a prompt that includes the image specified above and text
val prompt = content {
  image(bitmap)
  text("What developer tool is this mascot from?")
}

// To stream generated text output, call generateContentStream with the prompt
var fullResponse = ""
generativeModel.generateContentStream(prompt).collect { chunk ->
  print(chunk.text)
  fullResponse += chunk.text
}

Java

^{Java の場合、この SDK のストリーミングメソッドは Reactive Streams ライブラリの Publisher 型を返します。}

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
GenerativeModel gm = FirebaseVertexAI.getInstance()
        .generativeModel("gemini-2.0-flash");
GenerativeModelFutures model = GenerativeModelFutures.from(gm);

Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.sparky);

// Provide a prompt that includes the image specified above and text
Content prompt = new Content.Builder()
        .addImage(bitmap)
        .addText("What developer tool is this mascot from?")
        .build();

// To stream generated text output, call generateContentStream with the prompt
Publisher<GenerateContentResponse> streamingResponse = model.generateContentStream(prompt);

final String[] fullResponse = {""};

streamingResponse.subscribe(new Subscriber<GenerateContentResponse>() {
    @Override
    public void onNext(GenerateContentResponse generateContentResponse) {
        String chunk = generateContentResponse.getText();
        fullResponse[0] += chunk;
    }

    @Override
    public void onComplete() {
        System.out.println(fullResponse[0]);
    }

    @Override
    public void onError(Throwable t) {
        t.printStackTrace();
    }

    @Override
    public void onSubscribe(Subscription s) {
    }
});

Web

import { initializeApp } from "firebase/app";
import { getVertexAI, getGenerativeModel } from "firebase/vertexai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Vertex AI service
const vertexAI = getVertexAI(firebaseApp);

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(vertexAI, { model: "gemini-2.0-flash" });

// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the image
  const prompt = "What do you see?";

  // Prepare image for input
  const fileInputEl = document.querySelector("input[type=file]");
  const imagePart = await fileToGenerativePart(fileInputEl.files[0]);

  // To stream generated text output, call generateContentStream with the text and image
  const result = await model.generateContentStream([prompt, imagePart]);

  for await (const chunk of result.stream) {
    const chunkText = chunk.text();
    console.log(chunkText);
  }
}

run();

Dart

import 'package:firebase_vertexai/firebase_vertexai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
final model =
      FirebaseVertexAI.instance.generativeModel(model: 'gemini-2.0-flash');

// Provide a text prompt to include with the image
final prompt = TextPart("What's in the picture?");
// Prepare images for input
final image = await File('image0.jpg').readAsBytes();
final imagePart = InlineDataPart('image/jpeg', image);

// To stream generated text output, call generateContentStream with the text and image
final response = await model.generateContentStream([
  Content.multi([prompt,imagePart])
]);
await for (final chunk in response) {
  print(chunk.text);
}

表示例: テキストと複数の画像から生成されたテキストをストリーミングする

Swift

generateContentStream() を呼び出して、テキストと複数の画像を含むマルチモーダルプロンプトリクエストから生成されたテキストをストリーミングできます。

import FirebaseVertexAI

// Initialize the Vertex AI service
let vertex = VertexAI.vertexAI()

// Create a `GenerativeModel` instance with a model that supports your use case
let model = vertex.generativeModel(modelName: "gemini-2.0-flash")

guard let image1 = UIImage(systemName: "car") else { fatalError() }
guard let image2 = UIImage(systemName: "car.2") else { fatalError() }

// Provide a text prompt to include with the images
let prompt = "What's different between these pictures?"

// To stream generated text output, call generateContentStream and pass in the prompt
let contentStream = try model.generateContentStream(image1, image2, prompt)
for try await chunk in contentStream {
  if let text = chunk.text {
    print(text)
  }
}

Kotlin

^{Kotlin の場合、この SDK のメソッドは suspend 関数であり、Coroutine スコープから呼び出す必要があります。}

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
val generativeModel = Firebase.vertexAI.generativeModel("gemini-2.0-flash")

// Loads an image from the app/res/drawable/ directory
val bitmap1: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky)
val bitmap2: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky_eats_pizza)

// Provide a prompt that includes the images specified above and text
val prompt = content {
    image(bitmap1)
    image(bitmap2)
    text("What's different between these pictures?")
}

// To stream generated text output, call generateContentStream with the prompt
var fullResponse = ""
generativeModel.generateContentStream(prompt).collect { chunk ->
  print(chunk.text)
  fullResponse += chunk.text
}

Java

^{Java の場合、この SDK のストリーミングメソッドは Reactive Streams ライブラリの Publisher 型を返します。}

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
GenerativeModel gm = FirebaseVertexAI.getInstance()
        .generativeModel("gemini-2.0-flash");
GenerativeModelFutures model = GenerativeModelFutures.from(gm);

Bitmap bitmap1 = BitmapFactory.decodeResource(getResources(), R.drawable.sparky);
Bitmap bitmap2 = BitmapFactory.decodeResource(getResources(), R.drawable.sparky_eats_pizza);

// Provide a prompt that includes the images specified above and text
Content prompt = new Content.Builder()
    .addImage(bitmap1)
    .addImage(bitmap2)
    .addText("What's different between these pictures?")
    .build();

// To stream generated text output, call generateContentStream with the prompt
Publisher<GenerateContentResponse> streamingResponse = model.generateContentStream(prompt);

final String[] fullResponse = {""};

streamingResponse.subscribe(new Subscriber<GenerateContentResponse>() {
    @Override
    public void onNext(GenerateContentResponse generateContentResponse) {
        String chunk = generateContentResponse.getText();
        fullResponse[0] += chunk;
    }

    @Override
    public void onComplete() {
        System.out.println(fullResponse[0]);
    }

    @Override
    public void onError(Throwable t) {
        t.printStackTrace();
    }

    @Override
    public void onSubscribe(Subscription s) {
    }
});

Web

import { initializeApp } from "firebase/app";
import { getVertexAI, getGenerativeModel } from "firebase/vertexai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Vertex AI service
const vertexAI = getVertexAI(firebaseApp);

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(vertexAI, { model: "gemini-2.0-flash" });

// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the images
  const prompt = "What's different between these pictures?";

  const fileInputEl = document.querySelector("input[type=file]");
  const imageParts = await Promise.all(
    [...fileInputEl.files].map(fileToGenerativePart)
  );

  // To stream generated text output, call generateContentStream with the text and images
  const result = await model.generateContentStream([prompt, ...imageParts]);

  for await (const chunk of result.stream) {
    const chunkText = chunk.text();
    console.log(chunkText);
  }
}

run();

Dart

この例では、generateContentStream を使用して、テキストと複数の画像を含むマルチモーダルプロンプトリクエストから生成されたテキストをストリーミングする方法を示します。

import 'package:firebase_vertexai/firebase_vertexai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
final model =
      FirebaseVertexAI.instance.generativeModel(model: 'gemini-2.0-flash');

final (firstImage, secondImage) = await (
  File('image0.jpg').readAsBytes(),
  File('image1.jpg').readAsBytes()
).wait;
// Provide a text prompt to include with the images
final prompt = TextPart("What's different between these pictures?");
// Prepare images for input
final imageParts = [
  InlineDataPart('image/jpeg', firstImage),
  InlineDataPart('image/jpeg', secondImage),
];

// To stream generated text output, call generateContentStream with the text and images
final response = await model.generateContentStream([
  Content.multi([prompt, ...imageParts])
]);
await for (final chunk in response) {
  print(chunk.text);
}

例を表示: テキストと動画から生成されたテキストをストリーミングする

Swift

generateContentStream() を呼び出して、テキストと 1 つの動画を含むマルチモーダルプロンプトリクエストから生成されたテキストをストリーミングできます。

import FirebaseVertexAI

// Initialize the Vertex AI service
let vertex = VertexAI.vertexAI()

// Create a `GenerativeModel` instance with a model that supports your use case
let model = vertex.generativeModel(modelName: "gemini-2.0-flash")

// Provide the video as `Data` with the appropriate MIME type
let video = InlineDataPart(data: try Data(contentsOf: videoURL), mimeType: "video/mp4")

// Provide a text prompt to include with the video
let prompt = "What is in the video?"

// To stream generated text output, call generateContentStream with the text and video
let contentStream = try model.generateContentStream(video, prompt)
for try await chunk in contentStream {
  if let text = chunk.text {
    print(text)
  }
}

Kotlin

^{Kotlin の場合、この SDK のメソッドは suspend 関数であり、Coroutine スコープから呼び出す必要があります。}

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
val generativeModel = Firebase.vertexAI.generativeModel("gemini-2.0-flash")

val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(videoUri).use { stream ->
  stream?.let {
    val bytes = stream.readBytes()

    // Provide a prompt that includes the video specified above and text
    val prompt = content {
        inlineData(bytes, "video/mp4")
        text("What is in the video?")
    }

    // To stream generated text output, call generateContentStream with the prompt
    var fullResponse = ""
    generativeModel.generateContentStream(prompt).collect { chunk ->
        Log.d(TAG, chunk.text ?: "")
        fullResponse += chunk.text
    }
  }
}

Java

^{Java の場合、この SDK のストリーミングメソッドは Reactive Streams ライブラリの Publisher 型を返します。}

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
GenerativeModel gm = FirebaseVertexAI.getInstance()
        .generativeModel("gemini-2.0-flash");
GenerativeModelFutures model = GenerativeModelFutures.from(gm);

ContentResolver resolver = getApplicationContext().getContentResolver();
try (InputStream stream = resolver.openInputStream(videoUri)) {
    File videoFile = new File(new URI(videoUri.toString()));
    int videoSize = (int) videoFile.length();
    byte[] videoBytes = new byte[videoSize];
    if (stream != null) {
        stream.read(videoBytes, 0, videoBytes.length);
        stream.close();

        // Provide a prompt that includes the video specified above and text
        Content prompt = new Content.Builder()
                .addInlineData(videoBytes, "video/mp4")
                .addText("What is in the video?")
                .build();

        // To stream generated text output, call generateContentStream with the prompt
        Publisher<GenerateContentResponse> streamingResponse =
                model.generateContentStream(prompt);

        final String[] fullResponse = {""};

        streamingResponse.subscribe(new Subscriber<GenerateContentResponse>() {
            @Override
            public void onNext(GenerateContentResponse generateContentResponse) {
                String chunk = generateContentResponse.getText();
                fullResponse[0] += chunk;
            }

            @Override
            public void onComplete() {
                System.out.println(fullResponse[0]);
            }

            @Override
            public void onError(Throwable t) {
                t.printStackTrace();
            }

            @Override
            public void onSubscribe(Subscription s) {
            }
         });
    }
} catch (IOException e) {
    e.printStackTrace();
} catch (URISyntaxException e) {
    e.printStackTrace();
}

Web

import { initializeApp } from "firebase/app";
import { getVertexAI, getGenerativeModel } from "firebase/vertexai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Vertex AI service
const vertexAI = getVertexAI(firebaseApp);

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(vertexAI, { model: "gemini-2.0-flash" });

// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the video
  const prompt = "What do you see?";

  const fileInputEl = document.querySelector("input[type=file]");
  const videoPart = await fileToGenerativePart(fileInputEl.files[0]);

  // To stream generated text output, call generateContentStream with the text and video
  const result = await model.generateContentStream([prompt, videoPart]);

  for await (const chunk of result.stream) {
    const chunkText = chunk.text();
    console.log(chunkText);
  }
}

run();

Dart

import 'package:firebase_vertexai/firebase_vertexai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Vertex AI service and create a `GenerativeModel` instance
// Specify a model that supports your use case
final model =
      FirebaseVertexAI.instance.generativeModel(model: 'gemini-2.0-flash');

// Provide a text prompt to include with the video
final prompt = TextPart("What's in the video?");

// Prepare video for input
final video = await File('video0.mp4').readAsBytes();

// Provide the video as `Data` with the appropriate mimetype
final videoPart = InlineDataPart('video/mp4', video);

// To stream generated text output, call generateContentStream with the text and image
final response = await model.generateContentStream([
  Content.multi([prompt,videoPart])
]);
await for (final chunk in response) {
  print(chunk.text);
}

入力ファイルの要件と推奨事項

以下については、Gemini API in Vertex AI のサポートされている入力ファイルと要件をご覧ください。

リクエストでファイルを提供するさまざまな方法
サポートされているファイル形式
サポートされている MIME タイプとその指定方法
ファイルとマルチモーダルリクエストの要件とベストプラクティス

Google アシスタントの機能

長いプロンプトをモデルに送信する前に、トークンをカウントする方法を学習します。
Cloud Storage for Firebase を設定して、マルチモーダルリクエストに大きなファイルを含め、プロンプトでファイルを提供するより管理されたソリューションを利用できるようにします。ファイルには、画像、PDF、動画、音声を含めることができます。
不正なクライアントによる Gemini API の不正使用から保護するために Firebase App Check を設定するなど、本番環境の準備を検討します。また、本番環境チェックリストも必ずご確認ください。

その他の機能を試す

マルチターンの会話（チャット）を構築します。
テキストのみのプロンプトからテキストを生成する。
テキストとマルチモーダルプロンプトの両方から構造化出力（JSON など）を生成します。
テキストプロンプトから画像を生成する。
関数呼び出しを使用して、生成モデルを外部システムと情報に接続します。

コンテンツ生成を制御する方法

プロンプト設計を理解する。ベストプラクティス、戦略、プロンプトの例などをご覧ください。
温度や最大出力トークン（Gemini の場合）やアスペクト比と人物生成（Imagen の場合）など、モデルパラメータを構成します。
安全性設定を使用すると、有害と見なされる可能性のある回答が生成される可能性を調整できます。

Vertex AI Studio を使用して、プロンプトとモデル構成をテストすることもできます。

サポートされているモデルの詳細

さまざまなユースケースで利用可能なモデルと、その割り当てと料金について学びます。

Vertex AI in Firebase の使用感に関するフィードバックを送信する

Gemini API を使用してマルチモーダル プロンプトからテキストを生成する

始める前に

サンプル メディア ファイル

テキストと 1 つの画像からテキストを生成する

Swift

Kotlin

Java

Web

Dart

テキストと複数の画像からテキストを生成する

Swift

Kotlin

Java

Web

Dart

テキストと動画からテキストを生成する

Swift

Kotlin

Java

Web

Dart

レスポンスをストリーミングする

例を表示: テキストと 1 つの画像から生成されたテキストをストリーミングする

Swift

Kotlin

Java

Web

Dart

表示例: テキストと複数の画像から生成されたテキストをストリーミングする

Swift

Kotlin

Java

Web

Dart

例を表示: テキストと動画から生成されたテキストをストリーミングする

Swift

Kotlin

Java

Web

Dart

入力ファイルの要件と推奨事項

Google アシスタントの機能

その他の機能を試す

コンテンツ生成を制御する方法

サポートされているモデルの詳細

Gemini API を使用してマルチモーダルプロンプトからテキストを生成する

サンプルメディアファイル