プラットフォーム

価格

ソリューション

構築

資料

会社情報

Select Language

Playgroundへ移動

営業担当に相談する

パートナーシップ

高度なビデオ検索：セマンティック検索のためのTwelve LabsとMilvusの活用

ジェームズ・リー、マニッシュ・マヘシュワリ

開発者は、Twelve LabsのEmbed APIとMilvusを統合することでセマンティックビデオ検索アプリケーションを構築できます。これにより、Marengoを使用してマルチモーダルなビデオ埋め込みを生成し、それらをMilvusコレクションに保存し、テキストとビデオのハイブリッドクエリをサポートするベクトル類似性検索を通じて、関連するビデオセグメントを検索・取得することができます。

この記事の内容

No headings found on page

ニュースレターに登録する

ビデオ理解に関する最新の技術進歩、チュートリアル、業界の動向をお届けします

AIを活用してビデオを検索、分析、探索します。

プレイグラウンドを試す

2024/08/02

11分

記事へのリンクをコピー

TLDR: Twelve LabsのEmbed APIによるマルチモーダル埋め込み生成と、オープンソースのベクトルデータベースであるMilvusによる効率的な保存・検索を統合し、意味論的（セマンティック）ビデオ検索アプリケーションを作成する方法を学びます。開発環境のセットアップから、ハイブリッド検索や時間的ビデオ分析などの高度な機能の実装まで、ビデオコンテンツの高度な分析・検索システムを構築するための包括的な基礎をカバーしています。この統合ガイドの作成にあたり、Zillizチーム（Jiang Chen

氏およびChen Zhang氏）に多大なるご協力をいただきましたことに深く感謝いたします。

‍

はじめに

Twelve Labs Embed APIと、Zillizによって開発されたオープンソースのベクトルデータベースであるMilvusを使用した、セマンティックビデオ検索の実装に関する包括的なチュートリアルへようこそ。このガイドでは、Twelve Labsの高度なマルチモーダル埋め込みとMilvusの効率的なベクトルデータベースのパワーを最大限に活用し、堅牢なビデオ検索ソリューションを構築する方法を探ります。これらの技術を統合することで、開発者はビデオコンテンツ分析における新たな可能性を切り拓き、コンテンツベースのビデオ検索、レコメンデーションシステム、およびビデオデータのニュアンスを理解する高度な検索エンジンなどのアプリケーションを実現できます。

このチュートリアルでは、開発環境のセットアップから、完全に機能するセマンティックビデオ検索アプリケーションの実装までの全プロセスを順を追って説明します。ビデオからのマルチモーダル埋め込みの生成、それらをMilvusに効率的に保存する方法、および関連するコンテンツをリトリーブするための類似度検索の実行などの重要なコンセプトについて説明します。ビデオ分析プラットフォームやコンテンツ検出ツールの構築、あるいは既存のアプリケーションへのビデオ検索機能の統合など、目的に関わらず、このガイドは皆様のプロジェクトでTwelve LabsとMilvusの強みを組み合わせて活用するための知識と実践的なステップを提供します。

‍

前提条件

始める前に、以下の準備ができていることを確認してください：

Milvusサーバーがインストールされ、稼働していること（詳細な手順についてはMilvusインストールガイドを参照）
Twelve LabsのAPIキー（お持ちでない場合は https://playground.twelvelabs.io で登録してください）
システムにPython 3.7以降がインストールされていること

‍

開発環境のセットアップ

プロジェクト用の新しいディレクトリを作成し、そこへ移動します：

mkdir video-search-tutorial
cd video-search-tutorial

仮想環境をセットアップします（任意ですが推奨します）：

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

必要なPythonライブラリをインストールします：

pip install twelvelabs pymilvus

プロジェクト用に新しいPythonファイルを作成します：

touch video_search.py

この video_search.py ファイルが、このチュートリアルで使用するメインスクリプトになります。次に、セキュリティのためにTwelve LabsのAPIキーを環境変数として設定します：

export TWELVE_LABS_API_KEY='your_api_key_here'

Milvusへの接続

Milvusとの接続を確立するために、MilvusClient クラスを使用します。このアプローチは接続プロセスを簡素化し、このチュートリアルに最適なローカルのファイルベースのMilvusインスタンスを操作できるようにします。

from pymilvus import MilvusClient

# Initialize the Milvus client
milvus_client = MilvusClient("milvus_twelvelabs_demo.db")

print("Successfully connected to Milvus")

このコードは、milvus_twelvelabs_demo.db という名前にすべてのデータを保存する、新しいMilvusクライアントインスタンスを作成します。このファイルベースのアプローチは、開発やテストの目的に最適です。

‍

ビデオ埋め込み用のMilvusコレクションの作成

Milvusに接続できたので、次はビデオ埋め込みと関連するメタデータを保存するためのコレクションを作成しましょう。コレクションのスキーマを定義し、まだ存在しない場合はコレクションを新規作成します。

# Initialize the collection name
collection_name = "twelvelabs_demo_collection"

# Check if the collection already exists and drop it if it does
if milvus_client.has_collection(collection_name=collection_name):
    milvus_client.drop_collection(collection_name=collection_name)

# Create the collection
milvus_client.create_collection(
    collection_name=collection_name,
    dimension=1024  # The dimension of the Twelve Labs embeddings
)

print(f"Collection '{collection_name}' created successfully")

このコードでは、まずコレクションが既に存在するかどうかを確認し、存在する場合はそれを削除します。これにより、クリーンな状態から開始することができます。そして、Twelve Labsの埋め込みの出力次元数と一致する「1024」の次元数でコレクションを作成します。

Twelve Labs Embed APIを使用した埋め込みの生成

Twelve Labs Embed APIを使ってビデオの埋め込みを生成するために、Twelve Labs Python SDKを使用します。このプロセスには、埋め込みタスクの作成、完了の待機、および結果の取得が含まれます。実装方法は以下の通りです：

まず、Twelve Labs SDKがインストールされていることを確認し、必要なモジュールをインポートします：

from twelvelabs import TwelveLabs
from twelvelabs.models.embed import EmbeddingsTask
import os

# Retrieve the API key from environment variables
TWELVE_LABS_API_KEY = os.getenv('TWELVE_LABS_API_KEY')

Twelve Labsクライアントを初期化します：

twelvelabs_client = TwelveLabs(api_key=TWELVE_LABS_API_KEY)

指定されたビデオURLの埋め込みを生成する関数を作成します：

def generate_embedding(video_url):
		"""
    Generate embeddings for a given video URL using the Twelve Labs API.

    This function creates an embedding task for the specified video URL using
    the Marengo-retrieval-2.6 engine. It monitors the task progress and waits
    for completion. Once done, it retrieves the task result and extracts the
    embeddings along with their associated metadata.

    Args:
        video_url (str): The URL of the video to generate embeddings for.

    Returns:
        tuple: A tuple containing two elements:
            1. list: A list of dictionaries, where each dictionary contains:
                - 'embedding': The embedding vector as a list of floats.
                - 'start_offset_sec': The start time of the segment in seconds.
                - 'end_offset_sec': The end time of the segment in seconds.
                - 'embedding_scope': The scope of the embedding (e.g., 'shot', 'scene').
            2. EmbeddingsTaskResult: The complete task result object from Twelve Labs API.

    Raises:
        Any exceptions raised by the Twelve Labs API during task creation,
        execution, or retrieval.
    """

    # Create an embedding task
    task = twelvelabs_client.embed.task.create(
        engine_name="Marengo-retrieval-2.6",
        video_url=video_url
    )
    print(f"Created task: id={task.id} engine_name={task.engine_name} status={task.status}")

    # Define a callback function to monitor task progress
    def on_task_update(task: EmbeddingsTask):
        print(f"  Status={task.status}")

    # Wait for the task to complete
    status = task.wait_for_done(
        sleep_interval=2,
        callback=on_task_update
    )
    print(f"Embedding done: {status}")

    # Retrieve the task result
    task_result = twelvelabs_client.embed.task.retrieve(task.id)

    # Extract and return the embeddings
    embeddings = []
    for v in task_result.video_embeddings:
        embeddings.append({
            'embedding': v.embedding.float,
            'start_offset_sec': v.start_offset_sec,
            'end_offset_sec': v.end_offset_sec,
            'embedding_scope': v.embedding_scope
        })
    
    return embeddings, task_result

この関数を使用して、ビデオの埋め込みを生成します：

# Example usage
video_url = "https://example.com/your-video.mp4"

# Generate embeddings for the video
embeddings, task_result = generate_embedding(video_url)

print(f"Generated {len(embeddings)} embeddings for the video")
for i, emb in enumerate(embeddings):
    print(f"Embedding {i+1}:")
    print(f"  Scope: {emb['embedding_scope']}")
    print(f"  Time range: {emb['start_offset_sec']} - {emb['end_offset_sec']} seconds")
    print(f"  Embedding vector (first 5 values): {emb['embedding'][:5]}")
    print()

この実装により、Twelve Labs Embed APIを使用して、任意のビデオURLの埋め込みを生成できるようになります。generate_embedding 関数は、タスクの作成から結果の取得までのプロセス全体を処理します。この関数は、埋め込みベクトルとそのメタデータ（時間範囲とスコープ）を含む辞書のリストを返します。本番環境では、ネットワーク問題やAPI制限などの潜在的なエラーの処理を実装することを忘れないでください。また、具体的なユースケースに応じて、リトライ処理やより強固なエラー処理を実装することをお勧めします。

‍

Milvusへの埋め込みデータの挿入

Twelve Labs Embed APIを使用して埋め込みを生成した後の次のステップは、これらの埋め込みとそのメタデータをMilvusコレクションに挿入することです。このプロセスにより、後で効率的な類似度検索を行えるように、ビデオ埋め込みを保存し、インデックスを作成することができます。

以下は、埋め込みをMilvusに挿入する方法です：

def insert_embeddings(milvus_client, collection_name, task_result, video_url):
    """
    Insert embeddings into the Milvus collection.

    Args:
        milvus_client: The Milvus client instance.
        collection_name (str): The name of the Milvus collection to insert into.
        task_result (EmbeddingsTaskResult): The task result containing video embeddings.
        video_url (str): The URL of the video associated with the embeddings.

    Returns:
        MutationResult: The result of the insert operation.

    This function takes the video embeddings from the task result and inserts them
    into the specified Milvus collection. Each embedding is stored with additional
    metadata including its scope, start and end times, and the associated video URL.
    """
    data = []

    for i, v in enumerate(task_result.video_embeddings):
        data.append({
            "id": i,
            "vector": v.embedding.float,
            "embedding_scope": v.embedding_scope,
            "start_offset_sec": v.start_offset_sec,
            "end_offset_sec": v.end_offset_sec,
            "video_url": video_url
        })

    insert_result = milvus_client.insert(collection_name=collection_name, data=data)
    print(f"Inserted {len(data)} embeddings into Milvus")
    return insert_result

# Usage example
video_url = "https://example.com/your-video.mp4"

# Assuming this function exists from previous step
embeddings, task_result = generate_embedding(video_url)

# Insert embeddings into the Milvus collection
insert_result = insert_embeddings(milvus_client, collection_name, task_result, video_url)
print(insert_result)

この関数は、埋め込みベクトル、時間範囲、およびソースビデオのURLなど、関連するすべてのメタデータを含む挿入用データをご用意します。その後、Milvusクライアントを使用して、このデータを指定されたコレクションに挿入します。

類似度検索の実行

Milvusに埋め込みを保存したら、クエリベクトルに基づいて最も関連性の高いビデオセグメントを見つけるために、類似度検索を実行できます。この機能を実装する方法は以下の通りです：

def perform_similarity_search(milvus_client, collection_name, query_vector, limit=5):
    """
    Perform a similarity search on the Milvus collection.

    Args:
        milvus_client: The Milvus client instance.
        collection_name (str): The name of the Milvus collection to search in.
        query_vector (list): The query vector to search for similar embeddings.
        limit (int, optional): The maximum number of results to return. Defaults to 5.

    Returns:
        list: A list of search results, where each result is a dictionary containing
              the matched entity's metadata and similarity score.

    This function searches the specified Milvus collection for embeddings similar to
    the given query vector. It returns the top matching results, including metadata
    such as the embedding scope, time range, and associated video URL for each match.
    """
    search_results = milvus_client.search(
        collection_name=collection_name,
        data=[query_vector],
        limit=limit,
        output_fields=["embedding_scope", "start_offset_sec", "end_offset_sec", "video_url"]
    )

    return search_results
    
# define the query vector
# We use the embedding inserted previously as an example. In practice, you can replace it with any video embedding you want to query.
query_vector = task_result.video_embeddings[0].embedding.float

# Perform a similarity search on the Milvus collection
search_results = perform_similarity_search(milvus_client, collection_name, query_vector)

print("Search Results:")
for i, result in enumerate(search_results[0]):
    print(f"Result {i+1}:")
    print(f"  Video URL: {result['entity']['video_url']}")
    print(f"  Time Range: {result['entity']['start_offset_sec']} - {result['entity']['end_offset_sec']} seconds")
    print(f"  Similarity Score: {result['distance']}")
    print()

この実装は以下を行います：

クエリベクトルを受け取り、Milvusコレクション内の類似の埋め込みを検索する perform_similarity_search 関数を定義します。
最も類似したベクトルを見つけるために、Milvusクライアントの search メソッドを使用します。
一致したビデオセグメントに関するメタデータを含む、取得したい出力フィールドを指定します。
クエリビデオを使ってこの関数を使用する例を示します。まず、その埋め込みを生成し、それを使用して検索を行います。
関連するメタデータと類似度スコアを含む検索結果を出力します。

これらの関数を実装することで、Milvusにビデオ埋め込みを保存し、類似度検索を実行するための完全なワークフローが構築されました。このセットアップにより、Twelve LabsのEmbed APIによって生成されたマルチモーダル埋め込みに基づいて、類似のビデオコンテンツを効率的にリトリーブすることができるようになります。

‍

パフォーマンスの最適化

これで、アプリケーションを次のレベルに引き上げる準備が整いました。大規模なビデオコレクションを扱う場合、パフォーマンスが成功の鍵（キー）となります。最適化を行うには、埋め込みの生成とMilvusへの挿入に対してバッチ処理を実装する必要があります。これにより、複数のビデオを同時に処理できるようになり、総処理時間を大幅に短縮できます。さらに、Milvusのパーティショニング機能を活用して、おそらくビデオカテゴリや時間帯ごとにデータをより効率的に整理することもできます。これにより、関連するパーティションのみを検索対象に指定できるため、クエリ処理が高速になります。

もう一つの最適化のテクニックは、頻繁にアクセスされる埋め込みデータや検索結果にキャッシュメカニズムを使用することです。これにより、よくある検索に対するレスポンスタイムを劇的に改善できます。お客様固有のデータセットやクエリパターンに基づいて、Milvusのインデックスパラメータを微調整することも忘れないでください。ここでのわずかな調整が、検索パフォーマンスの向上に大きな効果をもたらす可能性があります。

‍

高度な機能

次に、アプリをさらに引き立てるいくつかのクールな機能を追加しましょう。テキストクエリとビデオクエリを組み合わせたハイブリッド検索を実装することができます。実のところ、Twelve Labs Embed APIはテキストクエリ用のテキスト埋め込みも生成できます。ユーザーがテキストの説明とサンプルのビデオクリップの両方を入力し、システムがその両方の埋め込みを生成して、Milvus内で重み付け検索を実行することをイメージしてみてください。これにより、きわめて正確な結果が得られるようになります。

もう一つの素晴らしい機能追加は、ビデオ内の一時的な検索（テンポラル検索）です。長いビデオをより小さなセグメントに分割し、それぞれに独自の埋め込みを持たせることができます。これにより、ユーザーはクリップ全体だけでなく、ビデオ内の特定の瞬間を見つけることができるようになります。そして、簡単なビデオ分析機能を組み込んでみるのはいかがでしょうか。埋め込みデータを使用して、類似するビデオセグメントをクラスタリングしたり、トレンドを検出したり、さらには大規模なビデオコレクションから外れ値を特定したりすることもできます。

‍

エラー処理とロギング

実際に運用していると何かがうまくいかなくなることがあります。その事態に備える必要があります。堅牢なエラー処理の実装は極めて重要です。API呼び出しとデータベース操作をtry-exceptブロックでラップし、何かが失敗した際にはユーザーに有益なエラーメッセージを表示するようにすべきです。ネットワーク関連の問題については、指数バックオフを用いたリトライ処理を実装することで、一時的な不具合をスマートに処理することができます。

ロギングに関しては、デバッグと監視における最良の友人です。アプリケーション全体における重要なイベント、エラー、およびパフォーマンス指標を追跡するために、Pythonのloggingモジュールを使用するべきです。開発用にはDEBUG、一般的な操作用にはINFO、重大な問題用にはERRORといったように、異なるログレベルを設定しましょう。また、ファイルサイズを管理するためにログローテーションを実装することも忘れないでください。適切なロギングを行うことで、問題を迅速に特定・解決でき、ビデオ検索アプリの規模が拡大しても、スムーズな稼働を維持できるようになります。

‍

おわりに

おめでとうございます。Twelve LabsのEmbed APIとMilvusを使用して、強力なセマンティックビデオ検索アプリケーションが構築されました。この統合により、今までにない精度と効率性でビデオコンテンツを処理、保存、リトリーブできるようになります。マルチモーダル埋め込みを活用することで、ビデオデータの微細な要素まで理解するシステムが構築され、コンテンツの検出、レコメンデーションシステム、および高度なビデオ分析において刺激的な可能性が開かれました。

アプリケーションの開発と改良を続けるにあたり、Twelve Labsの高度な埋め込み生成機能とMilvusのスケーラブルなベクトルストレージの組み合わせが、さらに複雑なビデオ理解の課題に対処するための強固な基礎を提供することを忘れないでください。解説した高度な機能を色々と実験し、ビデオの検索と分析で可能なことの限界を押し広げていくことをお勧めします。

‍

付録

リファレンスおよびさらなる理解を深めるために、以下をご活用ください：

あなたがどんなものを構築されるか楽しみにしています！Twelve LabsおよびMilvusのコミュニティで、あなたのプロジェクトや経験をぜひ共有してください。ハッピーコーディング！

TLDR: Twelve LabsのEmbed APIによるマルチモーダル埋め込み生成と、オープンソースのベクトルデータベースであるMilvusによる効率的な保存・検索を統合し、意味論的（セマンティック）ビデオ検索アプリケーションを作成する方法を学びます。開発環境のセットアップから、ハイブリッド検索や時間的ビデオ分析などの高度な機能の実装まで、ビデオコンテンツの高度な分析・検索システムを構築するための包括的な基礎をカバーしています。この統合ガイドの作成にあたり、Zillizチーム（Jiang Chen

氏およびChen Zhang氏）に多大なるご協力をいただきましたことに深く感謝いたします。

‍

はじめに

‍

前提条件

始める前に、以下の準備ができていることを確認してください：

Milvusサーバーがインストールされ、稼働していること（詳細な手順についてはMilvusインストールガイドを参照）
Twelve LabsのAPIキー（お持ちでない場合は https://playground.twelvelabs.io で登録してください）
システムにPython 3.7以降がインストールされていること

‍

開発環境のセットアップ

プロジェクト用の新しいディレクトリを作成し、そこへ移動します：

mkdir video-search-tutorial
cd video-search-tutorial

仮想環境をセットアップします（任意ですが推奨します）：

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

必要なPythonライブラリをインストールします：

pip install twelvelabs pymilvus

プロジェクト用に新しいPythonファイルを作成します：

touch video_search.py

export TWELVE_LABS_API_KEY='your_api_key_here'

Milvusへの接続

from pymilvus import MilvusClient

# Initialize the Milvus client
milvus_client = MilvusClient("milvus_twelvelabs_demo.db")

print("Successfully connected to Milvus")

‍

ビデオ埋め込み用のMilvusコレクションの作成

# Initialize the collection name
collection_name = "twelvelabs_demo_collection"

# Check if the collection already exists and drop it if it does
if milvus_client.has_collection(collection_name=collection_name):
    milvus_client.drop_collection(collection_name=collection_name)

# Create the collection
milvus_client.create_collection(
    collection_name=collection_name,
    dimension=1024  # The dimension of the Twelve Labs embeddings
)

print(f"Collection '{collection_name}' created successfully")

Twelve Labs Embed APIを使用した埋め込みの生成

まず、Twelve Labs SDKがインストールされていることを確認し、必要なモジュールをインポートします：

from twelvelabs import TwelveLabs
from twelvelabs.models.embed import EmbeddingsTask
import os

# Retrieve the API key from environment variables
TWELVE_LABS_API_KEY = os.getenv('TWELVE_LABS_API_KEY')

Twelve Labsクライアントを初期化します：

twelvelabs_client = TwelveLabs(api_key=TWELVE_LABS_API_KEY)

指定されたビデオURLの埋め込みを生成する関数を作成します：

def generate_embedding(video_url):
		"""
    Generate embeddings for a given video URL using the Twelve Labs API.

    This function creates an embedding task for the specified video URL using
    the Marengo-retrieval-2.6 engine. It monitors the task progress and waits
    for completion. Once done, it retrieves the task result and extracts the
    embeddings along with their associated metadata.

    Args:
        video_url (str): The URL of the video to generate embeddings for.

    Returns:
        tuple: A tuple containing two elements:
            1. list: A list of dictionaries, where each dictionary contains:
                - 'embedding': The embedding vector as a list of floats.
                - 'start_offset_sec': The start time of the segment in seconds.
                - 'end_offset_sec': The end time of the segment in seconds.
                - 'embedding_scope': The scope of the embedding (e.g., 'shot', 'scene').
            2. EmbeddingsTaskResult: The complete task result object from Twelve Labs API.

    Raises:
        Any exceptions raised by the Twelve Labs API during task creation,
        execution, or retrieval.
    """

    # Create an embedding task
    task = twelvelabs_client.embed.task.create(
        engine_name="Marengo-retrieval-2.6",
        video_url=video_url
    )
    print(f"Created task: id={task.id} engine_name={task.engine_name} status={task.status}")

    # Define a callback function to monitor task progress
    def on_task_update(task: EmbeddingsTask):
        print(f"  Status={task.status}")

    # Wait for the task to complete
    status = task.wait_for_done(
        sleep_interval=2,
        callback=on_task_update
    )
    print(f"Embedding done: {status}")

    # Retrieve the task result
    task_result = twelvelabs_client.embed.task.retrieve(task.id)

    # Extract and return the embeddings
    embeddings = []
    for v in task_result.video_embeddings:
        embeddings.append({
            'embedding': v.embedding.float,
            'start_offset_sec': v.start_offset_sec,
            'end_offset_sec': v.end_offset_sec,
            'embedding_scope': v.embedding_scope
        })
    
    return embeddings, task_result

この関数を使用して、ビデオの埋め込みを生成します：

# Example usage
video_url = "https://example.com/your-video.mp4"

# Generate embeddings for the video
embeddings, task_result = generate_embedding(video_url)

print(f"Generated {len(embeddings)} embeddings for the video")
for i, emb in enumerate(embeddings):
    print(f"Embedding {i+1}:")
    print(f"  Scope: {emb['embedding_scope']}")
    print(f"  Time range: {emb['start_offset_sec']} - {emb['end_offset_sec']} seconds")
    print(f"  Embedding vector (first 5 values): {emb['embedding'][:5]}")
    print()

‍

Milvusへの埋め込みデータの挿入

以下は、埋め込みをMilvusに挿入する方法です：

def insert_embeddings(milvus_client, collection_name, task_result, video_url):
    """
    Insert embeddings into the Milvus collection.

    Args:
        milvus_client: The Milvus client instance.
        collection_name (str): The name of the Milvus collection to insert into.
        task_result (EmbeddingsTaskResult): The task result containing video embeddings.
        video_url (str): The URL of the video associated with the embeddings.

    Returns:
        MutationResult: The result of the insert operation.

    This function takes the video embeddings from the task result and inserts them
    into the specified Milvus collection. Each embedding is stored with additional
    metadata including its scope, start and end times, and the associated video URL.
    """
    data = []

    for i, v in enumerate(task_result.video_embeddings):
        data.append({
            "id": i,
            "vector": v.embedding.float,
            "embedding_scope": v.embedding_scope,
            "start_offset_sec": v.start_offset_sec,
            "end_offset_sec": v.end_offset_sec,
            "video_url": video_url
        })

    insert_result = milvus_client.insert(collection_name=collection_name, data=data)
    print(f"Inserted {len(data)} embeddings into Milvus")
    return insert_result

# Usage example
video_url = "https://example.com/your-video.mp4"

# Assuming this function exists from previous step
embeddings, task_result = generate_embedding(video_url)

# Insert embeddings into the Milvus collection
insert_result = insert_embeddings(milvus_client, collection_name, task_result, video_url)
print(insert_result)

類似度検索の実行

def perform_similarity_search(milvus_client, collection_name, query_vector, limit=5):
    """
    Perform a similarity search on the Milvus collection.

    Args:
        milvus_client: The Milvus client instance.
        collection_name (str): The name of the Milvus collection to search in.
        query_vector (list): The query vector to search for similar embeddings.
        limit (int, optional): The maximum number of results to return. Defaults to 5.

    Returns:
        list: A list of search results, where each result is a dictionary containing
              the matched entity's metadata and similarity score.

    This function searches the specified Milvus collection for embeddings similar to
    the given query vector. It returns the top matching results, including metadata
    such as the embedding scope, time range, and associated video URL for each match.
    """
    search_results = milvus_client.search(
        collection_name=collection_name,
        data=[query_vector],
        limit=limit,
        output_fields=["embedding_scope", "start_offset_sec", "end_offset_sec", "video_url"]
    )

    return search_results
    
# define the query vector
# We use the embedding inserted previously as an example. In practice, you can replace it with any video embedding you want to query.
query_vector = task_result.video_embeddings[0].embedding.float

# Perform a similarity search on the Milvus collection
search_results = perform_similarity_search(milvus_client, collection_name, query_vector)

print("Search Results:")
for i, result in enumerate(search_results[0]):
    print(f"Result {i+1}:")
    print(f"  Video URL: {result['entity']['video_url']}")
    print(f"  Time Range: {result['entity']['start_offset_sec']} - {result['entity']['end_offset_sec']} seconds")
    print(f"  Similarity Score: {result['distance']}")
    print()

この実装は以下を行います：

クエリベクトルを受け取り、Milvusコレクション内の類似の埋め込みを検索する perform_similarity_search 関数を定義します。
最も類似したベクトルを見つけるために、Milvusクライアントの search メソッドを使用します。
一致したビデオセグメントに関するメタデータを含む、取得したい出力フィールドを指定します。
クエリビデオを使ってこの関数を使用する例を示します。まず、その埋め込みを生成し、それを使用して検索を行います。
関連するメタデータと類似度スコアを含む検索結果を出力します。

‍

TLDR: Twelve LabsのEmbed APIによるマルチモーダル埋め込み生成と、オープンソースのベクトルデータベースであるMilvusによる効率的な保存・検索を統合し、意味論的（セマンティック）ビデオ検索アプリケーションを作成する方法を学びます。開発環境のセットアップから、ハイブリッド検索や時間的ビデオ分析などの高度な機能の実装まで、ビデオコンテンツの高度な分析・検索システムを構築するための包括的な基礎をカバーしています。この統合ガイドの作成にあたり、Zillizチーム（Jiang Chen

氏およびChen Zhang氏）に多大なるご協力をいただきましたことに深く感謝いたします。

‍

はじめに

‍

前提条件

始める前に、以下の準備ができていることを確認してください：

Milvusサーバーがインストールされ、稼働していること（詳細な手順についてはMilvusインストールガイドを参照）
Twelve LabsのAPIキー（お持ちでない場合は https://playground.twelvelabs.io で登録してください）
システムにPython 3.7以降がインストールされていること

‍

開発環境のセットアップ

プロジェクト用の新しいディレクトリを作成し、そこへ移動します：

mkdir video-search-tutorial
cd video-search-tutorial

仮想環境をセットアップします（任意ですが推奨します）：

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

必要なPythonライブラリをインストールします：

pip install twelvelabs pymilvus

プロジェクト用に新しいPythonファイルを作成します：

touch video_search.py

export TWELVE_LABS_API_KEY='your_api_key_here'

Milvusへの接続

from pymilvus import MilvusClient

# Initialize the Milvus client
milvus_client = MilvusClient("milvus_twelvelabs_demo.db")

print("Successfully connected to Milvus")

‍

ビデオ埋め込み用のMilvusコレクションの作成

# Initialize the collection name
collection_name = "twelvelabs_demo_collection"

# Check if the collection already exists and drop it if it does
if milvus_client.has_collection(collection_name=collection_name):
    milvus_client.drop_collection(collection_name=collection_name)

# Create the collection
milvus_client.create_collection(
    collection_name=collection_name,
    dimension=1024  # The dimension of the Twelve Labs embeddings
)

print(f"Collection '{collection_name}' created successfully")

Twelve Labs Embed APIを使用した埋め込みの生成

まず、Twelve Labs SDKがインストールされていることを確認し、必要なモジュールをインポートします：

from twelvelabs import TwelveLabs
from twelvelabs.models.embed import EmbeddingsTask
import os

# Retrieve the API key from environment variables
TWELVE_LABS_API_KEY = os.getenv('TWELVE_LABS_API_KEY')

Twelve Labsクライアントを初期化します：

twelvelabs_client = TwelveLabs(api_key=TWELVE_LABS_API_KEY)

指定されたビデオURLの埋め込みを生成する関数を作成します：

def generate_embedding(video_url):
		"""
    Generate embeddings for a given video URL using the Twelve Labs API.

    This function creates an embedding task for the specified video URL using
    the Marengo-retrieval-2.6 engine. It monitors the task progress and waits
    for completion. Once done, it retrieves the task result and extracts the
    embeddings along with their associated metadata.

    Args:
        video_url (str): The URL of the video to generate embeddings for.

    Returns:
        tuple: A tuple containing two elements:
            1. list: A list of dictionaries, where each dictionary contains:
                - 'embedding': The embedding vector as a list of floats.
                - 'start_offset_sec': The start time of the segment in seconds.
                - 'end_offset_sec': The end time of the segment in seconds.
                - 'embedding_scope': The scope of the embedding (e.g., 'shot', 'scene').
            2. EmbeddingsTaskResult: The complete task result object from Twelve Labs API.

    Raises:
        Any exceptions raised by the Twelve Labs API during task creation,
        execution, or retrieval.
    """

    # Create an embedding task
    task = twelvelabs_client.embed.task.create(
        engine_name="Marengo-retrieval-2.6",
        video_url=video_url
    )
    print(f"Created task: id={task.id} engine_name={task.engine_name} status={task.status}")

    # Define a callback function to monitor task progress
    def on_task_update(task: EmbeddingsTask):
        print(f"  Status={task.status}")

    # Wait for the task to complete
    status = task.wait_for_done(
        sleep_interval=2,
        callback=on_task_update
    )
    print(f"Embedding done: {status}")

    # Retrieve the task result
    task_result = twelvelabs_client.embed.task.retrieve(task.id)

    # Extract and return the embeddings
    embeddings = []
    for v in task_result.video_embeddings:
        embeddings.append({
            'embedding': v.embedding.float,
            'start_offset_sec': v.start_offset_sec,
            'end_offset_sec': v.end_offset_sec,
            'embedding_scope': v.embedding_scope
        })
    
    return embeddings, task_result

この関数を使用して、ビデオの埋め込みを生成します：

# Example usage
video_url = "https://example.com/your-video.mp4"

# Generate embeddings for the video
embeddings, task_result = generate_embedding(video_url)

print(f"Generated {len(embeddings)} embeddings for the video")
for i, emb in enumerate(embeddings):
    print(f"Embedding {i+1}:")
    print(f"  Scope: {emb['embedding_scope']}")
    print(f"  Time range: {emb['start_offset_sec']} - {emb['end_offset_sec']} seconds")
    print(f"  Embedding vector (first 5 values): {emb['embedding'][:5]}")
    print()

‍

Milvusへの埋め込みデータの挿入

以下は、埋め込みをMilvusに挿入する方法です：

def insert_embeddings(milvus_client, collection_name, task_result, video_url):
    """
    Insert embeddings into the Milvus collection.

    Args:
        milvus_client: The Milvus client instance.
        collection_name (str): The name of the Milvus collection to insert into.
        task_result (EmbeddingsTaskResult): The task result containing video embeddings.
        video_url (str): The URL of the video associated with the embeddings.

    Returns:
        MutationResult: The result of the insert operation.

    This function takes the video embeddings from the task result and inserts them
    into the specified Milvus collection. Each embedding is stored with additional
    metadata including its scope, start and end times, and the associated video URL.
    """
    data = []

    for i, v in enumerate(task_result.video_embeddings):
        data.append({
            "id": i,
            "vector": v.embedding.float,
            "embedding_scope": v.embedding_scope,
            "start_offset_sec": v.start_offset_sec,
            "end_offset_sec": v.end_offset_sec,
            "video_url": video_url
        })

    insert_result = milvus_client.insert(collection_name=collection_name, data=data)
    print(f"Inserted {len(data)} embeddings into Milvus")
    return insert_result

# Usage example
video_url = "https://example.com/your-video.mp4"

# Assuming this function exists from previous step
embeddings, task_result = generate_embedding(video_url)

# Insert embeddings into the Milvus collection
insert_result = insert_embeddings(milvus_client, collection_name, task_result, video_url)
print(insert_result)

類似度検索の実行

def perform_similarity_search(milvus_client, collection_name, query_vector, limit=5):
    """
    Perform a similarity search on the Milvus collection.

    Args:
        milvus_client: The Milvus client instance.
        collection_name (str): The name of the Milvus collection to search in.
        query_vector (list): The query vector to search for similar embeddings.
        limit (int, optional): The maximum number of results to return. Defaults to 5.

    Returns:
        list: A list of search results, where each result is a dictionary containing
              the matched entity's metadata and similarity score.

    This function searches the specified Milvus collection for embeddings similar to
    the given query vector. It returns the top matching results, including metadata
    such as the embedding scope, time range, and associated video URL for each match.
    """
    search_results = milvus_client.search(
        collection_name=collection_name,
        data=[query_vector],
        limit=limit,
        output_fields=["embedding_scope", "start_offset_sec", "end_offset_sec", "video_url"]
    )

    return search_results
    
# define the query vector
# We use the embedding inserted previously as an example. In practice, you can replace it with any video embedding you want to query.
query_vector = task_result.video_embeddings[0].embedding.float

# Perform a similarity search on the Milvus collection
search_results = perform_similarity_search(milvus_client, collection_name, query_vector)

print("Search Results:")
for i, result in enumerate(search_results[0]):
    print(f"Result {i+1}:")
    print(f"  Video URL: {result['entity']['video_url']}")
    print(f"  Time Range: {result['entity']['start_offset_sec']} - {result['entity']['end_offset_sec']} seconds")
    print(f"  Similarity Score: {result['distance']}")
    print()