Tutorial

Tutorial

Tutorial

Building Brand Integration Assistant and Ad Break Finder App with Twelve Labs

Meeran Kim

Meeran Kim

Meeran Kim

Brand Integration Assistant and Ad Break Finder is a tool that helps you instantly understand and filter ad videos through auto-generated tags, and discover the most contextually aligned content based on a selected ad. You can even simulate ad breaks using AI-suggested insertion points and preview how your ad would appear within the content.

Brand Integration Assistant and Ad Break Finder is a tool that helps you instantly understand and filter ad videos through auto-generated tags, and discover the most contextually aligned content based on a selected ad. You can even simulate ad breaks using AI-suggested insertion points and preview how your ad would appear within the content.

Join our newsletter

Receive the latest advancements, tutorials, and industry insights in video understanding

Search, analyze, and explore your videos with AI.

Jul 14, 2025

Jul 14, 2025

Jul 14, 2025

12 Min

12 Min

12 Min

Copy link to article

Copy link to article

Copy link to article

Introduction

Viewers are often overwhelmed by irrelevant ads that don’t align with the content they’re watching. This disconnect leads to frustration and makes ads feel intrusive or poorly timed.

The Brand Integration Assistant & Ad Break Finder App solves this by delivering contextually relevant ad recommendations—ensuring the right message reaches the right audience at the right moment of engagement.

In this tutorial, you’ll learn how the app works across its core features:

  • Automatic Tag Generation: Each uploaded ad is analyzed to generate rich metadata—Topic Category, Emotions, Brands, Target Demographics (Gender & Age), and Location—enabling smart filtering, search, and content matching.

  • Search for Contextually Aligned Content: Use AI-powered similarity search to find content videos that are semantically aligned with your ad, based on both video and text embeddings.

  • Ad Break Recommendation & Simulation: Automatically segment content into chapters and simulate mid-roll ad insertions - creating a seamless, immersive ad experience.


Prerequisites

  • Sign up for the Twelve Labs Playground and generate your API key and create two indexes each for ads and content videos. 

  • Set up a Pinecone account and create an index to store video embeddings.

    • Make sure to set Dimensions to 1024 and Metric to Cosine

  • Find the application’s source code in the corresponding GitHub repository.

  • It’s helpful to have familiarity with JavaScript, TypeScript, and Next.js for a smoother setup and development experience.


Demo

Check out the demo application to try it yourself, or watch the quick demo video below to see how it works in action: https://www.loom.com/share/233cc8cb66ae44218e3cff69afb772d7

You can watch the webinar recording below for the entire demo:


How the App Works  

Once inside the app, users will find two main menus: Ads Library and Contextual Alignment Analysis.

Ads Library - The Ads Library provides brand marketers with an organized view of their ad videos, each enriched with automatically generated tags. Users can filter ads by Topic Category, Emotions, Brands, Gender, Age, and Location, or search by short-tail or long-tail keywords using the Twelve Labs Search API. In this tutorial, we’ll focus on the auto-tag generation feature within the Ads Library.

Contextual Alignment Analysis - This section enables users to find the most contextually relevant content videos for each ad. Powered by Twelve Labs EMBED and GET video API (for tag and video embeddings) and Pinecone (for similarity-based filtering), it surfaces highly aligned content.

Users can then select a content video, generate auto-chapters for ad break insertion, and simulate ad playback at chapter transitions.

The following tutorial will walk through both the content matching and ad simulation features in detail.


Three Main Features of the App and How They Work


Main Feature 1. Automatic Tag Generation (in ‘Ads Library’)

The Ads Library allows users to browse a collection of indexed videos and view their auto-generated tags. These tags help categorize videos by topic, emotion, brand, demographics, and more — all extracted using Twelve Labs' Analyze API.


Step 1 - Generate tags of each video

When a video is loaded and determined to have incomplete or missing metadata, the system calls generateMetadata to obtain tags using Twelve Labs' Analyze API.

⭐️Check out details here for Twelve Labs’ Analyze API

🔁 Where it’s used

You’ll find this call inside processVideoMetadataSingle in page.tsx, like this:

ads-library/page.ts (line 279-290)

     if (!video.user_metadata ||
         Object.keys(video.user_metadata).length === 0 ||
          !video.user_metadata.topic_category &&
          !video.user_metadata.emotions && 
          !video.user_metadata.brands &&
          !video.user_metadata.locations)) {


       setVideosInProcessing(prev => [...prev, videoId]);


       const hashtagText = await generateMetadata(videoId);


       if (hashtagText) {
         const metadata = parseHashtags(hashtagText);

The generateMetadata function is a custom hook that triggers a server-side API call to request AI-generated tags from the Twelve Labs engine.

This triggers the backend handler in api/analyze/route.ts, which constructs a structured and specific prompt for the Twelve Labs analyze API. The prompt ensures that the returned data is well-categorized and consistently formatted — making it easy to convert into tags and display them in the Filter Menu. Here’s the key part of the backend route:

api/analyze/route.ts (line 1 - 85)

import { NextResponse } from 'next/server';


const API_KEY = process.env.TWELVELABS_API_KEY;
const TWELVELABS_API_BASE_URL = process.env.TWELVELABS_API_BASE_URL;


export const maxDuration = 60;


export async function GET(req: Request) {
   const { searchParams } = new URL(req.url);
   const videoId = searchParams.get("videoId");
   const prompt =
   `You are a marketing assistant specialized in generating hashtags for video content.


Based on the input video metadata, generate a list of hashtags labeled by category.


**Output Format:**
Each line must be in the format:
[Category]: [Hashtag]
(e.g., sector: #beauty)




**Allowed Values:**


Gender: Male, Female
Age: 18-25, 25-34, 35-44, 45-54, 55+
Topic: Beauty, Fashion, Tech, Travel, CPG, Food & Bev, Retail, Other
Emotions: sorrow, happiness, laughter, anger, empathy, fear, love, trust, sadness, belonging, guilt, compassion, pride


**Instructions:**


1. Use only the values provided in Allowed Values.
2. Do not invent new values except for Brands and Location. Only use values from the Allowed Values.
3. Output must contain at least one hashtag for each of the following categories:
 - Gender
 - Age
 - Topic
 - Emotions
 - Location
 - Brands


4. Do not output any explanations or category names—only return the final hashtag list.


**Output Example:**


Gender: female
Age: 25-34
Topic: beauty
Emotions: happiness
Location: Los Angeles
Brands: Fenty Beauty


---
`


  


   const url = `${TWELVELABS_API_BASE_URL}/analyze`;
   const options = {
       method: "POST",
       headers: {
           "Content-Type": "application/json",
           "x-api-key": API_KEY,
       },
       body: JSON.stringify({
           prompt: prompt,
           video_id: videoId,
           stream: false
       })
   };


   try {
     const response = await fetch(url, options);

Step 2 - PUT each video to save generated tags

After the tags are generated using the /api/analyze route, the next step is to save them back to the video object in your indexed library. This is done through a PUT API call that updates the video’s metadata in the Twelve Labs index.

⭐️ Check out details here for Twelve Labs’ Update Video Information API

This operation is handled by the updateVideoMetadata hook, which ultimately calls the backend route at api/videos/metadata/route.ts.

❗️To store your custom metadata, make sure to use the key user_metadata when updating each video.

api/videos/metadata/route.ts (line 1-68)

import { NextRequest, NextResponse } from 'next/server';



export async function PUT(request: NextRequest) {
 try {
   // Parse request body
   const body: MetadataUpdateRequest = await request.json();
   const { videoId, indexId, metadata } = body;
  


   // Prepare API request
   const url = `${TWELVELABS_API_BASE_URL}/indexes/${indexId}/videos/${videoId}`;


   const requestBody = {
     user_metadata: {
       source: metadata.source || '',
       sector: metadata.sector || '',
       emotions: metadata.emotions || '',
       brands: metadata.brands || '',
       locations: metadata.locations || '',
       demographics: metadata.demographics || ''
     }
   };


   const options = {
     method: 'PUT',
     headers: {
       'Content-Type': 'application/json',
       'x-api-key': API_KEY,
     },
     body: JSON.stringify(requestBody)
   };


   // Call Twelve Labs API
   const response = await fetch(url, options);


🔁 Where it’s used

You’ll find this call inside processVideoMetadataSingle in page.tsx, like this:

ads-library/page.tsx (line 289-292)

 if (hashtagText) {
         const metadata = parseHashtags(hashtagText);


         await updateVideoMetadata(videoId, adsIndexId, metadata);

📌 What’s in the user_metadata?

The saved user_metadata object includes key fields like:

{
  "gender": "female",
  "age": "25-34",
  "topic": "beauty",
  "emotions": "happiness",
  "location": "Los Angeles",
  "brands": "Fenty Beauty"

This consistent format enables filter-by-category UX, search, and visual grouping in dashboards. These custom metadata are now embedded in the videos retrieved from Twelve Labs, so you can simply use a GET request to fetch each video and display the metadata as needed.

⭐️Check out details here for Twelve Labs’ Retrieve Video Information API


Main Feature 2. Search Similar videos (in ‘Contextual Alignment Analysis’)

The Contextual Alignment Analysis feature helps you find content videos that are most relevant to a selected ad by comparing video and text embeddings. These embeddings are:

  • Generated by Twelve Labs

  • Stored and queried via Pinecone for similarity search

To enable this, we must ensure that:

  • Embeddings exist for all content videos

  • Embeddings exist for the selected ad video

  • All embeddings are stored in the same Pinecone index

❗️When you index a video via Twelve Labs, embeddings are automatically generated and can be retrieved with a Retrieve Video Information API call.


Step 1 - Process Content Video Embeddings

Before performing similarity search, all content videos need their embeddings stored in Pinecone. This is handled by the client-side function processContentVideoEmbeddings().

💡 Internal Flow

🔧 Core Functions

checkVectorExists Checks if the video’s embedding vector is already present in Pinecone. It internally calls the backend route.

api/vectors/exists (line 15-27)

   // Fetch vectors using metadata filter instead of direct ID
   const queryResponse = await index.query({
     vector: new Array(1024).fill(0),
     filter: {
       tl_video_id: videoId
     },
     topK: 1,
     includeMetadata: true
   });


   return NextResponse.json({
     exists: queryResponse.matches.length > 0
   });

If the embedding does not exist, getAndStoreEmbeddings:

  1. Fetches the embedding from Twelve Labs (/api/videos/[videoId]?embed=true)

  2. Stores it in Pinecone via /api/vectors/store

api/videos/[videoId] (line 77-95)

// Base URL
 let url = `${TWELVELABS_API_BASE_URL}/indexes/${indexId}/videos/${videoId}`;


 // Always include embedding query parameters if requested
 if (requestEmbeddings) {
   // Include only supported embedding options
   url += `?embedding_option=visual-text&embedding_option=audio`;
 }


 const options = {
   method: "GET",
   headers: {
     "x-api-key": `${API_KEY}`,
     "Accept": "application/json"
   },
 };


 try {
   const response = await fetch(url, options);

api/vectors/store (line 126-173)

// Create vectors from embedding segments
   const vectors = embedding.video_embedding.segments.map((segment: Segment, index: number) => {
     // Create a meaningful and unique vector ID
     const vectorId = `${vectorIdBase}_segment${index + 1}`;


     const vector = {
       id: vectorId,
       values: segment.float,
       metadata: {
         video_file: actualFileName,
         video_title: videoTitle,
         video_segment: index + 1,
         start_time: segment.start_offset_sec,
         end_time: segment.end_offset_sec,
         scope: segment.embedding_scope,
         tl_video_id: videoId,
         tl_index_id: indexId,
         category
       }
     };


     return vector;
   });


   try {
     const index = getPineconeIndex();


     // Upload vectors in batches
     const batchSize = 100;
     const totalBatches = Math.ceil(vectors.length / batchSize);


     console.log(`🚀 FILENAME DEBUG - Starting vector upload with ${totalBatches} batches...`);


     for (let i = 0; i < vectors.length; i += batchSize) {
       const batch = vectors.slice(i, i + batchSize);
       const batchNumber = Math.floor(i / batchSize) + 1;


       try {
         // Test Pinecone connection before upserting
         try {
           await index.describeIndexStats();
         } catch (statsError) {
           console.error(`❌ Pinecone connection test failed:`, statsError);
           throw new Error(`Failed to connect to Pinecone: ${statsError instanceof Error ? statsError.message : 'Unknown error'}`);
         }


         // Perform the actual upsert
         await index.upsert(batch);


Step 2 - Process Selected Ad Video Embedding

Once a user selects an ad, the app automatically checks whether its embedding is ready. This logic runs inside a useEffect() hook that watches the selected ad:

contextual-analysis/page.tsx (line 296-318)

// Automatically check ONLY the ad video embedding when a video is selected
 useEffect(() => {
   if (selectedVideoId && !isLoadingEmbeddings) {
     const cachedStatus = queryClient.getQueryData(['embeddingStatus', selectedVideoId]) as
       { checked: boolean, ready: boolean } | undefined;


     if (!cachedStatus?.checked) {
       setIsLoadingEmbeddings(true);


       ensureEmbeddings().then(success => {
         queryClient.setQueryData(['embeddingStatus', selectedVideoId], {
           checked: true,
           ready: success
         });


         setEmbeddingsReady(success);
         setIsLoadingEmbeddings(false);
       });
     } else {
       setEmbeddingsReady(cachedStatus.ready);
     }
   }
 }, [selectedVideoId, isLoadingEmbeddings, queryClient]);

🔧 Core Functions

ensureEmbeddings calls checkAndEnsureEmbeddings() to:

❗️The internal workings of checkVectorExists() and getAndStoreEmbeddings() were already explained in Step 1, so we refer to them here without repeating.


Step 3 - Similarity search in Pinecone + TL search 

Once all video embeddings (ad + content) are in place, clicking the "Run Contextual Analysis" button runs two types of similarity searches in parallel:

  • Text-to-Video Search: Uses the textual tags (e.g. sector and emotions) of the selected ad to find semantically relevant content videos.

  •  Video-to-Video Search: Uses the frame-level video embeddings of the selected ad to find visually/contextually similar content clips.

  • Both results are merged and scored, prioritizing matches found in both searches.

Text-to-Video Search

contextual-analysis/page.tsx (line 351-384)

const handleContextualAnalysis = async () => {
  
     try {
       textResults = await textToVideoEmbeddingSearch(selectedVideoId, adsIndexId, contentIndexId);
      
     try {
       videoResults = await videoToVideoEmbeddingSearch(selectedVideoId, adsIndexId, contentIndexId);

textToVideoEmbeddingSearch extracts sector and emotions tags, and the video title from the selected ad.

  • Sends them as text prompts to the api/embeddingSearch/textToVideo route.

  • Twelve Labs generates a text embedding, which is used to query Pinecone for semantically similar content videos.

api/embeddingSearch/textToVideo (line 20-45)

const { data: embedData } = await axios.post(url, formData, {
    // extract embedding vector from text_embedding object
   const textEmbedding = embedData.text_embedding.segments[0].float;

   // Get index and search
   const searchResults = await index.query({
     vector: textEmbedding,
     filter: {
       // video_type: 'ad',
       tl_index_id: indexId,
       scope: 'clip'
     },
     topK: 10,
     includeMetadata: true,
   });

Video-to-Video Search

videoToVideoEmbeddingSearch finds the frame-level segments (vector values) of the selected ad.

  • For each segment, runs a similarity query against the content index in Pinecone.

  • Each result reflects a clip-level match in video embeddings.

api/embeddingSearch/videoToVideo (line 22-50)

// First, get the original video's clip embedding
   const originalClipQuery = await index.query({
     filter: {
       tl_video_id: videoId,
       scope: 'clip'
     },
     topK: 100,
     includeMetadata: true,
     includeValues: true,
     vector: new Array(1024).fill(0)
   });


   // If we found matching clips, search for similar ads for each match
   const similarResults = [];
   if (originalClipQuery.matches.length > 0) {
     for (const originalClip of originalClipQuery.matches) {
       const vectorValues = originalClip.values || new Array(1024).fill(0);
       const queryResult = await index.query({
         vector: vectorValues,
         filter: {
           tl_index_id: indexId,
           scope: 'clip'
         },
         topK: 5,
         includeMetadata: true,
       });
       similarResults.push(queryResult);
     }
   }

Merging Results

Results from both searches are merged by video ID. If a video appears in both searches, its score is boosted by 2x.

contextual-analysis/page.tsx (line 412-428)

 if (combinedResultsMap.has(videoId)) {
           // This is a match found in both searches - update it
           const existingResult = combinedResultsMap.get(videoId);


           // Apply a significant boost for results found in both searches (50% boost)
           const boostMultiplier = 2;


           // Combine the scores: use the max of both scores and apply the boost
           const maxScore = Math.max(existingResult.textScore, result.score);
           const boostedScore = maxScore * boostMultiplier;


           combinedResultsMap.set(videoId, {
             ...existingResult,
             videoScore: result.score,
             finalScore: boostedScore,  // Boosted score for appearing in both searches
             source: "BOTH"
           });

Main Feature 3. GENERATE chapters and implement ad break (in ‘Contextual Alignment Analysis’)

This feature enhances the contextual video recommendation experience by breaking selected content into meaningful chapters and inserting a relevant ad at the end of a selected chapter — simulating a real-world mid-roll ad break.


Step 1 - Auto-Generate Chapters of Selected Content

When a user opens the VideoModal, the app automatically calls the generateChapters API to segment the selected content video.

videoModal.tsx (line 42-47)

 // Fetch chapters data
 const { data: chaptersData, isLoading: isChaptersLoading } = useQuery({
   queryKey: ["chapters", videoId],
   queryFn: () => generateChapters(videoId),
   enabled: isOpen && !!videoId,
 });

Each chapter includes:

  • end: end time of the chapter (used as the ad cue point)

  • chapter_title: a generated short title

  • chapter_summary: a one-sentence description of the scene and why it’s a good ad break

These chapters are visualized on a chapter timeline bar, where each dot marks an end point. Clicking a dot simulates inserting an ad right before that chapter ends.

UX Behavior:

  • Dots appear as chapter end markers.

  • Clicking a dot triggers a "Show Ad" overlay at that cue point.

  • The ad is skippable and plays like a real mid-roll.

Server Logic: Chapter Generation via Twelve Labs

Chapters are generated using Twelve Labs' summarize endpoint with a custom prompt.

api/generateChapters (line 19-30)

 const url = `${TWELVELABS_API_BASE_URL}/summarize`;
     const options = {
         method: "POST",
         headers: {
             "Content-Type": "application/json",
             "x-api-key": `${API_KEY}`,
           },
           body: JSON.stringify({type: "chapter", video_id: videoId, prompt: "Chapterize this video into 3 chapters. For every chapter, describe why it is a strategically appropriate point for placing an advertisement. Do not mention what type of advertisement would be suitable, as the ad content has already been determined. "})
       };


     try {
       const response = await fetch(url, options);

⭐️Check out details here for Twelve Labs’ Summarize API

Client-side Helper: generateChapters

Used by React Query to fetch chapter data:

apiHooks.tsx (line 981-984)

export const generateChapters = async (videoId: string): Promise<ChaptersData> => {
 try {
   const response = await fetch(`/api/generateChapters?videoId=${videoId}`);


Step 2 - Insert and Play Ad at Selected Chapter Break

When a user clicks on a chapter marker:

  1. The content video seeks to 3 seconds before the chapter ends.

  2. When playback reaches the chapter’s end time, the app transitions into an ad playback sequence.

  3. After the ad finishes, the original content resumes from just after the chapter break.

Chapter Marker Click Logic

When a chapter marker is clicked, the player seeks to just before that chapter ends, setting the stage for a mid-roll ad.

VideoModal.tsx (line 102-126)

 // Chapter click handler
 const handleChapterClick = (index: number) => {
   if (playbackSequence === 'ad') {
     return;
   }


   if (!adVideoDetail?.hls?.video_url) {
     console.warn("No ad selected. Please select an ad in the contextual analysis page.");
     return;
   }


   if (!chaptersData) return;


   const chapter = chaptersData.chapters[index];
   setSelectedChapter(index);
   setHasPlayedAd(false);
   setPlaybackSequence('video');
   setShowChapterInfo(true);


   if (playerRef.current) {
     // Start 3 seconds before the chapter end time
     const startTime = Math.max(0, chapter.end - 3);
     playerRef.current.seekTo(startTime, 'seconds');
   }
 };

Progress Monitoring – Trigger Ad at Chapter End

While the content is playing, the app checks if the current play time has reached a chapter endpoint and switches to ad playback if conditions are met.

VideoModal.tsx (line 82-100)

 // Track video progress
 const handleProgress = (state: { playedSeconds: number }) => {
   if (selectedChapter === null || !chaptersData || !adVideoDetail) {
     return;
   }


   const chapter = chaptersData.chapters[selectedChapter];
   const timeDiff = state.playedSeconds - chapter.end;
   const isLastChapter = selectedChapter === chaptersData.chapters.length - 1;


   if (
     playbackSequence === 'video' &&
     !hasPlayedAd &&
     ((isLastChapter && Math.abs(timeDiff) < 0.5) || (!isLastChapter && timeDiff >= 0))
   ) {
     setPlaybackSequence('ad');
     setHasPlayedAd(true);
   }
 };

Ad Playback & Resume Content

After the ad finishes, the app automatically resumes the content from where the chapter left off.

VideoModal.tsx (line 128-136)

 // Ad ended handler
 const handleAdEnded = () => {
   if (selectedChapter === null || !chaptersData) return;
   const chapter = chaptersData.chapters[selectedChapter];
   setPlaybackSequence('video');
   setReturnToTime(chapter.end);
   setIsPlaying(true);
 };

This creates an immersive, chapter-aware viewing experience with smart ad insertions aligned to meaningful content breaks — ideal for showcasing contextually relevant ads at natural pause points.


Conclusion

In this tutorial, we walked through the full flow of Contextual Analysis — from generating and storing video embeddings, to running similarity searches, and finally simulating mid-roll ad insertions using chapter segmentation. By combining Twelve Labs’ multimodal embeddings with Pinecone’s vector filtering, you can deliver smart, content-aware ad experiences. This foundation can be further extended for real-time targeting, A/B testing, or personalized ad delivery at scale.

Introduction

Viewers are often overwhelmed by irrelevant ads that don’t align with the content they’re watching. This disconnect leads to frustration and makes ads feel intrusive or poorly timed.

The Brand Integration Assistant & Ad Break Finder App solves this by delivering contextually relevant ad recommendations—ensuring the right message reaches the right audience at the right moment of engagement.

In this tutorial, you’ll learn how the app works across its core features:

  • Automatic Tag Generation: Each uploaded ad is analyzed to generate rich metadata—Topic Category, Emotions, Brands, Target Demographics (Gender & Age), and Location—enabling smart filtering, search, and content matching.

  • Search for Contextually Aligned Content: Use AI-powered similarity search to find content videos that are semantically aligned with your ad, based on both video and text embeddings.

  • Ad Break Recommendation & Simulation: Automatically segment content into chapters and simulate mid-roll ad insertions - creating a seamless, immersive ad experience.


Prerequisites

  • Sign up for the Twelve Labs Playground and generate your API key and create two indexes each for ads and content videos. 

  • Set up a Pinecone account and create an index to store video embeddings.

    • Make sure to set Dimensions to 1024 and Metric to Cosine

  • Find the application’s source code in the corresponding GitHub repository.

  • It’s helpful to have familiarity with JavaScript, TypeScript, and Next.js for a smoother setup and development experience.


Demo

Check out the demo application to try it yourself, or watch the quick demo video below to see how it works in action: https://www.loom.com/share/233cc8cb66ae44218e3cff69afb772d7

You can watch the webinar recording below for the entire demo:


How the App Works  

Once inside the app, users will find two main menus: Ads Library and Contextual Alignment Analysis.

Ads Library - The Ads Library provides brand marketers with an organized view of their ad videos, each enriched with automatically generated tags. Users can filter ads by Topic Category, Emotions, Brands, Gender, Age, and Location, or search by short-tail or long-tail keywords using the Twelve Labs Search API. In this tutorial, we’ll focus on the auto-tag generation feature within the Ads Library.

Contextual Alignment Analysis - This section enables users to find the most contextually relevant content videos for each ad. Powered by Twelve Labs EMBED and GET video API (for tag and video embeddings) and Pinecone (for similarity-based filtering), it surfaces highly aligned content.

Users can then select a content video, generate auto-chapters for ad break insertion, and simulate ad playback at chapter transitions.

The following tutorial will walk through both the content matching and ad simulation features in detail.


Three Main Features of the App and How They Work


Main Feature 1. Automatic Tag Generation (in ‘Ads Library’)

The Ads Library allows users to browse a collection of indexed videos and view their auto-generated tags. These tags help categorize videos by topic, emotion, brand, demographics, and more — all extracted using Twelve Labs' Analyze API.


Step 1 - Generate tags of each video

When a video is loaded and determined to have incomplete or missing metadata, the system calls generateMetadata to obtain tags using Twelve Labs' Analyze API.

⭐️Check out details here for Twelve Labs’ Analyze API

🔁 Where it’s used

You’ll find this call inside processVideoMetadataSingle in page.tsx, like this:

ads-library/page.ts (line 279-290)

     if (!video.user_metadata ||
         Object.keys(video.user_metadata).length === 0 ||
          !video.user_metadata.topic_category &&
          !video.user_metadata.emotions && 
          !video.user_metadata.brands &&
          !video.user_metadata.locations)) {


       setVideosInProcessing(prev => [...prev, videoId]);


       const hashtagText = await generateMetadata(videoId);


       if (hashtagText) {
         const metadata = parseHashtags(hashtagText);

The generateMetadata function is a custom hook that triggers a server-side API call to request AI-generated tags from the Twelve Labs engine.

This triggers the backend handler in api/analyze/route.ts, which constructs a structured and specific prompt for the Twelve Labs analyze API. The prompt ensures that the returned data is well-categorized and consistently formatted — making it easy to convert into tags and display them in the Filter Menu. Here’s the key part of the backend route:

api/analyze/route.ts (line 1 - 85)

import { NextResponse } from 'next/server';


const API_KEY = process.env.TWELVELABS_API_KEY;
const TWELVELABS_API_BASE_URL = process.env.TWELVELABS_API_BASE_URL;


export const maxDuration = 60;


export async function GET(req: Request) {
   const { searchParams } = new URL(req.url);
   const videoId = searchParams.get("videoId");
   const prompt =
   `You are a marketing assistant specialized in generating hashtags for video content.


Based on the input video metadata, generate a list of hashtags labeled by category.


**Output Format:**
Each line must be in the format:
[Category]: [Hashtag]
(e.g., sector: #beauty)




**Allowed Values:**


Gender: Male, Female
Age: 18-25, 25-34, 35-44, 45-54, 55+
Topic: Beauty, Fashion, Tech, Travel, CPG, Food & Bev, Retail, Other
Emotions: sorrow, happiness, laughter, anger, empathy, fear, love, trust, sadness, belonging, guilt, compassion, pride


**Instructions:**


1. Use only the values provided in Allowed Values.
2. Do not invent new values except for Brands and Location. Only use values from the Allowed Values.
3. Output must contain at least one hashtag for each of the following categories:
 - Gender
 - Age
 - Topic
 - Emotions
 - Location
 - Brands


4. Do not output any explanations or category names—only return the final hashtag list.


**Output Example:**


Gender: female
Age: 25-34
Topic: beauty
Emotions: happiness
Location: Los Angeles
Brands: Fenty Beauty


---
`


  


   const url = `${TWELVELABS_API_BASE_URL}/analyze`;
   const options = {
       method: "POST",
       headers: {
           "Content-Type": "application/json",
           "x-api-key": API_KEY,
       },
       body: JSON.stringify({
           prompt: prompt,
           video_id: videoId,
           stream: false
       })
   };


   try {
     const response = await fetch(url, options);

Step 2 - PUT each video to save generated tags

After the tags are generated using the /api/analyze route, the next step is to save them back to the video object in your indexed library. This is done through a PUT API call that updates the video’s metadata in the Twelve Labs index.

⭐️ Check out details here for Twelve Labs’ Update Video Information API

This operation is handled by the updateVideoMetadata hook, which ultimately calls the backend route at api/videos/metadata/route.ts.

❗️To store your custom metadata, make sure to use the key user_metadata when updating each video.

api/videos/metadata/route.ts (line 1-68)

import { NextRequest, NextResponse } from 'next/server';



export async function PUT(request: NextRequest) {
 try {
   // Parse request body
   const body: MetadataUpdateRequest = await request.json();
   const { videoId, indexId, metadata } = body;
  


   // Prepare API request
   const url = `${TWELVELABS_API_BASE_URL}/indexes/${indexId}/videos/${videoId}`;


   const requestBody = {
     user_metadata: {
       source: metadata.source || '',
       sector: metadata.sector || '',
       emotions: metadata.emotions || '',
       brands: metadata.brands || '',
       locations: metadata.locations || '',
       demographics: metadata.demographics || ''
     }
   };


   const options = {
     method: 'PUT',
     headers: {
       'Content-Type': 'application/json',
       'x-api-key': API_KEY,
     },
     body: JSON.stringify(requestBody)
   };


   // Call Twelve Labs API
   const response = await fetch(url, options);


🔁 Where it’s used

You’ll find this call inside processVideoMetadataSingle in page.tsx, like this:

ads-library/page.tsx (line 289-292)

 if (hashtagText) {
         const metadata = parseHashtags(hashtagText);


         await updateVideoMetadata(videoId, adsIndexId, metadata);

📌 What’s in the user_metadata?

The saved user_metadata object includes key fields like:

{
  "gender": "female",
  "age": "25-34",
  "topic": "beauty",
  "emotions": "happiness",
  "location": "Los Angeles",
  "brands": "Fenty Beauty"

This consistent format enables filter-by-category UX, search, and visual grouping in dashboards. These custom metadata are now embedded in the videos retrieved from Twelve Labs, so you can simply use a GET request to fetch each video and display the metadata as needed.

⭐️Check out details here for Twelve Labs’ Retrieve Video Information API


Main Feature 2. Search Similar videos (in ‘Contextual Alignment Analysis’)

The Contextual Alignment Analysis feature helps you find content videos that are most relevant to a selected ad by comparing video and text embeddings. These embeddings are:

  • Generated by Twelve Labs

  • Stored and queried via Pinecone for similarity search

To enable this, we must ensure that:

  • Embeddings exist for all content videos

  • Embeddings exist for the selected ad video

  • All embeddings are stored in the same Pinecone index

❗️When you index a video via Twelve Labs, embeddings are automatically generated and can be retrieved with a Retrieve Video Information API call.


Step 1 - Process Content Video Embeddings

Before performing similarity search, all content videos need their embeddings stored in Pinecone. This is handled by the client-side function processContentVideoEmbeddings().

💡 Internal Flow

🔧 Core Functions

checkVectorExists Checks if the video’s embedding vector is already present in Pinecone. It internally calls the backend route.

api/vectors/exists (line 15-27)

   // Fetch vectors using metadata filter instead of direct ID
   const queryResponse = await index.query({
     vector: new Array(1024).fill(0),
     filter: {
       tl_video_id: videoId
     },
     topK: 1,
     includeMetadata: true
   });


   return NextResponse.json({
     exists: queryResponse.matches.length > 0
   });

If the embedding does not exist, getAndStoreEmbeddings:

  1. Fetches the embedding from Twelve Labs (/api/videos/[videoId]?embed=true)

  2. Stores it in Pinecone via /api/vectors/store

api/videos/[videoId] (line 77-95)

// Base URL
 let url = `${TWELVELABS_API_BASE_URL}/indexes/${indexId}/videos/${videoId}`;


 // Always include embedding query parameters if requested
 if (requestEmbeddings) {
   // Include only supported embedding options
   url += `?embedding_option=visual-text&embedding_option=audio`;
 }


 const options = {
   method: "GET",
   headers: {
     "x-api-key": `${API_KEY}`,
     "Accept": "application/json"
   },
 };


 try {
   const response = await fetch(url, options);

api/vectors/store (line 126-173)

// Create vectors from embedding segments
   const vectors = embedding.video_embedding.segments.map((segment: Segment, index: number) => {
     // Create a meaningful and unique vector ID
     const vectorId = `${vectorIdBase}_segment${index + 1}`;


     const vector = {
       id: vectorId,
       values: segment.float,
       metadata: {
         video_file: actualFileName,
         video_title: videoTitle,
         video_segment: index + 1,
         start_time: segment.start_offset_sec,
         end_time: segment.end_offset_sec,
         scope: segment.embedding_scope,
         tl_video_id: videoId,
         tl_index_id: indexId,
         category
       }
     };


     return vector;
   });


   try {
     const index = getPineconeIndex();


     // Upload vectors in batches
     const batchSize = 100;
     const totalBatches = Math.ceil(vectors.length / batchSize);


     console.log(`🚀 FILENAME DEBUG - Starting vector upload with ${totalBatches} batches...`);


     for (let i = 0; i < vectors.length; i += batchSize) {
       const batch = vectors.slice(i, i + batchSize);
       const batchNumber = Math.floor(i / batchSize) + 1;


       try {
         // Test Pinecone connection before upserting
         try {
           await index.describeIndexStats();
         } catch (statsError) {
           console.error(`❌ Pinecone connection test failed:`, statsError);
           throw new Error(`Failed to connect to Pinecone: ${statsError instanceof Error ? statsError.message : 'Unknown error'}`);
         }


         // Perform the actual upsert
         await index.upsert(batch);


Step 2 - Process Selected Ad Video Embedding

Once a user selects an ad, the app automatically checks whether its embedding is ready. This logic runs inside a useEffect() hook that watches the selected ad:

contextual-analysis/page.tsx (line 296-318)

// Automatically check ONLY the ad video embedding when a video is selected
 useEffect(() => {
   if (selectedVideoId && !isLoadingEmbeddings) {
     const cachedStatus = queryClient.getQueryData(['embeddingStatus', selectedVideoId]) as
       { checked: boolean, ready: boolean } | undefined;


     if (!cachedStatus?.checked) {
       setIsLoadingEmbeddings(true);


       ensureEmbeddings().then(success => {
         queryClient.setQueryData(['embeddingStatus', selectedVideoId], {
           checked: true,
           ready: success
         });


         setEmbeddingsReady(success);
         setIsLoadingEmbeddings(false);
       });
     } else {
       setEmbeddingsReady(cachedStatus.ready);
     }
   }
 }, [selectedVideoId, isLoadingEmbeddings, queryClient]);

🔧 Core Functions

ensureEmbeddings calls checkAndEnsureEmbeddings() to:

❗️The internal workings of checkVectorExists() and getAndStoreEmbeddings() were already explained in Step 1, so we refer to them here without repeating.


Step 3 - Similarity search in Pinecone + TL search 

Once all video embeddings (ad + content) are in place, clicking the "Run Contextual Analysis" button runs two types of similarity searches in parallel:

  • Text-to-Video Search: Uses the textual tags (e.g. sector and emotions) of the selected ad to find semantically relevant content videos.

  •  Video-to-Video Search: Uses the frame-level video embeddings of the selected ad to find visually/contextually similar content clips.

  • Both results are merged and scored, prioritizing matches found in both searches.

Text-to-Video Search

contextual-analysis/page.tsx (line 351-384)

const handleContextualAnalysis = async () => {
  
     try {
       textResults = await textToVideoEmbeddingSearch(selectedVideoId, adsIndexId, contentIndexId);
      
     try {
       videoResults = await videoToVideoEmbeddingSearch(selectedVideoId, adsIndexId, contentIndexId);

textToVideoEmbeddingSearch extracts sector and emotions tags, and the video title from the selected ad.

  • Sends them as text prompts to the api/embeddingSearch/textToVideo route.

  • Twelve Labs generates a text embedding, which is used to query Pinecone for semantically similar content videos.

api/embeddingSearch/textToVideo (line 20-45)

const { data: embedData } = await axios.post(url, formData, {
    // extract embedding vector from text_embedding object
   const textEmbedding = embedData.text_embedding.segments[0].float;

   // Get index and search
   const searchResults = await index.query({
     vector: textEmbedding,
     filter: {
       // video_type: 'ad',
       tl_index_id: indexId,
       scope: 'clip'
     },
     topK: 10,
     includeMetadata: true,
   });

Video-to-Video Search

videoToVideoEmbeddingSearch finds the frame-level segments (vector values) of the selected ad.

  • For each segment, runs a similarity query against the content index in Pinecone.

  • Each result reflects a clip-level match in video embeddings.

api/embeddingSearch/videoToVideo (line 22-50)

// First, get the original video's clip embedding
   const originalClipQuery = await index.query({
     filter: {
       tl_video_id: videoId,
       scope: 'clip'
     },
     topK: 100,
     includeMetadata: true,
     includeValues: true,
     vector: new Array(1024).fill(0)
   });


   // If we found matching clips, search for similar ads for each match
   const similarResults = [];
   if (originalClipQuery.matches.length > 0) {
     for (const originalClip of originalClipQuery.matches) {
       const vectorValues = originalClip.values || new Array(1024).fill(0);
       const queryResult = await index.query({
         vector: vectorValues,
         filter: {
           tl_index_id: indexId,
           scope: 'clip'
         },
         topK: 5,
         includeMetadata: true,
       });
       similarResults.push(queryResult);
     }
   }

Merging Results

Results from both searches are merged by video ID. If a video appears in both searches, its score is boosted by 2x.

contextual-analysis/page.tsx (line 412-428)

 if (combinedResultsMap.has(videoId)) {
           // This is a match found in both searches - update it
           const existingResult = combinedResultsMap.get(videoId);


           // Apply a significant boost for results found in both searches (50% boost)
           const boostMultiplier = 2;


           // Combine the scores: use the max of both scores and apply the boost
           const maxScore = Math.max(existingResult.textScore, result.score);
           const boostedScore = maxScore * boostMultiplier;


           combinedResultsMap.set(videoId, {
             ...existingResult,
             videoScore: result.score,
             finalScore: boostedScore,  // Boosted score for appearing in both searches
             source: "BOTH"
           });

Main Feature 3. GENERATE chapters and implement ad break (in ‘Contextual Alignment Analysis’)

This feature enhances the contextual video recommendation experience by breaking selected content into meaningful chapters and inserting a relevant ad at the end of a selected chapter — simulating a real-world mid-roll ad break.


Step 1 - Auto-Generate Chapters of Selected Content

When a user opens the VideoModal, the app automatically calls the generateChapters API to segment the selected content video.

videoModal.tsx (line 42-47)

 // Fetch chapters data
 const { data: chaptersData, isLoading: isChaptersLoading } = useQuery({
   queryKey: ["chapters", videoId],
   queryFn: () => generateChapters(videoId),
   enabled: isOpen && !!videoId,
 });

Each chapter includes:

  • end: end time of the chapter (used as the ad cue point)

  • chapter_title: a generated short title

  • chapter_summary: a one-sentence description of the scene and why it’s a good ad break

These chapters are visualized on a chapter timeline bar, where each dot marks an end point. Clicking a dot simulates inserting an ad right before that chapter ends.

UX Behavior:

  • Dots appear as chapter end markers.

  • Clicking a dot triggers a "Show Ad" overlay at that cue point.

  • The ad is skippable and plays like a real mid-roll.

Server Logic: Chapter Generation via Twelve Labs

Chapters are generated using Twelve Labs' summarize endpoint with a custom prompt.

api/generateChapters (line 19-30)

 const url = `${TWELVELABS_API_BASE_URL}/summarize`;
     const options = {
         method: "POST",
         headers: {
             "Content-Type": "application/json",
             "x-api-key": `${API_KEY}`,
           },
           body: JSON.stringify({type: "chapter", video_id: videoId, prompt: "Chapterize this video into 3 chapters. For every chapter, describe why it is a strategically appropriate point for placing an advertisement. Do not mention what type of advertisement would be suitable, as the ad content has already been determined. "})
       };


     try {
       const response = await fetch(url, options);

⭐️Check out details here for Twelve Labs’ Summarize API

Client-side Helper: generateChapters

Used by React Query to fetch chapter data:

apiHooks.tsx (line 981-984)

export const generateChapters = async (videoId: string): Promise<ChaptersData> => {
 try {
   const response = await fetch(`/api/generateChapters?videoId=${videoId}`);


Step 2 - Insert and Play Ad at Selected Chapter Break

When a user clicks on a chapter marker:

  1. The content video seeks to 3 seconds before the chapter ends.

  2. When playback reaches the chapter’s end time, the app transitions into an ad playback sequence.

  3. After the ad finishes, the original content resumes from just after the chapter break.

Chapter Marker Click Logic

When a chapter marker is clicked, the player seeks to just before that chapter ends, setting the stage for a mid-roll ad.

VideoModal.tsx (line 102-126)

 // Chapter click handler
 const handleChapterClick = (index: number) => {
   if (playbackSequence === 'ad') {
     return;
   }


   if (!adVideoDetail?.hls?.video_url) {
     console.warn("No ad selected. Please select an ad in the contextual analysis page.");
     return;
   }


   if (!chaptersData) return;


   const chapter = chaptersData.chapters[index];
   setSelectedChapter(index);
   setHasPlayedAd(false);
   setPlaybackSequence('video');
   setShowChapterInfo(true);


   if (playerRef.current) {
     // Start 3 seconds before the chapter end time
     const startTime = Math.max(0, chapter.end - 3);
     playerRef.current.seekTo(startTime, 'seconds');
   }
 };

Progress Monitoring – Trigger Ad at Chapter End

While the content is playing, the app checks if the current play time has reached a chapter endpoint and switches to ad playback if conditions are met.

VideoModal.tsx (line 82-100)

 // Track video progress
 const handleProgress = (state: { playedSeconds: number }) => {
   if (selectedChapter === null || !chaptersData || !adVideoDetail) {
     return;
   }


   const chapter = chaptersData.chapters[selectedChapter];
   const timeDiff = state.playedSeconds - chapter.end;
   const isLastChapter = selectedChapter === chaptersData.chapters.length - 1;


   if (
     playbackSequence === 'video' &&
     !hasPlayedAd &&
     ((isLastChapter && Math.abs(timeDiff) < 0.5) || (!isLastChapter && timeDiff >= 0))
   ) {
     setPlaybackSequence('ad');
     setHasPlayedAd(true);
   }
 };

Ad Playback & Resume Content

After the ad finishes, the app automatically resumes the content from where the chapter left off.

VideoModal.tsx (line 128-136)

 // Ad ended handler
 const handleAdEnded = () => {
   if (selectedChapter === null || !chaptersData) return;
   const chapter = chaptersData.chapters[selectedChapter];
   setPlaybackSequence('video');
   setReturnToTime(chapter.end);
   setIsPlaying(true);
 };

This creates an immersive, chapter-aware viewing experience with smart ad insertions aligned to meaningful content breaks — ideal for showcasing contextually relevant ads at natural pause points.


Conclusion

In this tutorial, we walked through the full flow of Contextual Analysis — from generating and storing video embeddings, to running similarity searches, and finally simulating mid-roll ad insertions using chapter segmentation. By combining Twelve Labs’ multimodal embeddings with Pinecone’s vector filtering, you can deliver smart, content-aware ad experiences. This foundation can be further extended for real-time targeting, A/B testing, or personalized ad delivery at scale.

Introduction

Viewers are often overwhelmed by irrelevant ads that don’t align with the content they’re watching. This disconnect leads to frustration and makes ads feel intrusive or poorly timed.

The Brand Integration Assistant & Ad Break Finder App solves this by delivering contextually relevant ad recommendations—ensuring the right message reaches the right audience at the right moment of engagement.

In this tutorial, you’ll learn how the app works across its core features:

  • Automatic Tag Generation: Each uploaded ad is analyzed to generate rich metadata—Topic Category, Emotions, Brands, Target Demographics (Gender & Age), and Location—enabling smart filtering, search, and content matching.

  • Search for Contextually Aligned Content: Use AI-powered similarity search to find content videos that are semantically aligned with your ad, based on both video and text embeddings.

  • Ad Break Recommendation & Simulation: Automatically segment content into chapters and simulate mid-roll ad insertions - creating a seamless, immersive ad experience.


Prerequisites

  • Sign up for the Twelve Labs Playground and generate your API key and create two indexes each for ads and content videos. 

  • Set up a Pinecone account and create an index to store video embeddings.

    • Make sure to set Dimensions to 1024 and Metric to Cosine

  • Find the application’s source code in the corresponding GitHub repository.

  • It’s helpful to have familiarity with JavaScript, TypeScript, and Next.js for a smoother setup and development experience.


Demo

Check out the demo application to try it yourself, or watch the quick demo video below to see how it works in action: https://www.loom.com/share/233cc8cb66ae44218e3cff69afb772d7

You can watch the webinar recording below for the entire demo:


How the App Works  

Once inside the app, users will find two main menus: Ads Library and Contextual Alignment Analysis.

Ads Library - The Ads Library provides brand marketers with an organized view of their ad videos, each enriched with automatically generated tags. Users can filter ads by Topic Category, Emotions, Brands, Gender, Age, and Location, or search by short-tail or long-tail keywords using the Twelve Labs Search API. In this tutorial, we’ll focus on the auto-tag generation feature within the Ads Library.

Contextual Alignment Analysis - This section enables users to find the most contextually relevant content videos for each ad. Powered by Twelve Labs EMBED and GET video API (for tag and video embeddings) and Pinecone (for similarity-based filtering), it surfaces highly aligned content.

Users can then select a content video, generate auto-chapters for ad break insertion, and simulate ad playback at chapter transitions.

The following tutorial will walk through both the content matching and ad simulation features in detail.


Three Main Features of the App and How They Work


Main Feature 1. Automatic Tag Generation (in ‘Ads Library’)

The Ads Library allows users to browse a collection of indexed videos and view their auto-generated tags. These tags help categorize videos by topic, emotion, brand, demographics, and more — all extracted using Twelve Labs' Analyze API.


Step 1 - Generate tags of each video

When a video is loaded and determined to have incomplete or missing metadata, the system calls generateMetadata to obtain tags using Twelve Labs' Analyze API.

⭐️Check out details here for Twelve Labs’ Analyze API

🔁 Where it’s used

You’ll find this call inside processVideoMetadataSingle in page.tsx, like this:

ads-library/page.ts (line 279-290)

     if (!video.user_metadata ||
         Object.keys(video.user_metadata).length === 0 ||
          !video.user_metadata.topic_category &&
          !video.user_metadata.emotions && 
          !video.user_metadata.brands &&
          !video.user_metadata.locations)) {


       setVideosInProcessing(prev => [...prev, videoId]);


       const hashtagText = await generateMetadata(videoId);


       if (hashtagText) {
         const metadata = parseHashtags(hashtagText);

The generateMetadata function is a custom hook that triggers a server-side API call to request AI-generated tags from the Twelve Labs engine.

This triggers the backend handler in api/analyze/route.ts, which constructs a structured and specific prompt for the Twelve Labs analyze API. The prompt ensures that the returned data is well-categorized and consistently formatted — making it easy to convert into tags and display them in the Filter Menu. Here’s the key part of the backend route:

api/analyze/route.ts (line 1 - 85)

import { NextResponse } from 'next/server';


const API_KEY = process.env.TWELVELABS_API_KEY;
const TWELVELABS_API_BASE_URL = process.env.TWELVELABS_API_BASE_URL;


export const maxDuration = 60;


export async function GET(req: Request) {
   const { searchParams } = new URL(req.url);
   const videoId = searchParams.get("videoId");
   const prompt =
   `You are a marketing assistant specialized in generating hashtags for video content.


Based on the input video metadata, generate a list of hashtags labeled by category.


**Output Format:**
Each line must be in the format:
[Category]: [Hashtag]
(e.g., sector: #beauty)




**Allowed Values:**


Gender: Male, Female
Age: 18-25, 25-34, 35-44, 45-54, 55+
Topic: Beauty, Fashion, Tech, Travel, CPG, Food & Bev, Retail, Other
Emotions: sorrow, happiness, laughter, anger, empathy, fear, love, trust, sadness, belonging, guilt, compassion, pride


**Instructions:**


1. Use only the values provided in Allowed Values.
2. Do not invent new values except for Brands and Location. Only use values from the Allowed Values.
3. Output must contain at least one hashtag for each of the following categories:
 - Gender
 - Age
 - Topic
 - Emotions
 - Location
 - Brands


4. Do not output any explanations or category names—only return the final hashtag list.


**Output Example:**


Gender: female
Age: 25-34
Topic: beauty
Emotions: happiness
Location: Los Angeles
Brands: Fenty Beauty


---
`


  


   const url = `${TWELVELABS_API_BASE_URL}/analyze`;
   const options = {
       method: "POST",
       headers: {
           "Content-Type": "application/json",
           "x-api-key": API_KEY,
       },
       body: JSON.stringify({
           prompt: prompt,
           video_id: videoId,
           stream: false
       })
   };


   try {
     const response = await fetch(url, options);

Step 2 - PUT each video to save generated tags

After the tags are generated using the /api/analyze route, the next step is to save them back to the video object in your indexed library. This is done through a PUT API call that updates the video’s metadata in the Twelve Labs index.

⭐️ Check out details here for Twelve Labs’ Update Video Information API

This operation is handled by the updateVideoMetadata hook, which ultimately calls the backend route at api/videos/metadata/route.ts.

❗️To store your custom metadata, make sure to use the key user_metadata when updating each video.

api/videos/metadata/route.ts (line 1-68)

import { NextRequest, NextResponse } from 'next/server';



export async function PUT(request: NextRequest) {
 try {
   // Parse request body
   const body: MetadataUpdateRequest = await request.json();
   const { videoId, indexId, metadata } = body;
  


   // Prepare API request
   const url = `${TWELVELABS_API_BASE_URL}/indexes/${indexId}/videos/${videoId}`;


   const requestBody = {
     user_metadata: {
       source: metadata.source || '',
       sector: metadata.sector || '',
       emotions: metadata.emotions || '',
       brands: metadata.brands || '',
       locations: metadata.locations || '',
       demographics: metadata.demographics || ''
     }
   };


   const options = {
     method: 'PUT',
     headers: {
       'Content-Type': 'application/json',
       'x-api-key': API_KEY,
     },
     body: JSON.stringify(requestBody)
   };


   // Call Twelve Labs API
   const response = await fetch(url, options);


🔁 Where it’s used

You’ll find this call inside processVideoMetadataSingle in page.tsx, like this:

ads-library/page.tsx (line 289-292)

 if (hashtagText) {
         const metadata = parseHashtags(hashtagText);


         await updateVideoMetadata(videoId, adsIndexId, metadata);

📌 What’s in the user_metadata?

The saved user_metadata object includes key fields like:

{
  "gender": "female",
  "age": "25-34",
  "topic": "beauty",
  "emotions": "happiness",
  "location": "Los Angeles",
  "brands": "Fenty Beauty"

This consistent format enables filter-by-category UX, search, and visual grouping in dashboards. These custom metadata are now embedded in the videos retrieved from Twelve Labs, so you can simply use a GET request to fetch each video and display the metadata as needed.

⭐️Check out details here for Twelve Labs’ Retrieve Video Information API


Main Feature 2. Search Similar videos (in ‘Contextual Alignment Analysis’)

The Contextual Alignment Analysis feature helps you find content videos that are most relevant to a selected ad by comparing video and text embeddings. These embeddings are:

  • Generated by Twelve Labs

  • Stored and queried via Pinecone for similarity search

To enable this, we must ensure that:

  • Embeddings exist for all content videos

  • Embeddings exist for the selected ad video

  • All embeddings are stored in the same Pinecone index

❗️When you index a video via Twelve Labs, embeddings are automatically generated and can be retrieved with a Retrieve Video Information API call.


Step 1 - Process Content Video Embeddings

Before performing similarity search, all content videos need their embeddings stored in Pinecone. This is handled by the client-side function processContentVideoEmbeddings().

💡 Internal Flow

🔧 Core Functions

checkVectorExists Checks if the video’s embedding vector is already present in Pinecone. It internally calls the backend route.

api/vectors/exists (line 15-27)

   // Fetch vectors using metadata filter instead of direct ID
   const queryResponse = await index.query({
     vector: new Array(1024).fill(0),
     filter: {
       tl_video_id: videoId
     },
     topK: 1,
     includeMetadata: true
   });


   return NextResponse.json({
     exists: queryResponse.matches.length > 0
   });

If the embedding does not exist, getAndStoreEmbeddings:

  1. Fetches the embedding from Twelve Labs (/api/videos/[videoId]?embed=true)

  2. Stores it in Pinecone via /api/vectors/store

api/videos/[videoId] (line 77-95)

// Base URL
 let url = `${TWELVELABS_API_BASE_URL}/indexes/${indexId}/videos/${videoId}`;


 // Always include embedding query parameters if requested
 if (requestEmbeddings) {
   // Include only supported embedding options
   url += `?embedding_option=visual-text&embedding_option=audio`;
 }


 const options = {
   method: "GET",
   headers: {
     "x-api-key": `${API_KEY}`,
     "Accept": "application/json"
   },
 };


 try {
   const response = await fetch(url, options);

api/vectors/store (line 126-173)

// Create vectors from embedding segments
   const vectors = embedding.video_embedding.segments.map((segment: Segment, index: number) => {
     // Create a meaningful and unique vector ID
     const vectorId = `${vectorIdBase}_segment${index + 1}`;


     const vector = {
       id: vectorId,
       values: segment.float,
       metadata: {
         video_file: actualFileName,
         video_title: videoTitle,
         video_segment: index + 1,
         start_time: segment.start_offset_sec,
         end_time: segment.end_offset_sec,
         scope: segment.embedding_scope,
         tl_video_id: videoId,
         tl_index_id: indexId,
         category
       }
     };


     return vector;
   });


   try {
     const index = getPineconeIndex();


     // Upload vectors in batches
     const batchSize = 100;
     const totalBatches = Math.ceil(vectors.length / batchSize);


     console.log(`🚀 FILENAME DEBUG - Starting vector upload with ${totalBatches} batches...`);


     for (let i = 0; i < vectors.length; i += batchSize) {
       const batch = vectors.slice(i, i + batchSize);
       const batchNumber = Math.floor(i / batchSize) + 1;


       try {
         // Test Pinecone connection before upserting
         try {
           await index.describeIndexStats();
         } catch (statsError) {
           console.error(`❌ Pinecone connection test failed:`, statsError);
           throw new Error(`Failed to connect to Pinecone: ${statsError instanceof Error ? statsError.message : 'Unknown error'}`);
         }


         // Perform the actual upsert
         await index.upsert(batch);


Step 2 - Process Selected Ad Video Embedding

Once a user selects an ad, the app automatically checks whether its embedding is ready. This logic runs inside a useEffect() hook that watches the selected ad:

contextual-analysis/page.tsx (line 296-318)

// Automatically check ONLY the ad video embedding when a video is selected
 useEffect(() => {
   if (selectedVideoId && !isLoadingEmbeddings) {
     const cachedStatus = queryClient.getQueryData(['embeddingStatus', selectedVideoId]) as
       { checked: boolean, ready: boolean } | undefined;


     if (!cachedStatus?.checked) {
       setIsLoadingEmbeddings(true);


       ensureEmbeddings().then(success => {
         queryClient.setQueryData(['embeddingStatus', selectedVideoId], {
           checked: true,
           ready: success
         });


         setEmbeddingsReady(success);
         setIsLoadingEmbeddings(false);
       });
     } else {
       setEmbeddingsReady(cachedStatus.ready);
     }
   }
 }, [selectedVideoId, isLoadingEmbeddings, queryClient]);

🔧 Core Functions

ensureEmbeddings calls checkAndEnsureEmbeddings() to:

❗️The internal workings of checkVectorExists() and getAndStoreEmbeddings() were already explained in Step 1, so we refer to them here without repeating.


Step 3 - Similarity search in Pinecone + TL search 

Once all video embeddings (ad + content) are in place, clicking the "Run Contextual Analysis" button runs two types of similarity searches in parallel:

  • Text-to-Video Search: Uses the textual tags (e.g. sector and emotions) of the selected ad to find semantically relevant content videos.

  •  Video-to-Video Search: Uses the frame-level video embeddings of the selected ad to find visually/contextually similar content clips.

  • Both results are merged and scored, prioritizing matches found in both searches.

Text-to-Video Search

contextual-analysis/page.tsx (line 351-384)

const handleContextualAnalysis = async () => {
  
     try {
       textResults = await textToVideoEmbeddingSearch(selectedVideoId, adsIndexId, contentIndexId);
      
     try {
       videoResults = await videoToVideoEmbeddingSearch(selectedVideoId, adsIndexId, contentIndexId);

textToVideoEmbeddingSearch extracts sector and emotions tags, and the video title from the selected ad.

  • Sends them as text prompts to the api/embeddingSearch/textToVideo route.

  • Twelve Labs generates a text embedding, which is used to query Pinecone for semantically similar content videos.

api/embeddingSearch/textToVideo (line 20-45)

const { data: embedData } = await axios.post(url, formData, {
    // extract embedding vector from text_embedding object
   const textEmbedding = embedData.text_embedding.segments[0].float;

   // Get index and search
   const searchResults = await index.query({
     vector: textEmbedding,
     filter: {
       // video_type: 'ad',
       tl_index_id: indexId,
       scope: 'clip'
     },
     topK: 10,
     includeMetadata: true,
   });

Video-to-Video Search

videoToVideoEmbeddingSearch finds the frame-level segments (vector values) of the selected ad.

  • For each segment, runs a similarity query against the content index in Pinecone.

  • Each result reflects a clip-level match in video embeddings.

api/embeddingSearch/videoToVideo (line 22-50)

// First, get the original video's clip embedding
   const originalClipQuery = await index.query({
     filter: {
       tl_video_id: videoId,
       scope: 'clip'
     },
     topK: 100,
     includeMetadata: true,
     includeValues: true,
     vector: new Array(1024).fill(0)
   });


   // If we found matching clips, search for similar ads for each match
   const similarResults = [];
   if (originalClipQuery.matches.length > 0) {
     for (const originalClip of originalClipQuery.matches) {
       const vectorValues = originalClip.values || new Array(1024).fill(0);
       const queryResult = await index.query({
         vector: vectorValues,
         filter: {
           tl_index_id: indexId,
           scope: 'clip'
         },
         topK: 5,
         includeMetadata: true,
       });
       similarResults.push(queryResult);
     }
   }

Merging Results

Results from both searches are merged by video ID. If a video appears in both searches, its score is boosted by 2x.

contextual-analysis/page.tsx (line 412-428)

 if (combinedResultsMap.has(videoId)) {
           // This is a match found in both searches - update it
           const existingResult = combinedResultsMap.get(videoId);


           // Apply a significant boost for results found in both searches (50% boost)
           const boostMultiplier = 2;


           // Combine the scores: use the max of both scores and apply the boost
           const maxScore = Math.max(existingResult.textScore, result.score);
           const boostedScore = maxScore * boostMultiplier;


           combinedResultsMap.set(videoId, {
             ...existingResult,
             videoScore: result.score,
             finalScore: boostedScore,  // Boosted score for appearing in both searches
             source: "BOTH"
           });

Main Feature 3. GENERATE chapters and implement ad break (in ‘Contextual Alignment Analysis’)

This feature enhances the contextual video recommendation experience by breaking selected content into meaningful chapters and inserting a relevant ad at the end of a selected chapter — simulating a real-world mid-roll ad break.


Step 1 - Auto-Generate Chapters of Selected Content

When a user opens the VideoModal, the app automatically calls the generateChapters API to segment the selected content video.

videoModal.tsx (line 42-47)

 // Fetch chapters data
 const { data: chaptersData, isLoading: isChaptersLoading } = useQuery({
   queryKey: ["chapters", videoId],
   queryFn: () => generateChapters(videoId),
   enabled: isOpen && !!videoId,
 });

Each chapter includes:

  • end: end time of the chapter (used as the ad cue point)

  • chapter_title: a generated short title

  • chapter_summary: a one-sentence description of the scene and why it’s a good ad break

These chapters are visualized on a chapter timeline bar, where each dot marks an end point. Clicking a dot simulates inserting an ad right before that chapter ends.

UX Behavior:

  • Dots appear as chapter end markers.

  • Clicking a dot triggers a "Show Ad" overlay at that cue point.

  • The ad is skippable and plays like a real mid-roll.

Server Logic: Chapter Generation via Twelve Labs

Chapters are generated using Twelve Labs' summarize endpoint with a custom prompt.

api/generateChapters (line 19-30)

 const url = `${TWELVELABS_API_BASE_URL}/summarize`;
     const options = {
         method: "POST",
         headers: {
             "Content-Type": "application/json",
             "x-api-key": `${API_KEY}`,
           },
           body: JSON.stringify({type: "chapter", video_id: videoId, prompt: "Chapterize this video into 3 chapters. For every chapter, describe why it is a strategically appropriate point for placing an advertisement. Do not mention what type of advertisement would be suitable, as the ad content has already been determined. "})
       };


     try {
       const response = await fetch(url, options);

⭐️Check out details here for Twelve Labs’ Summarize API

Client-side Helper: generateChapters

Used by React Query to fetch chapter data:

apiHooks.tsx (line 981-984)

export const generateChapters = async (videoId: string): Promise<ChaptersData> => {
 try {
   const response = await fetch(`/api/generateChapters?videoId=${videoId}`);


Step 2 - Insert and Play Ad at Selected Chapter Break

When a user clicks on a chapter marker:

  1. The content video seeks to 3 seconds before the chapter ends.

  2. When playback reaches the chapter’s end time, the app transitions into an ad playback sequence.

  3. After the ad finishes, the original content resumes from just after the chapter break.

Chapter Marker Click Logic

When a chapter marker is clicked, the player seeks to just before that chapter ends, setting the stage for a mid-roll ad.

VideoModal.tsx (line 102-126)

 // Chapter click handler
 const handleChapterClick = (index: number) => {
   if (playbackSequence === 'ad') {
     return;
   }


   if (!adVideoDetail?.hls?.video_url) {
     console.warn("No ad selected. Please select an ad in the contextual analysis page.");
     return;
   }


   if (!chaptersData) return;


   const chapter = chaptersData.chapters[index];
   setSelectedChapter(index);
   setHasPlayedAd(false);
   setPlaybackSequence('video');
   setShowChapterInfo(true);


   if (playerRef.current) {
     // Start 3 seconds before the chapter end time
     const startTime = Math.max(0, chapter.end - 3);
     playerRef.current.seekTo(startTime, 'seconds');
   }
 };

Progress Monitoring – Trigger Ad at Chapter End

While the content is playing, the app checks if the current play time has reached a chapter endpoint and switches to ad playback if conditions are met.

VideoModal.tsx (line 82-100)

 // Track video progress
 const handleProgress = (state: { playedSeconds: number }) => {
   if (selectedChapter === null || !chaptersData || !adVideoDetail) {
     return;
   }


   const chapter = chaptersData.chapters[selectedChapter];
   const timeDiff = state.playedSeconds - chapter.end;
   const isLastChapter = selectedChapter === chaptersData.chapters.length - 1;


   if (
     playbackSequence === 'video' &&
     !hasPlayedAd &&
     ((isLastChapter && Math.abs(timeDiff) < 0.5) || (!isLastChapter && timeDiff >= 0))
   ) {
     setPlaybackSequence('ad');
     setHasPlayedAd(true);
   }
 };

Ad Playback & Resume Content

After the ad finishes, the app automatically resumes the content from where the chapter left off.

VideoModal.tsx (line 128-136)

 // Ad ended handler
 const handleAdEnded = () => {
   if (selectedChapter === null || !chaptersData) return;
   const chapter = chaptersData.chapters[selectedChapter];
   setPlaybackSequence('video');
   setReturnToTime(chapter.end);
   setIsPlaying(true);
 };

This creates an immersive, chapter-aware viewing experience with smart ad insertions aligned to meaningful content breaks — ideal for showcasing contextually relevant ads at natural pause points.


Conclusion

In this tutorial, we walked through the full flow of Contextual Analysis — from generating and storing video embeddings, to running similarity searches, and finally simulating mid-roll ad insertions using chapter segmentation. By combining Twelve Labs’ multimodal embeddings with Pinecone’s vector filtering, you can deliver smart, content-aware ad experiences. This foundation can be further extended for real-time targeting, A/B testing, or personalized ad delivery at scale.