Media and Entertainment

Media and Entertainment

Media and Entertainment

From Manual Search to Instant Discovery: Bringing TwelveLabs Video Intelligence Capabilities to Frame.io Workflows

Brice Penven, James Le

Brice Penven, James Le

Brice Penven, James Le

How creative teams can find any shot, generate metadata, and automate compliance—all without leaving Frame.io

How creative teams can find any shot, generate metadata, and automate compliance—all without leaving Frame.io

Join our newsletter

Receive the latest advancements, tutorials, and industry insights in video understanding

Search, analyze, and explore your videos with AI.

Nov 20, 2025

Nov 20, 2025

Nov 20, 2025

22 Minutes

22 Minutes

22 Minutes

Copy link to article

Copy link to article

Copy link to article

TLDR

This post demonstrates how TwelveLabs' multimodal video understanding AI integrates directly into Frame.io V4, transforming how creative teams search, organize, and manage video content at scale.

  • Index and search your entire video library using natural language—find specific shots by describing what you're looking for ("wide aerial drone shot of coastline") instead of manually scrubbing through footage or relying on incomplete metadata.

  • Automatically generate rich metadata with Pegasus—populate Frame.io fields with AI-generated descriptions, tags, summaries, themes, and emotional tone that understand narrative context, not just objects on screen.

  • Discover visually similar content with image-to-video search—upload a reference photo and instantly locate matching footage across your entire library, perfect for finding B-roll, alternate takes, or thematic compilations.

  • Automate compliance checks with timestamped violations—flag potential regulatory issues, brand guideline violations, or content policy breaches at exact timestamps, integrated directly into Frame.io's comment system.

  • Seamless workflow integration powered by Frame.io V4's Custom Actions—trigger indexing, metadata generation, semantic search, and compliance checks with a simple right-click, with all results organized in Frame.io's familiar interface.


1 - Overview Introduction

Why TwelveLabs × Frame.io

The media and entertainment industry faces mounting challenges in managing increasingly large video libraries. Post-production teams, broadcasters, and content creators need efficient ways to search, analyze, and derive insights from their video assets. Frame.io, now part of Adobe's Creative Cloud ecosystem, has become the industry standard for video collaboration and review, serving creative teams across film, television, advertising, and digital media production. However, traditional text-based metadata and manual tagging can't keep pace with modern content volumes, creating bottlenecks in creative workflows and making it difficult to unlock the full value of video archives.

TwelveLabs' integration with Frame.io brings advanced multimodal video understanding capabilities directly into the collaborative review workflow. With the introduction of Frame.io V4, two key features made this integration particularly powerful: Custom Actions and customizable metadata fields.

  • Custom Actions allow users to trigger video understanding workflows with a simple right-click, enabling on-demand AI analysis without leaving the platform.

  • Flexible metadata fields enable TwelveLabs to write structured video intelligence data back into Frame.io's native interface, making AI-generated insights immediately accessible alongside traditional review tools.

These V4 capabilities provide the extensibility and flexibility needed to seamlessly embed multimodal video understanding into production workflows, transforming Frame.io from a review platform into an intelligent video content management system.

By combining Frame.io's collaborative features with TwelveLabs' Pegasus and Marengo video foundation models, teams can index video content, generate rich metadata, perform semantic searches across their libraries, find related content, and ensure compliance—all within the familiar Frame.io environment. Marengo 2.7 creates vector embeddings that enable semantic search and pattern recognition across video, audio, and text, while Pegasus 1.2 generates human-readable descriptions, summaries, and structured metadata from video content. Together, these models provide both the "what" and the "why" of video content, empowering creative teams to work faster and smarter.


Who It's For

This integration serves organizations that rely on Frame.io for video-centric workflows and need intelligent content management at scale. Our customers using Frame.io span multiple industries: broadcasting and news organizations managing extensive footage libraries, production studios coordinating complex post-production workflows, marketing agencies repurposing creative assets across campaigns, and enterprise content teams maintaining brand consistency across thousands of videos.

While their specific workflows differ, these organizations share common challenges: searching vast video libraries for specific moments or themes, maintaining consistent metadata across thousands of assets, ensuring brand and regulatory compliance, and discovering relevant content buried in archives. Manual tagging is time-consuming and inconsistent, while traditional keyword search fails to capture the visual and contextual richness of video. The need for intelligent video understanding at scale, integrated seamlessly into existing review and collaboration workflows, has never been greater.

Frame.io V4's Custom Actions and flexible metadata system provide the foundation for embedding TwelveLabs' video understanding capabilities directly where teams already work, eliminating the friction of switching between multiple tools and platforms. Whether you're a creative director searching for the perfect B-roll shot, a compliance officer reviewing thousands of hours of content, or a producer mining archives for reusable assets, this integration brings AI-powered video intelligence into your daily workflow.


2 - Key Capabilities

2.1 - Indexing Assets
What It Does & When to Use It

The indexing capability is the foundation of all TwelveLabs functionality within Frame.io. Users can index individual assets or entire folders directly from the Frame.io interface by right-clicking and selecting the "Index Asset(s)" custom action. This process creates a multimodal understanding of your video content, analyzing visual, audio, and textual information to build a searchable representation that powers all downstream AI capabilities.

Indexing is the critical first step that transforms raw video files into intelligent, queryable assets. Once indexed, videos become fully searchable through semantic queries, can generate automated metadata, enable compliance checks, and support image-based content discovery.


Key Features
  • Manual or automated triggering: Users can manually trigger indexing on-demand via Custom Actions directly from the Frame.io UI, or configure automated workflows that index assets when they're moved or copied to specific Frame.io projects or folders. This flexibility supports both ad-hoc indexing needs and systematic processing of incoming content.

  • Asset and folder-level processing: Trigger indexing on a single video file or select an entire folder to batch-index all contained assets in a single action, ideal for project archives or bulk content libraries. Batch processing dramatically reduces the manual effort required to prepare large video collections for semantic search and analysis.

  • Status tracking: Indexing status is visible within Frame.io custom metadata fields, showing whether assets are indexed, currently processing, or encountered errors. This transparency ensures teams know exactly which assets are ready for advanced video intelligence workflows and which may require attention.

  • Seamless integration: No need to download assets or leave the Frame.io environment. Indexing happens in the background while teams continue their collaborative review work, with TwelveLabs processing video content through its Marengo and Pegasus foundation models to generate multimodal embeddings and structured representations. The entire process is transparent to end users, requiring only a simple right-click action to initiate.

Once indexed, assets become searchable and analyzable through all other TwelveLabs capabilities, creating a foundation for advanced video intelligence workflows that transform how creative teams discover, understand, and leverage their video content.


Demo Video

Watch how indexing works in Frame.io: https://www.loom.com/share/5140914df15f4b9b83b30407daacf0c1


2.2 - Metadata Generation
What It Does & When to Use It

Metadata generation transforms unstructured video content into organized, searchable information. Users can manually trigger this capability via Custom Actions, or configure workflows to automatically generate metadata once assets complete indexing. This capability uses TwelveLabs' Pegasus video language model to analyze indexed content and generate structured metadata that's written directly back to Frame.io asset fields.

Pegasus is a state-of-the-art multimodal AI model designed for advanced video-language understanding and interaction. Unlike traditional tagging approaches that rely on manual input or simple object detection, Pegasus comprehends objects, people, actions, events, and their relationships within video context to generate rich, semantically meaningful metadata. This capability is particularly valuable for organizations managing thousands of assets where manual metadata entry is impractical.


Examples of Fields Populated

The screenshot demonstrates the range of metadata fields that can be automatically populated:

  • Description by TwelveLabs: Rich, detailed scene descriptions capturing visual elements, setting, and atmosphere

  • Emotions by TwelveLabs: Mood and emotional tone detection (e.g., "calm, peaceful, serene")

  • Genre by TwelveLabs: Content categorization (e.g., "Nature documentary")

  • Summary by Twelve Labs: Concise overviews capturing key topics and content highlights

  • Tags by TwelveLabs: Extracted keywords covering entities, locations, objects, and themes (e.g., "aerial view, island, ocean, cliffs, natural pool, waves, rocky formations, scenic landscape")

  • Theme by TwelveLabs: High-level content themes (e.g., "Natural landscape")

  • Status: Custom workflow fields (e.g., "Approved" with visual indicators)

  • Rating: Quality or compliance ratings using visual star systems

Frame.io V4's flexible metadata system allows you to define fields that match your exact workflow needs, whether following broadcast standards, marketing guidelines, or custom organizational taxonomies.


Key Features
  • Custom metadata fields: Configure which Frame.io metadata fields to populate based on your organization's requirements. Frame.io v4's account-level custom metadata feature allows you to define fields once and apply them across every project and workspace, ensuring consistency in how assets are tagged, searched, and managed at enterprise scale.

  • Intelligent summarization: Generate concise summaries capturing key topics, themes, and content highlights. Pegasus understands narrative structure and can produce descriptions that reflect the semantic content of video, not just surface-level object detection.

  • Automatic tagging: Extract relevant keywords, entities, people, locations, and objects detected in the video. These tags are generated with contextual awareness, understanding how elements relate to each other within the video's narrative.

  • Scene-aware descriptions: Create descriptions that understand narrative structure and content context, going beyond simple object detection. Pegasus generates natural language descriptions that capture what's happening, who's doing what, when, and where—creating a semantic narrative of the video.

  • Configurable prompts: Tailor the metadata generation to your specific needs, from broadcast standards to marketing guidelines. The Pegasus model supports structured output formats, allowing you to define JSON schemas that match your organization's metadata taxonomy and ensure predictable, parseable results.

The generated metadata enhances searchability within Frame.io's native search, improves team collaboration by providing context at a glance, and ensures consistent tagging across large content libraries. Teams can quickly assess content without watching entire videos, accelerate shot selection and content discovery, and maintain standardized metadata practices across distributed creative operations.


Demo Video

Watch how automated metadata generation works in Frame.io: https://www.loom.com/share/58e6dee73045429288b62b310c949959


2.3 - Semantic Search
Shot/Segment Discovery Workflow

Semantic search enables teams to find specific moments within videos using natural language queries, going far beyond filename or tag-based search. Instead of remembering exact keywords or manually tagging every moment, users describe what they're looking for in conversational language, and the system returns relevant video segments with precise timestamps. This capability leverages TwelveLabs' Marengo embedding model to understand the semantic meaning behind queries and match them to corresponding video content.

Traditional keyword search requires exact matches and fails to understand context, forcing users to guess at specific terms that might appear in metadata or transcripts. Semantic search understands the meaning behind queries, enabling content discovery based on visual cues, actions, context, and concepts—even if those exact words were never explicitly tagged. This dramatically reduces the time teams spend manually scrubbing through footage to find specific shots or moments.


Key Features
  • Natural language queries: Search using conversational phrases like "people shaking hands in an office" or "wide aerial drone shot of rocky coastline meeting the ocean". Users can describe scenes based on visual elements, actions, emotions, settings, or any combination of multimodal attributes without needing to know technical terminology.

  • Multimodal understanding: Searches across visual content, spoken dialogue, on-screen text, and audio elements simultaneously. Marengo creates embeddings that place video, audio, and text in a shared vector space, allowing the system to understand connections across modalities. A query like "when the person in the red shirt enters the restaurant" can successfully retrieve the exact moment based on visual cues (red shirt), actions (enters), and context (restaurant setting)—even if those specific words never appeared in any metadata.

  • Timestamped results: Returns specific moments within videos with frame-accurate timestamps, not just entire files. This moment-level precision means users jump directly to the relevant scene rather than watching entire videos or manually seeking through timelines. Results include timestamp references (e.g., "0.00s—6.50s," "6.50s—12.75s") that correspond to exact segments where query matches occur.

  • Cross-project search: Query across multiple assets or entire folder structures to find content regardless of where it's stored. The integration displays results through Frame.io's folder structure, organizing matches for immediate review. Users can search an entire library spanning multiple projects and receive consolidated results showing every relevant moment across all indexed videos.

The semantic search capability is displayed through Frame.io's folder structure, with results organized by asset and timestamp for immediate review. Users can click through to the exact moment in the video where their query match occurs, with comments and annotations automatically generated showing the matching segments. This workflow transforms content discovery from a time-consuming manual process into an instant, AI-powered operation.

Note: The subfolder organization and naming conventions are fully configurable to match your team's preferences, allowing you to structure search results in ways that align with your existing project hierarchies and workflows.


Demo Video

Watch how semantic search works in Frame.io: https://www.loom.com/share/a92b4b9787094977b20cd566c9ed0894


2.4 - Search Related Content
Creative Reuse and Shot Matching

The search related content capability helps teams discover similar or thematically connected assets across their video library. This is particularly valuable for finding complementary B-roll, locating alternate takes, or building content collections around specific themes. Instead of manually browsing through hundreds of clips, creative teams can use visual references to instantly locate relevant footage.

This capability extends beyond simple text-based search by allowing users to provide an image as their query, then finding video segments that match the visual composition, subject matter, or semantic context. The screenshot demonstrates how this works in practice—after providing a reference image, the system returns ranked video matches with similarity scores and timestamp ranges.


Key Features
  • Image-to-video search: Find videos with similar compositions, settings, objects, or subject matter by using a still image as your query. This makes it easy to find video moments that match a reference frame or photo, whether sourced from existing footage, client-provided mockups, or external inspirational images. For example, if a client provides a reference photo showing a person in an outdoor athletic pose, you can upload that image and instantly retrieve all indexed video segments featuring similar compositions, settings, and activities.

  • Semantic understanding: Leveraging the same multimodal search capabilities as text-based semantic search, the system understands the meaning and context of visual elements. Unlike simple pixel-matching or tag-based approaches, TwelveLabs' Marengo embedding model captures what's actually happening in the scene—objects, actions, settings, and their relationships. For example, searching with an image of a tree will find videos featuring contextually similar trees, focusing on the overall semantic content rather than just detecting the presence of tree-shaped pixels.

  • Ranked results: Related content is presented in order of relevance with similarity scores, organized in a subfolder created at the location where you triggered the action. Each result includes timestamp ranges showing exactly where matching content appears, along with a confidence ranking (e.g., "Rank 9," "Rank 8") that helps prioritize review of the most relevant matches. Results include annotations that describe the match quality, such as "Content similar to: pexels-chevanon-317157els-chevanon-317157 | 5.25s—10.50s Rank 9 | 'you'" .

This capability excels in scenarios like finding alternative footage when a client requests changesbuilding thematic reels or compilation videosdiscovering archived content relevant to current projects, and identifying duplicate or near-duplicate content across libraries. Creative teams can quickly build shot lists by providing visual references, locate matching coverage for editorial continuity, or discover forgotten assets that match current production needs.

The related content feature leverages TwelveLabs' Marengo embedding models to understand video content at a deep semantic level, going beyond simple tag matching to find truly relevant connections between assets. Marengo creates embeddings in a unified vector space where images and video segments are directly comparable, enabling accurate visual similarity matching across modalities. This technology captures temporal coherence and multimodal relationships that traditional computer vision approaches miss, ensuring that recommended content is contextually appropriate, not just visually similar at the pixel level.


Demo Video

Watch how image-to-video search works in Frame.io: https://www.loom.com/share/86b9ae43199a4b7baa221732df5e5a61


2.5 - Compliance Actions
Approve/Flag Workflow

Compliance checking automates the review process for content that must meet specific regulatory, brand, or quality standards. This capability analyzes video content against defined compliance criteria and flags potential issues directly on the Frame.io timeline as comments with precise timestamps. Rather than manually reviewing every frame for compliance violations, teams can leverage AI-powered analysis that identifies issues instantly and documents them in the collaborative environment where creative teams already work.

The screenshot below demonstrates this workflow in action—compliance checks are automatically posted as Frame.io comments at the exact timestamps where potential violations occur. Each comment includes the timestamp (e.g., "00:00", "00:02", "00:19", "00:23", "00:24"), the compliance status ("Compliance REJECTED"), and a detailed description of the flagged issue (e.g., "Violence: Soldier firing rocket launcher," "Violence: Explosion in field," "Violence: Soldiers adjusting and aiming rifles").


Key Features
  • Custom compliance rules: Define specific criteria based on your organization's standards, including brand guidelines, regulatory requirements, and content policies. Whether you need to flag violence for broadcast standards, detect missing disclaimers for regulatory compliance, identify incorrect logos for brand consistency, or catch off-brand messaging before publication, the system can be configured to your exact specifications.

  • Automated detection: Identify potential compliance issues through AI-powered video analysis that understands visual content, spoken dialogue, on-screen text, and contextual meaning. TwelveLabs' Pegasus model can analyze video content against complex compliance criteria, detecting violations that would require hours of manual review. The system processes prohibited content, missing required elements, brand guideline violations, and regulatory non-compliance automatically.

  • Timeline annotations: Issues are marked as Frame.io comments at the exact timestamp where they occur, integrated seamlessly with existing review workflows. This approach ensures compliance findings appear alongside creative feedback, making it natural for teams to address both simultaneously during review cycles. Comments are timestamped and linked directly to the problematic frames, allowing immediate review and correction.

  • Detailed explanations: Each flagged issue includes context about why it was identified and what compliance rule it violates. The screenshot shows examples like "Compliance REJECTED at 0:00: Violence: Soldier firing rocket launcher" and "Compliance REJECTED at 0:19: Violence: Soldiers adjusting and aiming rifles," providing clear justification for each flag. This transparency helps teams understand the specific violation and make informed decisions about how to address it.

  • Review workflow integration: Compliance checks fit naturally into existing Frame.io review cycles, allowing teams to address issues alongside creative feedback. Creative directors, compliance officers, and legal reviewers can collaborate in a single interface, commenting on violations, requesting changes, and approving resolutions without switching between multiple systems.

  • Configurable severity levels: Classify issues by priority—critical, warning, or informational—to help teams triage effectively. High-severity violations can be escalated immediately, while lower-priority issues can be batched for periodic review. This prioritization ensures that teams focus on the most important compliance risks first, streamlining the approval process.

Compliance actions are particularly valuable for broadcast teams ensuring content meets FCC or regional broadcasting standardsbrand managers verifying assets align with brand guidelines before publicationadvertising teams checking that ads meet platform-specific requirements across YouTube, broadcast TV, and social media, and legal departments identifying content that may require clearance or documentation. By automating the detection process, organizations can review content faster, reduce compliance risk, and maintain consistent standards across all published materials.

The system processes the entire video and creates a comprehensive compliance report embedded directly in Frame.io's comment system, where teams already collaborate and resolve issues. This integration eliminates the need for separate compliance tracking tools or spreadsheets, ensuring all feedback—creative and compliance-related—lives in one centralized location.

Note: This is a demonstration workflow. The specific compliance criteria, comment formatting, and detection thresholds are fully customizable to your organization's needs.


Demo Video

Watch how compliance checking works in Frame.io: https://www.loom.com/share/0e7308a9beac438db49c7855783825e3


3 - Workflow Implementation

The general workflow diagram below shows how a generic architecture applies across all TwelveLabs capabilities. Frame.io UI actions trigger webhook events that flow to the workflow orchestrator, which coordinates API requests to both the TwelveLabs Platform and Frame.io API. Analysis results are returned through the same orchestrator and written back to Frame.io via REST API calls, creating a seamless bidirectional integration.


3.1 - Indexing Workflow

When a user triggers the indexing custom action from the Frame.io interface, Frame.io sends a webhook event to the workflow orchestrator containing asset details and metadata. This same webhook mechanism is used whether indexing is triggered manually via Custom Actions or automatically when assets are moved or copied to designated folders. The orchestrator distinguishes between single asset and folder-level triggers, retrieving the appropriate file list when processing folders.

The indexing workflow diagram above illustrates the complete end-to-end process:

  1. Step 1: Frame.io Indexing Triggered — The workflow begins when a user right-clicks on an asset or folder in Frame.io and selects the "Index Asset(s)" custom action. Frame.io immediately sends a webhook event (manual or automated) to the workflow orchestrator with details about which assets need to be indexed.

  2. Step 2: Webhook → Workflow Orchestrator — The orchestrator receives the webhook payload and parses the asset information. For folder-level indexing, it queries the Frame.io API to retrieve the complete list of video files contained within the folder hierarchy.

  3. Step 3: Get Download URL from Frame.io API — For each video to be indexed, the orchestrator calls the Frame.io API to retrieve a secure, time-limited download URL. This URL provides temporary access to the video file with proper authentication, allowing the orchestrator to download the asset without requiring permanent storage credentials.

  4. Step 4: Upload Video to TwelveLabs API — Using the download URL, the orchestrator retrieves the video file and uploads it directly to the TwelveLabs platform via the Index Video API endpoint. The Marengo model begins processing the content immediately, analyzing visual, audio, and contextual information to create a multimodal understanding of the video.

  5. Step 5: Poll Status — Since video indexing is computationally intensive and may take several minutes depending on video length and complexity, the orchestrator polls the TwelveLabs API to monitor indexing status. This polling continues at regular intervals until the indexing task reaches a terminal state (ready, failed, or error).

  6. Step 6: video_id & status → Frame.io API Update Metadata — Once indexing completes successfully, the orchestrator writes the TwelveLabs video_id and indexing status back to Frame.io custom metadata fields. These metadata fields become visible in the Frame.io UI, confirming that the asset is now indexed and ready for advanced video intelligence operations. If indexing fails, the status field reflects the error state, alerting users to investigate.

This background process requires no user intervention beyond the initial trigger, making it ideal for automated archival pipelines. Once the workflow is configured, organizations can automatically index all incoming content by routing assets through designated Frame.io folders, ensuring that every video becomes immediately searchable and analyzable without manual action.


3.2 - Metadata Generation Workflow

The metadata generation workflow retrieves TwelveLabs video IDs from Frame.io metadata, then calls the Analyze API using prompts stored in an external configuration system. This approach allows non-technical teams to modify prompts without changing workflow code, enabling content managers, compliance officers, and brand teams to customize metadata fields based on evolving organizational needs.

The metadata generation workflow diagram above illustrates the complete process:

  1. Step 1: Frame.io Metadata Generation Triggered — When a user triggers the metadata generation custom action, Frame.io sends a webhook to the workflow orchestrator. This action can be triggered manually after indexing completes, or configured to run automatically as part of an automated workflow.

  2. Step 2: Webhook → Workflow Orchestrator — The orchestrator receives the webhook event and identifies which assets require metadata generation. It then queries the Frame.io API to retrieve the TwelveLabs video_id that was stored during the indexing workflow.

  3. Step 3: Get video_id from Frame.io API (Read Metadata) — The orchestrator calls the Frame.io API to read custom metadata fields containing the video_id. This identifier is essential for subsequent TwelveLabs API calls, as it references the indexed video content stored in the TwelveLabs platform.

  4. Step 4: Load Prompts from Config System — Rather than hardcoding prompts directly into the workflow logic, the orchestrator retrieves prompt templates from an external configuration system. These prompts define what metadata to generate (e.g., "Generate a concise summary," "List all visible objects and people," "Identify the emotional tone") and specify the output format, often using JSON schemas to ensure structured responses. This separation of configuration from code enables business users to iterate on metadata strategies without developer involvement.

  5. Step 5: Analyze Request with video_id and Prompts → TwelveLabs Analyze API — The orchestrator sends an analysis request to the TwelveLabs Analyze API, providing the video_id and the configured prompts. Pegasus processes the indexed video content and generates metadata based on the prompt instructions. For example, a prompt might request a JSON response containing { "title": "...", "description": "...", "tags": [...], "genre": "...", "mood": "..." }. Pegasus comprehends objects, people, actions, events, and their relationships in video, then assigns appropriate classes and metadata according to the prompt's specifications.

  6. Step 6: Generated Text → Workflow Orchestrator — The TwelveLabs API returns the generated metadata as structured text (often JSON). The orchestrator receives this response and prepares it for integration with Frame.io.

  7. Step 7: Parse & Format, Batch Update → Frame.io API (Write Metadata) — The workflow parses the Pegasus-generated responses and formats them to match Frame.io field requirements, including character limits, data types, and field naming conventions. Updates are batched where possible using Frame.io's batch metadata endpoint, reducing API overhead and improving performance when processing multiple assets. Requests are queued and throttled to respect API rate limits, implementing strategies such as progressive rate limiting, request distribution, and adaptive inter-batch cooldowns. Frame.io V4's API uses a "leaky bucket" algorithm for rate limiting, where limits refresh gradually during their allotted time window, requiring the orchestrator to carefully manage request pacing.

Progress is tracked through Frame.io status fields, with the workflow continuing to process remaining assets even if individual updates fail. Error handling ensures that transient API failures (such as rate limit exceeded responses) trigger retries with exponential backoff, while permanent errors are logged for manual review. This resilient design allows large-scale batch processing to complete successfully despite occasional API hiccups.

The entire process transforms raw video content into richly tagged, searchable assets without requiring manual metadata entry, enabling teams to manage thousands of videos with consistent, AI-generated metadata that aligns with organizational standards.


3.3 - Semantic Search Workflow

Users initiate semantic search by entering a natural language query through Frame.io's modal interface. The workflow queries the TwelveLabs index, searching across all indexed videos within the scope to find segments matching the query. Results are returned with precise timestamps indicating where relevant content appears in each video.

The semantic search workflow diagram above illustrates the complete process:

  1. Step 1: Frame.io Semantic Search Triggered — When a user triggers the semantic search custom action, Frame.io presents a modal interface where users enter their natural language query. For example, queries like "wide aerial drone shot of rocky coastline meeting the ocean" or "people shaking hands in an office" are processed as conversational descriptions rather than exact keyword matches.

  2. Step 2: Webhook + User Enters Query → Workflow Orchestrator — Frame.io sends a webhook to the workflow orchestrator containing both the trigger event details and the user's search query. The orchestrator prepares to execute the search across the TwelveLabs index.

  3. Step 3: Search Request → TwelveLabs Search API — The orchestrator submits the natural language query to the TwelveLabs Search API. TwelveLabs' Marengo embedding model converts the text query into a multimodal vector representation, then searches across all indexed video embeddings to find semantically similar segments. The search operates at the video segment level, identifying specific timestamp ranges where the query's semantic meaning matches the video content.

  4. Step 4: Matching Segments + Timestamps → Workflow Orchestrator — The TwelveLabs Search API returns a ranked list of matching video segments, each with precise start and end timestamps (e.g., "0.00s—6.50s," "6.50s—12.75s"). Each result includes the video_id, segment boundaries, and a relevance score indicating how closely the segment matches the query.

  5. Step 5: Get Asset FPS → Frame.io API — Before creating timeline comments, the orchestrator retrieves each matching video's frames-per-second (FPS) metadata from Frame.io. This metadata is essential for converting TwelveLabs' second-based timestamps into Frame.io's frame-accurate timeline positions. Different videos may have different frame rates (24fps, 30fps, 60fps, etc.), so the orchestrator must query this information to ensure comments appear at the exact correct frames.

  6. Step 6: FPS Metadata → Workflow Orchestrator — Frame.io returns the FPS information for each asset. The orchestrator uses this data to calculate frame-accurate positions for timeline comments.

  7. Step 7: Create Subfolder, Copy Assets, Post Comments → Frame.io API — The orchestrator performs several Frame.io API operations to organize and present search results:

    1. Create subfolder: A dedicated subfolder is created at the trigger location to organize search results. This folder is typically named with a timestamp and search query description (e.g., "TL_Search_wide_aerial_drone_shot_2025-11-07").

    2. Copy assets: For each video containing matches, the workflow copies the asset into this subfolder. This consolidates all relevant footage in one location, making it easy for users to review search results without navigating through the original project hierarchy.

    3. Post comments: Timeline comments are created at the exact timestamps where relevant moments occur. Each comment includes the timestamp range, relevance ranking, and the original search query, allowing users to understand why each segment was matched (e.g., "Content similar to: wide aerial drone shot of rocky coastlineky coastline meeting the ocean | 5.25s—10.50s Rank 9") . Users can click these comments to jump directly to the specific frames that match their query .

  8. Step 8: Return Form ← Frame.io — Once the subfolder is created, assets are copied, and comments are posted, the orchestrator sends a completion notification back to Frame.io. The user receives confirmation that their search is complete, along with a link to the newly created results subfolder.

This workflow allows users to review all search results in one place and click directly to the specific moments that match their query, dramatically reducing the time spent manually searching through footage. Instead of scrubbing through hours of video or relying on incomplete metadata, creative teams can instantly locate relevant shots using natural language descriptions. The subfolder organization ensures that search results remain accessible for future reference, while timestamped comments provide immediate navigation to the exact frames of interest.


3.4 - Search Related Content Workflow

When triggered from a still image, the workflow performs image-to-video search using TwelveLabs' semantic understanding capabilities. The image is sent to the Search API, which returns videos containing contextually similar visual elements based on the meaning and context of the image, not just pixel-level similarity.

The search related content workflow diagram above illustrates the complete process:

  1. Step 1: Frame.io Related Content Triggered — When a user triggers the related content custom action from a still image in Frame.io, a webhook is sent to the workflow orchestrator. This action can be initiated from any image asset within Frame.io, whether it's a reference photo, a production still, or a frame extracted from existing footage.

  2. Step 2: Webhook → Workflow Orchestrator — The orchestrator receives the webhook event containing the source image's asset ID and metadata. This triggers the image-to-video search workflow.

  3. Step 3: Get Image URL → Frame.io API — The orchestrator calls the Frame.io API to retrieve a download URL for the source image. This URL provides temporary access to the image file with proper authentication.

  4. Step 4: Image Download URL → Workflow Orchestrator — Frame.io returns the image download URL, which the orchestrator uses to retrieve the image for analysis. The image is then prepared for submission to TwelveLabs.

  5. Step 5: Image-to-Video Search → TwelveLabs Search API — The orchestrator sends the image to the TwelveLabs Search API. TwelveLabs' Marengo embedding model generates a vector representation of the image, capturing its visual features, composition, objects, setting, and semantic context. This embedding is then compared against all indexed video embeddings to find segments with high visual and semantic similarity. Unlike simple pixel matching or color histogram comparison, this approach understands what's actually in the image—for example, recognizing "a person in an outdoor athletic pose" rather than just matching pixel patterns.

  6. Step 6: Similar Videos + Scores → Workflow Orchestrator — The TwelveLabs Search API returns a ranked list of matching video segments along with similarity scores. Each result includes the video_id, timestamp range, and a confidence score indicating how closely the video segment matches the source image. Results are filtered by similarity threshold to ensure only high-quality matches are included. Organizations can configure this threshold based on their use case—tighter thresholds for precise shot matching, looser thresholds for broader thematic discovery.

  7. Step 7: Filter & Rank, Create Subfolder, Copy Assets, Add Comments → Frame.io API — The orchestrator performs several Frame.io API operations to organize and present related content results:

    1. Filter & Rank: Results are filtered by the configured similarity threshold and ranked by relevance score. Only videos exceeding the minimum similarity threshold are included in the results, ensuring users receive high-quality matches.

    2. Create subfolder: A dedicated subfolder is created at the trigger location to organize related content results. This folder is typically named with descriptive information that includes the source image name and timestamp (e.g., "TL_ImageSearch_pexels-chevanon-317157_2025-11-07").

    3. Copy assets: Each related video asset is copied into the results folder. This consolidates all visually similar footage in one location, making it easy for users to review matches without navigating through the original project hierarchy.

    4. Add comments: Timeline comments are created explaining the similarity match and relevance scores. Each comment includes the timestamp range where the visual similarity occurs, the similarity rank (e.g., "Rank 9," "Rank 8"), and a reference to the source image (e.g., "Content similar to: pexels-chevanon-317157vanon-317157 | 5.25s—10.50s Rank 9") . These annotations help users understand why each video was matched and prioritize which results to review first .

Results are filtered by similarity threshold and organized into a subfolder with descriptive naming that includes the source image name and timestamp. Each related asset is copied to the results folder with comments explaining the similarity match and relevance scores. This workflow enables creative teams to instantly discover visually similar footage, find alternative takes, build thematic compilations, or locate archived content that matches current production needs—all without manually browsing through thousands of assets.


3.5 - Compliance Actions Workflow

The compliance workflow sends the video to Pegasus with a detailed prompt specifying evaluation categories (violence, language, sexual content, substance use, disturbing content, discrimination) and requesting timestamped violations. This structured approach enables automated detection of compliance issues that would otherwise require hours of manual review.

The compliance actions workflow diagram above illustrates the complete process:

  1. Step 1: Frame.io Compliance Check Triggered — When a user triggers the compliance check custom action, Frame.io sends a webhook to the workflow orchestrator. This action can be triggered manually for on-demand compliance review or configured to run automatically as part of pre-publication workflows.

  2. Step 2: Webhook → Workflow Orchestrator — The orchestrator receives the webhook event containing the asset details and prepares to execute the compliance analysis. It identifies which video requires compliance checking and initiates the workflow.

  3. Step 3: Get video_id → Frame.io API (Read Metadata) — The orchestrator queries the Frame.io API to retrieve the TwelveLabs video_id that was stored during the indexing workflow. This identifier is required to reference the indexed video content in the TwelveLabs platform.

  4. Step 4: video_id → Workflow Orchestrator — Frame.io returns the video_id, which the orchestrator uses to construct the compliance analysis request. The orchestrator also loads the compliance prompt from the configuration system, defining specific evaluation criteria and output format.

  5. Step 5: Analyze with Compliance Prompt → TwelveLabs Analyze API — The orchestrator sends an analysis request to the TwelveLabs Analyze API, providing the video_id and a detailed compliance prompt. The prompt specifies evaluation categories such as violence, explicit language, sexual content, substance use, disturbing imagery, and discrimination. It also requests structured output including an overall compliance status (APPROVED/REJECTED/NEEDS_REVIEW) and timestamped violations for each detected issue. Pegasus processes the video content, analyzing visual elements, spoken dialogue, on-screen text, and contextual meaning to identify compliance violations. For example, it can detect violent actions ("soldier firing rocket launcher"), explicit language in audio, or discriminatory imagery, providing timestamps for each violation.

  6. Step 6: Status + Violations + Timestamps → Workflow Orchestrator — The TwelveLabs Analyze API returns the compliance analysis results. The response includes the overall compliance status, a list of individual violations, and precise timestamps indicating where each violation occurs in the video. For example: { "status": "REJECTED", "violations": [{ "timestamp": "0:00", "category": "violence", "description": "Soldier firing rocket launcher" }, { "timestamp": "0:02", "category": "violence", "description": "Explosion in field" }] }.

  7. Step 7: Get Asset FPS → Frame.io API — Before creating timeline comments, the orchestrator retrieves the video's frames-per-second (FPS) metadata from Frame.io. This information is essential for converting second-based timestamps from TwelveLabs into frame-accurate positions in Frame.io's timeline.

  8. Step 8: FPS Metadata → Workflow Orchestrator — Frame.io returns the FPS metadata, which the orchestrator uses to calculate exact frame positions for each violation comment.

  9. Step 9: Update Status Field, Create Timeline Comments → Frame.io API — The orchestrator performs two Frame.io API operations to document the compliance results:

    1. Update Status Field: The overall compliance status (APPROVED/REJECTED/NEEDS_REVIEW) is written to a Frame.io dropdown metadata field. This provides immediate visual feedback on the asset's compliance status, allowing teams to filter and prioritize review workflows.

    2. Create Timeline Comments: For each detected violation, the orchestrator creates a Frame.io timeline comment at the exact frame position where the issue occurs. Each comment includes the timestamp, violation category, and a detailed description (e.g., "Compliance REJECTED at 0:00: Violence: Soldier firing rocket launcher"). This allows reviewers to address each compliance issue individually within Frame.io's existing review interface, alongside creative feedback from the rest of the team.

The workflow then processes the response by extracting the overall compliance status (APPROVED/REJECTED/NEEDS_REVIEW), updating a Frame.io dropdown field with the appropriate status, parsing individual violations with their timestamps, and creating Frame.io timeline comments at exact frame positions for each violation. This approach allows reviewers to address each compliance issue individually within Frame.io's existing review interface, alongside creative feedback from the rest of the team.

By automating the detection process and integrating findings directly into Frame.io's collaborative review workflow, organizations can review content faster, reduce compliance risk, maintain consistent standards across all published materials, and ensure that compliance checks don't create workflow bottlenecks. Compliance officers can focus their attention on addressing flagged violations rather than manually screening hours of footage, while creative teams receive compliance feedback in the same interface where they handle all other review comments.


4 - Bringing It All Together: Transform Your Video Workflows Today

The TwelveLabs × Frame.io integration demonstrates how advanced video AI can seamlessly embed into production workflows that creative teams already trust. By combining Frame.io's industry-leading collaborative review platform with TwelveLabs' Pegasus and Marengo video foundation models, Media & Entertainment organizations gain powerful capabilities that were previously impossible or prohibitively expensive.


Real-World Impact Across M&E Workflows

Throughout this post, we've seen how this integration delivers measurable value across every stage of the content lifecycle:

  • Production & Post-Production Teams can instantly locate specific shots using natural language search ("wide aerial drone shot of rocky coastline"), eliminating hours of manual footage review. Semantic search with timestamped results means editors jump directly to the exact frames they need, accelerating assembly and reducing time-to-delivery.

  • Creative Libraries & Asset Management benefit from automated metadata generation that populates Frame.io fields with rich descriptions, tags, themes, and summaries—all generated by AI that understands narrative context, not just object detection. Image-to-video search enables teams to find visually similar footage for B-roll, alternate takes, or thematic compilations, unlocking the value of archived content that would otherwise remain undiscovered.

  • Compliance & Brand Teams can automate regulatory review processes, with AI-powered analysis flagging potential violations at exact timestamps and writing them directly into Frame.io's comment system. This transforms compliance from a bottleneck into a parallel workflow, where reviewers address issues alongside creative feedback without switching tools.

  • Broadcast Networks & Advertising Agencies managing thousands of assets across multiple campaigns gain consistent, scalable metadata practices and compliance checks that ensure brand guidelines and regulatory standards are met before publication. The integration fits naturally into existing Frame.io review cycles, preserving established workflows while adding AI intelligence.


Key Technical Innovations

The integration leverages Frame.io V4's Custom Actions and flexible metadata system to create a seamless user experience. Webhook-driven workflows orchestrate complex multi-step processes—indexing, metadata generation, semantic search, and compliance checking—without requiring users to leave the Frame.io interface. Configurable prompts, batch processing, rate limiting, and error handling ensure the system operates reliably at enterprise scale.

TwelveLabs' multimodal video understanding goes beyond traditional computer vision by analyzing visual content, spoken dialogue, on-screen text, and contextual meaning simultaneously. This enables semantic search that understands what users are asking for, not just keyword matches, and generates metadata that captures the narrative essence of video content.


Getting Started

Ready to bring multimodal video intelligence into your Frame.io workflows? The TwelveLabs × Frame.io integration is designed for organizations that need intelligent content management at scale:

Prerequisites:

  • Frame.io V4 account with enterprise features

  • Access to Frame.io Custom Actions and custom metadata fields

  • TwelveLabs API access for indexing and analysis

Enablement Path:

Organizations interested in deploying this integration should contact TwelveLabs to discuss implementation, configuration, and training. Our team will work with you to customize workflows, configure compliance prompts, and integrate the system with your existing production infrastructure. Reach out to brice@twelvelabs.io for hands-on support.

Resources:

The future of video content management isn't about replacing human creativity—it's about eliminating the tedious manual work that prevents creative teams from focusing on what they do best. With TwelveLabs and Frame.io working together, your organization can search vast libraries instantly, maintain consistent metadata at scale, ensure compliance automatically, and discover connections between assets that would otherwise remain hidden.

TLDR

This post demonstrates how TwelveLabs' multimodal video understanding AI integrates directly into Frame.io V4, transforming how creative teams search, organize, and manage video content at scale.

  • Index and search your entire video library using natural language—find specific shots by describing what you're looking for ("wide aerial drone shot of coastline") instead of manually scrubbing through footage or relying on incomplete metadata.

  • Automatically generate rich metadata with Pegasus—populate Frame.io fields with AI-generated descriptions, tags, summaries, themes, and emotional tone that understand narrative context, not just objects on screen.

  • Discover visually similar content with image-to-video search—upload a reference photo and instantly locate matching footage across your entire library, perfect for finding B-roll, alternate takes, or thematic compilations.

  • Automate compliance checks with timestamped violations—flag potential regulatory issues, brand guideline violations, or content policy breaches at exact timestamps, integrated directly into Frame.io's comment system.

  • Seamless workflow integration powered by Frame.io V4's Custom Actions—trigger indexing, metadata generation, semantic search, and compliance checks with a simple right-click, with all results organized in Frame.io's familiar interface.


1 - Overview Introduction

Why TwelveLabs × Frame.io

The media and entertainment industry faces mounting challenges in managing increasingly large video libraries. Post-production teams, broadcasters, and content creators need efficient ways to search, analyze, and derive insights from their video assets. Frame.io, now part of Adobe's Creative Cloud ecosystem, has become the industry standard for video collaboration and review, serving creative teams across film, television, advertising, and digital media production. However, traditional text-based metadata and manual tagging can't keep pace with modern content volumes, creating bottlenecks in creative workflows and making it difficult to unlock the full value of video archives.

TwelveLabs' integration with Frame.io brings advanced multimodal video understanding capabilities directly into the collaborative review workflow. With the introduction of Frame.io V4, two key features made this integration particularly powerful: Custom Actions and customizable metadata fields.

  • Custom Actions allow users to trigger video understanding workflows with a simple right-click, enabling on-demand AI analysis without leaving the platform.

  • Flexible metadata fields enable TwelveLabs to write structured video intelligence data back into Frame.io's native interface, making AI-generated insights immediately accessible alongside traditional review tools.

These V4 capabilities provide the extensibility and flexibility needed to seamlessly embed multimodal video understanding into production workflows, transforming Frame.io from a review platform into an intelligent video content management system.

By combining Frame.io's collaborative features with TwelveLabs' Pegasus and Marengo video foundation models, teams can index video content, generate rich metadata, perform semantic searches across their libraries, find related content, and ensure compliance—all within the familiar Frame.io environment. Marengo 2.7 creates vector embeddings that enable semantic search and pattern recognition across video, audio, and text, while Pegasus 1.2 generates human-readable descriptions, summaries, and structured metadata from video content. Together, these models provide both the "what" and the "why" of video content, empowering creative teams to work faster and smarter.


Who It's For

This integration serves organizations that rely on Frame.io for video-centric workflows and need intelligent content management at scale. Our customers using Frame.io span multiple industries: broadcasting and news organizations managing extensive footage libraries, production studios coordinating complex post-production workflows, marketing agencies repurposing creative assets across campaigns, and enterprise content teams maintaining brand consistency across thousands of videos.

While their specific workflows differ, these organizations share common challenges: searching vast video libraries for specific moments or themes, maintaining consistent metadata across thousands of assets, ensuring brand and regulatory compliance, and discovering relevant content buried in archives. Manual tagging is time-consuming and inconsistent, while traditional keyword search fails to capture the visual and contextual richness of video. The need for intelligent video understanding at scale, integrated seamlessly into existing review and collaboration workflows, has never been greater.

Frame.io V4's Custom Actions and flexible metadata system provide the foundation for embedding TwelveLabs' video understanding capabilities directly where teams already work, eliminating the friction of switching between multiple tools and platforms. Whether you're a creative director searching for the perfect B-roll shot, a compliance officer reviewing thousands of hours of content, or a producer mining archives for reusable assets, this integration brings AI-powered video intelligence into your daily workflow.


2 - Key Capabilities

2.1 - Indexing Assets
What It Does & When to Use It

The indexing capability is the foundation of all TwelveLabs functionality within Frame.io. Users can index individual assets or entire folders directly from the Frame.io interface by right-clicking and selecting the "Index Asset(s)" custom action. This process creates a multimodal understanding of your video content, analyzing visual, audio, and textual information to build a searchable representation that powers all downstream AI capabilities.

Indexing is the critical first step that transforms raw video files into intelligent, queryable assets. Once indexed, videos become fully searchable through semantic queries, can generate automated metadata, enable compliance checks, and support image-based content discovery.


Key Features
  • Manual or automated triggering: Users can manually trigger indexing on-demand via Custom Actions directly from the Frame.io UI, or configure automated workflows that index assets when they're moved or copied to specific Frame.io projects or folders. This flexibility supports both ad-hoc indexing needs and systematic processing of incoming content.

  • Asset and folder-level processing: Trigger indexing on a single video file or select an entire folder to batch-index all contained assets in a single action, ideal for project archives or bulk content libraries. Batch processing dramatically reduces the manual effort required to prepare large video collections for semantic search and analysis.

  • Status tracking: Indexing status is visible within Frame.io custom metadata fields, showing whether assets are indexed, currently processing, or encountered errors. This transparency ensures teams know exactly which assets are ready for advanced video intelligence workflows and which may require attention.

  • Seamless integration: No need to download assets or leave the Frame.io environment. Indexing happens in the background while teams continue their collaborative review work, with TwelveLabs processing video content through its Marengo and Pegasus foundation models to generate multimodal embeddings and structured representations. The entire process is transparent to end users, requiring only a simple right-click action to initiate.

Once indexed, assets become searchable and analyzable through all other TwelveLabs capabilities, creating a foundation for advanced video intelligence workflows that transform how creative teams discover, understand, and leverage their video content.


Demo Video

Watch how indexing works in Frame.io: https://www.loom.com/share/5140914df15f4b9b83b30407daacf0c1


2.2 - Metadata Generation
What It Does & When to Use It

Metadata generation transforms unstructured video content into organized, searchable information. Users can manually trigger this capability via Custom Actions, or configure workflows to automatically generate metadata once assets complete indexing. This capability uses TwelveLabs' Pegasus video language model to analyze indexed content and generate structured metadata that's written directly back to Frame.io asset fields.

Pegasus is a state-of-the-art multimodal AI model designed for advanced video-language understanding and interaction. Unlike traditional tagging approaches that rely on manual input or simple object detection, Pegasus comprehends objects, people, actions, events, and their relationships within video context to generate rich, semantically meaningful metadata. This capability is particularly valuable for organizations managing thousands of assets where manual metadata entry is impractical.


Examples of Fields Populated

The screenshot demonstrates the range of metadata fields that can be automatically populated:

  • Description by TwelveLabs: Rich, detailed scene descriptions capturing visual elements, setting, and atmosphere

  • Emotions by TwelveLabs: Mood and emotional tone detection (e.g., "calm, peaceful, serene")

  • Genre by TwelveLabs: Content categorization (e.g., "Nature documentary")

  • Summary by Twelve Labs: Concise overviews capturing key topics and content highlights

  • Tags by TwelveLabs: Extracted keywords covering entities, locations, objects, and themes (e.g., "aerial view, island, ocean, cliffs, natural pool, waves, rocky formations, scenic landscape")

  • Theme by TwelveLabs: High-level content themes (e.g., "Natural landscape")

  • Status: Custom workflow fields (e.g., "Approved" with visual indicators)

  • Rating: Quality or compliance ratings using visual star systems

Frame.io V4's flexible metadata system allows you to define fields that match your exact workflow needs, whether following broadcast standards, marketing guidelines, or custom organizational taxonomies.


Key Features
  • Custom metadata fields: Configure which Frame.io metadata fields to populate based on your organization's requirements. Frame.io v4's account-level custom metadata feature allows you to define fields once and apply them across every project and workspace, ensuring consistency in how assets are tagged, searched, and managed at enterprise scale.

  • Intelligent summarization: Generate concise summaries capturing key topics, themes, and content highlights. Pegasus understands narrative structure and can produce descriptions that reflect the semantic content of video, not just surface-level object detection.

  • Automatic tagging: Extract relevant keywords, entities, people, locations, and objects detected in the video. These tags are generated with contextual awareness, understanding how elements relate to each other within the video's narrative.

  • Scene-aware descriptions: Create descriptions that understand narrative structure and content context, going beyond simple object detection. Pegasus generates natural language descriptions that capture what's happening, who's doing what, when, and where—creating a semantic narrative of the video.

  • Configurable prompts: Tailor the metadata generation to your specific needs, from broadcast standards to marketing guidelines. The Pegasus model supports structured output formats, allowing you to define JSON schemas that match your organization's metadata taxonomy and ensure predictable, parseable results.

The generated metadata enhances searchability within Frame.io's native search, improves team collaboration by providing context at a glance, and ensures consistent tagging across large content libraries. Teams can quickly assess content without watching entire videos, accelerate shot selection and content discovery, and maintain standardized metadata practices across distributed creative operations.


Demo Video

Watch how automated metadata generation works in Frame.io: https://www.loom.com/share/58e6dee73045429288b62b310c949959


2.3 - Semantic Search
Shot/Segment Discovery Workflow

Semantic search enables teams to find specific moments within videos using natural language queries, going far beyond filename or tag-based search. Instead of remembering exact keywords or manually tagging every moment, users describe what they're looking for in conversational language, and the system returns relevant video segments with precise timestamps. This capability leverages TwelveLabs' Marengo embedding model to understand the semantic meaning behind queries and match them to corresponding video content.

Traditional keyword search requires exact matches and fails to understand context, forcing users to guess at specific terms that might appear in metadata or transcripts. Semantic search understands the meaning behind queries, enabling content discovery based on visual cues, actions, context, and concepts—even if those exact words were never explicitly tagged. This dramatically reduces the time teams spend manually scrubbing through footage to find specific shots or moments.


Key Features
  • Natural language queries: Search using conversational phrases like "people shaking hands in an office" or "wide aerial drone shot of rocky coastline meeting the ocean". Users can describe scenes based on visual elements, actions, emotions, settings, or any combination of multimodal attributes without needing to know technical terminology.

  • Multimodal understanding: Searches across visual content, spoken dialogue, on-screen text, and audio elements simultaneously. Marengo creates embeddings that place video, audio, and text in a shared vector space, allowing the system to understand connections across modalities. A query like "when the person in the red shirt enters the restaurant" can successfully retrieve the exact moment based on visual cues (red shirt), actions (enters), and context (restaurant setting)—even if those specific words never appeared in any metadata.

  • Timestamped results: Returns specific moments within videos with frame-accurate timestamps, not just entire files. This moment-level precision means users jump directly to the relevant scene rather than watching entire videos or manually seeking through timelines. Results include timestamp references (e.g., "0.00s—6.50s," "6.50s—12.75s") that correspond to exact segments where query matches occur.

  • Cross-project search: Query across multiple assets or entire folder structures to find content regardless of where it's stored. The integration displays results through Frame.io's folder structure, organizing matches for immediate review. Users can search an entire library spanning multiple projects and receive consolidated results showing every relevant moment across all indexed videos.

The semantic search capability is displayed through Frame.io's folder structure, with results organized by asset and timestamp for immediate review. Users can click through to the exact moment in the video where their query match occurs, with comments and annotations automatically generated showing the matching segments. This workflow transforms content discovery from a time-consuming manual process into an instant, AI-powered operation.

Note: The subfolder organization and naming conventions are fully configurable to match your team's preferences, allowing you to structure search results in ways that align with your existing project hierarchies and workflows.


Demo Video

Watch how semantic search works in Frame.io: https://www.loom.com/share/a92b4b9787094977b20cd566c9ed0894


2.4 - Search Related Content
Creative Reuse and Shot Matching

The search related content capability helps teams discover similar or thematically connected assets across their video library. This is particularly valuable for finding complementary B-roll, locating alternate takes, or building content collections around specific themes. Instead of manually browsing through hundreds of clips, creative teams can use visual references to instantly locate relevant footage.

This capability extends beyond simple text-based search by allowing users to provide an image as their query, then finding video segments that match the visual composition, subject matter, or semantic context. The screenshot demonstrates how this works in practice—after providing a reference image, the system returns ranked video matches with similarity scores and timestamp ranges.


Key Features
  • Image-to-video search: Find videos with similar compositions, settings, objects, or subject matter by using a still image as your query. This makes it easy to find video moments that match a reference frame or photo, whether sourced from existing footage, client-provided mockups, or external inspirational images. For example, if a client provides a reference photo showing a person in an outdoor athletic pose, you can upload that image and instantly retrieve all indexed video segments featuring similar compositions, settings, and activities.

  • Semantic understanding: Leveraging the same multimodal search capabilities as text-based semantic search, the system understands the meaning and context of visual elements. Unlike simple pixel-matching or tag-based approaches, TwelveLabs' Marengo embedding model captures what's actually happening in the scene—objects, actions, settings, and their relationships. For example, searching with an image of a tree will find videos featuring contextually similar trees, focusing on the overall semantic content rather than just detecting the presence of tree-shaped pixels.

  • Ranked results: Related content is presented in order of relevance with similarity scores, organized in a subfolder created at the location where you triggered the action. Each result includes timestamp ranges showing exactly where matching content appears, along with a confidence ranking (e.g., "Rank 9," "Rank 8") that helps prioritize review of the most relevant matches. Results include annotations that describe the match quality, such as "Content similar to: pexels-chevanon-317157els-chevanon-317157 | 5.25s—10.50s Rank 9 | 'you'" .

This capability excels in scenarios like finding alternative footage when a client requests changesbuilding thematic reels or compilation videosdiscovering archived content relevant to current projects, and identifying duplicate or near-duplicate content across libraries. Creative teams can quickly build shot lists by providing visual references, locate matching coverage for editorial continuity, or discover forgotten assets that match current production needs.

The related content feature leverages TwelveLabs' Marengo embedding models to understand video content at a deep semantic level, going beyond simple tag matching to find truly relevant connections between assets. Marengo creates embeddings in a unified vector space where images and video segments are directly comparable, enabling accurate visual similarity matching across modalities. This technology captures temporal coherence and multimodal relationships that traditional computer vision approaches miss, ensuring that recommended content is contextually appropriate, not just visually similar at the pixel level.


Demo Video

Watch how image-to-video search works in Frame.io: https://www.loom.com/share/86b9ae43199a4b7baa221732df5e5a61


2.5 - Compliance Actions
Approve/Flag Workflow

Compliance checking automates the review process for content that must meet specific regulatory, brand, or quality standards. This capability analyzes video content against defined compliance criteria and flags potential issues directly on the Frame.io timeline as comments with precise timestamps. Rather than manually reviewing every frame for compliance violations, teams can leverage AI-powered analysis that identifies issues instantly and documents them in the collaborative environment where creative teams already work.

The screenshot below demonstrates this workflow in action—compliance checks are automatically posted as Frame.io comments at the exact timestamps where potential violations occur. Each comment includes the timestamp (e.g., "00:00", "00:02", "00:19", "00:23", "00:24"), the compliance status ("Compliance REJECTED"), and a detailed description of the flagged issue (e.g., "Violence: Soldier firing rocket launcher," "Violence: Explosion in field," "Violence: Soldiers adjusting and aiming rifles").


Key Features
  • Custom compliance rules: Define specific criteria based on your organization's standards, including brand guidelines, regulatory requirements, and content policies. Whether you need to flag violence for broadcast standards, detect missing disclaimers for regulatory compliance, identify incorrect logos for brand consistency, or catch off-brand messaging before publication, the system can be configured to your exact specifications.

  • Automated detection: Identify potential compliance issues through AI-powered video analysis that understands visual content, spoken dialogue, on-screen text, and contextual meaning. TwelveLabs' Pegasus model can analyze video content against complex compliance criteria, detecting violations that would require hours of manual review. The system processes prohibited content, missing required elements, brand guideline violations, and regulatory non-compliance automatically.

  • Timeline annotations: Issues are marked as Frame.io comments at the exact timestamp where they occur, integrated seamlessly with existing review workflows. This approach ensures compliance findings appear alongside creative feedback, making it natural for teams to address both simultaneously during review cycles. Comments are timestamped and linked directly to the problematic frames, allowing immediate review and correction.

  • Detailed explanations: Each flagged issue includes context about why it was identified and what compliance rule it violates. The screenshot shows examples like "Compliance REJECTED at 0:00: Violence: Soldier firing rocket launcher" and "Compliance REJECTED at 0:19: Violence: Soldiers adjusting and aiming rifles," providing clear justification for each flag. This transparency helps teams understand the specific violation and make informed decisions about how to address it.

  • Review workflow integration: Compliance checks fit naturally into existing Frame.io review cycles, allowing teams to address issues alongside creative feedback. Creative directors, compliance officers, and legal reviewers can collaborate in a single interface, commenting on violations, requesting changes, and approving resolutions without switching between multiple systems.

  • Configurable severity levels: Classify issues by priority—critical, warning, or informational—to help teams triage effectively. High-severity violations can be escalated immediately, while lower-priority issues can be batched for periodic review. This prioritization ensures that teams focus on the most important compliance risks first, streamlining the approval process.

Compliance actions are particularly valuable for broadcast teams ensuring content meets FCC or regional broadcasting standardsbrand managers verifying assets align with brand guidelines before publicationadvertising teams checking that ads meet platform-specific requirements across YouTube, broadcast TV, and social media, and legal departments identifying content that may require clearance or documentation. By automating the detection process, organizations can review content faster, reduce compliance risk, and maintain consistent standards across all published materials.

The system processes the entire video and creates a comprehensive compliance report embedded directly in Frame.io's comment system, where teams already collaborate and resolve issues. This integration eliminates the need for separate compliance tracking tools or spreadsheets, ensuring all feedback—creative and compliance-related—lives in one centralized location.

Note: This is a demonstration workflow. The specific compliance criteria, comment formatting, and detection thresholds are fully customizable to your organization's needs.


Demo Video

Watch how compliance checking works in Frame.io: https://www.loom.com/share/0e7308a9beac438db49c7855783825e3


3 - Workflow Implementation

The general workflow diagram below shows how a generic architecture applies across all TwelveLabs capabilities. Frame.io UI actions trigger webhook events that flow to the workflow orchestrator, which coordinates API requests to both the TwelveLabs Platform and Frame.io API. Analysis results are returned through the same orchestrator and written back to Frame.io via REST API calls, creating a seamless bidirectional integration.


3.1 - Indexing Workflow

When a user triggers the indexing custom action from the Frame.io interface, Frame.io sends a webhook event to the workflow orchestrator containing asset details and metadata. This same webhook mechanism is used whether indexing is triggered manually via Custom Actions or automatically when assets are moved or copied to designated folders. The orchestrator distinguishes between single asset and folder-level triggers, retrieving the appropriate file list when processing folders.

The indexing workflow diagram above illustrates the complete end-to-end process:

  1. Step 1: Frame.io Indexing Triggered — The workflow begins when a user right-clicks on an asset or folder in Frame.io and selects the "Index Asset(s)" custom action. Frame.io immediately sends a webhook event (manual or automated) to the workflow orchestrator with details about which assets need to be indexed.

  2. Step 2: Webhook → Workflow Orchestrator — The orchestrator receives the webhook payload and parses the asset information. For folder-level indexing, it queries the Frame.io API to retrieve the complete list of video files contained within the folder hierarchy.

  3. Step 3: Get Download URL from Frame.io API — For each video to be indexed, the orchestrator calls the Frame.io API to retrieve a secure, time-limited download URL. This URL provides temporary access to the video file with proper authentication, allowing the orchestrator to download the asset without requiring permanent storage credentials.

  4. Step 4: Upload Video to TwelveLabs API — Using the download URL, the orchestrator retrieves the video file and uploads it directly to the TwelveLabs platform via the Index Video API endpoint. The Marengo model begins processing the content immediately, analyzing visual, audio, and contextual information to create a multimodal understanding of the video.

  5. Step 5: Poll Status — Since video indexing is computationally intensive and may take several minutes depending on video length and complexity, the orchestrator polls the TwelveLabs API to monitor indexing status. This polling continues at regular intervals until the indexing task reaches a terminal state (ready, failed, or error).

  6. Step 6: video_id & status → Frame.io API Update Metadata — Once indexing completes successfully, the orchestrator writes the TwelveLabs video_id and indexing status back to Frame.io custom metadata fields. These metadata fields become visible in the Frame.io UI, confirming that the asset is now indexed and ready for advanced video intelligence operations. If indexing fails, the status field reflects the error state, alerting users to investigate.

This background process requires no user intervention beyond the initial trigger, making it ideal for automated archival pipelines. Once the workflow is configured, organizations can automatically index all incoming content by routing assets through designated Frame.io folders, ensuring that every video becomes immediately searchable and analyzable without manual action.


3.2 - Metadata Generation Workflow

The metadata generation workflow retrieves TwelveLabs video IDs from Frame.io metadata, then calls the Analyze API using prompts stored in an external configuration system. This approach allows non-technical teams to modify prompts without changing workflow code, enabling content managers, compliance officers, and brand teams to customize metadata fields based on evolving organizational needs.

The metadata generation workflow diagram above illustrates the complete process:

  1. Step 1: Frame.io Metadata Generation Triggered — When a user triggers the metadata generation custom action, Frame.io sends a webhook to the workflow orchestrator. This action can be triggered manually after indexing completes, or configured to run automatically as part of an automated workflow.

  2. Step 2: Webhook → Workflow Orchestrator — The orchestrator receives the webhook event and identifies which assets require metadata generation. It then queries the Frame.io API to retrieve the TwelveLabs video_id that was stored during the indexing workflow.

  3. Step 3: Get video_id from Frame.io API (Read Metadata) — The orchestrator calls the Frame.io API to read custom metadata fields containing the video_id. This identifier is essential for subsequent TwelveLabs API calls, as it references the indexed video content stored in the TwelveLabs platform.

  4. Step 4: Load Prompts from Config System — Rather than hardcoding prompts directly into the workflow logic, the orchestrator retrieves prompt templates from an external configuration system. These prompts define what metadata to generate (e.g., "Generate a concise summary," "List all visible objects and people," "Identify the emotional tone") and specify the output format, often using JSON schemas to ensure structured responses. This separation of configuration from code enables business users to iterate on metadata strategies without developer involvement.

  5. Step 5: Analyze Request with video_id and Prompts → TwelveLabs Analyze API — The orchestrator sends an analysis request to the TwelveLabs Analyze API, providing the video_id and the configured prompts. Pegasus processes the indexed video content and generates metadata based on the prompt instructions. For example, a prompt might request a JSON response containing { "title": "...", "description": "...", "tags": [...], "genre": "...", "mood": "..." }. Pegasus comprehends objects, people, actions, events, and their relationships in video, then assigns appropriate classes and metadata according to the prompt's specifications.

  6. Step 6: Generated Text → Workflow Orchestrator — The TwelveLabs API returns the generated metadata as structured text (often JSON). The orchestrator receives this response and prepares it for integration with Frame.io.

  7. Step 7: Parse & Format, Batch Update → Frame.io API (Write Metadata) — The workflow parses the Pegasus-generated responses and formats them to match Frame.io field requirements, including character limits, data types, and field naming conventions. Updates are batched where possible using Frame.io's batch metadata endpoint, reducing API overhead and improving performance when processing multiple assets. Requests are queued and throttled to respect API rate limits, implementing strategies such as progressive rate limiting, request distribution, and adaptive inter-batch cooldowns. Frame.io V4's API uses a "leaky bucket" algorithm for rate limiting, where limits refresh gradually during their allotted time window, requiring the orchestrator to carefully manage request pacing.

Progress is tracked through Frame.io status fields, with the workflow continuing to process remaining assets even if individual updates fail. Error handling ensures that transient API failures (such as rate limit exceeded responses) trigger retries with exponential backoff, while permanent errors are logged for manual review. This resilient design allows large-scale batch processing to complete successfully despite occasional API hiccups.

The entire process transforms raw video content into richly tagged, searchable assets without requiring manual metadata entry, enabling teams to manage thousands of videos with consistent, AI-generated metadata that aligns with organizational standards.


3.3 - Semantic Search Workflow

Users initiate semantic search by entering a natural language query through Frame.io's modal interface. The workflow queries the TwelveLabs index, searching across all indexed videos within the scope to find segments matching the query. Results are returned with precise timestamps indicating where relevant content appears in each video.

The semantic search workflow diagram above illustrates the complete process:

  1. Step 1: Frame.io Semantic Search Triggered — When a user triggers the semantic search custom action, Frame.io presents a modal interface where users enter their natural language query. For example, queries like "wide aerial drone shot of rocky coastline meeting the ocean" or "people shaking hands in an office" are processed as conversational descriptions rather than exact keyword matches.

  2. Step 2: Webhook + User Enters Query → Workflow Orchestrator — Frame.io sends a webhook to the workflow orchestrator containing both the trigger event details and the user's search query. The orchestrator prepares to execute the search across the TwelveLabs index.

  3. Step 3: Search Request → TwelveLabs Search API — The orchestrator submits the natural language query to the TwelveLabs Search API. TwelveLabs' Marengo embedding model converts the text query into a multimodal vector representation, then searches across all indexed video embeddings to find semantically similar segments. The search operates at the video segment level, identifying specific timestamp ranges where the query's semantic meaning matches the video content.

  4. Step 4: Matching Segments + Timestamps → Workflow Orchestrator — The TwelveLabs Search API returns a ranked list of matching video segments, each with precise start and end timestamps (e.g., "0.00s—6.50s," "6.50s—12.75s"). Each result includes the video_id, segment boundaries, and a relevance score indicating how closely the segment matches the query.

  5. Step 5: Get Asset FPS → Frame.io API — Before creating timeline comments, the orchestrator retrieves each matching video's frames-per-second (FPS) metadata from Frame.io. This metadata is essential for converting TwelveLabs' second-based timestamps into Frame.io's frame-accurate timeline positions. Different videos may have different frame rates (24fps, 30fps, 60fps, etc.), so the orchestrator must query this information to ensure comments appear at the exact correct frames.

  6. Step 6: FPS Metadata → Workflow Orchestrator — Frame.io returns the FPS information for each asset. The orchestrator uses this data to calculate frame-accurate positions for timeline comments.

  7. Step 7: Create Subfolder, Copy Assets, Post Comments → Frame.io API — The orchestrator performs several Frame.io API operations to organize and present search results:

    1. Create subfolder: A dedicated subfolder is created at the trigger location to organize search results. This folder is typically named with a timestamp and search query description (e.g., "TL_Search_wide_aerial_drone_shot_2025-11-07").

    2. Copy assets: For each video containing matches, the workflow copies the asset into this subfolder. This consolidates all relevant footage in one location, making it easy for users to review search results without navigating through the original project hierarchy.

    3. Post comments: Timeline comments are created at the exact timestamps where relevant moments occur. Each comment includes the timestamp range, relevance ranking, and the original search query, allowing users to understand why each segment was matched (e.g., "Content similar to: wide aerial drone shot of rocky coastlineky coastline meeting the ocean | 5.25s—10.50s Rank 9") . Users can click these comments to jump directly to the specific frames that match their query .

  8. Step 8: Return Form ← Frame.io — Once the subfolder is created, assets are copied, and comments are posted, the orchestrator sends a completion notification back to Frame.io. The user receives confirmation that their search is complete, along with a link to the newly created results subfolder.

This workflow allows users to review all search results in one place and click directly to the specific moments that match their query, dramatically reducing the time spent manually searching through footage. Instead of scrubbing through hours of video or relying on incomplete metadata, creative teams can instantly locate relevant shots using natural language descriptions. The subfolder organization ensures that search results remain accessible for future reference, while timestamped comments provide immediate navigation to the exact frames of interest.


3.4 - Search Related Content Workflow

When triggered from a still image, the workflow performs image-to-video search using TwelveLabs' semantic understanding capabilities. The image is sent to the Search API, which returns videos containing contextually similar visual elements based on the meaning and context of the image, not just pixel-level similarity.

The search related content workflow diagram above illustrates the complete process:

  1. Step 1: Frame.io Related Content Triggered — When a user triggers the related content custom action from a still image in Frame.io, a webhook is sent to the workflow orchestrator. This action can be initiated from any image asset within Frame.io, whether it's a reference photo, a production still, or a frame extracted from existing footage.

  2. Step 2: Webhook → Workflow Orchestrator — The orchestrator receives the webhook event containing the source image's asset ID and metadata. This triggers the image-to-video search workflow.

  3. Step 3: Get Image URL → Frame.io API — The orchestrator calls the Frame.io API to retrieve a download URL for the source image. This URL provides temporary access to the image file with proper authentication.

  4. Step 4: Image Download URL → Workflow Orchestrator — Frame.io returns the image download URL, which the orchestrator uses to retrieve the image for analysis. The image is then prepared for submission to TwelveLabs.

  5. Step 5: Image-to-Video Search → TwelveLabs Search API — The orchestrator sends the image to the TwelveLabs Search API. TwelveLabs' Marengo embedding model generates a vector representation of the image, capturing its visual features, composition, objects, setting, and semantic context. This embedding is then compared against all indexed video embeddings to find segments with high visual and semantic similarity. Unlike simple pixel matching or color histogram comparison, this approach understands what's actually in the image—for example, recognizing "a person in an outdoor athletic pose" rather than just matching pixel patterns.

  6. Step 6: Similar Videos + Scores → Workflow Orchestrator — The TwelveLabs Search API returns a ranked list of matching video segments along with similarity scores. Each result includes the video_id, timestamp range, and a confidence score indicating how closely the video segment matches the source image. Results are filtered by similarity threshold to ensure only high-quality matches are included. Organizations can configure this threshold based on their use case—tighter thresholds for precise shot matching, looser thresholds for broader thematic discovery.

  7. Step 7: Filter & Rank, Create Subfolder, Copy Assets, Add Comments → Frame.io API — The orchestrator performs several Frame.io API operations to organize and present related content results:

    1. Filter & Rank: Results are filtered by the configured similarity threshold and ranked by relevance score. Only videos exceeding the minimum similarity threshold are included in the results, ensuring users receive high-quality matches.

    2. Create subfolder: A dedicated subfolder is created at the trigger location to organize related content results. This folder is typically named with descriptive information that includes the source image name and timestamp (e.g., "TL_ImageSearch_pexels-chevanon-317157_2025-11-07").

    3. Copy assets: Each related video asset is copied into the results folder. This consolidates all visually similar footage in one location, making it easy for users to review matches without navigating through the original project hierarchy.

    4. Add comments: Timeline comments are created explaining the similarity match and relevance scores. Each comment includes the timestamp range where the visual similarity occurs, the similarity rank (e.g., "Rank 9," "Rank 8"), and a reference to the source image (e.g., "Content similar to: pexels-chevanon-317157vanon-317157 | 5.25s—10.50s Rank 9") . These annotations help users understand why each video was matched and prioritize which results to review first .

Results are filtered by similarity threshold and organized into a subfolder with descriptive naming that includes the source image name and timestamp. Each related asset is copied to the results folder with comments explaining the similarity match and relevance scores. This workflow enables creative teams to instantly discover visually similar footage, find alternative takes, build thematic compilations, or locate archived content that matches current production needs—all without manually browsing through thousands of assets.


3.5 - Compliance Actions Workflow

The compliance workflow sends the video to Pegasus with a detailed prompt specifying evaluation categories (violence, language, sexual content, substance use, disturbing content, discrimination) and requesting timestamped violations. This structured approach enables automated detection of compliance issues that would otherwise require hours of manual review.

The compliance actions workflow diagram above illustrates the complete process:

  1. Step 1: Frame.io Compliance Check Triggered — When a user triggers the compliance check custom action, Frame.io sends a webhook to the workflow orchestrator. This action can be triggered manually for on-demand compliance review or configured to run automatically as part of pre-publication workflows.

  2. Step 2: Webhook → Workflow Orchestrator — The orchestrator receives the webhook event containing the asset details and prepares to execute the compliance analysis. It identifies which video requires compliance checking and initiates the workflow.

  3. Step 3: Get video_id → Frame.io API (Read Metadata) — The orchestrator queries the Frame.io API to retrieve the TwelveLabs video_id that was stored during the indexing workflow. This identifier is required to reference the indexed video content in the TwelveLabs platform.

  4. Step 4: video_id → Workflow Orchestrator — Frame.io returns the video_id, which the orchestrator uses to construct the compliance analysis request. The orchestrator also loads the compliance prompt from the configuration system, defining specific evaluation criteria and output format.

  5. Step 5: Analyze with Compliance Prompt → TwelveLabs Analyze API — The orchestrator sends an analysis request to the TwelveLabs Analyze API, providing the video_id and a detailed compliance prompt. The prompt specifies evaluation categories such as violence, explicit language, sexual content, substance use, disturbing imagery, and discrimination. It also requests structured output including an overall compliance status (APPROVED/REJECTED/NEEDS_REVIEW) and timestamped violations for each detected issue. Pegasus processes the video content, analyzing visual elements, spoken dialogue, on-screen text, and contextual meaning to identify compliance violations. For example, it can detect violent actions ("soldier firing rocket launcher"), explicit language in audio, or discriminatory imagery, providing timestamps for each violation.

  6. Step 6: Status + Violations + Timestamps → Workflow Orchestrator — The TwelveLabs Analyze API returns the compliance analysis results. The response includes the overall compliance status, a list of individual violations, and precise timestamps indicating where each violation occurs in the video. For example: { "status": "REJECTED", "violations": [{ "timestamp": "0:00", "category": "violence", "description": "Soldier firing rocket launcher" }, { "timestamp": "0:02", "category": "violence", "description": "Explosion in field" }] }.

  7. Step 7: Get Asset FPS → Frame.io API — Before creating timeline comments, the orchestrator retrieves the video's frames-per-second (FPS) metadata from Frame.io. This information is essential for converting second-based timestamps from TwelveLabs into frame-accurate positions in Frame.io's timeline.

  8. Step 8: FPS Metadata → Workflow Orchestrator — Frame.io returns the FPS metadata, which the orchestrator uses to calculate exact frame positions for each violation comment.

  9. Step 9: Update Status Field, Create Timeline Comments → Frame.io API — The orchestrator performs two Frame.io API operations to document the compliance results:

    1. Update Status Field: The overall compliance status (APPROVED/REJECTED/NEEDS_REVIEW) is written to a Frame.io dropdown metadata field. This provides immediate visual feedback on the asset's compliance status, allowing teams to filter and prioritize review workflows.

    2. Create Timeline Comments: For each detected violation, the orchestrator creates a Frame.io timeline comment at the exact frame position where the issue occurs. Each comment includes the timestamp, violation category, and a detailed description (e.g., "Compliance REJECTED at 0:00: Violence: Soldier firing rocket launcher"). This allows reviewers to address each compliance issue individually within Frame.io's existing review interface, alongside creative feedback from the rest of the team.

The workflow then processes the response by extracting the overall compliance status (APPROVED/REJECTED/NEEDS_REVIEW), updating a Frame.io dropdown field with the appropriate status, parsing individual violations with their timestamps, and creating Frame.io timeline comments at exact frame positions for each violation. This approach allows reviewers to address each compliance issue individually within Frame.io's existing review interface, alongside creative feedback from the rest of the team.

By automating the detection process and integrating findings directly into Frame.io's collaborative review workflow, organizations can review content faster, reduce compliance risk, maintain consistent standards across all published materials, and ensure that compliance checks don't create workflow bottlenecks. Compliance officers can focus their attention on addressing flagged violations rather than manually screening hours of footage, while creative teams receive compliance feedback in the same interface where they handle all other review comments.


4 - Bringing It All Together: Transform Your Video Workflows Today

The TwelveLabs × Frame.io integration demonstrates how advanced video AI can seamlessly embed into production workflows that creative teams already trust. By combining Frame.io's industry-leading collaborative review platform with TwelveLabs' Pegasus and Marengo video foundation models, Media & Entertainment organizations gain powerful capabilities that were previously impossible or prohibitively expensive.


Real-World Impact Across M&E Workflows

Throughout this post, we've seen how this integration delivers measurable value across every stage of the content lifecycle:

  • Production & Post-Production Teams can instantly locate specific shots using natural language search ("wide aerial drone shot of rocky coastline"), eliminating hours of manual footage review. Semantic search with timestamped results means editors jump directly to the exact frames they need, accelerating assembly and reducing time-to-delivery.

  • Creative Libraries & Asset Management benefit from automated metadata generation that populates Frame.io fields with rich descriptions, tags, themes, and summaries—all generated by AI that understands narrative context, not just object detection. Image-to-video search enables teams to find visually similar footage for B-roll, alternate takes, or thematic compilations, unlocking the value of archived content that would otherwise remain undiscovered.

  • Compliance & Brand Teams can automate regulatory review processes, with AI-powered analysis flagging potential violations at exact timestamps and writing them directly into Frame.io's comment system. This transforms compliance from a bottleneck into a parallel workflow, where reviewers address issues alongside creative feedback without switching tools.

  • Broadcast Networks & Advertising Agencies managing thousands of assets across multiple campaigns gain consistent, scalable metadata practices and compliance checks that ensure brand guidelines and regulatory standards are met before publication. The integration fits naturally into existing Frame.io review cycles, preserving established workflows while adding AI intelligence.


Key Technical Innovations

The integration leverages Frame.io V4's Custom Actions and flexible metadata system to create a seamless user experience. Webhook-driven workflows orchestrate complex multi-step processes—indexing, metadata generation, semantic search, and compliance checking—without requiring users to leave the Frame.io interface. Configurable prompts, batch processing, rate limiting, and error handling ensure the system operates reliably at enterprise scale.

TwelveLabs' multimodal video understanding goes beyond traditional computer vision by analyzing visual content, spoken dialogue, on-screen text, and contextual meaning simultaneously. This enables semantic search that understands what users are asking for, not just keyword matches, and generates metadata that captures the narrative essence of video content.


Getting Started

Ready to bring multimodal video intelligence into your Frame.io workflows? The TwelveLabs × Frame.io integration is designed for organizations that need intelligent content management at scale:

Prerequisites:

  • Frame.io V4 account with enterprise features

  • Access to Frame.io Custom Actions and custom metadata fields

  • TwelveLabs API access for indexing and analysis

Enablement Path:

Organizations interested in deploying this integration should contact TwelveLabs to discuss implementation, configuration, and training. Our team will work with you to customize workflows, configure compliance prompts, and integrate the system with your existing production infrastructure. Reach out to brice@twelvelabs.io for hands-on support.

Resources:

The future of video content management isn't about replacing human creativity—it's about eliminating the tedious manual work that prevents creative teams from focusing on what they do best. With TwelveLabs and Frame.io working together, your organization can search vast libraries instantly, maintain consistent metadata at scale, ensure compliance automatically, and discover connections between assets that would otherwise remain hidden.

TLDR

This post demonstrates how TwelveLabs' multimodal video understanding AI integrates directly into Frame.io V4, transforming how creative teams search, organize, and manage video content at scale.

  • Index and search your entire video library using natural language—find specific shots by describing what you're looking for ("wide aerial drone shot of coastline") instead of manually scrubbing through footage or relying on incomplete metadata.

  • Automatically generate rich metadata with Pegasus—populate Frame.io fields with AI-generated descriptions, tags, summaries, themes, and emotional tone that understand narrative context, not just objects on screen.

  • Discover visually similar content with image-to-video search—upload a reference photo and instantly locate matching footage across your entire library, perfect for finding B-roll, alternate takes, or thematic compilations.

  • Automate compliance checks with timestamped violations—flag potential regulatory issues, brand guideline violations, or content policy breaches at exact timestamps, integrated directly into Frame.io's comment system.

  • Seamless workflow integration powered by Frame.io V4's Custom Actions—trigger indexing, metadata generation, semantic search, and compliance checks with a simple right-click, with all results organized in Frame.io's familiar interface.


1 - Overview Introduction

Why TwelveLabs × Frame.io

The media and entertainment industry faces mounting challenges in managing increasingly large video libraries. Post-production teams, broadcasters, and content creators need efficient ways to search, analyze, and derive insights from their video assets. Frame.io, now part of Adobe's Creative Cloud ecosystem, has become the industry standard for video collaboration and review, serving creative teams across film, television, advertising, and digital media production. However, traditional text-based metadata and manual tagging can't keep pace with modern content volumes, creating bottlenecks in creative workflows and making it difficult to unlock the full value of video archives.

TwelveLabs' integration with Frame.io brings advanced multimodal video understanding capabilities directly into the collaborative review workflow. With the introduction of Frame.io V4, two key features made this integration particularly powerful: Custom Actions and customizable metadata fields.

  • Custom Actions allow users to trigger video understanding workflows with a simple right-click, enabling on-demand AI analysis without leaving the platform.

  • Flexible metadata fields enable TwelveLabs to write structured video intelligence data back into Frame.io's native interface, making AI-generated insights immediately accessible alongside traditional review tools.

These V4 capabilities provide the extensibility and flexibility needed to seamlessly embed multimodal video understanding into production workflows, transforming Frame.io from a review platform into an intelligent video content management system.

By combining Frame.io's collaborative features with TwelveLabs' Pegasus and Marengo video foundation models, teams can index video content, generate rich metadata, perform semantic searches across their libraries, find related content, and ensure compliance—all within the familiar Frame.io environment. Marengo 2.7 creates vector embeddings that enable semantic search and pattern recognition across video, audio, and text, while Pegasus 1.2 generates human-readable descriptions, summaries, and structured metadata from video content. Together, these models provide both the "what" and the "why" of video content, empowering creative teams to work faster and smarter.


Who It's For

This integration serves organizations that rely on Frame.io for video-centric workflows and need intelligent content management at scale. Our customers using Frame.io span multiple industries: broadcasting and news organizations managing extensive footage libraries, production studios coordinating complex post-production workflows, marketing agencies repurposing creative assets across campaigns, and enterprise content teams maintaining brand consistency across thousands of videos.

While their specific workflows differ, these organizations share common challenges: searching vast video libraries for specific moments or themes, maintaining consistent metadata across thousands of assets, ensuring brand and regulatory compliance, and discovering relevant content buried in archives. Manual tagging is time-consuming and inconsistent, while traditional keyword search fails to capture the visual and contextual richness of video. The need for intelligent video understanding at scale, integrated seamlessly into existing review and collaboration workflows, has never been greater.

Frame.io V4's Custom Actions and flexible metadata system provide the foundation for embedding TwelveLabs' video understanding capabilities directly where teams already work, eliminating the friction of switching between multiple tools and platforms. Whether you're a creative director searching for the perfect B-roll shot, a compliance officer reviewing thousands of hours of content, or a producer mining archives for reusable assets, this integration brings AI-powered video intelligence into your daily workflow.


2 - Key Capabilities

2.1 - Indexing Assets
What It Does & When to Use It

The indexing capability is the foundation of all TwelveLabs functionality within Frame.io. Users can index individual assets or entire folders directly from the Frame.io interface by right-clicking and selecting the "Index Asset(s)" custom action. This process creates a multimodal understanding of your video content, analyzing visual, audio, and textual information to build a searchable representation that powers all downstream AI capabilities.

Indexing is the critical first step that transforms raw video files into intelligent, queryable assets. Once indexed, videos become fully searchable through semantic queries, can generate automated metadata, enable compliance checks, and support image-based content discovery.


Key Features
  • Manual or automated triggering: Users can manually trigger indexing on-demand via Custom Actions directly from the Frame.io UI, or configure automated workflows that index assets when they're moved or copied to specific Frame.io projects or folders. This flexibility supports both ad-hoc indexing needs and systematic processing of incoming content.

  • Asset and folder-level processing: Trigger indexing on a single video file or select an entire folder to batch-index all contained assets in a single action, ideal for project archives or bulk content libraries. Batch processing dramatically reduces the manual effort required to prepare large video collections for semantic search and analysis.

  • Status tracking: Indexing status is visible within Frame.io custom metadata fields, showing whether assets are indexed, currently processing, or encountered errors. This transparency ensures teams know exactly which assets are ready for advanced video intelligence workflows and which may require attention.

  • Seamless integration: No need to download assets or leave the Frame.io environment. Indexing happens in the background while teams continue their collaborative review work, with TwelveLabs processing video content through its Marengo and Pegasus foundation models to generate multimodal embeddings and structured representations. The entire process is transparent to end users, requiring only a simple right-click action to initiate.

Once indexed, assets become searchable and analyzable through all other TwelveLabs capabilities, creating a foundation for advanced video intelligence workflows that transform how creative teams discover, understand, and leverage their video content.


Demo Video

Watch how indexing works in Frame.io: https://www.loom.com/share/5140914df15f4b9b83b30407daacf0c1


2.2 - Metadata Generation
What It Does & When to Use It

Metadata generation transforms unstructured video content into organized, searchable information. Users can manually trigger this capability via Custom Actions, or configure workflows to automatically generate metadata once assets complete indexing. This capability uses TwelveLabs' Pegasus video language model to analyze indexed content and generate structured metadata that's written directly back to Frame.io asset fields.

Pegasus is a state-of-the-art multimodal AI model designed for advanced video-language understanding and interaction. Unlike traditional tagging approaches that rely on manual input or simple object detection, Pegasus comprehends objects, people, actions, events, and their relationships within video context to generate rich, semantically meaningful metadata. This capability is particularly valuable for organizations managing thousands of assets where manual metadata entry is impractical.


Examples of Fields Populated

The screenshot demonstrates the range of metadata fields that can be automatically populated:

  • Description by TwelveLabs: Rich, detailed scene descriptions capturing visual elements, setting, and atmosphere

  • Emotions by TwelveLabs: Mood and emotional tone detection (e.g., "calm, peaceful, serene")

  • Genre by TwelveLabs: Content categorization (e.g., "Nature documentary")

  • Summary by Twelve Labs: Concise overviews capturing key topics and content highlights

  • Tags by TwelveLabs: Extracted keywords covering entities, locations, objects, and themes (e.g., "aerial view, island, ocean, cliffs, natural pool, waves, rocky formations, scenic landscape")

  • Theme by TwelveLabs: High-level content themes (e.g., "Natural landscape")

  • Status: Custom workflow fields (e.g., "Approved" with visual indicators)

  • Rating: Quality or compliance ratings using visual star systems

Frame.io V4's flexible metadata system allows you to define fields that match your exact workflow needs, whether following broadcast standards, marketing guidelines, or custom organizational taxonomies.


Key Features
  • Custom metadata fields: Configure which Frame.io metadata fields to populate based on your organization's requirements. Frame.io v4's account-level custom metadata feature allows you to define fields once and apply them across every project and workspace, ensuring consistency in how assets are tagged, searched, and managed at enterprise scale.

  • Intelligent summarization: Generate concise summaries capturing key topics, themes, and content highlights. Pegasus understands narrative structure and can produce descriptions that reflect the semantic content of video, not just surface-level object detection.

  • Automatic tagging: Extract relevant keywords, entities, people, locations, and objects detected in the video. These tags are generated with contextual awareness, understanding how elements relate to each other within the video's narrative.

  • Scene-aware descriptions: Create descriptions that understand narrative structure and content context, going beyond simple object detection. Pegasus generates natural language descriptions that capture what's happening, who's doing what, when, and where—creating a semantic narrative of the video.

  • Configurable prompts: Tailor the metadata generation to your specific needs, from broadcast standards to marketing guidelines. The Pegasus model supports structured output formats, allowing you to define JSON schemas that match your organization's metadata taxonomy and ensure predictable, parseable results.

The generated metadata enhances searchability within Frame.io's native search, improves team collaboration by providing context at a glance, and ensures consistent tagging across large content libraries. Teams can quickly assess content without watching entire videos, accelerate shot selection and content discovery, and maintain standardized metadata practices across distributed creative operations.


Demo Video

Watch how automated metadata generation works in Frame.io: https://www.loom.com/share/58e6dee73045429288b62b310c949959


2.3 - Semantic Search
Shot/Segment Discovery Workflow

Semantic search enables teams to find specific moments within videos using natural language queries, going far beyond filename or tag-based search. Instead of remembering exact keywords or manually tagging every moment, users describe what they're looking for in conversational language, and the system returns relevant video segments with precise timestamps. This capability leverages TwelveLabs' Marengo embedding model to understand the semantic meaning behind queries and match them to corresponding video content.

Traditional keyword search requires exact matches and fails to understand context, forcing users to guess at specific terms that might appear in metadata or transcripts. Semantic search understands the meaning behind queries, enabling content discovery based on visual cues, actions, context, and concepts—even if those exact words were never explicitly tagged. This dramatically reduces the time teams spend manually scrubbing through footage to find specific shots or moments.


Key Features
  • Natural language queries: Search using conversational phrases like "people shaking hands in an office" or "wide aerial drone shot of rocky coastline meeting the ocean". Users can describe scenes based on visual elements, actions, emotions, settings, or any combination of multimodal attributes without needing to know technical terminology.

  • Multimodal understanding: Searches across visual content, spoken dialogue, on-screen text, and audio elements simultaneously. Marengo creates embeddings that place video, audio, and text in a shared vector space, allowing the system to understand connections across modalities. A query like "when the person in the red shirt enters the restaurant" can successfully retrieve the exact moment based on visual cues (red shirt), actions (enters), and context (restaurant setting)—even if those specific words never appeared in any metadata.

  • Timestamped results: Returns specific moments within videos with frame-accurate timestamps, not just entire files. This moment-level precision means users jump directly to the relevant scene rather than watching entire videos or manually seeking through timelines. Results include timestamp references (e.g., "0.00s—6.50s," "6.50s—12.75s") that correspond to exact segments where query matches occur.

  • Cross-project search: Query across multiple assets or entire folder structures to find content regardless of where it's stored. The integration displays results through Frame.io's folder structure, organizing matches for immediate review. Users can search an entire library spanning multiple projects and receive consolidated results showing every relevant moment across all indexed videos.

The semantic search capability is displayed through Frame.io's folder structure, with results organized by asset and timestamp for immediate review. Users can click through to the exact moment in the video where their query match occurs, with comments and annotations automatically generated showing the matching segments. This workflow transforms content discovery from a time-consuming manual process into an instant, AI-powered operation.

Note: The subfolder organization and naming conventions are fully configurable to match your team's preferences, allowing you to structure search results in ways that align with your existing project hierarchies and workflows.


Demo Video

Watch how semantic search works in Frame.io: https://www.loom.com/share/a92b4b9787094977b20cd566c9ed0894


2.4 - Search Related Content
Creative Reuse and Shot Matching

The search related content capability helps teams discover similar or thematically connected assets across their video library. This is particularly valuable for finding complementary B-roll, locating alternate takes, or building content collections around specific themes. Instead of manually browsing through hundreds of clips, creative teams can use visual references to instantly locate relevant footage.

This capability extends beyond simple text-based search by allowing users to provide an image as their query, then finding video segments that match the visual composition, subject matter, or semantic context. The screenshot demonstrates how this works in practice—after providing a reference image, the system returns ranked video matches with similarity scores and timestamp ranges.


Key Features
  • Image-to-video search: Find videos with similar compositions, settings, objects, or subject matter by using a still image as your query. This makes it easy to find video moments that match a reference frame or photo, whether sourced from existing footage, client-provided mockups, or external inspirational images. For example, if a client provides a reference photo showing a person in an outdoor athletic pose, you can upload that image and instantly retrieve all indexed video segments featuring similar compositions, settings, and activities.

  • Semantic understanding: Leveraging the same multimodal search capabilities as text-based semantic search, the system understands the meaning and context of visual elements. Unlike simple pixel-matching or tag-based approaches, TwelveLabs' Marengo embedding model captures what's actually happening in the scene—objects, actions, settings, and their relationships. For example, searching with an image of a tree will find videos featuring contextually similar trees, focusing on the overall semantic content rather than just detecting the presence of tree-shaped pixels.

  • Ranked results: Related content is presented in order of relevance with similarity scores, organized in a subfolder created at the location where you triggered the action. Each result includes timestamp ranges showing exactly where matching content appears, along with a confidence ranking (e.g., "Rank 9," "Rank 8") that helps prioritize review of the most relevant matches. Results include annotations that describe the match quality, such as "Content similar to: pexels-chevanon-317157els-chevanon-317157 | 5.25s—10.50s Rank 9 | 'you'" .

This capability excels in scenarios like finding alternative footage when a client requests changesbuilding thematic reels or compilation videosdiscovering archived content relevant to current projects, and identifying duplicate or near-duplicate content across libraries. Creative teams can quickly build shot lists by providing visual references, locate matching coverage for editorial continuity, or discover forgotten assets that match current production needs.

The related content feature leverages TwelveLabs' Marengo embedding models to understand video content at a deep semantic level, going beyond simple tag matching to find truly relevant connections between assets. Marengo creates embeddings in a unified vector space where images and video segments are directly comparable, enabling accurate visual similarity matching across modalities. This technology captures temporal coherence and multimodal relationships that traditional computer vision approaches miss, ensuring that recommended content is contextually appropriate, not just visually similar at the pixel level.


Demo Video

Watch how image-to-video search works in Frame.io: https://www.loom.com/share/86b9ae43199a4b7baa221732df5e5a61


2.5 - Compliance Actions
Approve/Flag Workflow

Compliance checking automates the review process for content that must meet specific regulatory, brand, or quality standards. This capability analyzes video content against defined compliance criteria and flags potential issues directly on the Frame.io timeline as comments with precise timestamps. Rather than manually reviewing every frame for compliance violations, teams can leverage AI-powered analysis that identifies issues instantly and documents them in the collaborative environment where creative teams already work.

The screenshot below demonstrates this workflow in action—compliance checks are automatically posted as Frame.io comments at the exact timestamps where potential violations occur. Each comment includes the timestamp (e.g., "00:00", "00:02", "00:19", "00:23", "00:24"), the compliance status ("Compliance REJECTED"), and a detailed description of the flagged issue (e.g., "Violence: Soldier firing rocket launcher," "Violence: Explosion in field," "Violence: Soldiers adjusting and aiming rifles").


Key Features
  • Custom compliance rules: Define specific criteria based on your organization's standards, including brand guidelines, regulatory requirements, and content policies. Whether you need to flag violence for broadcast standards, detect missing disclaimers for regulatory compliance, identify incorrect logos for brand consistency, or catch off-brand messaging before publication, the system can be configured to your exact specifications.

  • Automated detection: Identify potential compliance issues through AI-powered video analysis that understands visual content, spoken dialogue, on-screen text, and contextual meaning. TwelveLabs' Pegasus model can analyze video content against complex compliance criteria, detecting violations that would require hours of manual review. The system processes prohibited content, missing required elements, brand guideline violations, and regulatory non-compliance automatically.

  • Timeline annotations: Issues are marked as Frame.io comments at the exact timestamp where they occur, integrated seamlessly with existing review workflows. This approach ensures compliance findings appear alongside creative feedback, making it natural for teams to address both simultaneously during review cycles. Comments are timestamped and linked directly to the problematic frames, allowing immediate review and correction.

  • Detailed explanations: Each flagged issue includes context about why it was identified and what compliance rule it violates. The screenshot shows examples like "Compliance REJECTED at 0:00: Violence: Soldier firing rocket launcher" and "Compliance REJECTED at 0:19: Violence: Soldiers adjusting and aiming rifles," providing clear justification for each flag. This transparency helps teams understand the specific violation and make informed decisions about how to address it.

  • Review workflow integration: Compliance checks fit naturally into existing Frame.io review cycles, allowing teams to address issues alongside creative feedback. Creative directors, compliance officers, and legal reviewers can collaborate in a single interface, commenting on violations, requesting changes, and approving resolutions without switching between multiple systems.

  • Configurable severity levels: Classify issues by priority—critical, warning, or informational—to help teams triage effectively. High-severity violations can be escalated immediately, while lower-priority issues can be batched for periodic review. This prioritization ensures that teams focus on the most important compliance risks first, streamlining the approval process.

Compliance actions are particularly valuable for broadcast teams ensuring content meets FCC or regional broadcasting standardsbrand managers verifying assets align with brand guidelines before publicationadvertising teams checking that ads meet platform-specific requirements across YouTube, broadcast TV, and social media, and legal departments identifying content that may require clearance or documentation. By automating the detection process, organizations can review content faster, reduce compliance risk, and maintain consistent standards across all published materials.

The system processes the entire video and creates a comprehensive compliance report embedded directly in Frame.io's comment system, where teams already collaborate and resolve issues. This integration eliminates the need for separate compliance tracking tools or spreadsheets, ensuring all feedback—creative and compliance-related—lives in one centralized location.

Note: This is a demonstration workflow. The specific compliance criteria, comment formatting, and detection thresholds are fully customizable to your organization's needs.


Demo Video

Watch how compliance checking works in Frame.io: https://www.loom.com/share/0e7308a9beac438db49c7855783825e3


3 - Workflow Implementation

The general workflow diagram below shows how a generic architecture applies across all TwelveLabs capabilities. Frame.io UI actions trigger webhook events that flow to the workflow orchestrator, which coordinates API requests to both the TwelveLabs Platform and Frame.io API. Analysis results are returned through the same orchestrator and written back to Frame.io via REST API calls, creating a seamless bidirectional integration.


3.1 - Indexing Workflow

When a user triggers the indexing custom action from the Frame.io interface, Frame.io sends a webhook event to the workflow orchestrator containing asset details and metadata. This same webhook mechanism is used whether indexing is triggered manually via Custom Actions or automatically when assets are moved or copied to designated folders. The orchestrator distinguishes between single asset and folder-level triggers, retrieving the appropriate file list when processing folders.

The indexing workflow diagram above illustrates the complete end-to-end process:

  1. Step 1: Frame.io Indexing Triggered — The workflow begins when a user right-clicks on an asset or folder in Frame.io and selects the "Index Asset(s)" custom action. Frame.io immediately sends a webhook event (manual or automated) to the workflow orchestrator with details about which assets need to be indexed.

  2. Step 2: Webhook → Workflow Orchestrator — The orchestrator receives the webhook payload and parses the asset information. For folder-level indexing, it queries the Frame.io API to retrieve the complete list of video files contained within the folder hierarchy.

  3. Step 3: Get Download URL from Frame.io API — For each video to be indexed, the orchestrator calls the Frame.io API to retrieve a secure, time-limited download URL. This URL provides temporary access to the video file with proper authentication, allowing the orchestrator to download the asset without requiring permanent storage credentials.

  4. Step 4: Upload Video to TwelveLabs API — Using the download URL, the orchestrator retrieves the video file and uploads it directly to the TwelveLabs platform via the Index Video API endpoint. The Marengo model begins processing the content immediately, analyzing visual, audio, and contextual information to create a multimodal understanding of the video.

  5. Step 5: Poll Status — Since video indexing is computationally intensive and may take several minutes depending on video length and complexity, the orchestrator polls the TwelveLabs API to monitor indexing status. This polling continues at regular intervals until the indexing task reaches a terminal state (ready, failed, or error).

  6. Step 6: video_id & status → Frame.io API Update Metadata — Once indexing completes successfully, the orchestrator writes the TwelveLabs video_id and indexing status back to Frame.io custom metadata fields. These metadata fields become visible in the Frame.io UI, confirming that the asset is now indexed and ready for advanced video intelligence operations. If indexing fails, the status field reflects the error state, alerting users to investigate.

This background process requires no user intervention beyond the initial trigger, making it ideal for automated archival pipelines. Once the workflow is configured, organizations can automatically index all incoming content by routing assets through designated Frame.io folders, ensuring that every video becomes immediately searchable and analyzable without manual action.


3.2 - Metadata Generation Workflow

The metadata generation workflow retrieves TwelveLabs video IDs from Frame.io metadata, then calls the Analyze API using prompts stored in an external configuration system. This approach allows non-technical teams to modify prompts without changing workflow code, enabling content managers, compliance officers, and brand teams to customize metadata fields based on evolving organizational needs.

The metadata generation workflow diagram above illustrates the complete process:

  1. Step 1: Frame.io Metadata Generation Triggered — When a user triggers the metadata generation custom action, Frame.io sends a webhook to the workflow orchestrator. This action can be triggered manually after indexing completes, or configured to run automatically as part of an automated workflow.

  2. Step 2: Webhook → Workflow Orchestrator — The orchestrator receives the webhook event and identifies which assets require metadata generation. It then queries the Frame.io API to retrieve the TwelveLabs video_id that was stored during the indexing workflow.

  3. Step 3: Get video_id from Frame.io API (Read Metadata) — The orchestrator calls the Frame.io API to read custom metadata fields containing the video_id. This identifier is essential for subsequent TwelveLabs API calls, as it references the indexed video content stored in the TwelveLabs platform.

  4. Step 4: Load Prompts from Config System — Rather than hardcoding prompts directly into the workflow logic, the orchestrator retrieves prompt templates from an external configuration system. These prompts define what metadata to generate (e.g., "Generate a concise summary," "List all visible objects and people," "Identify the emotional tone") and specify the output format, often using JSON schemas to ensure structured responses. This separation of configuration from code enables business users to iterate on metadata strategies without developer involvement.

  5. Step 5: Analyze Request with video_id and Prompts → TwelveLabs Analyze API — The orchestrator sends an analysis request to the TwelveLabs Analyze API, providing the video_id and the configured prompts. Pegasus processes the indexed video content and generates metadata based on the prompt instructions. For example, a prompt might request a JSON response containing { "title": "...", "description": "...", "tags": [...], "genre": "...", "mood": "..." }. Pegasus comprehends objects, people, actions, events, and their relationships in video, then assigns appropriate classes and metadata according to the prompt's specifications.

  6. Step 6: Generated Text → Workflow Orchestrator — The TwelveLabs API returns the generated metadata as structured text (often JSON). The orchestrator receives this response and prepares it for integration with Frame.io.

  7. Step 7: Parse & Format, Batch Update → Frame.io API (Write Metadata) — The workflow parses the Pegasus-generated responses and formats them to match Frame.io field requirements, including character limits, data types, and field naming conventions. Updates are batched where possible using Frame.io's batch metadata endpoint, reducing API overhead and improving performance when processing multiple assets. Requests are queued and throttled to respect API rate limits, implementing strategies such as progressive rate limiting, request distribution, and adaptive inter-batch cooldowns. Frame.io V4's API uses a "leaky bucket" algorithm for rate limiting, where limits refresh gradually during their allotted time window, requiring the orchestrator to carefully manage request pacing.

Progress is tracked through Frame.io status fields, with the workflow continuing to process remaining assets even if individual updates fail. Error handling ensures that transient API failures (such as rate limit exceeded responses) trigger retries with exponential backoff, while permanent errors are logged for manual review. This resilient design allows large-scale batch processing to complete successfully despite occasional API hiccups.

The entire process transforms raw video content into richly tagged, searchable assets without requiring manual metadata entry, enabling teams to manage thousands of videos with consistent, AI-generated metadata that aligns with organizational standards.


3.3 - Semantic Search Workflow

Users initiate semantic search by entering a natural language query through Frame.io's modal interface. The workflow queries the TwelveLabs index, searching across all indexed videos within the scope to find segments matching the query. Results are returned with precise timestamps indicating where relevant content appears in each video.

The semantic search workflow diagram above illustrates the complete process:

  1. Step 1: Frame.io Semantic Search Triggered — When a user triggers the semantic search custom action, Frame.io presents a modal interface where users enter their natural language query. For example, queries like "wide aerial drone shot of rocky coastline meeting the ocean" or "people shaking hands in an office" are processed as conversational descriptions rather than exact keyword matches.

  2. Step 2: Webhook + User Enters Query → Workflow Orchestrator — Frame.io sends a webhook to the workflow orchestrator containing both the trigger event details and the user's search query. The orchestrator prepares to execute the search across the TwelveLabs index.

  3. Step 3: Search Request → TwelveLabs Search API — The orchestrator submits the natural language query to the TwelveLabs Search API. TwelveLabs' Marengo embedding model converts the text query into a multimodal vector representation, then searches across all indexed video embeddings to find semantically similar segments. The search operates at the video segment level, identifying specific timestamp ranges where the query's semantic meaning matches the video content.

  4. Step 4: Matching Segments + Timestamps → Workflow Orchestrator — The TwelveLabs Search API returns a ranked list of matching video segments, each with precise start and end timestamps (e.g., "0.00s—6.50s," "6.50s—12.75s"). Each result includes the video_id, segment boundaries, and a relevance score indicating how closely the segment matches the query.

  5. Step 5: Get Asset FPS → Frame.io API — Before creating timeline comments, the orchestrator retrieves each matching video's frames-per-second (FPS) metadata from Frame.io. This metadata is essential for converting TwelveLabs' second-based timestamps into Frame.io's frame-accurate timeline positions. Different videos may have different frame rates (24fps, 30fps, 60fps, etc.), so the orchestrator must query this information to ensure comments appear at the exact correct frames.

  6. Step 6: FPS Metadata → Workflow Orchestrator — Frame.io returns the FPS information for each asset. The orchestrator uses this data to calculate frame-accurate positions for timeline comments.

  7. Step 7: Create Subfolder, Copy Assets, Post Comments → Frame.io API — The orchestrator performs several Frame.io API operations to organize and present search results:

    1. Create subfolder: A dedicated subfolder is created at the trigger location to organize search results. This folder is typically named with a timestamp and search query description (e.g., "TL_Search_wide_aerial_drone_shot_2025-11-07").

    2. Copy assets: For each video containing matches, the workflow copies the asset into this subfolder. This consolidates all relevant footage in one location, making it easy for users to review search results without navigating through the original project hierarchy.

    3. Post comments: Timeline comments are created at the exact timestamps where relevant moments occur. Each comment includes the timestamp range, relevance ranking, and the original search query, allowing users to understand why each segment was matched (e.g., "Content similar to: wide aerial drone shot of rocky coastlineky coastline meeting the ocean | 5.25s—10.50s Rank 9") . Users can click these comments to jump directly to the specific frames that match their query .

  8. Step 8: Return Form ← Frame.io — Once the subfolder is created, assets are copied, and comments are posted, the orchestrator sends a completion notification back to Frame.io. The user receives confirmation that their search is complete, along with a link to the newly created results subfolder.

This workflow allows users to review all search results in one place and click directly to the specific moments that match their query, dramatically reducing the time spent manually searching through footage. Instead of scrubbing through hours of video or relying on incomplete metadata, creative teams can instantly locate relevant shots using natural language descriptions. The subfolder organization ensures that search results remain accessible for future reference, while timestamped comments provide immediate navigation to the exact frames of interest.


3.4 - Search Related Content Workflow

When triggered from a still image, the workflow performs image-to-video search using TwelveLabs' semantic understanding capabilities. The image is sent to the Search API, which returns videos containing contextually similar visual elements based on the meaning and context of the image, not just pixel-level similarity.

The search related content workflow diagram above illustrates the complete process:

  1. Step 1: Frame.io Related Content Triggered — When a user triggers the related content custom action from a still image in Frame.io, a webhook is sent to the workflow orchestrator. This action can be initiated from any image asset within Frame.io, whether it's a reference photo, a production still, or a frame extracted from existing footage.

  2. Step 2: Webhook → Workflow Orchestrator — The orchestrator receives the webhook event containing the source image's asset ID and metadata. This triggers the image-to-video search workflow.

  3. Step 3: Get Image URL → Frame.io API — The orchestrator calls the Frame.io API to retrieve a download URL for the source image. This URL provides temporary access to the image file with proper authentication.

  4. Step 4: Image Download URL → Workflow Orchestrator — Frame.io returns the image download URL, which the orchestrator uses to retrieve the image for analysis. The image is then prepared for submission to TwelveLabs.

  5. Step 5: Image-to-Video Search → TwelveLabs Search API — The orchestrator sends the image to the TwelveLabs Search API. TwelveLabs' Marengo embedding model generates a vector representation of the image, capturing its visual features, composition, objects, setting, and semantic context. This embedding is then compared against all indexed video embeddings to find segments with high visual and semantic similarity. Unlike simple pixel matching or color histogram comparison, this approach understands what's actually in the image—for example, recognizing "a person in an outdoor athletic pose" rather than just matching pixel patterns.

  6. Step 6: Similar Videos + Scores → Workflow Orchestrator — The TwelveLabs Search API returns a ranked list of matching video segments along with similarity scores. Each result includes the video_id, timestamp range, and a confidence score indicating how closely the video segment matches the source image. Results are filtered by similarity threshold to ensure only high-quality matches are included. Organizations can configure this threshold based on their use case—tighter thresholds for precise shot matching, looser thresholds for broader thematic discovery.

  7. Step 7: Filter & Rank, Create Subfolder, Copy Assets, Add Comments → Frame.io API — The orchestrator performs several Frame.io API operations to organize and present related content results:

    1. Filter & Rank: Results are filtered by the configured similarity threshold and ranked by relevance score. Only videos exceeding the minimum similarity threshold are included in the results, ensuring users receive high-quality matches.

    2. Create subfolder: A dedicated subfolder is created at the trigger location to organize related content results. This folder is typically named with descriptive information that includes the source image name and timestamp (e.g., "TL_ImageSearch_pexels-chevanon-317157_2025-11-07").

    3. Copy assets: Each related video asset is copied into the results folder. This consolidates all visually similar footage in one location, making it easy for users to review matches without navigating through the original project hierarchy.

    4. Add comments: Timeline comments are created explaining the similarity match and relevance scores. Each comment includes the timestamp range where the visual similarity occurs, the similarity rank (e.g., "Rank 9," "Rank 8"), and a reference to the source image (e.g., "Content similar to: pexels-chevanon-317157vanon-317157 | 5.25s—10.50s Rank 9") . These annotations help users understand why each video was matched and prioritize which results to review first .

Results are filtered by similarity threshold and organized into a subfolder with descriptive naming that includes the source image name and timestamp. Each related asset is copied to the results folder with comments explaining the similarity match and relevance scores. This workflow enables creative teams to instantly discover visually similar footage, find alternative takes, build thematic compilations, or locate archived content that matches current production needs—all without manually browsing through thousands of assets.


3.5 - Compliance Actions Workflow

The compliance workflow sends the video to Pegasus with a detailed prompt specifying evaluation categories (violence, language, sexual content, substance use, disturbing content, discrimination) and requesting timestamped violations. This structured approach enables automated detection of compliance issues that would otherwise require hours of manual review.

The compliance actions workflow diagram above illustrates the complete process:

  1. Step 1: Frame.io Compliance Check Triggered — When a user triggers the compliance check custom action, Frame.io sends a webhook to the workflow orchestrator. This action can be triggered manually for on-demand compliance review or configured to run automatically as part of pre-publication workflows.

  2. Step 2: Webhook → Workflow Orchestrator — The orchestrator receives the webhook event containing the asset details and prepares to execute the compliance analysis. It identifies which video requires compliance checking and initiates the workflow.

  3. Step 3: Get video_id → Frame.io API (Read Metadata) — The orchestrator queries the Frame.io API to retrieve the TwelveLabs video_id that was stored during the indexing workflow. This identifier is required to reference the indexed video content in the TwelveLabs platform.

  4. Step 4: video_id → Workflow Orchestrator — Frame.io returns the video_id, which the orchestrator uses to construct the compliance analysis request. The orchestrator also loads the compliance prompt from the configuration system, defining specific evaluation criteria and output format.

  5. Step 5: Analyze with Compliance Prompt → TwelveLabs Analyze API — The orchestrator sends an analysis request to the TwelveLabs Analyze API, providing the video_id and a detailed compliance prompt. The prompt specifies evaluation categories such as violence, explicit language, sexual content, substance use, disturbing imagery, and discrimination. It also requests structured output including an overall compliance status (APPROVED/REJECTED/NEEDS_REVIEW) and timestamped violations for each detected issue. Pegasus processes the video content, analyzing visual elements, spoken dialogue, on-screen text, and contextual meaning to identify compliance violations. For example, it can detect violent actions ("soldier firing rocket launcher"), explicit language in audio, or discriminatory imagery, providing timestamps for each violation.

  6. Step 6: Status + Violations + Timestamps → Workflow Orchestrator — The TwelveLabs Analyze API returns the compliance analysis results. The response includes the overall compliance status, a list of individual violations, and precise timestamps indicating where each violation occurs in the video. For example: { "status": "REJECTED", "violations": [{ "timestamp": "0:00", "category": "violence", "description": "Soldier firing rocket launcher" }, { "timestamp": "0:02", "category": "violence", "description": "Explosion in field" }] }.

  7. Step 7: Get Asset FPS → Frame.io API — Before creating timeline comments, the orchestrator retrieves the video's frames-per-second (FPS) metadata from Frame.io. This information is essential for converting second-based timestamps from TwelveLabs into frame-accurate positions in Frame.io's timeline.

  8. Step 8: FPS Metadata → Workflow Orchestrator — Frame.io returns the FPS metadata, which the orchestrator uses to calculate exact frame positions for each violation comment.

  9. Step 9: Update Status Field, Create Timeline Comments → Frame.io API — The orchestrator performs two Frame.io API operations to document the compliance results:

    1. Update Status Field: The overall compliance status (APPROVED/REJECTED/NEEDS_REVIEW) is written to a Frame.io dropdown metadata field. This provides immediate visual feedback on the asset's compliance status, allowing teams to filter and prioritize review workflows.

    2. Create Timeline Comments: For each detected violation, the orchestrator creates a Frame.io timeline comment at the exact frame position where the issue occurs. Each comment includes the timestamp, violation category, and a detailed description (e.g., "Compliance REJECTED at 0:00: Violence: Soldier firing rocket launcher"). This allows reviewers to address each compliance issue individually within Frame.io's existing review interface, alongside creative feedback from the rest of the team.

The workflow then processes the response by extracting the overall compliance status (APPROVED/REJECTED/NEEDS_REVIEW), updating a Frame.io dropdown field with the appropriate status, parsing individual violations with their timestamps, and creating Frame.io timeline comments at exact frame positions for each violation. This approach allows reviewers to address each compliance issue individually within Frame.io's existing review interface, alongside creative feedback from the rest of the team.

By automating the detection process and integrating findings directly into Frame.io's collaborative review workflow, organizations can review content faster, reduce compliance risk, maintain consistent standards across all published materials, and ensure that compliance checks don't create workflow bottlenecks. Compliance officers can focus their attention on addressing flagged violations rather than manually screening hours of footage, while creative teams receive compliance feedback in the same interface where they handle all other review comments.


4 - Bringing It All Together: Transform Your Video Workflows Today

The TwelveLabs × Frame.io integration demonstrates how advanced video AI can seamlessly embed into production workflows that creative teams already trust. By combining Frame.io's industry-leading collaborative review platform with TwelveLabs' Pegasus and Marengo video foundation models, Media & Entertainment organizations gain powerful capabilities that were previously impossible or prohibitively expensive.


Real-World Impact Across M&E Workflows

Throughout this post, we've seen how this integration delivers measurable value across every stage of the content lifecycle:

  • Production & Post-Production Teams can instantly locate specific shots using natural language search ("wide aerial drone shot of rocky coastline"), eliminating hours of manual footage review. Semantic search with timestamped results means editors jump directly to the exact frames they need, accelerating assembly and reducing time-to-delivery.

  • Creative Libraries & Asset Management benefit from automated metadata generation that populates Frame.io fields with rich descriptions, tags, themes, and summaries—all generated by AI that understands narrative context, not just object detection. Image-to-video search enables teams to find visually similar footage for B-roll, alternate takes, or thematic compilations, unlocking the value of archived content that would otherwise remain undiscovered.

  • Compliance & Brand Teams can automate regulatory review processes, with AI-powered analysis flagging potential violations at exact timestamps and writing them directly into Frame.io's comment system. This transforms compliance from a bottleneck into a parallel workflow, where reviewers address issues alongside creative feedback without switching tools.

  • Broadcast Networks & Advertising Agencies managing thousands of assets across multiple campaigns gain consistent, scalable metadata practices and compliance checks that ensure brand guidelines and regulatory standards are met before publication. The integration fits naturally into existing Frame.io review cycles, preserving established workflows while adding AI intelligence.


Key Technical Innovations

The integration leverages Frame.io V4's Custom Actions and flexible metadata system to create a seamless user experience. Webhook-driven workflows orchestrate complex multi-step processes—indexing, metadata generation, semantic search, and compliance checking—without requiring users to leave the Frame.io interface. Configurable prompts, batch processing, rate limiting, and error handling ensure the system operates reliably at enterprise scale.

TwelveLabs' multimodal video understanding goes beyond traditional computer vision by analyzing visual content, spoken dialogue, on-screen text, and contextual meaning simultaneously. This enables semantic search that understands what users are asking for, not just keyword matches, and generates metadata that captures the narrative essence of video content.


Getting Started

Ready to bring multimodal video intelligence into your Frame.io workflows? The TwelveLabs × Frame.io integration is designed for organizations that need intelligent content management at scale:

Prerequisites:

  • Frame.io V4 account with enterprise features

  • Access to Frame.io Custom Actions and custom metadata fields

  • TwelveLabs API access for indexing and analysis

Enablement Path:

Organizations interested in deploying this integration should contact TwelveLabs to discuss implementation, configuration, and training. Our team will work with you to customize workflows, configure compliance prompts, and integrate the system with your existing production infrastructure. Reach out to brice@twelvelabs.io for hands-on support.

Resources:

The future of video content management isn't about replacing human creativity—it's about eliminating the tedious manual work that prevents creative teams from focusing on what they do best. With TwelveLabs and Frame.io working together, your organization can search vast libraries instantly, maintain consistent metadata at scale, ensure compliance automatically, and discover connections between assets that would otherwise remain hidden.