Company

Company

Company

Marengo 3.0: Video Intelligence Turns Video Into Strategic Assets

Travis Couture

Travis Couture

Travis Couture

Join our newsletter

Receive the latest advancements, tutorials, and industry insights in video understanding

Search, analyze, and explore your videos with AI.

Nov 30, 2025

Nov 30, 2025

Nov 30, 2025

5 minutes

5 minutes

5 minutes

Copy link to article

Copy link to article

Copy link to article

Video is 90% of the world's data, but most of it is effectively invisible. At TwelveLabs, we’re focused on building the best video intelligence in the world. Not video as a bolt-on feature nor video from forked image models, but explicit, purpose-built, production-grade video intelligence that actually works for the most consequential video workloads.

Today, we're launching Marengo 3.0, which sets a new standard as the highest-performing video foundation model that's also production-ready for real-world deployment.

The Performance Gap Is Massive

For starters, most embedding models fail to handle meaningful video workloads. We tested Marengo 3.0 against embedding models that can actually handle video, Amazon Nova and Google Vertex, across comprehensive benchmarks spanning video, image, text, and audio retrieval. The results aren't close:

Marengo 3.0 handily wins across categories with an overall composite performance: 78.5% (vs. 61.8% Nova, 50.2% Vertex) and clear leads across video retrieval, image retrieval, and audio retrieval.

It’s easy for model teams to cherry-pick data to tell you they’re the best. But here's what matters for production: Marengo 3.0 delivers superior performance with unparalleled latency while competitors fail or crawl.

  • Amazon Nova: 10-15× slower on long videos, fails completely on 4K content

  • Google Vertex: Fails on videos over 60 seconds, no audio support

This isn't just better benchmarks for ML Researchers to argue about; it's the difference between a model that works in production and one that doesn't.

Smarter. Faster. Leaner.

Industry-First Capabilities:

  • Composed multimodal search: Combine image + text in a single query

  • Entity search: Define any person or object and search for them performing specific actions

  • 4-hour video support: 2× increase over Marengo 2.7, while competitors fail on long content

  • 50% storage reduction: 512-dimension embeddings vs. 1024-dimension in Marengo 2.7 and 3072-dimension in Nova

  • 2× faster indexing: Get your video libraries searchable in half the time

Built for Real-World Video

Unlike competitors who fork broader model variants to shoehorn in video, Marengo 3.0 treats video as a living, dynamic system. It understands dialogue, hears audio, tracks motion through time, and understands context across hours of content.

The proof is in production performance:

  • Sports, Media & Entertainment: One customer reduced content preparation from days to minutes

  • Security and Government: Sensitive and critical video data understood with precision and speed

  • Advertising: Precise brand safety and contextual ad placement without manual review

Get Started

Marengo 3.0 is available now. Whether you're processing millions of videos or building your first video-powered application, it delivers the intelligence and performance to turn video from a storage burden into a strategic asset. Get started with either:

  • AWS Bedrock: Enterprise-grade integration with AWS infrastructure

  • TwelveLabs SaaS: Developer-friendly APIs with Python and Node.js SDKs

If you'd like to learn more, see our technical blog for a deep dive on all our research.

Video is 90% of the world's data, but most of it is effectively invisible. At TwelveLabs, we’re focused on building the best video intelligence in the world. Not video as a bolt-on feature nor video from forked image models, but explicit, purpose-built, production-grade video intelligence that actually works for the most consequential video workloads.

Today, we're launching Marengo 3.0, which sets a new standard as the highest-performing video foundation model that's also production-ready for real-world deployment.

The Performance Gap Is Massive

For starters, most embedding models fail to handle meaningful video workloads. We tested Marengo 3.0 against embedding models that can actually handle video, Amazon Nova and Google Vertex, across comprehensive benchmarks spanning video, image, text, and audio retrieval. The results aren't close:

Marengo 3.0 handily wins across categories with an overall composite performance: 78.5% (vs. 61.8% Nova, 50.2% Vertex) and clear leads across video retrieval, image retrieval, and audio retrieval.

It’s easy for model teams to cherry-pick data to tell you they’re the best. But here's what matters for production: Marengo 3.0 delivers superior performance with unparalleled latency while competitors fail or crawl.

  • Amazon Nova: 10-15× slower on long videos, fails completely on 4K content

  • Google Vertex: Fails on videos over 60 seconds, no audio support

This isn't just better benchmarks for ML Researchers to argue about; it's the difference between a model that works in production and one that doesn't.

Smarter. Faster. Leaner.

Industry-First Capabilities:

  • Composed multimodal search: Combine image + text in a single query

  • Entity search: Define any person or object and search for them performing specific actions

  • 4-hour video support: 2× increase over Marengo 2.7, while competitors fail on long content

  • 50% storage reduction: 512-dimension embeddings vs. 1024-dimension in Marengo 2.7 and 3072-dimension in Nova

  • 2× faster indexing: Get your video libraries searchable in half the time

Built for Real-World Video

Unlike competitors who fork broader model variants to shoehorn in video, Marengo 3.0 treats video as a living, dynamic system. It understands dialogue, hears audio, tracks motion through time, and understands context across hours of content.

The proof is in production performance:

  • Sports, Media & Entertainment: One customer reduced content preparation from days to minutes

  • Security and Government: Sensitive and critical video data understood with precision and speed

  • Advertising: Precise brand safety and contextual ad placement without manual review

Get Started

Marengo 3.0 is available now. Whether you're processing millions of videos or building your first video-powered application, it delivers the intelligence and performance to turn video from a storage burden into a strategic asset. Get started with either:

  • AWS Bedrock: Enterprise-grade integration with AWS infrastructure

  • TwelveLabs SaaS: Developer-friendly APIs with Python and Node.js SDKs

If you'd like to learn more, see our technical blog for a deep dive on all our research.

Video is 90% of the world's data, but most of it is effectively invisible. At TwelveLabs, we’re focused on building the best video intelligence in the world. Not video as a bolt-on feature nor video from forked image models, but explicit, purpose-built, production-grade video intelligence that actually works for the most consequential video workloads.

Today, we're launching Marengo 3.0, which sets a new standard as the highest-performing video foundation model that's also production-ready for real-world deployment.

The Performance Gap Is Massive

For starters, most embedding models fail to handle meaningful video workloads. We tested Marengo 3.0 against embedding models that can actually handle video, Amazon Nova and Google Vertex, across comprehensive benchmarks spanning video, image, text, and audio retrieval. The results aren't close:

Marengo 3.0 handily wins across categories with an overall composite performance: 78.5% (vs. 61.8% Nova, 50.2% Vertex) and clear leads across video retrieval, image retrieval, and audio retrieval.

It’s easy for model teams to cherry-pick data to tell you they’re the best. But here's what matters for production: Marengo 3.0 delivers superior performance with unparalleled latency while competitors fail or crawl.

  • Amazon Nova: 10-15× slower on long videos, fails completely on 4K content

  • Google Vertex: Fails on videos over 60 seconds, no audio support

This isn't just better benchmarks for ML Researchers to argue about; it's the difference between a model that works in production and one that doesn't.

Smarter. Faster. Leaner.

Industry-First Capabilities:

  • Composed multimodal search: Combine image + text in a single query

  • Entity search: Define any person or object and search for them performing specific actions

  • 4-hour video support: 2× increase over Marengo 2.7, while competitors fail on long content

  • 50% storage reduction: 512-dimension embeddings vs. 1024-dimension in Marengo 2.7 and 3072-dimension in Nova

  • 2× faster indexing: Get your video libraries searchable in half the time

Built for Real-World Video

Unlike competitors who fork broader model variants to shoehorn in video, Marengo 3.0 treats video as a living, dynamic system. It understands dialogue, hears audio, tracks motion through time, and understands context across hours of content.

The proof is in production performance:

  • Sports, Media & Entertainment: One customer reduced content preparation from days to minutes

  • Security and Government: Sensitive and critical video data understood with precision and speed

  • Advertising: Precise brand safety and contextual ad placement without manual review

Get Started

Marengo 3.0 is available now. Whether you're processing millions of videos or building your first video-powered application, it delivers the intelligence and performance to turn video from a storage burden into a strategic asset. Get started with either:

  • AWS Bedrock: Enterprise-grade integration with AWS infrastructure

  • TwelveLabs SaaS: Developer-friendly APIs with Python and Node.js SDKs

If you'd like to learn more, see our technical blog for a deep dive on all our research.