Partnerships
Partnerships
Partnerships
Building Semantic Video Recommendations with TwelveLabs and LanceDB


James Le
James Le
James Le
By combining TwelveLabs, LanceDB, and Geneva you can build a recommendation system that understands video content directly. TwelveLabs provides embeddings and summaries that capture meaning beyond keywords. LanceDB is simple to use, runs as an embedded database, and stores multimodal data like video, images, text, and vectors. Geneva with Ray lets you scale from local development to distributed clusters with the same code.
By combining TwelveLabs, LanceDB, and Geneva you can build a recommendation system that understands video content directly. TwelveLabs provides embeddings and summaries that capture meaning beyond keywords. LanceDB is simple to use, runs as an embedded database, and stores multimodal data like video, images, text, and vectors. Geneva with Ray lets you scale from local development to distributed clusters with the same code.


Join our newsletter
Receive the latest advancements, tutorials, and industry insights in video understanding
Search, analyze, and explore your videos with AI.
Sep 1, 2025
Sep 1, 2025
Sep 1, 2025
5 Minutes
5 Minutes
5 Minutes
Copy link to article
Copy link to article
Copy link to article
Note: The examples in this post use trimmed sample code for readability. If you want the complete code in a runnable notebook, you can find it here.
Most recommendation systems rely on metadata like titles, tags, or transcripts. That approach works, but it misses what is actually happening inside a video. What if your system could understand the visual and audio content itself?
In this post we will show how to build a semantic recommendation engine using TwelveLabs, LanceDB, and Geneva, LanceDB’s feature engineering package. In this scenario, TwelveLabs provides powerful multimodal embeddings that represent the meaning of a video. LanceDB stores those embeddings with metadata and gives you fast vector search through a simple Python API. Geneva builds on LanceDB to scale the pipeline across clusters using Ray, which means the exact same code can run on your laptop or on hundreds of machines.
Why TwelveLabs, LanceDB and Geneva?
TwelveLabs lets you embed video in a way that captures narrative flow, mood, and action. Queries like “a surfer riding a wave at sunset” can return matches even if no one tagged the clip that way.
LanceDB is a vector database built on Apache Arrow. It has three key strengths:
A simple Python API that feels natural for developers.
An embedded database that runs locally with no external services required.
Native multimodal support, so it can store video, images, text, and vectors with equal ease.
Geneva is built on LanceDB and adds distributed data processing. With Ray underneath, it scales embedding generation and queries across many workers.
This combination covers the full pipeline: ingest, embed, store, search, and scale.
Loading and Materializing Videos
We start with a sample dataset from HuggingFace called FineVideo. The loader creates a RecordBatch with raw video bytes plus captions, titles, IDs, and metadata.
def load_videos(): dataset = load_dataset("HuggingFaceFV/finevideo", split="train", streaming=True) batch = [] processed = 0 for row in dataset: if processed >= 10: break video_bytes = row['mp4'] json_metadata = row['json'] batch.append({ "video": video_bytes, "caption": json_metadata.get("youtube_title", "No description"), "youtube_title": json_metadata.get("youtube_title", ""), "video_id": f"video_{processed}", "duration": json_metadata.get("duration_seconds", 0), "resolution": json_metadata.get("resolution", "") }) processed += 1 return pa.RecordBatch.from_pylist(batch)
This creates a table that holds both raw video bytes and human-readable metadata. The benefit is that you do not need to separate video data from structured data — LanceDB handles both seamlessly.
With Geneva, we materialize this dataset into a table backed by LanceDB.
db = geneva.connect("/content/quickstart/") tbl = db.create_table("videos", load_videos(), mode="overwrite")
At this point we have an embedded database of videos. Even before embeddings are added, this structure makes it easy to run queries, transformations, or visualizations.
Embedding Videos with TwelveLabs
The next step is generating embeddings with TwelveLabs’ Marengo 2.7 model. It outputs 1024-dimensional vectors representing video meaning. We use both “clip” and “video” scopes to get whole-video embeddings.
task = client.embed.tasks.create( model_name="Marengo-retrieval-2.7", video_file=video_file, video_embedding_scope=["clip", "video"] ) status = client.embed.tasks.wait_for_done(task.id) result = client.embed.tasks.retrieve(task.id) video_segments = [seg for seg in result.video_embedding.segments if seg.embedding_scope == "video"] embedding_array = np.array(video_segments[0].float_, dtype=np.float32)
Here we ask for both clip and video scopes, ensuring we capture the whole context of the video. The vector captures patterns in visuals and sound, so similar activities cluster together even if the metadata is sparse.
With Geneva, embeddings become another column in the table:
tbl.add_columns({"embedding": GenVideoEmbeddings( twelve_labs_api_key=os.environ['TWELVE_LABS_API_KEY'] )}) tbl.backfill("embedding", concurrency=1)
The backfill call processes all rows and computes embeddings. In development we set concurrency to 1, but in production Geneva can run with high concurrency and let Ray parallelize across workers. That is how you scale from a dozen videos to millions.
Searching with LanceDB
Once embeddings are stored, LanceDB gives you a clean API for vector search. A query can be plain text, which TwelveLabs embeds, then compared with video vectors in LanceDB.
query = "educational tutorial" query_result = client.embed.create( model_name="Marengo-retrieval-2.7", text=query ) qvec = np.array(query_result.text_embedding.segments[0].float_) lance_db = lancedb.connect("/content/quickstart/") lance_tbl = lance_db.open_table("videos") results = (lance_tbl .search(qvec) .metric("cosine") .limit(3) .to_pandas())
Because LanceDB is Arrow-native, results come back as a pandas DataFrame. This makes it simple to integrate with analysis or a web application.
Summarizing with Pegasus
Sometimes a vector match is not enough. TwelveLabs also offers Pegasus, which generates summaries of videos. You can attach these summaries as another column in LanceDB, making search results more understandable.
index = client.indexes.create( index_name=f"lancedb_demo_{int(time.time())}", models=[{"model_name": "pegasus1.2", "model_options": ["visual", "audio"]}] )
This step improves the user experience by letting you display a short, human-readable summary along with each recommendation.
Scaling with Geneva and Ray
Without Geneva you would need to manage ingestion and embedding jobs manually. That might be fine for a dozen videos, but it quickly breaks down at scale. Geneva brings declarative pipelines and Ray executes them in parallel.
Concern | LanceDB only | With Geneva and Ray |
Ingestion | Manual loaders | Declarative pipelines |
Embeddings | Sequential | Parallel across many workers |
Storage | Local tables | Distributed LanceDB tables |
ML and analytics | Custom scripts | Built-in distributed UDFs |
This means you can prototype locally and move to production on a cluster without rewriting the workflow.
Conclusion
By combining TwelveLabs, LanceDB, and Geneva you can build a recommendation system that understands video content directly.
TwelveLabs provides embeddings and summaries that capture meaning beyond keywords.
LanceDB is simple to use, runs as an embedded database, and stores multimodal data like video, images, text, and vectors.
Geneva with Ray lets you scale from local development to distributed clusters with the same code.
This stack is a practical foundation for media platforms, education apps, or analytics tools that need semantic video recommendations at scale.
Try it out
TwelveLabs Playground – Sign up for an API key and start generating video embeddings right away: https://playground.twelvelabs.io
LanceDB Quickstart – Install LanceDB locally and try your first vector search with Python: https://lancedb.com/docs/quickstart
Geneva Documentation – Learn how to scale pipelines and run distributed embedding jobs with Ray: https://lancedb.com/docs/geneva
Complete Notebook for this Tutorial – Explore the full runnable code with all the details: https://colab.research.google.com/drive/1jZiMT1QFYGvPgrps2Vpge9CtlRKFY1L0?usp=sharing#scrollTo=046o2pt62413
Note: The examples in this post use trimmed sample code for readability. If you want the complete code in a runnable notebook, you can find it here.
Most recommendation systems rely on metadata like titles, tags, or transcripts. That approach works, but it misses what is actually happening inside a video. What if your system could understand the visual and audio content itself?
In this post we will show how to build a semantic recommendation engine using TwelveLabs, LanceDB, and Geneva, LanceDB’s feature engineering package. In this scenario, TwelveLabs provides powerful multimodal embeddings that represent the meaning of a video. LanceDB stores those embeddings with metadata and gives you fast vector search through a simple Python API. Geneva builds on LanceDB to scale the pipeline across clusters using Ray, which means the exact same code can run on your laptop or on hundreds of machines.
Why TwelveLabs, LanceDB and Geneva?
TwelveLabs lets you embed video in a way that captures narrative flow, mood, and action. Queries like “a surfer riding a wave at sunset” can return matches even if no one tagged the clip that way.
LanceDB is a vector database built on Apache Arrow. It has three key strengths:
A simple Python API that feels natural for developers.
An embedded database that runs locally with no external services required.
Native multimodal support, so it can store video, images, text, and vectors with equal ease.
Geneva is built on LanceDB and adds distributed data processing. With Ray underneath, it scales embedding generation and queries across many workers.
This combination covers the full pipeline: ingest, embed, store, search, and scale.
Loading and Materializing Videos
We start with a sample dataset from HuggingFace called FineVideo. The loader creates a RecordBatch with raw video bytes plus captions, titles, IDs, and metadata.
def load_videos(): dataset = load_dataset("HuggingFaceFV/finevideo", split="train", streaming=True) batch = [] processed = 0 for row in dataset: if processed >= 10: break video_bytes = row['mp4'] json_metadata = row['json'] batch.append({ "video": video_bytes, "caption": json_metadata.get("youtube_title", "No description"), "youtube_title": json_metadata.get("youtube_title", ""), "video_id": f"video_{processed}", "duration": json_metadata.get("duration_seconds", 0), "resolution": json_metadata.get("resolution", "") }) processed += 1 return pa.RecordBatch.from_pylist(batch)
This creates a table that holds both raw video bytes and human-readable metadata. The benefit is that you do not need to separate video data from structured data — LanceDB handles both seamlessly.
With Geneva, we materialize this dataset into a table backed by LanceDB.
db = geneva.connect("/content/quickstart/") tbl = db.create_table("videos", load_videos(), mode="overwrite")
At this point we have an embedded database of videos. Even before embeddings are added, this structure makes it easy to run queries, transformations, or visualizations.
Embedding Videos with TwelveLabs
The next step is generating embeddings with TwelveLabs’ Marengo 2.7 model. It outputs 1024-dimensional vectors representing video meaning. We use both “clip” and “video” scopes to get whole-video embeddings.
task = client.embed.tasks.create( model_name="Marengo-retrieval-2.7", video_file=video_file, video_embedding_scope=["clip", "video"] ) status = client.embed.tasks.wait_for_done(task.id) result = client.embed.tasks.retrieve(task.id) video_segments = [seg for seg in result.video_embedding.segments if seg.embedding_scope == "video"] embedding_array = np.array(video_segments[0].float_, dtype=np.float32)
Here we ask for both clip and video scopes, ensuring we capture the whole context of the video. The vector captures patterns in visuals and sound, so similar activities cluster together even if the metadata is sparse.
With Geneva, embeddings become another column in the table:
tbl.add_columns({"embedding": GenVideoEmbeddings( twelve_labs_api_key=os.environ['TWELVE_LABS_API_KEY'] )}) tbl.backfill("embedding", concurrency=1)
The backfill call processes all rows and computes embeddings. In development we set concurrency to 1, but in production Geneva can run with high concurrency and let Ray parallelize across workers. That is how you scale from a dozen videos to millions.
Searching with LanceDB
Once embeddings are stored, LanceDB gives you a clean API for vector search. A query can be plain text, which TwelveLabs embeds, then compared with video vectors in LanceDB.
query = "educational tutorial" query_result = client.embed.create( model_name="Marengo-retrieval-2.7", text=query ) qvec = np.array(query_result.text_embedding.segments[0].float_) lance_db = lancedb.connect("/content/quickstart/") lance_tbl = lance_db.open_table("videos") results = (lance_tbl .search(qvec) .metric("cosine") .limit(3) .to_pandas())
Because LanceDB is Arrow-native, results come back as a pandas DataFrame. This makes it simple to integrate with analysis or a web application.
Summarizing with Pegasus
Sometimes a vector match is not enough. TwelveLabs also offers Pegasus, which generates summaries of videos. You can attach these summaries as another column in LanceDB, making search results more understandable.
index = client.indexes.create( index_name=f"lancedb_demo_{int(time.time())}", models=[{"model_name": "pegasus1.2", "model_options": ["visual", "audio"]}] )
This step improves the user experience by letting you display a short, human-readable summary along with each recommendation.
Scaling with Geneva and Ray
Without Geneva you would need to manage ingestion and embedding jobs manually. That might be fine for a dozen videos, but it quickly breaks down at scale. Geneva brings declarative pipelines and Ray executes them in parallel.
Concern | LanceDB only | With Geneva and Ray |
Ingestion | Manual loaders | Declarative pipelines |
Embeddings | Sequential | Parallel across many workers |
Storage | Local tables | Distributed LanceDB tables |
ML and analytics | Custom scripts | Built-in distributed UDFs |
This means you can prototype locally and move to production on a cluster without rewriting the workflow.
Conclusion
By combining TwelveLabs, LanceDB, and Geneva you can build a recommendation system that understands video content directly.
TwelveLabs provides embeddings and summaries that capture meaning beyond keywords.
LanceDB is simple to use, runs as an embedded database, and stores multimodal data like video, images, text, and vectors.
Geneva with Ray lets you scale from local development to distributed clusters with the same code.
This stack is a practical foundation for media platforms, education apps, or analytics tools that need semantic video recommendations at scale.
Try it out
TwelveLabs Playground – Sign up for an API key and start generating video embeddings right away: https://playground.twelvelabs.io
LanceDB Quickstart – Install LanceDB locally and try your first vector search with Python: https://lancedb.com/docs/quickstart
Geneva Documentation – Learn how to scale pipelines and run distributed embedding jobs with Ray: https://lancedb.com/docs/geneva
Complete Notebook for this Tutorial – Explore the full runnable code with all the details: https://colab.research.google.com/drive/1jZiMT1QFYGvPgrps2Vpge9CtlRKFY1L0?usp=sharing#scrollTo=046o2pt62413
Note: The examples in this post use trimmed sample code for readability. If you want the complete code in a runnable notebook, you can find it here.
Most recommendation systems rely on metadata like titles, tags, or transcripts. That approach works, but it misses what is actually happening inside a video. What if your system could understand the visual and audio content itself?
In this post we will show how to build a semantic recommendation engine using TwelveLabs, LanceDB, and Geneva, LanceDB’s feature engineering package. In this scenario, TwelveLabs provides powerful multimodal embeddings that represent the meaning of a video. LanceDB stores those embeddings with metadata and gives you fast vector search through a simple Python API. Geneva builds on LanceDB to scale the pipeline across clusters using Ray, which means the exact same code can run on your laptop or on hundreds of machines.
Why TwelveLabs, LanceDB and Geneva?
TwelveLabs lets you embed video in a way that captures narrative flow, mood, and action. Queries like “a surfer riding a wave at sunset” can return matches even if no one tagged the clip that way.
LanceDB is a vector database built on Apache Arrow. It has three key strengths:
A simple Python API that feels natural for developers.
An embedded database that runs locally with no external services required.
Native multimodal support, so it can store video, images, text, and vectors with equal ease.
Geneva is built on LanceDB and adds distributed data processing. With Ray underneath, it scales embedding generation and queries across many workers.
This combination covers the full pipeline: ingest, embed, store, search, and scale.
Loading and Materializing Videos
We start with a sample dataset from HuggingFace called FineVideo. The loader creates a RecordBatch with raw video bytes plus captions, titles, IDs, and metadata.
def load_videos(): dataset = load_dataset("HuggingFaceFV/finevideo", split="train", streaming=True) batch = [] processed = 0 for row in dataset: if processed >= 10: break video_bytes = row['mp4'] json_metadata = row['json'] batch.append({ "video": video_bytes, "caption": json_metadata.get("youtube_title", "No description"), "youtube_title": json_metadata.get("youtube_title", ""), "video_id": f"video_{processed}", "duration": json_metadata.get("duration_seconds", 0), "resolution": json_metadata.get("resolution", "") }) processed += 1 return pa.RecordBatch.from_pylist(batch)
This creates a table that holds both raw video bytes and human-readable metadata. The benefit is that you do not need to separate video data from structured data — LanceDB handles both seamlessly.
With Geneva, we materialize this dataset into a table backed by LanceDB.
db = geneva.connect("/content/quickstart/") tbl = db.create_table("videos", load_videos(), mode="overwrite")
At this point we have an embedded database of videos. Even before embeddings are added, this structure makes it easy to run queries, transformations, or visualizations.
Embedding Videos with TwelveLabs
The next step is generating embeddings with TwelveLabs’ Marengo 2.7 model. It outputs 1024-dimensional vectors representing video meaning. We use both “clip” and “video” scopes to get whole-video embeddings.
task = client.embed.tasks.create( model_name="Marengo-retrieval-2.7", video_file=video_file, video_embedding_scope=["clip", "video"] ) status = client.embed.tasks.wait_for_done(task.id) result = client.embed.tasks.retrieve(task.id) video_segments = [seg for seg in result.video_embedding.segments if seg.embedding_scope == "video"] embedding_array = np.array(video_segments[0].float_, dtype=np.float32)
Here we ask for both clip and video scopes, ensuring we capture the whole context of the video. The vector captures patterns in visuals and sound, so similar activities cluster together even if the metadata is sparse.
With Geneva, embeddings become another column in the table:
tbl.add_columns({"embedding": GenVideoEmbeddings( twelve_labs_api_key=os.environ['TWELVE_LABS_API_KEY'] )}) tbl.backfill("embedding", concurrency=1)
The backfill call processes all rows and computes embeddings. In development we set concurrency to 1, but in production Geneva can run with high concurrency and let Ray parallelize across workers. That is how you scale from a dozen videos to millions.
Searching with LanceDB
Once embeddings are stored, LanceDB gives you a clean API for vector search. A query can be plain text, which TwelveLabs embeds, then compared with video vectors in LanceDB.
query = "educational tutorial" query_result = client.embed.create( model_name="Marengo-retrieval-2.7", text=query ) qvec = np.array(query_result.text_embedding.segments[0].float_) lance_db = lancedb.connect("/content/quickstart/") lance_tbl = lance_db.open_table("videos") results = (lance_tbl .search(qvec) .metric("cosine") .limit(3) .to_pandas())
Because LanceDB is Arrow-native, results come back as a pandas DataFrame. This makes it simple to integrate with analysis or a web application.
Summarizing with Pegasus
Sometimes a vector match is not enough. TwelveLabs also offers Pegasus, which generates summaries of videos. You can attach these summaries as another column in LanceDB, making search results more understandable.
index = client.indexes.create( index_name=f"lancedb_demo_{int(time.time())}", models=[{"model_name": "pegasus1.2", "model_options": ["visual", "audio"]}] )
This step improves the user experience by letting you display a short, human-readable summary along with each recommendation.
Scaling with Geneva and Ray
Without Geneva you would need to manage ingestion and embedding jobs manually. That might be fine for a dozen videos, but it quickly breaks down at scale. Geneva brings declarative pipelines and Ray executes them in parallel.
Concern | LanceDB only | With Geneva and Ray |
Ingestion | Manual loaders | Declarative pipelines |
Embeddings | Sequential | Parallel across many workers |
Storage | Local tables | Distributed LanceDB tables |
ML and analytics | Custom scripts | Built-in distributed UDFs |
This means you can prototype locally and move to production on a cluster without rewriting the workflow.
Conclusion
By combining TwelveLabs, LanceDB, and Geneva you can build a recommendation system that understands video content directly.
TwelveLabs provides embeddings and summaries that capture meaning beyond keywords.
LanceDB is simple to use, runs as an embedded database, and stores multimodal data like video, images, text, and vectors.
Geneva with Ray lets you scale from local development to distributed clusters with the same code.
This stack is a practical foundation for media platforms, education apps, or analytics tools that need semantic video recommendations at scale.
Try it out
TwelveLabs Playground – Sign up for an API key and start generating video embeddings right away: https://playground.twelvelabs.io
LanceDB Quickstart – Install LanceDB locally and try your first vector search with Python: https://lancedb.com/docs/quickstart
Geneva Documentation – Learn how to scale pipelines and run distributed embedding jobs with Ray: https://lancedb.com/docs/geneva
Complete Notebook for this Tutorial – Explore the full runnable code with all the details: https://colab.research.google.com/drive/1jZiMT1QFYGvPgrps2Vpge9CtlRKFY1L0?usp=sharing#scrollTo=046o2pt62413
Related articles
© 2021
-
2025
TwelveLabs, Inc. All Rights Reserved
© 2021
-
2025
TwelveLabs, Inc. All Rights Reserved
© 2021
-
2025
TwelveLabs, Inc. All Rights Reserved