James Le
Date Published
July 4, 2024
Video understanding
Social Media
Influencer Marketing
Creator Economy
Join our newsletter
You’re now subscribed to the Twelve Labs Newsletter! You'll be getting the latest news and updates in video understanding.
Oh no, something went wrong.
Please try again.
This blog post is co-authored with Ronit Orga from Phyllo.


Social media platforms have shifted from being text-heavy to video dominant. 

The reason? Videos grab attention way better. Studies show video posts get up to 10 times more engagement than text-based posts. And it's not just mindless watching – 74% of users take action (like visiting a website or making a purchase) after viewing a brand's video on Instagram.

This video revolution creates a challenge - gathering insights from videos.  Companies  are missing out on valuable insights because video content is exploding on YouTube, TikTok, Snapchat, and Instagram. These platforms are where users are spending their time, sharing experiences, and responding to brands – all through video.

This is where the groundbreaking collaboration between Phyllo and Twelve Labs steps in, bridging the gap and revolutionizing how we derive insights from video content.

Phyllo cuts through the data overload. Forget scraping mountains of content – Phyllo offers a smarter, more cost-effective approach. It empowers you to unlock the power of social data with customizable search and retrieval. This means you can do a targeted fetch, instead of pulling down everything and wading through endless content. Plus, Phyllo integrates seamlessly with 15+ social media platforms. 

Twelve Labs helps decode the language of videos. Our powerful video understanding platform uses multimodal foundation models to analyze and understand videos in detail. It can grasp the nuance of what’s being communicated through a video by processing and understanding the visual, audio, and textual elements inside it. This is critical in a world where video content is increasingly dominating. By making videos understandable to machines, Twelve Labs' technology can unlock hidden insights and create new opportunities for businesses to connect with their audiences.

The Collaboration

The primary objective of this collaboration is to address the current limitations in video content analysis. By combining Phyllo's data access capabilities with Twelve Labs' expertise in video understanding, they aim to create a comprehensive solution that captures valuable insights from video content, which has previously gone unnoticed.

Possible Use Cases

Insights Co-Pilots for Videos

When working with text, we have always had the convenience of using the search option to find specific instances or locate particular occurrences quickly. However, with video, there is the friction of having to manually forward through timelines or, worse, having to play the video at different speeds to find the needed information. Now imagine having to do this for thousands of videos..pfft.

Marketers and content creators will benefit greatly from insights co-pilots that allow them to ask questions about a set of videos and receive detailed answers. These insights can include summaries, reviews, sentiments, and references, providing a deeper understanding of audience reactions and engagement. This technology can streamline the process of extracting valuable information from video content, making it easier for professionals to make informed decisions and optimize their strategies.

Example: A marketing team at a fashion company wants to understand the overall sentiment and key takeaways from customer reviews on a new clothing line featured in YouTube videos. By using the insights co-pilots developed through this collaboration, the team can ask questions and receive detailed summaries, sentiment analysis, and references from the videos.

Product Development

Companies can leverage this technology to understand how products are used in social videos. By analyzing usage patterns, color combinations, and product placements, businesses can gain valuable insights to inform product development and marketing strategies.

Example: A consumer electronics company wants to analyze the gaps in the smartphone market. By leveraging the Phyllo and Twelve Labs collaboration, they can extract data on usage patterns, popular color combinations, and product placements from social media videos. For instance, they might discover that a particular color variant is more popular among influencers.

Converting Long-Form Content into Byte-Sized Segments

Long-form videos can be broken down into bite-sized segments, making it easier for viewers to consume content and for marketers to repurpose and distribute it across different channels. These byte-sized segments are perfect for sharing on platforms like Instagram and TikTok, where shorter content performs better, thus increasing engagement and reach.

Example: A fitness brand has a series of hour-long workout videos on youtube or twitch that they want to repurpose for social media shorts and reels . Using the tools from Phyllo and Twelve Labs, they can automatically segment these long videos into short, engaging clips that highlight key exercises or tips.

Influencer Insights

Identifying which influencers are using specific products in their videos can be a game-changer for brands. This collaboration will provide detailed insights into influencer activities and their impact on product visibility and consumer behavior.

This allows the brand to discover new influencer partnerships and understand the impact of their products in the influencer community, helping to tailor their marketing efforts and collaborations.

Example: A skincare brand wants to identify which influencers are using their products in their daily routines. Through the collaboration, they can analyze social media videos to find influencers who mention or use their products, even if it’s not explicitly tagged.


The collaboration leverages the unique strengths of both Phyllo and Twelve Labs to deliver these innovative solutions:

Phyllo’s Role

Phyllo facilitates access to social data by enabling customized searches and retrieval of relevant video content. This targeted approach is more cost-effective and efficient than pulling all available content.

With Phyllo’s API’s you can fetch content from multiple platforms. Detailing below the different endpoints you can utilize to fetch the content

  • Content fetch from a specific social handle: Retrieve the information of a profile's content or information of a single content item with the supplied content url.some text
  • Content fetch using Keywords/Hashtags/Mentions : Retrieve content items pertaining to a specific keyword or hashtag or keyword that is provided by the user. The returned result included media url as well as information about the content included engagement, creator details etc.
Twelve Labs’ Role

Twelve Labs Video Understanding Platform uses AI to extract information from videos. The platform identifies and interprets movements, actions, objects, individuals, sounds, on-screen text, and spoken words. Built on top of our state-of-the-art multimodal foundation model optimized for videos, the platform enables you to add rich, contextual video understanding to your applications through developer-friendly APIs.

More specifically, the following key capabilities are offered for developers:

  • Deep semantic search: Find the exact moment you need within your videos using natural language queries instead of tags or metadata with the Search API.
  • Zero-shot classification: Use natural language to create your custom taxonomy, facilitating accurate and efficient video classification tailored to your unique use case with the Classify API.
  • Dynamic video-to-text generation: Capture the essence of your videos into concise summaries or custom reports with the Generate API. The platform offers built-in formats to generate the following: titles, topics, summaries, hashtags, chapters, and highlights. Additionally, you can provide a prompt detailing the content and desired output format, such as a police report, to tailor the results to your needs.
  • Powerful multimodal embeddings: Create multimodal embeddings that are contextual vector representations for your videos and texts with the Embed API. You can utilize multimodal embeddings in various downstream tasks, including but not limited to training custom multimodal models for applications such as anomaly detection, diversity sorting, sentiment analysis, and recommendations.


The collaboration between Phyllo and Twelve Labs is set to revolutionize how we derive insights from video content on social media. By making video content searchable and providing detailed insights, this partnership empowers marketers, product developers, and brands with valuable information that was previously inaccessible.

As social media continues to evolve, the need for advanced video content analysis will only grow. The Phyllo and Twelve Labs collaboration is a significant step forward, promising continued innovation in social data retrieval and video analytics. Together, they are unlocking the full potential of video insights, transforming the way we understand and engage with video content in the digital age.

Generation Examples
No items found.
No items found.
Comparison against existing models
No items found.

Related articles

How to make AI Startup worth over $30MㅣTwelve Labs Jae Lee

Meet Jae Lee, the founder and CEO of Twelve Labs, who spearheaded a $30M seed funding round and forged partnerships with industry giants NVIDIA, Samsung, and Intel.

Semantic Content Discovery for a Post-Production World

Explores the benefits of semantic search in post-production, the key technologies powering it, how it integrates with media asset management systems, and where it's headed in the future.

James Le
CineSys partners with Twelve Labs

CineSys partners with Twelve Labs to transform content search with AI-enhanced CineViewer

Twelve Labs Partners With Blackbird to Craft the Future of Narrative Excellence Through Video

Partnership makes it faster and easier than ever before to find specific moments in video content that can be used to amplify the human touch of storytelling