James Le
Date Published
June 13, 2024
Media and Entertainment
Multimodal AI
Join our newsletter
You’re now subscribed to the Twelve Labs Newsletter! You'll be getting the latest news and updates in video understanding.
Oh no, something went wrong.
Please try again.

Last weekend, Twelve Labs hosted an exhilarating hackathon at a stunning beach house in Los Angeles, in partnership with, AWS, Fireworks AI, and This event brought together over 50 participants, forming 15 teams, for an intense 24-hour coding marathon. The focus was on showcasing how AI is transforming the media and entertainment industry, with participants working on innovative projects that push the boundaries of video understanding.

High-Level Details

The hackathon was a vibrant gathering of AI engineers, data scientists, and entertainment professionals, all eager to explore the cutting-edge applications of multimodal AI. By enabling computers to interpret video with the same nuanced context as humans, multimodal AI is transforming the industry through video understanding, which involves analyzing visual, audio, textual, and other data types together.

The Los Angeles AI community, uniquely positioned at the intersection of Silicon Valley tech innovation and Hollywood creativity, played a crucial role in this event. With a vibrant ecosystem of AI researchers, engineers, artists, and entrepreneurs collaborating closely, LA is poised to drive the future of AI-powered media and entertainment.

Hackathon Challenges

Participants tackled four exciting challenges designed to push the limits of AI in media and entertainment:

1 - Video Editing with Johnny Harris: Teams developed an AI-powered video editing tool capable of analyzing Johnny Harris' Switzerland bunker footage and script. The goal was to automate the process of finding relevant clips and creating montages based on the script, streamlining the video editing workflow.

2 - Highlight Reel Generation with Drew Binsky: This challenge involved creating an AI tool to analyze Drew Binsky's travel footage and automatically generate engaging highlight reels. The tool aimed to capture the best moments from his adventures around the world, making it easier to share captivating travel stories.

3 - Sports Press Conference Summarization and Highlight Generation: Participants developed an AI-powered tool to automatically generate concise summaries and engaging highlight reels from sports press conference videos. This tool helps fans and media quickly grasp the key takeaways and most memorable moments without watching the entire press conference.

4 - AWS-Powered Video Q&A Chatbot with RAG: Teams worked on developing an AI-powered chatbot that can answer questions about a library of movie and TV show trailers using the Retrieval-Augmented Generation (RAG) approach. The chatbot aimed to provide an engaging and informative Q&A experience for users interested in learning more about upcoming releases.

These challenges not only tested the participants' technical skills but also their creativity and ability to innovate in the rapidly evolving field of AI in media and entertainment.

Incredible Projects Built

The hackathon was a showcase of creativity and technical prowess, with participants developing a range of innovative projects. Here are the highlights:

Winning Projects
  1. πŸ† ThirteenLabs Smart AI Editor (1st Place): This advanced video processing application that utilizes AI to edit videos based on user-defined prompts. This app can create highlight reels from sports press conferences or YouTube videos, generate video descriptions, and provide video dubbing in different languages. It leverages various AI models to deliver high-quality video editing and transcription services. πŸ‘‰ Video Demo
  2. πŸ† AISR: AI Sports Recap (2nd Place): AI Sports Recap is a Streamlit-based application designed to generate video highlights and textual summaries of sports press conferences given a YouTube video link. The app leverages cutting-edge technologies including OpenAI's GPT-4o, Pegasus-1 by TwelveLabs, and Docker for seamless integration and deployment. πŸ‘‰ Video Demo
  3. πŸ† AI-Assistant Editor (3rd Place): This smart AI editor assists in the video editing process by automating repetitive tasks and providing intelligent suggestions for scene assembly. Content creators can spend less time on organizing footage or creating rough cuts and focus on storytelling. πŸ‘‰ Video Demo
  4. πŸ† (3rd Place): Inspired by Twelve Labs’ Jockey, is a GPT-4o based tool that has elaborate LLM prompts and an architecture similar to an instructor-planner. It can extract relevant portions from the whole press conference based on the subjectivity or objectivity of the prompt, allow feedback on generation and share to social media platforms. πŸ‘‰ Video Demo
Other Notable Projects
  1. πŸŽ₯ Cactus: Leveraging SOTA multimodal foundation models from Twelve Labs, this cutting-edge content generation platform transforms long-form YouTube videos into captivating short-form reels with minimal effort from the creator. It analyzes video content, identifies the most engaging moments, and compiles them into optimized highlight reels tailored for various social media platforms.
  2. πŸ“½οΈ SportRecap: A tool that analyzes press conference videos and text prompts to detect relevant content. It outputs the timestamps for the start and end of the video sections, cuts the video, and generates highlights. πŸ‘‰ Video Demo
  3. 🎬 Eddie: An AI Assistant Editor that speeds up the process of digging through dailies and simplifies scene assembly through auto-segmentation, contextual matching, and compilation. It helps editors quickly find and organize the best takes.
  1. 🌐 Infinite Jest: An advanced AI-powered chatbot designed to enhance user interaction with a library of movie and TV show trailers. It leverages Twelve Labs Embed API, vector database, and LLMs to construct a multimodal RAG workflow. πŸ‘‰ Video Demo
  2. πŸš€ Trailer-GPT: Similar to Infinite Jest, Trailer-GPT is an AI-powered chatbot that answers questions about a library of movie and TV show trailers, offering an engaging and informative user experience. πŸ‘‰ Video Demo
  3. ⚽ AlmazingClips: This tool converts press conference videos into engaging articles, making it easier for journalists to produce written content based on video footage. πŸ‘‰ Video Demo
  4. 🐱 Hello Garfield: A VR/MR immersive movie theater app featuring a chatbot concierge (like Garfield) that offers personalized movie recommendations. The concierge suggests themed snacks, recipes, and merchandise to enhance the viewing experience. AR filters allow users to "try on" costumes and decorate their virtual spaces. A shared virtual theater fosters community interaction. πŸ‘‰ Video Demo

These projects not only demonstrated the participants' technical skills but also their ability to innovate and create practical solutions for the media and entertainment industry.

Thanks to Judges and Sponsors

We extend our heartfelt gratitude to the esteemed judges who dedicated their time and expertise to evaluate the incredible projects developed during the hackathon. Their insights and feedback were invaluable in recognizing the most innovative solutions. Our panel of judges included:

  • Greg Young: Head of Production and Post Technology at Prime Video & Amazon Studios
  • Vivek Gangasani: AI/ML Solutions Architect working with Generative AI startups on AWS
  • Pranav Murthy: Senior Gen AI/ML Specialist Solutions Architect at AWS, WW Tech Lead SageMaker Studio
  • Brad Boim: Senior Director, Asset Management and Post Production at NFL Media
  • Eric Peters: Director, Post Production and Media Administration at NFL Media
  • Simran Butalia: Ex-CTO at BeBop Technology
  • Rachel Joy Victor: Co-Founder at
  • Manish Maheshwari: Product Manager - Developer Experience at Twelve Labs
  • Soyoung Lee: Co-Founder and Head of Business Development at Twelve Labs

We also want to express our deep appreciation to our sponsors, whose support made this event possible:

  • AWS: AWS provided prize credits for the winning teams and offered participants access to Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies through a single API. This service includes a broad set of capabilities needed to build generative AI applications with security, privacy, and responsible AI.
  • offers a fast, reliable platform for developers to run and tune generative AI models at scale, such as Llama 3 and Stable Diffusion 3. Participants used Fireworks to deliver delightful, responsive experiences with up to 4x lower latency than alternative solutions, without compromising on model quality.
  • is a market-based cloud computing platform focused on reducing the costs and friction of compute-intensive workloads. They enable anyone to easily leverage large-scale GPU liquidity, with thousands of GPUs available at

Their contributions were instrumental in providing the resources and infrastructure needed for participants to bring their innovative ideas to life. Thank you for your unwavering support and commitment to advancing AI in media and entertainment.

Product Feedback and Innovations

The hackathon provided an invaluable opportunity for Twelve Labs to gather product feedback and showcase our latest innovations. Two key products were highlighted during the event:

  1. Embed API: This new offering from Twelve Labs allows users to create multimodal embeddings, which are contextual vector representations for videos and texts. These embeddings can be utilized in various downstream tasks, including training custom multimodal models for applications such as anomaly detection, diversity sorting, sentiment analysis, and recommendations. Additionally, they can be used to construct Retrieval-Augmented Generation (RAG) systems, enhancing the capabilities of AI applications in media and entertainment. The Embed API is currently in limited Beta and accessible exclusively to a select group of users. More information can be found in our documentation.
  2. Jockey: Jockey is an open-source conversational video agent built on top of the Twelve Labs APIs and LangGraph. It allows workloads to be allocated to the appropriate foundation models for handling complex video workflows. Large Language Models (LLMs) are used to logically plan execution steps and interact with users, while video-related tasks are passed to Twelve Labs APIs, powered by video-foundation models (VFMs), to work with video natively, without the need for intermediary representations like pre-generated captions.

The feedback from participants was overwhelmingly positive, with many expressing excitement about the potential of these tools to revolutionize their workflows. This feedback will be instrumental in refining and improving our products to better meet the needs of our users.

A Teaser

We are excited to be involved in a massive hackathon during LA Tech Week in October. This event will be organized in conjunction with and the AI LA Community, promising to be even bigger and more exciting than our recent hackathon. Participants can look forward to new challenges, more opportunities for collaboration, and the chance to work with cutting-edge AI technologies. Stay tuned for more details, and mark your calendars for this can't-miss event!

Generation Examples
No items found.
No items found.
Comparison against existing models
No items found.

Related articles

Introducing the Multimodal AI in Media & Entertainment Hackathon

Twelve Labs will co-host our first in-person hackathon in Los Angeles!

James Le
Twelve Labs: Building Multimodal Video Foundation Models for Better Understanding

Co-founder Soyoung Lee shares how Twelve Labs' AI models are reshaping video understanding and content management

VP Land
AI 100: The most promising artificial intelligence startups of 2023

Twelve Labs recognized as one of the most innovative AI companies in search by CB Insights.

CB Insights
NAB 2024: The Bold Innovations You Probably Missed at the Show

Twelve Labs gets featured as a multimodal AI company that deserves buzz at NAB 2024