The Chosun Daily
Date Published
Apr 8, 2024
Generative AI
Video understanding
Multimodal AI
Join our newsletter
You’re now subscribed to the Twelve Labs Newsletter! You'll be getting the latest news and updates in video understanding.
Oh no, something went wrong.
Please try again.

Twelve Labs, a South Korean generative artificial intelligence (AI) startup, made headlines last year after securing investment from U.S. tech giant Nvidia. Based in Seoul and San Francisco, the company specializes in AI technology that analyzes and understands video. Nvidia, Intel and two other companies jointly invested $10 million in Twelve Labs last October.

“Just like OpenAI’s ChatGPT pioneered the realm of text-based generative AI, Twelve Labs aims to pave the way for the advancement of video AI,” said Twelve Labs co-founder and CEO Lee Jae-sung, 30, in a video interview with the Chosunilbo on April 8.

Twelve Labs is developing a multimodal AI that understands videos. The company’s AI model analyzes all the images and sounds in a video and matches them with the human language. For instance, the AI model can identify a scene with “a man holding a pen in the office” in an hour-long video within seconds.

When Lee founded Twelve Labs in 2020, the burgeoning AI market mainly focused on text or images. “AI startups were receiving astronomical funding for developing large language models like ChatGPT,” said Lee. “We believed video was a field where we could make a difference even with limited investment,” says Lee.

Lee, who majored in computer science at UC Berkeley and interned at Samsung Electronics and Amazon, returned to Korea to fulfill mandatory military service. Here, he met his future Twelve Labs co-founders. While serving in the Ministry of National Defense’s Cyber Operations Command, Lee and his colleagues, who were equally passionate about AI, spent time discussing research papers and exploring AI technologies, eventually starting Twelve Labs together in 2020.

“My co-founder, who was the first to finish military service, was so dedicated that he regularly visited us to study AI together,” Lee reflected. “Starting this company based on passion without worrying too much about the future turned out to be a good idea.”

Twelve Labs currently operates Pegasus, a video language foundation model that can summarize long videos into text and answer questions about videos with its users, and Marengo, a multimodal AI model that understands videos, images and audio. Over 30,000 developers and companies are using these AI models. One of the company’s most prominent partnerships is with the National Football League (NFL).

“Organizations like the NFL have amassed a treasure trove of video content that spans over a century, but monetizing such content requires advanced video search technology,” Lee said. “Companies with extensive data archives are seeking out Twelve Labs’ AI technology.”

By  Park Ji-min, Lee Jae-eun

Published 2024.04.08. 16:04

Generation Examples
No items found.
No items found.
Comparison against existing models
No items found.

Related articles

A Recap of Our Multimodal AI in Media & Entertainment Hackathon in Sunny Los Angeles!

Twelve Labs co-hosted our first in-person hackathon in Los Angeles!

James Le
Introducing the Multimodal AI in Media & Entertainment Hackathon

Twelve Labs will co-host our first in-person hackathon in Los Angeles!

James Le
Twelve Labs: Building Multimodal Video Foundation Models for Better Understanding

Co-founder Soyoung Lee shares how Twelve Labs' AI models are reshaping video understanding and content management

VP Land
S.Korea's Twelve Labs ranks among world's top 50 generative AI startups

The company has independently developed a massive AI model geared toward video understanding

The Korea Economic Daily