How a tiny startup beat the tech giants of the world and ranked #1 in video search
Aiden Lee
Aiden Lee
Date Published
Mar 16, 2022
Join our newsletter
You’re now subscribed to the Twelve Labs Newsletter! You'll be getting the latest news and updates in video understanding.
Oh no, something went wrong.
Please try again.

Final ICCV VALUE Challenge 2021 — Video Retrieval(Search) Track Result (2021.10)

One question I often got from our customers and investors is

“How does your technology compare to Google’s or Microsoft’s?”

I’m sure what they REALLY wanted to ask is…

“Is your technology better than Google’s or Microsoft’s?”

It’s a difficult question to answer. Even more difficult for a deep tech AI startup, especially if the founders do not have a strong track record of publications or come from academia. The answer usually ends up in one of two routes,

  1. The bulldozer strategy: “Yes, we are better! Here is their technology’s benchmark performance and here is ours.”
    → Reaction: doubtful, questioning, and sometimes even resentful
  2. The sidestep strategy: “We provide better usability and can build out features for specific customer segments, and our customers love it!” (Talking about products and customers instead of technology)
    → Reaction: maybe persuaded, but still dissatisfied.

We always went with #2 despite having had a better benchmark performance than other companies. It gave a natural segue for us to talk more about our customers, and more importantly, we definitely did not want to argue with the folks we were trying to turn into believers in our product and vision!

As a technical founder who’s leading the AI research and product development, I was often discouraged. Each time I heard the same question repeated, again and again, I felt powerless to the point of having nervous breakdowns. I constantly repeated the word “sorry” to my team who worked day and night to build the incredible technology I knew we had.

Feeling powerless. At least the chair was comfy.

That’s when I knew we had to participate in the ICCV VALUE(Video-And-Language-Understanding-Evaluation) Challenge hosted by Microsoft. The challenge had already started two weeks earlier, but who cares? This was the perfect opportunity to prove ourselves.

Three reasons,

  1. The task of the challenge was spot on for us — video retrieval — that evaluates the performance of video search AI models.
  2. The evaluation would be objective and complete, with four different and diverse domains of benchmark video datasets.
  3. It was hosted and joined by the most prestigious AI institutions and tech giants such as Microsoft, Tencent, and Baidu, giving us the chance to directly compete against them.

If we could win the competition, there would be much to gain: credibility, branding, PR, hiring, confidence, …and most importantly, we would have a powerful, bulldozer answer to give to our customers and investors when asked something along the lines of, “Are you better than Google?”

Despite the shiny opportunities that we imagined if we could win the competition, the odds were so obviously against us.

  1. We had limited cloud GPU resources that we could utilize for training multiple models at the same time. At the time, we only had $50K to spare for the competition. We had been given $100K worth of free AWS credit upon joining Techstars, and had already used up $50K. For a competition of this size, $50K in compute is same as not having compute at all.
  2. We had limited human “labor”. Our entire company consisted of fewer than 10 people. 10 people minus the non-engineers minus the engineers who had to focus on product and PoC tasks with beta customers...? That only left me with 2 engineers, and that’s including myself.
  3. We had limited datasets that we could use to train our model. Unlike the tech giants who own infinite amounts of videos to pre-train their models, our only option was to utilize public video datasets that are available to everyone.

And so, I believed that we had less than a 10% chance of winning, but still decided to participate. Just like any startup at some point, we needed to take a leap of faith and arm ourselves with a winning mentality. As the startup saying goes, the odds will always be 0% if we don’t do anything.

Next Post: Part 2 — Nuts & Bolts of ICCV VALUE Challenge

Generation Examples
No items found.
No items found.
Comparison against existing models
No items found.

Related articles

2021 Techstars Seattle Accelerator Companies

Twelve Labs participates in 2021 Techstars Seattle program.

Isaac Kato
Meet the latest Techstars Seattle cohort: 10 startups on how they’ve adapted to the pandemic

Video search AI that recognizes features such as faces, movement, speech, and shot types to make scenes searchable.

Cara Khulman
Why I joined Twelve Labs as the Chief Scientist

Pushing the boundaries of multimodal video understanding, the obvious next step forward.

Minjoon Seo
Minjoon Seo