Author
SVG
Date Published
Apr 19, 2024
Tags
Applications
Multimodal AI
Startup
Share
Join our newsletter
You’re now subscribed to the Twelve Labs Newsletter! You'll be getting the latest news and updates in video understanding.
Oh no, something went wrong.
Please try again.

Low-code, gen Ai, closed captions, and public-cloud multicast deserved more buzz

By Brian L. Ring, SVG Contributor

Friday, April 19, 2024 - 11:42 am

Although NAB 2024 was a bit light in terms of “new,” I couldn’t be more excited about what’s ahead for our industry. I had to dig a little deeper this year to find truly bold innovation on the show floor. Sure, generative AI (gen AI) was buzzed about, but there were nary a mention of multimodal and no demos contrasting various LLMs (large language models), and few leaders — Joe Inzerillo, EVP/chief product and technology officer, SiriusXM, being an exception — talked about the amazing constellation and diversity of LLMs aside from OpenAI. We are moving too slowly. And that’s why this is my favorite column of the year. Here are four things you may have missed at NAB 2024:

1. A Critical Category, Still Crawling: Low-Code

My top hope coming into the show was to be blown away by offerings around the next revolutionary iteration of workflow and media supply chain. Sometimes called no-code, it’s more accurately described as low-code, and it has been aspirational for a decade or more. But, this year, it finally seems to be coming to shore. I say that having just last week successfully built a Zapier workflow to create automated, data-driven ad slates and fillers that has truly blown me away.

And it’s not just low-code. It’s low-code combined with programmatic editing and torched into flames with gen AI.

After that experience, I figured NAB 2024 would be rife with vendors and hot debates about whether the agreed-upon buzzword should be no-code or low-code and about what rules would govern the interactions between LLMs as media companies began to create workflows that pit one model against another in adversarial and supervisory content workflows.

I was disappointed. Instead, I had to dig and search to find five hidden gems – several of them with amazing software but nary an instance of “broadcast-grade low-code” on exhibit signage.

Friends, this is an important trend for CTOs at major media companies. That includes Megan Mauck, SVP, media operations, NBC, who mentioned No Code on stage at the Devoncroft Executive Summit on Saturday. (It was in context of the difficult exogenous economic factors we’ve had to endure and was referred to as a path toward reaching profitability.)

Industry scolding aside, here are, in no particular order, the five fabulous innovators that are on the right track: Tedial, Ateliere, Norsk, Qibb, and Blue Lacy. Congrats to all of you! You are pioneering a new software category that is going to be essential in helping us lift our media boats, create better TV, and restore economic health to our craft.

2. Gen AI? So Last Year

Apologies for another industry scolding, but I must say, while everyone seemed to be crowing about gen AI, I heard few mentions of the importance of “multimodal” gen AI. To be sure, this term also made the stage at NAB 2024, this time on an impressive AWS-hosted event featuring PGA Golf and the NFL. But I found only two hero vendors with demos worth writing home about.

First was Twelve Labs, not to be confused with Eleven Labs, a leader in voice cloning. Nor with Moments Lab, the bold rebrand from Newsbridge. (Why all the Labs?)

Twelve Labs has a strong website that says it all: “Understand video the way humans do.”

Pretty simple. Ridiculously powerful. Just take a pause. You’re in your office recliner. You’ve got your Vision Pro goggles on. And there’s a huge multiview in front of you, 100 channels 24/7/365.

Now pretend your job is to, say, monitor those channels. Or, let’s say, you sell CTV ads and can monitor the commercials for key brand targets to chase. Pretend you have a robot to do this task for you, and, no matter how many assignments or how complex the task, it seems to be doing a decent job.

That’s the kind of power these multimodal gen AI systems are going to have. And, oh yes, almost forgot: condensed games and next-gen highlights.

Just for context it’s useful to know that I worked in the AI-generated–metadata business for two years — nearly eight years ago. Thuuz Sports — and its excitement ratings IP — could fashion together a stream with your chosen duration, players, teams — all personalized to you. The company was eventually sold to StatsPerform, having lost a long, valiant, tough battle in AI 1.0 to today’s leader WSC Sports.

(I’m excited to report that a brand-new era has been unleashed by multimodal gen AI in the domain of sports highlights. it’s not clear to me what advantage the 1.0 approach will have to the 2.0 approach.)

At least one other vendor at NAB 2024 fully “groks” Gen AI — and also had a slick sports-highlight demo that did something I’ve been dreaming about for many years. Moments Lab extracted the most meaningful comments by the on-air broadcaster at the most important time to give an editor — it seemed to me, a premium, paid super-fan user would also love this — the ability to consume in an uber-productive fashion. Sports fans want to chug their video, and Moments Lab harnesses multimodal gen AI to do that.

Equally impressive was its commitment to the space, which starts with a total rebrand and a beautiful exhibit with smart messaging, customer logos, and GTM (go-to-market) positioning painted brightly on the outside of the booth. The company used to be called Newsbridge.

Quite honestly, I’m not sure I’ve seen a growth use case with as much near-term possibility as multimodal gen AI metadata logging.

3. From Boring Utility to Kingmaker? Closed Captions

Are you sleeping yet? Don’t. Though an unsexy sector, closed captioning entangles tons of flows today, a lot of them tied to regulations that can result in big fines if not complied with. But, as gen AI hits translations, transcriptions, dubbing, one company seems to be ahead in the craft I care about most: bold, meaningful innovation.

Caption Hub showed patent-pending technology that turns the lemons of latency into a lemonade of luxury loot. You know how ABR (adaptive bitrate) induces 20-30 seconds of latency into streaming TV? Caption Hub takes advantage of that time to process captions in a maximal fashion — using the extra time, for instance, to ensure better quality, to align placement of the subtitles more perfectly with the action on the screen, or drive any number of additional post-air enhancement workflows.

For example, Caption Hub also enables synthetic multilingual voiceovers — not quite AI dubbing but perhaps well-suited to many or most global distribution opportunities.

4. Multicast Routing to — and on — Public Clouds

SwXtch.io, along with its media-processing partner Cinnafilm, produces a cloud switch — an overlay fabric — that makes it easy to bring the benefits of multicast to public-cloud streaming contribution workflows. These are growing in quantity and complexity.

Today, most live feeds in cloud streaming workflows are sent and received as point-to-point, or unicast, connections. Public clouds don’t support multicast – one-to-many, or broadcast, workflows — for important network-level reasons.

However, it’s possible to create an overlay network that, given the proper characteristics, can outperform today’s point-to-point connections. This brings a multitude of benefits, including agility, simplification, and, in some cases, raw efficiencies: one feed instead of two.

Comments? Questions? Feedback at Brian@RingDigital.tv

Generation Examples
No items found.
No items found.
Comparison against existing models
No items found.

Related articles

A Recap of Our Multimodal AI in Media & Entertainment Hackathon in Sunny Los Angeles!

Twelve Labs co-hosted our first in-person hackathon in Los Angeles!

James Le
Introducing the Multimodal AI in Media & Entertainment Hackathon

Twelve Labs will co-host our first in-person hackathon in Los Angeles!

James Le
Twelve Labs: Building Multimodal Video Foundation Models for Better Understanding

Co-founder Soyoung Lee shares how Twelve Labs' AI models are reshaping video understanding and content management

VP Land
S.Korea's Twelve Labs ranks among world's top 50 generative AI startups

The company has independently developed a massive AI model geared toward video understanding

The Korea Economic Daily