Composed retrieval doubles signal for long queries
Marengo 3.0 fuses text + visual intent into a single embedding, surfacing coherent grassy scenes in the first positions where baselines drift.
Replace the fallen autumn leaves covering the ground with a patch of long, unkempt green grass swaying gently in a light breeze.
QUERY CLIP (30s)
Composed R@10
97.0%
Text-Only R@10
90.9% vs 78.3% (Vertex)
Embedding size
512d vs Nova 3072d
Marengo 3.0 (composed)
GT at rank 1
Top 1
Top 2
Top 3
TOP 4
TOP 5
Marengo 3.0 (composed)
GT at rank 1
Top 1
Top 2
Top 3
TOP 4
TOP 5
Marengo 3.0 (composed)
GT at rank 1
Top 1
Top 2
Top 3
TOP 4
TOP 5
Marengo 3.0 (composed)
GT at rank 1
Top 1
Top 2
Top 3
TOP 4
TOP 5
Marengo 3.0 (composed)
GT at rank 1
Top 1
Top 2
Top 3
TOP 4
TOP 5
Speech retrieval stays faithful to exact utterances
Marengo 3.0 and 2.7 surface the ground-truth first; Nova fails to retrieve speech reliably.
It is so made that everywhere we feel the sense of punishment
Marengo 3.0
GT at rank 1
Top 1
Top 1
Top 1
Top 1
Top 1
Marengo 3.0
GT at rank 1
Top 1
Top 1
Top 1
Top 1
Top 1
Marengo 3.0
GT at rank 1
Top 1
Top 1
Top 1
Top 1
Top 1