VLJ tracks meaning across video, outperforming CLIP in zero-shot tasks, so you get steadier captions and cleaner ...