Visual Programming Language Tutorial

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Abstract: Automatic detection and prevention of open-set failures are crucial in closed-loop robotic systems. Recent studies often struggle to simultaneously identify unexpected failures reactively ...

GitHub

APPL: A Prompt Programming Language

APPL is A Prompt Programming Language that extends Python to provide a Natural, Intuitive, Convenient, and Efficient (NICE) way to utilize Large Language Models (LLMs) such as GPT in your program. We ...

IEEE

Implicit and Explicit Language Guidance for Diffusion-Based Visual Perception

Abstract: Text-to-image diffusion models have shown powerful ability on conditional image synthesis. With large-scale vision-language pre-training, diffusion models are able to generate high-quality ...

GitHub

This is the official implementation of ICLR 2024 paper "VDC: Versatile Data Cleanser based on Visual-Linguistic Inconsistency by Multimodal Large Language Models".

We find a commonality of various dirty samples is visual-linguistic inconsistency between images and associated labels. To capture the semantic inconsistency between modalities, we propose versatile ...

Microsoft

LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation

CLIP is one of the most important multimodal foundational models today. What powers CLIP’s capabilities? The rich supervision signals provided by natural language, the carrier of human knowledge, ...

Microsoft

LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation - Microsoft Research

CLIP is one of the most important multimodal foundational models today, aligning visual and textual signals into a shared feature space using a simple contrastive learning loss on large-scale ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results