Working On Multimodal Vision Models

Microsoft open-sources multimodal reasoning model with 15B parameters

The company mainly trained Phi-4-reasoning-vision-15B on open-source data. The data included images and text-based descriptions of the objects depicted in those images. Before it started training the ...

Microsoft built Phi-4-reasoning-vision-15B to know when to think — and when thinking is a waste of time

B, an open-weight multimodal vision AI model designed to deliver strong math, science, document and UI reasoning with far less training data and compute than much larger systems.

CU Boulder News & Events

CSCA 5422: Modern AI Models for Vision and Multimodal Understanding

Start working toward program admission and requirements right away. Work you complete in the non-credit experience will transfer to the for-credit experience when you ...

CU Boulder News & Events

DTSA 5514 Modern AI Models for Vision and Multimodal Understanding

Modern AI Models for Vision and Multimodal Understanding is a course that will enable you to understand and build systems that interpret images, text, and more—just like today’s leading AI models.

SiliconANGLE

Twelve Labs raises $50M for multimodal AI foundation models

Twelve Labs Inc., the developer of generative artificial intelligence foundation models that can understand videos like humans, said today that the company has raised $50 million in early-stage ...

The Robot Report

Vision-language-action models are the next leap in autonomous robotics

Explore how vision-language-action models like Helix, GR00T N1, and RT-1 are enabling robots to understand instructions and act autonomously.

Computerworld

OpenAI announces new multimodal desktop GPT with new voice and vision capabilities

“GPT-4o is especially better at vision and audio understanding compared to existing models,” OpenAI said in its announcement. During an on-stage event, Murati said GPT-4o will also have new memory ...

14don MSN

Sarvam vs ChatGPT vs Gemini: What’s so special about India’s first AI model

Sarvam has gained attention at the AI Impact Summit 2026 by unveiling its advanced AI model, Sarvam Vision, which claims ...

VentureBeat

Cohere's first vision model Aya Vision is here with broad, multilingual understanding and open weights — but there's a catch

Canadian AI startup Cohere launched in 2019 specifically targeting the enterprise, but independent research has shown it has so far struggled to gain much of a market share among third-party ...

Yahoo Finance

Vision-Language Models (VLM) Market Projected to Reach USD 41.75 Billion by 2035 | Hyperscale Infrastructure and Capital Investments Accelerate Multimodal AI Growth Says Astute ...

Chicago, Feb. 11, 2026 (GLOBE NEWSWIRE) -- The global vision-Language Models (VLM) market size was valued at USD 3.84 billion in 2025 and is projected to hit the market valuation of USD 41.75 billion ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results