Run Cambrian-1 Demo Online


Cambrian-1 is a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While powerful language models can enhance multimodal capabilities, the design choices of the visual component are often underexplored and out of touch with visual representation learning research.
Cambrian-1 is built around five key pillars, each of which provides important insights into the design space of MLMs:
- Visual Representation: The research team explored various visual encoders and their combinations.
- Connector Design: The research team designed a new dynamic and spatially aware connector that integrates visual features from several models while reducing the number of tokens.
- Instruction tuning data: The research team curates high-quality visual instruction tuning data from public resources, emphasizing the importance of balanced distribution.
- Instruction Tuning Cookbook: The research team discusses instruction tuning strategies and practices.
- Benchmarks: The research team examined existing MLM benchmarks and introduced a new vision-centric benchmark “CV-Bench”.
Cambrian-1 project website:https://cambrian-mllm.github.io/#visual-representation
Model performance
Model | # Vis. Tok. | MMB | SQA-I | MathVistaM | ChartQA | MMVP |
---|---|---|---|---|---|---|
GPT-4V | UNK | 75.8 | – | 49.9 | 78.5 | 50.0 |
Gemini-1.0 Pro | UNK | 73.6 | – | 45.2 | – | – |
Gemini-1.5 Pro | UNK | – | – | 52.1 | 81.3 | – |
Grok-1.5 | UNK | – | – | 52.8 | 76.1 | – |
MM-1-8B | 144 | 72.3 | 72.6 | 35.9 | – | – |
MM-1-30B | 144 | 75.1 | 81.0 | 39.4 | – | – |
Base LLM: LLaMA3-8B-Instruct | ||||||
Mini-Gemini-HD-8B | 2880 | 72.7 | 75.1 | 37.0 | 59.1 | 18.7 |
LLaVA-NeXT-8B | 2880 | 72.1 | 72.8 | 36.3 | 69.5 | 38.7 |
Cambrian-1-8B | 576 | 75.9 | 80.4 | 49.0 | 73.3 | 51.3 |
Base LLM: Vicuna1.5-13B | ||||||
Mini-Gemini-HD-13B | 2880 | 68.6 | 71.9 | 37.0 | 56.6 | 19.3 |
LLaVA-NeXT-13B | 2880 | 70.0 | 73.5 | 35.1 | 62.2 | 36.0 |
Cambrian-1-13B | 576 | 75.7 | 79.3 | 48.0 | 73.8 | 41.3 |
Base LLM: Hermes2-Yi-34B | ||||||
Mini-Gemini-HD-34B | 2880 | 80.6 | 77.7 | 43.4 | 67.6 | 37.3 |
LLaVA-NeXT-34B | 2880 | 79.3 | 81.8 | 46.5 | 68.7 | 47.3 |
Cambrian-1-34B | 576 | 81.4 | 85.6 | 53.2 | 75.6 | 52.7 |
Deploy the inference step
This tutorial has deployed the model and environment. You can directly use the large model for reasoning dialogue according to the tutorial instructions. The specific tutorial is as follows:
1. Initial Setup
1. Open the workspace after resource configuration

2. Open the terminal and enter the command bash setup.sh


3. After the system outputs Environment variable added to .bashrc, enter the command source ~/.bashrc

2. Start the controller
4. After initialization is complete, enter the command in the terminal bash control.sh

3. Open the interface
5. Wait for about 15 seconds and then open aNew Terminal, and enter the command bash gradio.sh
, click the link generated on the page to enter the model interface

6. At this time, notice that there is no model for us to choose in the model interface. This is because we have not configured the model yet. At this time, we need to proceed to the fourth step.

4. Model Configuration
7. Open anotherNew Terminal And enter the command bash model.sh
When "Uvicorn running on..." appears, return to the Gradio web page that has been opened. After refreshing, you can see that the model has been deployed. You can then upload pictures and prompt words to communicate with the model.


There are also multiple parameters in the model that can be adjusted by the user.
- Temperature can affect the creativity and randomness of the output content.
- Top p can control the size of the candidate word set, affecting the quality and diversity of the generated text
- Max output tokens can change the maximum number of output tokens.
