VLM-Lens (EMNLP 2025 System Demonstration)
This beta version processes an instruction with up to two images through various VLMs,
computes cosine similarity between their embeddings at a specified layer,
and visualizes the probability distribution of the first token in the response for each image.
Instructions:
- Select a VLM from the dropdown
- Select a layer from the available embedding layers
- Upload two images for comparison
- Enter your instruction/question about the images
- Adjust the number of top tokens to display (1-20)
- Click "Analyze" to see the first token probability distributions side by side
Note: You can upload just one image if you prefer single image analysis.