VLM-Lens (EMNLP 2025 System Demonstration)

arXiv | GitHub

This beta version processes an instruction with up to two images through various VLMs, computes cosine similarity between their embeddings at a specified layer, and visualizes the probability distribution of the first token in the response for each image.

Instructions:

  1. Select a VLM from the dropdown
  2. Select a layer from the available embedding layers
  3. Upload two images for comparison
  4. Enter your instruction/question about the images
  5. Adjust the number of top tokens to display (1-20)
  6. Click "Analyze" to see the first token probability distributions side by side

Note: You can upload just one image if you prefer single image analysis.

Select VLM
1 20
Examples