Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models

Published in arXiv, 2024