GROUNDHOG (logo): Grounding Large Language Models to Holistic Segmentation

Grounded Image Captioning

Grounded image captioning with short descriptions.

Grounded image captioning with detailed descriptions.

Referring Expression Segmentation

Referential Dialogue

Grounded Visual Question Answering

Grounded visual question answering with multiple choices.

Grounded visual question answering with visual text.

BibTeX


@inproceedings{zhang2024groundhog,
    title={GROUNDHOG: Grounding Large Language Models to Holistic Segmentation},
    author={Zhang, Yichi and Ma, Ziqiao and Gao, Xiaofeng and Shakiah, Suhaila and Gao, Qiaozi and Chai, Joyce},
    booktitle={Conference on Computer Vision and Pattern Recognition 2024},
    year={2024}
}