Publications

Preprints


TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition

Published in arXiv, under review of a CCF-A conference, 2025

TRivia is a self-supervised fine-tuning method that enables pretrained vision-language models to learn table recognition directly from unlabeled table images in the wild. Built upon Group Relative Policy Optimization, TRivia automatically identifies unlabeled samples that most effectively facilitate learning and eliminates the need for human annotations through a question-answering-based reward mechanism.

Recommended citation: Junyuan Zhang and Bin Wang and Qintong Zhang and Fan Wu and Zichen Wen and Jialin Lu and Junjie Shan and Ziqi Zhao and Shuya Yang and Ziling Wang and Ziyang Miao and Huaping Zhong and Yuhang Zang and Xiaoyi Dong and Ka-Ho Chow and Conghui He (2025). "TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition." arXiv. 1(3).
Download Paper

Protego: User-Centric Pose-Invariant Privacy Protection Against Face Recognition-Induced Digital Footprint Exposure

Published in arXiv, under review of a CCF-A conference, 2025

Protego encapsulates a user’s 3D facial signatures into a pose-invariant 2D representation, which is dynamically deformed into a natural-looking 3D mask tailored to the pose and expression of any facial image of the user, and applied prior to online sharing. Motivated by a critical limitation of existing methods, Protego amplifies the sensitivity of FR models so that protected images cannot be matched even among themselves.

Recommended citation: Ziling Wang, Shuya Yang, Jialin Lu, Ka-Ho Chow (2025). "Protego: User-Centric Pose-Invariant Privacy Protection Against Face Recognition-Induced Digital Footprint Exposure." arXiv. 1(3).
Download Paper