时间:2026-04-17 14:00:002026-04-17 15:00:00
地点:教学实验楼108
线上链接:911-701-350
主讲人:Prof. Trac-Duy Tran
主持人:梁杰 讲席教授
讲座语言:英语
主办单位:信息学部
品牌栏目:
In this talk, we introduce a hybrid U-Net architecture that pairs a multi-resolution Vision Transformer encoder with a CNN decoder. The ViT encoder captures global sparse support whereas the CNN decoder concentrates reconstruction capacity on support-consistent regions, enabling the model to combine global high-level context with fine low-level local detail. We demonstrate that this framework consistently outperforms existing networks, achieving consistent improvements in representation accuracy and reducing hallucination artifacts, while requiring substantially less training data. These gains are observed across multiple image processing tasks and benchmarks, including optical imaging, MRI, and ImageNet. Overall, our results show that attention-guided transformer-based signal representation pairing with local CNN kernels provides a principled and effective solution for low-level image processing
