Seguir
Linjie (Lindsey) Li
Linjie (Lindsey) Li
Principal Researcher, Microsoft
Dirección de correo verificada de microsoft.com
Título
Citado por
Citado por
Año
UNITER: Learning UNiversal Image-TExt Representations
YC Chen, L Li, L Yu, AE Kholy, F Ahmed, Z Gan, Y Cheng, J Liu
ECCV 2020, 2020
2696*2020
Improving image generation with better captions
J Betker, G Goh, L Jing, T Brooks, J Wang, L Li, L Ouyang, J Zhuang, ...
Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf 2 (3), 8, 2023
765*2023
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
J Lei, L Li, L Zhou, Z Gan, TL Berg, M Bansal, J Liu
CVPR 2021, 2021
7192021
HERO: Hierarchical Encoder for Video+ Language Omni-representation Pre-training
L Li, YC Chen, Y Cheng, Z Gan, L Yu, J Liu
EMNLP 2020, 2020
5512020
The dawn of lmms: Preliminary explorations with gpt-4v (ision)
Z Yang*, L Li*, K Lin*, J Wang*, CC Lin*, Z Liu, L Wang*
arXiv preprint arXiv:2309.17421 9, 1, 2023
5452023
GIT: A Generative Image-to-text Transformer for Vision and Language
J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu, C Liu, L Wang
TMLR, 2022
5442022
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Z Gan, YC Chen, L Li, C Zhu, Y Cheng, J Liu
NeurIPS 2020, 2020
5402020
Segment Everything Everywhere All at Once
X Zou, J Yang, H Zhang, F Li, L Li, J Gao, YJ Lee
NeurIPS 2023, 2023
4682023
Relation-aware graph attention network for visual question answering
L Li, Z Gan, Y Cheng, J Liu
ICCV 2019, 2019
4262019
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
W Yu, Z Yang, L Li, J Wang, K Lin, Z Liu, X Wang, L Wang
ICML 2024, 2023
4222023
Mitigating hallucination in large multi-modal models via robust instruction tuning
F Liu, K Lin, L Li, J Wang, Y Yacoob, L Wang
ICLR 2024, 2023
360*2023
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Z Yang*, L Li*, J Wang*, K Lin*, E Azarnasab*, F Ahmed*, Z Liu, C Liu, ...
arXiv preprint arXiv:2303.11381, 2023
3192023
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
K Lin, L Li, CC Lin, F Ahmed, Z Gan, Z Liu, Y Lu, L Wang
CVPR 2022, 2021
2772021
Generalized Decoding for Pixel, Image, and Language
X Zou, ZY Dou, J Yang, Z Gan, L Li, C Li, X Dai, H Behl, J Wang, L Yuan, ...
CVPR 2023, 2022
2352022
VIOLET: End-to-End Video-Language Transformers with Masked Visual-token Modeling
TJ Fu, L Li, Z Gan, K Lin, WY Wang, L Wang, Z Liu
arXiv preprint arXiv:2111.12681, 2021
2172021
Multimodal foundation models: From specialists to general-purpose assistants
C Li*, Z Gan*, Z Yang*, J Yang*, L Li*, L Wang, J Gao
Foundations and Trends® in Computer Graphics and Vision 16.1-2 (2024): 1-214., 2023
1882023
Graph Optimal Transport for Cross-Domain Alignment
L Chen, Z Gan, Y Cheng, L Li, L Carin, J Liu
ICML 2020, 2020
1882020
Vision-Language Pre-training: Basics, Recent Advances, and Future Trends
Z Gan, L Li, C Li, L Wang, Z Liu, J Gao
Foundations and Trends® in Computer Graphics and Vision 14 (3–4), 163-352, 2022
1782022
ReCo: Region-Controlled Text-to-Image Generation
Z Yang, J Wang, Z Gan, L Li, K Lin, C Wu, N Duan, Z Liu, C Liu, M Zeng, ...
CVPR 2023, 2022
1322022
Multi-step reasoning via recurrent dual attention for visual dialog
Z Gan, Y Cheng, AEI Kholy, L Li, J Liu, J Gao
ACL 2019, 2019
1172019
El sistema no puede realizar la operación en estos momentos. Inténtalo de nuevo más tarde.
Artículos 1–20