GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher Y Yuan, W Jiao, W Wang, J Huang, P He*, S Shi, Z Tu ICLR 2024, 2023 | 160 | 2023 |
On the Humanity of Conversational AI: Evaluating the Psychological Portrayal of LLMs J Huang, W Wang, EJ Li, MH Lam, S Ren, Y Yuan, W Jiao, Z Tu, MR Lyu ICLR 2024 (Oral), 2023 | 59* | 2023 |
All Languages Matter: On the Multilingual Safety of LLMs W Wang, Z Tu, C Chen, Y Yuan, J Huang, W Jiao, MR Lyu ACL 2024 Findings, 2023 | 47* | 2023 |
How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments J Huang, EJ Li, MH Lam, T Liang, W Wang, Y Yuan, W Jiao, X Wang, Z Tu, ... ICLR 2025, 2024 | 37 | 2024 |
LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models Y Wan, W Wang, Y Yang, Y Yuan, J Huang, P He, W Jiao, MR Lyu EMNLP 2024, 2024 | 22* | 2024 |
Refuse whenever you feel unsafe: Improving safety in llms via decoupled refusal training Y Yuan, W Jiao, W Wang, J Huang, J Xu, T Liang, P He, Z Tu arXiv preprint arXiv:2407.09121, 2024 | 10 | 2024 |
New Job, New Gender? Measuring the Social Bias in Image Generation Models W Wang, H Bai, J Huang, Y Wan, Y Yuan, H Qiu, N Peng, MR Lyu ACM MM 2024 (Oral), 2024 | 7 | 2024 |
The earth is flat? unveiling factual errors in large language models W Wang, J Shi, Z Tu, Y Yuan, J Huang, W Jiao, MR Lyu arXiv preprint arXiv:2401.00761, 2024 | 5 | 2024 |
Libra-leaderboard: Towards responsible ai through a balanced leaderboard of safety and capability H Li, X Han, Z Zhai, H Mu, H Wang, Z Zhang, Y Geng, S Lin, R Wang, ... arXiv preprint arXiv:2412.18551, 2024 | 3 | 2024 |
Does ChatGPT Know That It Does Not Know? Evaluating the Black-Box Calibration of ChatGPT Y Yuan, W Wang, Q Guo, Y Xiong, C Shen, P He COLING 2024 (Oral), 5191-5201, 2024 | 3 | 2024 |
Learning to ask: When llms meet unclear instruction W Wang, J Shi, C Wang, C Lee, Y Yuan, J Huang, MR Lyu arXiv preprint arXiv:2409.00557, 2024 | 2 | 2024 |
On the resilience of multi-agent systems with malicious agents J Huang, J Zhou, T Jin, X Zhou, Z Chen, W Wang, Y Yuan, M Sap, MR Lyu arXiv preprint arXiv:2408.00989, 2024 | 2 | 2024 |
Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs S Zhao, Y Yuan, X Tang, P He EMNLP 2024 Findings, 2024 | 1 | 2024 |
Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step W Wang, K Gao, Z Jia, Y Yuan, J Huang, Q Liu, S Wang, W Jiao, Z Tu arXiv preprint arXiv:2410.03869, 2024 | 1 | 2024 |
Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs X Liu, W Wang, Y Yuan, J Huang, Q Liu, P He, Z Tu arXiv preprint arXiv:2410.08145, 2024 | | 2024 |