YuyaoGe's Website
YuyaoGe's Website
About
Highlight Publications
Other Publications
Projects
Posts
Light
Dark
Automatic
Home
Tags
Adversarial Attack
Adversarial Attack
Are All Prompt Components Value-Neutral? Understanding the Heterogeneous Adversarial Robustness of Dissected Prompt in Large Language Models
Abstract: Prompt-based adversarial attacks have become an effective means to assess the robustness of large language models (LLMs). However, existing approaches often treat prompts as monolithic text, overlooking their structural heterogeneity-different prompt components contribute unequally to adversarial robustness.
Yujia Zheng
,
Tianhao Li
,
Haotian Huang
,
Tianyu Zeng
,
Jingyu Lu
,
Chuangxin Chu
,
Yuekai Huang
,
Ziyou Jiang
,
Qian Xiong
,
Yuyao Ge 葛钰峣
,
Mingyang Li
Aug 3, 2025
PDF
Cite
DOI
arXiv
论文分享 | 广泛的解码策略导致大模型越狱
在本文,作者提出了一个新的数据集MaliciousInstruct,一种模型回答毒性评估方式,一种通过操纵解码超参数的攻击手段——generation exploitation,一种对齐策略——generation-aware alignment
Yuyao Ge 葛钰峣
Apr 9, 2024
1 min read
论文分享
Softmax回归及其优化问题
本文所属系列为笔者学习陈天奇和J.Zico Kolter在CMU开设的Deep Learning Systems的课程笔记。
Yuyao Ge 葛钰峣
Mar 21, 2024
3 min read
笔记
Attack based on data : A novel perspective to attack sensitive points directly
Adversarial attack for time-series classification model is widely explored and many attack methods are proposed. But there is not a …
Yuyao Ge 葛钰峣
,
Zhongguo Yang
,
Lizhe Chen
,
Yiming Wang
,
Chengyang Li
PDF
Cite
Dataset
DOI
Cite
×