Currently, i focus on AI Safety
and Alignment.
-
AI
Alignment: Given the biases and discriminations that may exist in
pre-training data, large models
(LMs) may exhibit unintended behaviors. I am interested in alignment methods (e.g.,
Reinforcement
Learning from human feedback (RLHF)) and post-hoc alignment methods to ensure the safety and
trustworthy of LLMs.
-
Theoretical
Explanations and Mechanism Design for
Alignment:
Aligning these AI System (e.g. LLMs) effectively to
ensure
consistency with human intentions and values (though some views may question universal
values) is a significant current challenge. I am particularly interested in ensuring the
feasibility of these alignment methods in both theoretical and practical mechanisms.
-
Applications (LM + X):
I am interested in the
application of large models in various domain, such as healthcare and education, and the
potential impact of rapid industry development and iteration brought about by large
models.