Align on the Fly: Adapting Chatbot Behavior to Established Norms

1Shanghai Jiao Tong University 2Shanghai AI Lab
3The Hong Kong Polytechnic University 4Carnegie Mellon University
5University of Waterloo 6The Hong Kong University of Science and Technology
7Google DeepMind 8Generative AI Research Lab (GAIR) *Corresponding Author

Abstract

We propose On-the-fly Preference Optimization (OPO), a real-time alignment that works in a streaming way. OPO employs an external memory to store established rules for alignment, which can constrain LLMs’ behaviors without further training, allowing for convenient updates and customization of human values.

Description of first image

On-the-fly Preference Optimization (OPO)

OPO consists of a rule creation module, an alignment module, and an evaluation module.

Description of first image
  • Rule Creation Module: We focus on collecting legal and moral rules in this paper. Legal rules are collected from the National Database of Laws and Regulations (NDLR) and National Database of Government Regulations (NDGR), which cover most of the laws and regulations enforced in China. Moral rules are collected from middle school textbooks (basic moral rules), company guidelines (professional moral rules), and Normbank (Social moral rules)
  • Description of first image Description of first image
  • Alignment Module: Our alignment module is inspired by retrieval-augmented generation (RAG). We utilize OpenAI’s text-embedding-ada-002 model, a top-performing text embedding model, to obtain dense representations for the collected rules and create the vector database to store the representations.
  • Evaluation Module: To alleviate the benchmark leakage problem and enhance the comprehensiveness of evaluation, we propose an scalable evaluation module to automatically generate new legal questions and professional moral questions by utilizing GPT-4.
  • Description of first image

Experiments and Results

Description of first image
Results on the Law dataset
Description of first image
Results on the Morality dataset

BibTeX

@article{xu2023align,
      title={Align on the Fly: Adapting Chatbot Behavior to Established Norms},
      author={Xu, Chunpu and Chern, Steffi and Chern, Ethan and Zhang, Ge and Wang, Zekun and Liu, Ruibo and Li, Jing and Fu, Jie and Liu, Pengfei},
      journal={arXiv preprint arXiv:2312.15907},
      year={2023}
    }