Progress or Regress? Self-Improvement Reversal in Post-training

Ting Wu1,3 Xuefeng Li2,3 Pengfei Liu2,3,4
1Fudan University 2Shanghai Jiao Tong University
3Generative AI Research Lab (GAIR) 4Shanghai AI Laboratory

Self-improvement Reversal reveals the phenomenon that during iterative post-training, as pass@1 accuracy improves, broader capabilities like output diversity and out-of-distribution generalization decline, exhibiting a paradoxical trend.

Overview

πŸ€” A thought-provoking research question

Self-improvement through post-training methods has been acclaimed for enhancing the problem-solving capabilities (e.g., mathematical reasoning) of Large Language Models (LLMs) without human intervention. However, current research all concentrate on maximizing benchmark scores through iterative self-improvement, there is little exploration of the underlying factors contributing to performance gains. As a result, the progress and reliability of different self-improvement methods are not guaranteed. Amidst the quest for self-improvement in LLMs, the persistent question arises: are these iterative post-training methods truly fostering progress, or are they inadvertently leading to regression?

πŸ‘©β€πŸ’» Our research route

We first provide a comprehensive overview of the main iterative post-training paradigms for self-improvement, understanding both the explicit and implicit influencing factors that contribute to the consistent performance improvements. This provide actionable insights for practitioners on how to perform iterative self-improvement more effectively.

We further develop an evaluative framework equipped with a comprehensive suite of metrics to assess improvement problems, solutions diversity, and OOD capabilities within the iterative process, enabling us to scrutinize the actual improvements beneath self-improvement.

🧐 What we reveal?

  • Answer Selection Optimization: Iterative self-improvement hardly entails the acquisition of new problem-solving abilities, but rather the enhancement of the model’s correct answer selection within its generation space.
  • Trade-off with Output Diversity: There exists a critical trade-off in iterative self-improvement: while aiming for higher accuracy, the diversity of outputs, which can be crucial for creativity and robustness in problem-solving, is compromised.
  • Capabilities Collapse: Iterative post-training methods can exacerbate the generalization disparities across groups, inadvertently causing models to focus on easier problems rather than enhancing their ability to solve more complex ones.

Systematic Formulation of Self-Improvement in Post-training

We identify key variables that influence the optimal improving performance and trend during the iterative post-training process: foundation model M, problem-solving task D, iteration steps T and post-training method F. For our experimental setup, we choose M = {LLaMA2-7B, LLaMA3-8B, Mistral-7B}, D = {CommonsenseQA for Commonse Knowledge, GSM8K and MATH for Mathematical Reasoning, MBPP for Code Generation}, T = {1, 2, 3, 4, 5}, and F = {Iterative SFT, Iterative DPO, Iterative SFT-DPO}.


Explicit influencing Factors Across all methods and datasets, there is a general trend of improvement in pass@1 accuracy with increasing iteration steps. This indicates that iterative post-training effectively enhances model performance over time. However, the rate of improvement tends to plateau or even decline slightly after 4-5 iterations.
Implicit Underlying Factor We found that Correct Answer Coverage, the proportion of the correct answer space that M1 occupy, can be used to gauge the model’s gains in self-improvement. The table above clearly demonstrates that when the correct answer coverage is high (>0.5), Iterative DPO and Iterative SFT-DPO produce the best-performing Mt*. Conversely, when the coverage is lower (< 0.5), Iterative SFT is more effective in achieving the optimal Mt*. Therefore, correct answer coverage can serve as a key factor in guiding practitioners to choose the suitable post-training method F.

Critical Evaluations on Self-Improvement

We engage in a critical examination and reevaluation of iterative self-improvement: discerning whether the improvements constitute genuine progress or merely regression.

🌟 Improvement Problems

Reversal Observation As N grows, M1 achieves near-perfect pass@N accuracy on IS(t) (Improvement set), suggesting its inherent capacity to tackle the deemed improvement problems.

🌟 Solutions Diversity

Reversal Observation All methods show a consistent decrease in diversity, significantly diminishing the diversity of model outputs over iterations, impacting both correct and incorrect answers. This reduction is evident across all three metrics: syntactic (Distinct-N grams), semantic (SentenceBERT Consine Similarity), and logical (Distinct Equations) diversity.

🌟 OOD Generalization

Reversal Observation With an increase in iterative steps, Iterative SFT and Iterative SFT-DPO can significantly impair OOD generalization. In contrast, Iterative DPO shows a noticeable improvement. However, all three iterative post-training methods can exacerbate generalization disparities across groups, inadvertently causing models to focus on easier problems instead of improving their ability to solve more complex ones.

πŸ“¬ Contact

If you have any questions regarding this project, feel free to submit a github issue or reach out to us via email.

BibTeX

If you find our paper and code helpful, please consider citing our work😊

@artical{wu2024progressregressselfimprovementreversal,
        title={Progress or Regress? Self-Improvement Reversal in Post-training}, 
        author={Ting Wu and Xuefeng Li and Pengfei Liu},
        year={2024},
        eprint={2407.05013},
        archivePrefix={arXiv},
        primaryClass={cs.CL},
        url={https://arxiv.org/abs/2407.05013}
      }