Generative AI Act II: Test Time Scaling Drives Cognition Engineering

Abstract

The first generation of Large Language Models—what might be called ''Act I" of generative AI (2020-2023)—achieved remarkable success through massive parameter and data scaling, yet exhibited fundamental limitations in knowledge latency, shallow reasoning, and constrained cognitive processes. During this era, prompt engineering emerged as our primary interface with AI, enabling dialogue-level communication through natural language. We now witness the emergence of ''Act II" (2024-present), where models are transitioning from knowledge-retrieval systems (in latent space) to thought-construction engines through test-time scaling techniques. This new paradigm establishes a mind-level connection with AI through language-based thoughts. In this paper, we clarify the conceptual foundations of cognition engineering and explain why this moment is critical for its development. We systematically break down these advanced approaches through comprehensive tutorials and optimized implementations, democratizing access to cognition engineering and enabling every practitioner to participate in AI's second act. We provide a regularly updated collection of papers on test-time scaling in the GitHub Repository.

Three Scaling Phases

The Evolution of AI Engineering Paradigms

The three scaling phases illustrated as a progression of knowledge representation. Pre-training scaling (blue) forms isolated knowledge islands with fundamental physics concepts connected by limited innate associations. Post-training scaling (green) densifies these islands with more sophisticated learned connections between related concepts. Test-time scaling (red) enables dynamic reasoning pathway formation between previously disconnected concepts through extended computation, facilitating multi-hop inference across the entire knowledge space. Test-time scaling builds bridges between knowledge islands, connecting distant nodes that remain isolated during pre-training and conventional post-training.

The Practitioner’s Roadmap: How to Apply Test-Time Scaling to your Applications?

Workflow for applying test-time scaling in a specific domain. For more details, please refer to the main paper.

Methods to improve scaling efficiency of test-time scaling approaches

Works and methodologies for applying RL to elicit long CoT abilities

Works for applying RL to elicit long CoT abilities.

Long CoT resources across different domains

Works applying test-time scaling across various domains

A hands-on tutorial applying RL to unlock long CoT abilities

More in the paper!

❮ ❯

📬 Contact

If you have any questions regarding the paper, feel free to directly submit a github issue.

BibTeX

@misc{xia2025generativeaiactii,
      title={Generative AI Act II: Test Time Scaling Drives Cognition Engineering}, 
      author={Shijie Xia and Yiwei Qin and Xuefeng Li and Yan Ma and Run-Ze Fan and Steffi Chern and Haoyang Zou and Fan Zhou and Xiangkun Hu and Jiahe Jin and Yanheng He and Yixin Ye and Yixiu Liu and Pengfei Liu},
      year={2025},
      eprint={2504.13828},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.13828}, 
}