Skip to main content

Exploration Journey

📄️ How does Long Thought Work?

While we are still in the hypothesis stage without sufficient empirical evidence, we believe the success of O1’s long-thought approach is due to journey learning, as discussed earlier. Unlike shortcut learning, journey learning allows the model to explore the entire decision-making process, much like human problem-solving. O1 can consider multiple solution paths, learn from mistakes, and develop a deeper understanding of the problem—not just finding the correct answer but understanding why and how to reach it.

📄️ How to Evaluate our Trials?

In addition to testing accuracy scores using specific evaluation metrics on benchmarks, manually reviewing actual cases is a crucial step in evaluating data and models. Therefore, to provide a more intuitive way to evaluate the model’s performance on specific problems, we build a visual data analysis platform using Streamlit. Specifically, our visualization platform includes the visualization of synthetic trees and their corresponding long thoughts as well as the output of the trained model. Furthermore, when visualizing results, we support detailed conditional filtering, such as filtering for correctly or incorrectly answered questions, or whether the output contains keywords indicating reflection or hesitation (e.g., “wait”). Additionally, we support comparison between different iterations of synthetic data and model outputs, which makes it highly intuitive and helps us easily validate whether the new round of data or models is effective.