How does Long Thought Work?
While we are still in the hypothesis stage without sufficient empirical evidence, we believe the success of O1’s long-thought approach is due to journey learning, as discussed earlier. Unlike shortcut learning, journey learning allows the model to explore the entire decision-making process, much like human problem-solving. O1 can consider multiple solution paths, learn from mistakes, and develop a deeper understanding of the problem—not just finding the correct answer but understanding why and how to reach it.
By navigating both correct and incorrect paths, O1 improves its error-handling and adaptability to new challenges. This trial-and-error process, combined with reflection and adjustment, mirrors human cognitive processes and enhances the model’s explainability. O1 can not only provide the correct solution but also explain the reasoning behind it, including how it recovers from errors. This thorough exploration is why O1 excels at handling complex problems and offering reliable, interpretable answers across a range of tasks.