Key Recent Works
Our team has a rich and diverse research background, with a strong focus on advancing core areas of AI and reasoning. We have extensively explored the domains of evaluation methodologies, including novel approaches for assessing reasoning beyond traditional accuracy metrics. Additionally, our research delves into AI alignment, particularly in fostering honesty and ethical behavior in models. We are also deeply involved in the development of datasets and benchmarks that challenge AI’s cognitive and reasoning abilities across multiple disciplines. Moreover, we have contributed significantly to generative AI for complex domains like mathematics, where we emphasize large-scale pretraining and reasoning. This solid foundation of prior work equips us with the expertise to tackle ambitious AI challenges and foster continued innovation in the field.
Some highly relevant key recent projects are listed below (ordered by the time of publication on arXiv):
Project | Focus | GitHub | Website | Publication | Date |
---|---|---|---|---|---|
Generative AI for Math: Abel | Reasoning models | Code | Homepage | ArXiv | 2023.09 |
Alignment for Honesty | Honesty-aligned models | Code | Homepage | NeurIPS 2024 | 2023.12 |
MathPile: A Billion-Token-Scale Pretraining Corpus for Math | Math pre-train data | Code | Homepage | NeurIPS 2024 | 2023.12 |
Reformatted Alignment | Alignment methods | Code | Homepage | EMNLP 2024 | 2024.02 |
Evaluating Mathematical Reasoning Beyond Accuracy | Reasoning Evaluation | Code | - | ArXiv | 2024.04 |
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI | Reasoning Evaluation | Code | Homepage | NeurIPS 2024 | 2024.06 |
Progress or Regress? Self-Improvement Reversal in Post-training | Evaluation | Code | - | ICML 2024 workshop | 2024.07 |
Weak-to-Strong Reasoning | Reasoning | Code | - | EMNLP 2024 | 2024.07 |
OpenResearcher: Unleashing AI for Accelerated Scientific Research | AI for scientific research | Code | - | EMNLP 2024 demo | 2024.08 |