We release the benchmarking results on various safety-related datasets as a leaderboard.
For the model accuracy evaluation, we test different models on five datasets: BeaverTails, DiaSafety, Jade, Flames, and WildSafety. The metric is the accuracy rate on each dataset, with an overall average (Avg) across all datasets. We report the results for various models, including large language models and specialized safety models. The "Generative" column indicates whether a model is capable of generating text (βοΈ) or if it's a non-generative model primarily used for classification(β).