According to OpenAI, o1 has been trained using a new optimization algorithm and a dataset specifically tailored to it; while also meshing in
reinforcement learning into its training. o1 spends additional time thinking (generating a chain of thought) before generating an answer, which makes it better for complex reasoning tasks, particularly in science and
mathematics. According to
Mira Murati, this ability to think before responding represents a new, additional paradigm, which is improving model outputs by spending more computing power when generating the answer, whereas the model scaling paradigm improves outputs by increasing the model size, training data and training compute power. OpenAI's test results suggest a correlation between accuracy and the logarithm of the amount of compute spent thinking before answering. o1-mini is faster and 80% cheaper than o1-preview. It is particularly suitable for programming and
STEM-related tasks, but does not have the same "broad world knowledge" as o1-preview. OpenAI noted that o1's reasoning capabilities make it better at adhering to safety rules provided in the prompt's context window. OpenAI reported that during a test, one instance of o1-preview exploited a misconfiguration to succeed at a task that should have been infeasible due to a bug. OpenAI also granted early access to the UK and US
AI Safety Institutes for research, evaluation, and testing. According to OpenAI's assessments, o1-preview and o1-mini crossed into "medium risk" in CBRN (biological, chemical, radiological, and nuclear) weapons.
Dan Hendrycks wrote that "The model already outperforms PhD scientists most of the time on answering questions related to
bioweapons." He suggested that these concerning capabilities will continue to increase. == Limitations ==