MATH-500

Mathematical problem-solving benchmark score.

MATH-500 is a 500-problem held-out subset from the MATH benchmark split used in OpenAI's Let's Verify Step by Step work. It is widely used to test final-answer mathematical problem solving.

Test type: Competition math word problems, usually scored by normalized final-answer matching.