"Microsoft's Orca-Math model outperforms larger AI models on standardized math tests"

According to a recent report from a small team of AI experts at Microsoft, their Orca-Math model has surpassed larger models on standardized math tests. Published on the arXiv preprint server, their paper discusses the testing of the Orca-Math on the Grade School Math 8K (GSM8K) benchmark and its comparison to well-known large language models (LLMs).

While popular LLMs like ChatGPT are known for their impressive conversational abilities, it is less known that they are also capable of solving math word problems. To test this, the researchers pitted these models against the GSM8K, which consists of 8,500 grade-school math word problems that require multi-step reasoning to solve, along with their correct answers.

In their study, the Microsoft team tested Orca-Math, an AI application developed specifically to tackle math word problems, and compared its results to those of larger AI models. As highlighted in a blog post on Microsoft's Research Blog, there is a significant difference between popular LLMs and Orca-Math. While the former is a large language model, the latter is a small language model, with a much smaller number of parameters (typically in the thousands or a few million).

Unlike popular LLMs, Orca-Math is designed specifically for solving math problems and cannot be used for general conversations or answering random questions. With 7 billion parameters, it is relatively large for a small language model, but still much smaller than most well-known LLMs. However, it still managed to score an impressive 86.81% on the GSM8K, comparable to GPT-4-0613, which scored 97.0%. In contrast, other models like Llama-2 scored as low as 14.6%.

According to Microsoft, the high score achieved by Orca-Math can be attributed to the use of higher-quality training data and an interactive learning process developed by their AI team. This process continually improves results by incorporating feedback from a teacher. The team concludes that small language models can perform just as well as large language models on specific applications when developed under specialized conditions.

For more information, the paper "Orca-Math: Unlocking the potential of SLMs in Grade School Math" by Arindam Mitra et al. can be found on arXiv (2024). The Orca-Math model can be accessed on Microsoft's website and updates can be found on Twitter at @Arindam1408/status/1764761895473762738.

Ann Castro
Ann Castro Author
Ann Castro carries a total of 7 years experience in the healthcare domain. She owns a Master’s of Medicine Degree. She bagged numerous awards by contributing in the medical field with her ground-breaking notions. Ann has developed her own style of working and known for accuracy in her work. She loves trekking. She visits new places whenever she gets free time.