OpenAI employees have publicly accused xAI of misleading benchmark results for its latest AI model, Grok3

Recently, an OpenAI employee publicly criticized Elon Musk's xAI company, saying that the benchmark results of its latest AI model Grok3 were misleading. In response, xAI co-founder Igor Babushkin insisted that the company was not improper. The xAI chart shows that two versions of Grok3 - Grok3 Reasoning Beta and Grok3 mini Reasoning - outperformed OpenAI's current strongest available model, o3-mini-high, on AIME 2025. However, OpenAI employees were quick to point out on the X platform that the xAI chart does not include the o3-mini-high's AIME 2025 score under "cons@64" conditions. Babushkin argued on the X platform that OpenAI had published similar misleading benchmark charts in the past, even though they were used to compare the performance of its own models.