OpenAI Open Source BrowseComp, Reinventing Agent Browser Reviews

At 2 am today, OpenAI open-sourced a test benchmark dedicated to the function of the agent browser - BrowseComp. This test benchmark is very difficult. Even OpenAI's own GPT-4o and GPT-4.5 have an accuracy rate of only 0.6% and 0.9% almost 0, and even using GPT-4o with browser function is only 1.9%. But OpenAI's latest agent model Deep Research has an accuracy rate of 51.5%, which is excellent in autonomous search, information integration, and accuracy calibration. (AIGC Open Community)