• 34ºc, Sunny

Microsoft Open Source Multimodal AI Agent - Magma

At 3 am today, Microsoft open-sourced the basic model of multi-modal AI Agent - Magma on its official website. Compared with traditional agents, Magma has multi-modal capabilities across the digital and physical world, and can automatically process different types of data such as images, videos, and text. For example, you can use Magma to automatically place e-commerce orders and check the weather; you can also automatically operate physical robots, or get help when playing real chess. In addition, Magma can also have built-in psychological prediction functions, which enhances the ability to understand the spatiotemporal dynamics in future video frames, and can accurately predict the intentions and future behaviors of people or objects in the video.