Microsoft has partnered with the University of California and other institutions to introduce MM-Navigator, a multimodal large model built on GPT-4V. This innovative tool is designed for zero-shot smartphone GUI navigation tasks, allowing smartphone screens to function like human users by accurately determining the next steps based on given instructions. Research indicates that multimodal large models, particularly GPT-4V, significantly enhance screen interpretation, action reasoning, and precise action localization, showcasing remarkable capabilities in this area.