This article delves into a promising research framework proposed by a team of researchers from Huawei's Noah's Ark Lab in Paris, advocating for "embodied artificial intelligence" (E-AI) as the essential next step in the quest for artificial general intelligence (AGI). The proposed framework emphasizes the significance of AI agents having a form of embodiment to facilitate perception, action, memory, and learning, ultimately aiming to bridge the gap towards human-level AI.

A team of researchers from Huawei's Noah's Ark Lab in Paris recently presented pre-print research outlining a pioneering framework for "embodied artificial intelligence" (E-AI) as the next fundamental step in the pursuit of artificial general intelligence (AGI). AGI, also known as "human-level AI" or "strong AI," pertains to an AI system capable of executing any task given the necessary resources. While there remains no clear scientific consensus on the specific qualifications for a general intelligence AI system, companies such as OpenAI have been established with the sole purpose of pursuing this technological milestone.


The research challenges the prevailing belief that simply scaling up large language models (LLMs), such as OpenAI's ChatGPT and Google's Gemini, in terms of data volume and computational power, could lead to AGI. Instead, the researchers propose that true understanding can only be achieved through E-AI agents that are embodied, living within, and interacting with the real world to learn from it.


The researchers contend that current large language models, despite their scale and capabilities, struggle to comprehend the real world because they do not actively exist within it. The proposed embodied artificial intelligence (E-AI) framework aims to equip AI agents with the ability for perception, action, memory, and learning. This involves granting AI systems the capability to obtain raw real-world data in real-time, process, and encode this data into a latent learning space, as well as enabling them to take independent actions and observe the outcomes. The team emphasizes that this process is vital for AI to learn about the world through interaction and experience, akin to living creatures.


Furthermore, the framework posits that for AI agents to truly engage with the real world, they must possess some form of embodiment capable of perception, action, memory, and learning. This entails providing AI systems with the ability to obtain raw data from the real world, process and encode it into a latent learning space, carry out actions, and observe their outcomes. The researchers argue that allowing AI to act on its own and perceive the results of its actions as new memories could enable agents to learn about the world through trial and error, similarly to living creatures.


While the researchers have presented a theoretical framework demonstrating how an LLM, or foundational AI model, could be embodied to achieve these goals in the future, they acknowledge that there are numerous challenges that must be addressed. Notably, the most powerful LLMs currently exist on massive cloud networks, posing difficulties in embodying AI agents with today's technology.


In a related context, the article delves into the recent trend of Bitcoin-based nonfungible tokens (NFTs), highlighting a substantial sales dip in January. Additionally, it discusses how gaming platform Roblox has integrated an in-house large language model (LLM) and real-time AI translation into the metaverse, reflecting the ongoing advancements in the intersection of AI and digital assets.


This comprehensive coverage sheds light on the forefront of AI research, emphasizing the critical role of embodied artificial intelligence in advancing the capabilities of AI systems towards achieving human-level intelligence and understanding.


(TRISTAN GREENE, COINTELEGRAPH, 2024)