Researchers demonstrate that even top-performing large language models lack a genuine understanding of the world’s structure and rules, leading to unexpected failures on similar tasks.
Large language models (LLMs) showcase impressive abilities—such as writing poetry or creating functional code—despite being designed to predict the next word in a text sequence.
These surprising skills might suggest that LLMs are implicitly learning general truths about the world. However, a recent study challenges this assumption. Researchers discovered that a popular generative AI model could provide highly accurate driving directions in New York City, yet lacked a true internal map of the city. When researchers introduced street closures and detours, the model’s performance significantly declined.
Further analysis revealed that the AI’s implied maps included imaginary streets weaving through the grid and connecting distant intersections. This raises concerns for real-world AI applications, as models performing well in one scenario could fail if the environment shifts slightly.
“There’s hope that, since LLMs excel at language tasks, we could use these tools in other scientific fields. However, the question of whether they truly understand world models is crucial for leveraging AI in new discoveries,” explains Ashesh Rambachan, MIT assistant professor of economics and principal investigator at the Laboratory for Information and Decision Systems (LIDS).
Rambachan co-authored this research with Keyon Vafa, a Harvard postdoc; Justin Y. Chen, MIT EECS graduate student; Jon Kleinberg, Cornell professor of computer and information science; and Sendhil Mullainathan, MIT professor in EECS and economics, and LIDS member. The findings will be presented at the Conference on Neural Information Processing Systems.
Novel Evaluation Metrics
The researchers investigated transformers, the core architecture behind large language models (LLMs) like GPT-4, which are trained on vast amounts of text data to predict the next word in a sequence.
However, simply measuring how accurately a model predicts isn’t sufficient to confirm whether it has a genuine understanding of the world. For instance, they observed that a transformer could reliably predict valid moves in Connect 4 without understanding the game’s rules.
To address this, the team created two new metrics to evaluate a transformer’s “world model.” They focused on deterministic finite automations (DFAs)—problems with clearly defined states and rules, like navigating city streets or playing a structured board game.
According to lead researcher Keyon Vafa, “We needed controlled environments where we understand the exact world model, so we could rigorously analyze the model’s understanding.”
The first metric, sequence distinction, checks if the model can identify and differentiate between two distinct states, such as two different Othello boards, and recognize their unique qualities. Transformers rely on ordered data sequences to generate outputs, so this metric evaluates their ability to form distinct world models.
The second metric, sequence compression, tests whether the model recognizes identical states as having the same sequence of possible next steps. For example, two identical Othello boards should prompt the same predicted moves.
Using these metrics, the researchers evaluated two types of transformers: one trained on randomly generated sequences and another on data that follows specific strategies. This approach offers a more rigorous assessment of a model’s understanding of structured environments and rules.
Incoherent world models
The researchers unexpectedly found that transformers making random choices developed more accurate world models, likely due to exposure to a broader set of possible moves during training.
“If you watch two random computers playing Othello instead of top-tier players, you’d theoretically see a complete range of moves, including those experts would avoid,” Vafa explains.
Although the transformers accurately generated Othello moves and directions, only one produced a coherent model for Othello moves, and none succeeded in forming a reliable model for city navigation. When researchers added detours to New York City’s map, all models failed.
“I was surprised by how quickly performance declined when we introduced detours. Closing just 1% of streets dropped accuracy from nearly 100% to 67%,” Vafa notes.
Examining the models’ generated maps showed distorted versions of New York, filled with imaginary streets and overpasses that didn’t follow real city layouts.
These findings illustrate that transformers can perform complex tasks without grasping underlying rules. The researchers believe that building LLMs capable of creating accurate world models will require different methods.
“We often assume models that perform well understand the world. I hope this research highlights the need for careful evaluation, beyond just intuition,” says Rambachan.
Looking forward, the team aims to tackle diverse problems, including scenarios with partially known rules, and apply their evaluation metrics to real scientific challenges.
This research received funding from the Harvard Data Science Initiative, NSF Graduate Research Fellowship, Vannevar Bush Faculty Fellowship, Simons Collaboration grant, and MacArthur Foundation.