Ordered graph descriptions consistently outperform the random baseline across all traditional graph tasks and prompt configurations. Moreover, simpler tasks like connectivity and cycle detection exhibit low variance, indicating strong robustness in LLM reasoning. However, more complex tasks (e.g., Hamilton path, topological sort, and shortest path) show higher variance, with the shortest path task being the most impacted due to its reliance on weighted graphs, where description order significantly influences complexity. Additionally, BFS generally outperforms DFS in tasks such as cycle detection, connectivity detection, and shortest path. On the other hand, for tasks requiring deeper exploration, such as topological sort and Hamilton path, DFS performs better.
Similarly, ordered descriptions consistently outperform the baseline in node classification tasks. However, the improvements in node classification are generally less pronounced than those observed in traditional graph tasks. For instance, in the Pubmed dataset with ego-graph sampling, the PR order boosts accuracy by 14.82%, marking the largest improvement in node classification tasks. Notably, for node classification, the PR order consistently outperforms the PPR across all datasets, while PPR generally performs better than traversal-based orders.
We hypothesize that LLMs' improved performance with ordered descriptions stems from limitations in positional encoding and attention mechanisms—what we call attention bias. Positional encodings, which are meant to provide sequence information in transformer models, may not effectively capture the structural complexity of graph data when the input sequence is unordered. Guided by positional encodings, attention mechanisms can give undue priority to certain sections of the input based on their order in the sequence, leading to over-dependence on the graph description.