Abstract:Reasoning ability is one of the most crucial capabilities of a foundation model, signifying its capacity to address complex reasoning tasks. Chain-of-Thought (CoT) technique is widely regarded as one of the effective methods for enhancing the reasoning ability of foundation models and has garnered significant attention. However, the reasoning process of CoT is linear, step-by-step, similar to personal logical reasoning, suitable for solving general and slightly complicated problems. On the contrary, the thinking pattern of an expert owns two prominent characteristics that cannot be handled appropriately in CoT, i.e., high-order multi-hop reasoning and multimodal comparative judgement. Therefore, the core motivation of this paper is transcending CoT to construct a reasoning paradigm that can think like an expert. The hyperedge of a hypergraph could connect various vertices, making it naturally suitable for modelling high-order relationships. Inspired by this, this paper innovatively proposes a multimodal Hypergraph-of-Thought (HoT) reasoning paradigm, which enables the foundation models to possess the expert-level ability of high-order multi-hop reasoning and multimodal comparative judgement. Specifically, a textual hypergraph-of-thought is constructed utilizing triple as the primary thought to model higher-order relationships, and a hyperedge-of-thought is generated through multi-hop walking paths to achieve multi-hop inference. Furthermore, we devise a visual hypergraph-of-thought to interact with the textual hypergraph-of-thought via Cross-modal Co-Attention Graph Learning for multimodal comparative verification. Experimentations on the ScienceQA benchmark demonstrate the proposed HoT-based T5 outperforms CoT-based GPT3.5 and chatGPT, which is on par with CoT-based GPT4 with a lower model size.
Abstract:Link Prediction, addressing the issue of completing KGs with missing facts, has been broadly studied. However, less light is shed on the ubiquitous hyper-relational KGs. Most existing hyper-relational KG embedding models still tear an n-ary fact into smaller tuples, neglecting the indecomposability of some n-ary facts. While other frameworks work for certain arity facts only or ignore the significance of primary triple. In this paper, we represent an n-ary fact as a whole, simultaneously keeping the integrity of n-ary fact and maintaining the vital role that the primary triple plays. In addition, we generalize hyperbolic Poincar\'e embedding from binary to arbitrary arity data, which has not been studied yet. To tackle the weak expressiveness and high complexity issue, we propose HYPER^2 which is qualified for capturing the interaction between entities within and beyond triple through information aggregation on the tangent space. Extensive experiments demonstrate HYPER^2 achieves superior performance to its translational and deep analogues, improving SOTA by up to 34.5\% with relatively few dimensions. Moreover, we study the side effect of literals and we theoretically and experimentally compare the computational complexity of HYPER^2 against several best performing baselines, HYPER^2 is 49-61 times quicker than its counterparts.