Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction

Jan 24, 2025

Hammad Ayyubi, Xuande Feng, Junzhang Liu, Xudong Lin, Zhecan Wang, Shih-Fu Chang

Figure 1 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction

Figure 2 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction

Figure 3 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction

Figure 4 for PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction

Share this with someone who'll enjoy it:

Abstract:The task of predicting time and location from images is challenging and requires complex human-like puzzle-solving ability over different clues. In this work, we formalize this ability into core skills and implement them using different modules in an expert pipeline called PuzzleGPT. PuzzleGPT consists of a perceiver to identify visual clues, a reasoner to deduce prediction candidates, a combiner to combinatorially combine information from different clues, a web retriever to get external knowledge if the task can't be solved locally, and a noise filter for robustness. This results in a zero-shot, interpretable, and robust approach that records state-of-the-art performance on two datasets -- TARA and WikiTilo. PuzzleGPT outperforms large VLMs such as BLIP-2, InstructBLIP, LLaVA, and even GPT-4V, as well as automatically generated reasoning pipelines like VisProg, by at least 32% and 38%, respectively. It even rivals or surpasses finetuned models.

* NAACL 2025 Findings

View paper on

Share this with someone who'll enjoy it:

Title:PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction

Paper and Code