Distortions in judged spatial relations in large language models

We present a benchmark for assessing the capability of Large Language Models (LLMs) to discern intercardinal directions between geographic locations and apply it to three prominent LLMs: GPT-3.5, GPT-4, and Llama-2. This benchmark specifically evaluates whether LLMs exhibit a hierarchical spatial bi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Fulman, Nir (VerfasserIn) , Memduhoğlu, Abdulkadir (VerfasserIn) , Zipf, Alexander (VerfasserIn)
Dokumenttyp:	Article (Journal) Kapitel/Artikel
Sprache:	Englisch
Veröffentlicht:	4 Jun 2024
Ausgabe:	Version v2
In:	Arxiv Year: 2024, Pages: 1-18
DOI:	10.48550/arXiv.2401.04218
Online-Zugang:	Verlag, kostenfrei, Volltext: https://doi.org/10.48550/arXiv.2401.04218 Verlag, kostenfrei, Volltext: http://arxiv.org/abs/2401.04218
Verfasserangaben:	Nir Fulman, Abdulkadir Memduhoğlu, Alexander Zipf

MARC


LEADER	00000naa a2200000 c 4500
001	1928887368
003	DE-627
005	20250623130114.0
007	cr uuu---uuuuu
008	250623s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.48550/arXiv.2401.04218 \|2 doi
035			\|a (DE-627)1928887368
035			\|a (DE-599)KXP1928887368
040			\|a DE-627 \|b ger \|c DE-627 \|e rda
041			\|a eng
084			\|a 61 \|2 sdnb
100	1		\|a Fulman, Nir \|e VerfasserIn \|0 (DE-588)1317779584 \|0 (DE-627)1879624907 \|4 aut
245	1	0	\|a Distortions in judged spatial relations in large language models \|c Nir Fulman, Abdulkadir Memduhoğlu, Alexander Zipf
250			\|a Version v2
264		1	\|c 4 Jun 2024
300			\|b Karte
300			\|a 18
336			\|a Text \|b txt \|2 rdacontent
337			\|a Computermedien \|b c \|2 rdamedia
338			\|a Online-Ressource \|b cr \|2 rdacarrier
500			\|a Gesehen am 23.06.2025
520			\|a We present a benchmark for assessing the capability of Large Language Models (LLMs) to discern intercardinal directions between geographic locations and apply it to three prominent LLMs: GPT-3.5, GPT-4, and Llama-2. This benchmark specifically evaluates whether LLMs exhibit a hierarchical spatial bias similar to humans, where judgments about individual locations' spatial relationships are influenced by the perceived relationships of the larger groups that contain them. To investigate this, we formulated 14 questions focusing on well-known American cities. Seven questions were designed to challenge the LLMs with scenarios potentially influenced by the orientation of larger geographical units, such as states or countries, while the remaining seven targeted locations were less susceptible to such hierarchical categorization. Among the tested models, GPT-4 exhibited superior performance with 55 percent accuracy, followed by GPT-3.5 at 47 percent, and Llama-2 at 45 percent. The models showed significantly reduced accuracy on tasks with suspected hierarchical bias. For example, GPT-4's accuracy dropped to 33 percent on these tasks, compared to 86 percent on others. However, the models identified the nearest cardinal direction in most cases, reflecting their associative learning mechanism, thereby embodying human-like misconceptions. We discuss avenues for improving the spatial reasoning capabilities of LLMs.
650		4	\|a Computer Science - Computation and Language
700	1		\|a Memduhoğlu, Abdulkadir \|e VerfasserIn \|0 (DE-588)1342530799 \|0 (DE-627)1903048877 \|4 aut
700	1		\|a Zipf, Alexander \|d 1971- \|e VerfasserIn \|0 (DE-588)123246369 \|0 (DE-627)082437076 \|0 (DE-576)175641056 \|4 aut
773	0	8	\|i Enthalten in \|t Arxiv \|d Ithaca, NY : Cornell University, 1991 \|g (2024) vom: Apr., Artikel-ID 2401.04218, Seite 1-18 \|h Online-Ressource \|w (DE-627)509006531 \|w (DE-600)2225896-6 \|w (DE-576)28130436X \|7 nnas \|a Distortions in judged spatial relations in large language models
773	1	8	\|g year:2024 \|g month:04 \|g elocationid:2401.04218 \|g pages:1-18 \|g extent:18 \|a Distortions in judged spatial relations in large language models
856	4	0	\|u https://doi.org/10.48550/arXiv.2401.04218 \|x Verlag \|x Resolving-System \|z kostenfrei \|3 Volltext
856	4	0	\|u http://arxiv.org/abs/2401.04218 \|x Verlag \|z kostenfrei \|3 Volltext
951			\|a AR
992			\|a 20250623
993			\|a Article
994			\|a 2025
998			\|g 123246369 \|a Zipf, Alexander \|m 123246369:Zipf, Alexander \|d 120000 \|d 120700 \|e 120000PZ123246369 \|e 120700PZ123246369 \|k 0/120000/ \|k 1/120000/120700/ \|p 3
998			\|g 1342530799 \|a Memduhoğlu, Abdulkadir \|m 1342530799:Memduhoğlu, Abdulkadir \|d 120000 \|d 120700 \|e 120000PM1342530799 \|e 120700PM1342530799 \|k 0/120000/ \|k 1/120000/120700/ \|p 2
998			\|g 1317779584 \|a Fulman, Nir \|m 1317779584:Fulman, Nir \|d 120000 \|d 120700 \|e 120000PF1317779584 \|e 120700PF1317779584 \|k 0/120000/ \|k 1/120000/120700/ \|p 1 \|x j
999			\|a KXP-PPN1928887368 \|e 473766894X
BIB			\|a Y
JSO			\|a {"recId":"1928887368","physDesc":[{"noteIll":"Karte","extent":"18 S."}],"relHost":[{"type":{"bibl":"edited-book","media":"Online-Ressource"},"titleAlt":[{"title":"Arxiv.org"},{"title":"Arxiv.org e-print archive"},{"title":"Arxiv e-print archive"},{"title":"De.arxiv.org"}],"note":["Gesehen am 28.05.2024"],"recId":"509006531","physDesc":[{"extent":"Online-Ressource"}],"disp":"Distortions in judged spatial relations in large language modelsArxiv","title":[{"title_sort":"Arxiv","title":"Arxiv"}],"id":{"eki":["509006531"],"zdb":["2225896-6"]},"part":{"pages":"1-18","extent":"18","text":"(2024) vom: Apr., Artikel-ID 2401.04218, Seite 1-18","year":"2024"},"origin":[{"publisherPlace":"Ithaca, NY ; [Erscheinungsort nicht ermittelbar]","dateIssuedKey":"1991","publisher":"Cornell University ; Arxiv.org","dateIssuedDisp":"1991-"}],"pubHistory":["1991 -"],"language":["eng"]}],"language":["eng"],"note":["Gesehen am 23.06.2025"],"person":[{"given":"Nir","display":"Fulman, Nir","role":"aut","family":"Fulman"},{"family":"Memduhoğlu","display":"Memduhoğlu, Abdulkadir","role":"aut","given":"Abdulkadir"},{"given":"Alexander","display":"Zipf, Alexander","role":"aut","family":"Zipf"}],"id":{"eki":["1928887368"],"doi":["10.48550/arXiv.2401.04218"]},"origin":[{"dateIssuedKey":"2024","edition":"Version v2","dateIssuedDisp":"4 Jun 2024"}],"title":[{"title_sort":"Distortions in judged spatial relations in large language models","title":"Distortions in judged spatial relations in large language models"}],"name":{"displayForm":["Nir Fulman, Abdulkadir Memduhoğlu, Alexander Zipf"]},"type":{"bibl":"chapter","media":"Online-Ressource"}}
SRT			\|a FULMANNIRMDISTORTION4202

Distortions in judged spatial relations in large language models

MARC

Ähnliche Einträge