Readability and topics of the German Health Web: exploratory study and text analysis

Background The internet has become an increasingly important resource for health information, especially for lay people. However, the information found does not necessarily comply with the user’s health literacy level. Therefore, it is vital to (1) identify prominent information providers, (2) quant...

Full description

Saved in:
Bibliographic Details
Main Authors: Zowalla, Richard (Author) , Pfeifer, Daniel (Author) , Wetter, Thomas (Author)
Format: Article (Journal)
Language:English
Published: February 10, 2023
In: PLOS ONE
Year: 2023, Volume: 18, Issue: 2, Pages: 1-37
ISSN:1932-6203
DOI:10.1371/journal.pone.0281582
Online Access:Verlag, lizenzpflichtig, Volltext: https://doi.org/10.1371/journal.pone.0281582
Verlag, lizenzpflichtig, Volltext: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0281582
Get full text
Author Notes:Richard Zowalla, Daniel Pfeifer, Thomas Wetter

MARC

LEADER 00000caa a2200000 c 4500
001 184621467X
003 DE-627
005 20230706205307.0
007 cr uuu---uuuuu
008 230525s2023 xx |||||o 00| ||eng c
024 7 |a 10.1371/journal.pone.0281582  |2 doi 
035 |a (DE-627)184621467X 
035 |a (DE-599)KXP184621467X 
035 |a (OCoLC)1389529921 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 33  |2 sdnb 
100 1 |a Zowalla, Richard  |d 1990-  |e VerfasserIn  |0 (DE-588)1222745283  |0 (DE-627)1742033598  |4 aut 
245 1 0 |a Readability and topics of the German Health Web  |b exploratory study and text analysis  |c Richard Zowalla, Daniel Pfeifer, Thomas Wetter 
264 1 |c February 10, 2023 
300 |a 37 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Gesehen am 25.05.2023 
520 |a Background The internet has become an increasingly important resource for health information, especially for lay people. However, the information found does not necessarily comply with the user’s health literacy level. Therefore, it is vital to (1) identify prominent information providers, (2) quantify the readability of written health information, and (3) to analyze how different types of information sources are suited for people with differing health literacy levels. Objective In previous work, we showed the use of a focused crawler to “capture” and describe a large sample of the “German Health Web”, which we call the “Sampled German Health Web” (sGHW). It includes health-related web content of the three mostly German speaking countries Germany, Austria, and Switzerland, i.e. country-code top-level domains (ccTLDs) “.de”, “.at” and “.ch”. Based on the crawled data, we now provide a fully automated readability and vocabulary analysis of a subsample of the sGHW, an analysis of the sGHW’s graph structure covering its size, its content providers and a ratio of public to private stakeholders. In addition, we apply Latent Dirichlet Allocation (LDA) to identify topics and themes within the sGHW. Methods Important web sites were identified by applying PageRank on the sGHW’s graph representation. LDA was used to discover topics within the top-ranked web sites. Next, a computer-based readability and vocabulary analysis was performed on each health-related web page. Flesch Reading Ease (FRE) and the 4th Vienna formula (WSTF) were used to assess the readability. Vocabulary was assessed by a specifically trained Support Vector Machine classifier. Results In total, n = 14,193,743 health-related web pages were collected during the study period of 370 days. The resulting host-aggregated web graph comprises 231,733 nodes connected via 429,530 edges (network diameter = 25; average path length = 6.804; average degree = 1.854; modularity = 0.723). Among 3000 top-ranked pages (1000 per ccTLD according to PageRank), 18.50%(555/3000) belong to web sites from governmental or public institutions, 18.03% (541/3000) from nonprofit organizations, 54.03% (1621/3000) from private organizations, 4.07% (122/3000) from news agencies, 3.87% (116/3000) from pharmaceutical companies, 0.90% (27/3000) from private bloggers, and 0.60% (18/3000) are from others. LDA identified 50 topics, which we grouped into 11 themes: “Research & Science”, “Illness & Injury”, “The State”, “Healthcare structures”, “Diet & Food”, “Medical Specialities”, “Economy”, “Food production”, “Health communication”, “Family” and “Other”. The most prevalent themes were “Research & Science” and “Illness & Injury” accounting for 21.04% and 17.92% of all topics across all ccTLDs and provider types, respectively. Our readability analysis reveals that the majority of the collected web sites is structurally difficult or very difficult to read: 84.63% (2539/3000) scored a WSTF ≥ 12, 89.70% (2691/3000) scored a FRE ≤ 49. Moreover, our vocabulary analysis shows that 44.00% (1320/3000) web sites use vocabulary that is well suited for a lay audience. Conclusions We were able to identify major information hubs as well as topics and themes within the sGHW. Results indicate that the readability within the sGHW is low. As a consequence, patients may face barriers, even though the vocabulary used seems appropriate from a medical perspective. In future work, the authors intend to extend their analyses to identify trustworthy health information web sites. 
650 4 |a Algorithms 
650 4 |a Diagnostic radiology 
650 4 |a Health education and awareness 
650 4 |a Information retrieval 
650 4 |a Internet 
650 4 |a Mental health therapies 
650 4 |a Patients 
650 4 |a Vocabulary 
700 1 |a Pfeifer, Daniel  |e VerfasserIn  |4 aut 
700 1 |a Wetter, Thomas  |d 1953-  |e VerfasserIn  |0 (DE-588)141236124  |0 (DE-627)703920774  |0 (DE-576)322863252  |4 aut 
773 0 8 |i Enthalten in  |t PLOS ONE  |d San Francisco, California, US : PLOS, 2006  |g 18(2023), 2 vom: Feb., Artikel-ID e0281582, Seite 1-37  |h Online-Ressource  |w (DE-627)523574592  |w (DE-600)2267670-3  |w (DE-576)281331979  |x 1932-6203  |7 nnas  |a Readability and topics of the German Health Web exploratory study and text analysis 
773 1 8 |g volume:18  |g year:2023  |g number:2  |g month:02  |g elocationid:e0281582  |g pages:1-37  |g extent:37  |a Readability and topics of the German Health Web exploratory study and text analysis 
856 4 0 |u https://doi.org/10.1371/journal.pone.0281582  |x Verlag  |x Resolving-System  |z lizenzpflichtig  |3 Volltext 
856 4 0 |u https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0281582  |x Verlag  |z lizenzpflichtig  |3 Volltext 
951 |a AR 
992 |a 20230525 
993 |a Article 
994 |a 2023 
998 |g 141236124  |a Wetter, Thomas  |m 141236124:Wetter, Thomas  |d 910000  |d 911800  |e 910000PW141236124  |e 911800PW141236124  |k 0/910000/  |k 1/910000/911800/  |p 3  |y j 
998 |g 1222745283  |a Zowalla, Richard  |m 1222745283:Zowalla, Richard  |d 50000  |e 50000PZ1222745283  |k 0/50000/  |p 1  |x j 
999 |a KXP-PPN184621467X  |e 4325178996 
BIB |a Y 
SER |a journal 
JSO |a {"name":{"displayForm":["Richard Zowalla, Daniel Pfeifer, Thomas Wetter"]},"id":{"eki":["184621467X"],"doi":["10.1371/journal.pone.0281582"]},"origin":[{"dateIssuedKey":"2023","dateIssuedDisp":"February 10, 2023"}],"relHost":[{"title":[{"title_sort":"PLOS ONE","title":"PLOS ONE"}],"part":{"extent":"37","volume":"18","text":"18(2023), 2 vom: Feb., Artikel-ID e0281582, Seite 1-37","issue":"2","pages":"1-37","year":"2023"},"pubHistory":["1.2006 -"],"corporate":[{"display":"Public Library of Science","roleDisplay":"Herausgebendes Organ","role":"isb"}],"language":["eng"],"recId":"523574592","disp":"Readability and topics of the German Health Web exploratory study and text analysisPLOS ONE","type":{"media":"Online-Ressource","bibl":"periodical"},"note":["Schreibweise des Titels bis 2012: PLoS ONE","Gesehen am 20.03.19"],"id":{"issn":["1932-6203"],"eki":["523574592"],"zdb":["2267670-3"]},"origin":[{"publisherPlace":"San Francisco, California, US ; Lawrence, Kan.","dateIssuedKey":"2006","publisher":"PLOS ; PLoS","dateIssuedDisp":"2006-"}],"name":{"displayForm":["Public Library of Science"]},"physDesc":[{"extent":"Online-Ressource"}]}],"physDesc":[{"extent":"37 S."}],"person":[{"family":"Zowalla","given":"Richard","display":"Zowalla, Richard","roleDisplay":"VerfasserIn","role":"aut"},{"family":"Pfeifer","given":"Daniel","roleDisplay":"VerfasserIn","display":"Pfeifer, Daniel","role":"aut"},{"family":"Wetter","given":"Thomas","roleDisplay":"VerfasserIn","display":"Wetter, Thomas","role":"aut"}],"title":[{"title":"Readability and topics of the German Health Web","subtitle":"exploratory study and text analysis","title_sort":"Readability and topics of the German Health Web"}],"language":["eng"],"recId":"184621467X","note":["Gesehen am 25.05.2023"],"type":{"bibl":"article-journal","media":"Online-Ressource"}} 
SRT |a ZOWALLARICREADABILIT1020