TY - GEN
T1 - Comparing Methods for Finding Search Sessions on a Specified Topic
T2 - 25th International Conference on Theory and Practice of Digital Libraries, TPDL 2021
AU - Bogaard, Tessel
AU - Bilgin, Aysenur
AU - Wielemaker, Jan
AU - Hollink, Laura
AU - Ribbens, Kees
AU - van Ossenbruggen, Jacco
N1 - Acknowledgements:
We would like to thank the National Library of the Netherlands, and Lynda Hardman (Centrum Wiskunde & Informatica) for their support. The Wikipedia articles related to WWII were assessed by Kees Ribbens with the assistance of Caroline Schoofs and Koen Smilde. This research is partially supported by the VRE4EIC project, a project that has received funding from the European Union?s Horizon 2020 research and innovation program under grant agreement No 676247.
Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021/9/7
Y1 - 2021/9/7
N2 - Users searching for different topics in a collection may show distinct search patterns. To analyze search behavior of users searching for a specific topic, we need to retrieve the sessions containing this topic. In this paper, we compare different topic representations and approaches to find topic-specific sessions. We conduct our research in a double case study of two topics, World War II and feminism, using search logs of a historical newspaper collection. We evaluate the results using manually created ground truths of over 600 sessions per topic. The two case studies show similar results: The query-based methods yield high precision, at the expense of recall. The document-based methods find more sessions, at the expense of precision. In both approaches, precision improves significantly by manually curating the topic representations. This study demonstrates how different methods to find sessions containing specific topics can be applied by digital humanities scholars and practitioners.
AB - Users searching for different topics in a collection may show distinct search patterns. To analyze search behavior of users searching for a specific topic, we need to retrieve the sessions containing this topic. In this paper, we compare different topic representations and approaches to find topic-specific sessions. We conduct our research in a double case study of two topics, World War II and feminism, using search logs of a historical newspaper collection. We evaluate the results using manually created ground truths of over 600 sessions per topic. The two case studies show similar results: The query-based methods yield high precision, at the expense of recall. The document-based methods find more sessions, at the expense of precision. In both approaches, precision improves significantly by manually curating the topic representations. This study demonstrates how different methods to find sessions containing specific topics can be applied by digital humanities scholars and practitioners.
UR - http://www.scopus.com/inward/record.url?scp=85115322800&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-86324-1_23
DO - 10.1007/978-3-030-86324-1_23
M3 - Conference proceeding
AN - SCOPUS:85115322800
SN - 9783030863234
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 189
EP - 201
BT - Linking Theory and Practice of Digital Libraries - 25th International Conference on Theory and Practice of Digital Libraries, TPDL 2021, Proceedings
A2 - Berget, Gerd
A2 - Hall, Mark Michael
A2 - Brenn, Daniel
A2 - Kumpulainen, Sanna
PB - Springer Science+Business Media
Y2 - 13 September 2021 through 17 September 2021
ER -