This session will loosely follow Matthew Lavin’s Programming Historian lesson “Analyzing Documents with TF-IDF”, which “focuses on a foundational natural language processing and information retrieval method called Term Frequency - Inverse Document Frequency (tf-idf). This lesson explores the foundations of tf-idf, and will also introduce you to some of the questions and concepts of computationally oriented text analysis”. We will explore some of the key concepts, potential uses and limitations of frequency-based analyses of historical corpora, including comparing tf-idf with some other widely used approaches such as keyness analysis and topic modelling.
We will use the Python programming language within Google Colab to do calculations and see the effects of different choices we make. No prior programming experience is required, but you will need to bring a laptop and have a Google account to participate fully.
The session will be facilitated by Dr Chris Thomson, Senior Lecturer in Digital Humanities and Director of the UC Arts Digital Lab.