Humanities Data Analysis “125 85018 Karsdrop Humanities ch01 3p” — 2020/8/19 — 11 03 — page 165 — #1 II Advanced Data Analysis ������������������������������� “125 85018 Karsdrop Humanities ch01 3p” —[.]
“125-85018_Karsdrop_Humanities_ch01_3p” — 2020/8/19 — 11:03 — page 165 — #1 II Advanced Data Analysis “125-85018_Karsdrop_Humanities_ch01_3p” — 2020/8/19 — 11:03 — page 166 — #2 “125-85018_Karsdrop_Humanities_ch01_3p” — 2020/8/19 — 11:03 — page 167 — #3 In the first part to this book, “Data Analysis Essentials,” we have covered much ground already The introductory chapter (chapter 1), which revolved around the case study of historical cookbooks in the United States, was meant to set the stage A number of established libraries, such as NumPy and Pandas, were introduced, albeit at a relatively high level The chapter’s aim was to illustrate, in very broad brushstrokes, the potential of Python and its ecosystem for quantitative data analysis in the humanities In chapter 2, we took a step back and focused on Python as a practical instrument for data carpentry: we discussed a number of established file formats that allow Python to interface with the wealth of scholarly data that is nowadays digitally available, such as the various Shakespeariana that were at the heart of this chapter Chapter 3, then, centered around a corpus of historical French plays and the question of how we can numerically represent such a corpus as a documentterm matrix Geometry was a focal point of this chapter, offering an intuitive framework to approach texts as vectors in a space and estimate the distances between them, for instance in terms of word usage In the final chapter of part (chapter 4), the Pandas library was introduced at length, which specifically caters to scholars working with such tabular data Using a deceptively simple dataset of historical baby names, it was shown how Pandas’s routines can assist scholars in highly complex diachronic analyses The second part of this book, “Advanced Data Analysis,” will build on the previously covered topics Reading and parsing structured data, for instance, is a topic that returns at the start of each chapter The vector space model is also representation strategy that will be revisited more than once; the same goes for a number of ubiquitous libraries such as NumPy and Pandas: these libraries have become crucial tools in the world of scholarly Python The chapters in part each cover a more advanced introduction to established applications in quantitatively oriented scholarship in the humanities We start with covering some statistics essentials that will immediately lay the basis for some of the more advanced chapters, including the one on probability theory (chapter 6) We then proceed with a chapter on drawing maps in Python and performing (historical) geospatial analysis Finally, we end with two more specific yet wellknown applications: stylometry (chapter 8), the quantitative study of writing style (especially in the context of authorship attribution) and topic modeling (chapter 9), a mixed-membership method that is able to model the semantics of large collections of documents “125-85018_Karsdrop_Humanities_ch01_3p” — 2020/8/19 — 11:03 — page 168 — #4 ...“125-85018_Karsdrop _Humanities_ ch01_3p” — 2020/8/19 — 11:03 — page 166 — #2 “125-85018_Karsdrop _Humanities_ ch01_3p” — 2020/8/19 — 11:03 — page 167 — #3 In the first part to this book, ? ?Data Analysis Essentials,”... Python and its ecosystem for quantitative data analysis in the humanities In chapter 2, we took a step back and focused on Python as a practical instrument for data carpentry: we discussed a number... diachronic analyses The second part of this book, “Advanced Data Analysis, ” will build on the previously covered topics Reading and parsing structured data, for instance, is a topic that returns at the