tl;dr Book-length introductions to programming or analytic methods (math / statistics / etc.) aimed at or useful for humanists with limited coding experience.
What am I missing from this list of systematic (book-length) introductions to analytic programming in DH? Please add https://t.co/tWjtb54q4T
— Scott B. Weingart (@scott_bot) February 17, 2017
I’m collecting programming & methodological textbooks for humanists as part of a reflective study on DH, but figured it’d also be useful for those interested in teaching themselves to code, or teachers who need a textbook for their class. Though I haven’t read them all yet, I’ve organized them into very imperfect categories and provided (hopefully) some useful comments. Short coding exercises, books that assume some pre-existing knowledge of coding, and theoretical introductions are not listed here.
Thanks to @Literature_Geek, @ProgHist, @heatherfro, @electricarchaeo, @digitaldante, @kintopp, @dmimno, & @collinj for their contributions to the growing list. In the interest of maintaining scope, not all of their suggestions appear below.
- The Programming Historian, 1st edition (2007). William J. Turkel and Alan MacEachern.
- An open access introduction to programming in Python. Mostly web scraping and basic text analysis. Probably best to look to newer resources, due to the date. Although it’s aimed at historians, the methods are broadly useful to all text-based DH.
- The Programming Historian, 2nd edition (ongoing). Afanador-Llach, Maria José, Antonio Rojas Castro, Adam Crymble, Víctor Gayol, Fred Gibbs, Caleb McDaniel, Ian Milligan, Amanda Visconti, and Jeri Wieringa, eds.
- Constantly updating lessons, ostensibly aimed at historians, but useful to all of DH. Includes introductions to web development, text analysis, GIS, network analysis, etc. in multiple programming languages. Not a monograph, and no real order.
- Computational Historical Thinking with Applications in R (ongoing). Lincoln Mullen.
- A series of lessons in in R, still under development with quite a few chapters missing. Probably the only programming book aimed at historians that actually focuses on historical questions and approaches.
- The Rubyist Historian (2004). Jason Heppler.
- A short introduction to programming in Ruby. Again, ostensibly aimed at historians, but really just focused on the fundamentals of coding, and useful in that context.
- Natural Language Processing for Historical Texts (2012). Michael Piotrowski.
- About natural language processing, but not an introduction to coding. Instead, an introduction to the methodological approaches of natural language processing specific to historical texts (OCR, spelling normalization, choosing a corpus, part of speech tagging, etc.). Teaches a variety of tools and techniques.
- The Historian’s Macroscope (2015). Graham, Milligan, & Weingart.
- Okay I’m cheating a bit here! This isn’t teaching you to program, but Shawn, Ian, and I spent a while writing this intro to digital methods for historians, so I figured I’d sneak a link in.
Literary & Linguistic Analysis
- Text Analysis with R for Students of Literature (2014). Matthew Jockers.
- Step-by-step introduction to learning R, specifically focused on literary text analysis, both for close and distant reading, with primers on the statistical approaches being used. Includes approaches to, e.g., word frequency distribution, lexical variety, classification, and topic modeling.
- The Art of Literary Text Analysis (ongoing). Stéfan Sinclair & Geoffrey Rockwell.
- A growing, interactive textbook similar in scope to Jockers’ book (close & distant reading in literary analysis), but in Python rather than R. Heavily focused on the code itself, and includes such methods as topic modeling and sentiment analysis.
- Statistics for Corpus Linguistics (1998). Michael Oakes.
- Don’t know anything about this one, sorry!
General Digital Humanities
Many of the above books are focused on literary or historical analysis only in name, but are really useful for everyone in DH. The below are similar in scope, but don’t aim themselves at one particular group.
- Humanities Data in R (2015). Lauren Tilton & Taylor Arnold.
- General introduction to programming through R, and broadly focused on many approaches, including basic statistics, networks, maps, texts, and images. Teaches concepts and programmatic implementations.
- Digital Research Methods with Mathematica (2015). William J. Turkel.
- A Mathematica notebook (thus, not accessible unless you have an appropriate reader) teaching text, image, and geo-based analysis. Mathematica itself is an expensive piece of software without an institutional license, so this resource may be inaccessible to many learners. [NOTE: Arno Bosse wrote positive feedback on this textbook in a comment below.]
- Exploratory Programming for the Arts and Humanities (2016). Nick Montfort.
- An introduction to the fundamentals of programming specifically for arts and humanities, languages Python and Processing, that goes through statistics, text, sound, animation, images, and so forth. Much more expansive than many other options listed here, but not as focused on needs of text analysis (which is probably a good thing).
- An Introduction to Text Analysis: A Coursebook (2016). Brandon Walsh & Sarah Horowitz.
- A brief textbook with exercises and explanatory notes specific to text analysis for the study of literature and history. Not an introduction to programming, but covers some of the mathematical and methodological concepts used in these sorts of studies.
- Python Programming for Humanists (ongoing). Folgert Karsdorp and Maarten van Gompel.
- Interactive (Jupyter) notebooks teaching Python for statistical text analysis. Quite thorough, teaching methodological reasoning and examples, including quizzes and other lesson helpers, going from basic tokenization up through unsupervised learning, object-oriented programming, etc.
Statistical Methods & Machine Learning
- Statistics for the Humanities (2014). John Canning.
- Not an introduction to coding of any sort, but a solid intro to statistics geared at the sort of stats needed by humanists (archaeologists, literary theorists, philosophers, historians, etc.). Reading this should give you a solid foundation of statistical methods (sampling, confidence intervals, bias, etc.)
- Data Mining: Practical Machine Learning Tools and Techniques, 4th edition (2016). Witten, Frank, Hall, & Pal.
- A practical intro to machine learning in Weka, Java-based software for data mining and modeling. Not aimed at humanists, but legible to the dedicated amateur. It really gets into the weeds of how machine learning works.
- Text Mining with R (2017). Julia Silge and David Robinson.
- Introduction to text mining aimed at data scientists in the statistical programming language R. Some knowledge of R is expected; the authors suggest using R for Data Science (2016) by Grolemund & Wickham to get up to speed. This is for those interested in current data science coding best-practices, though it does not get as in-depth as some other texts focused on literary text analysis. Good as a solid base to learn from.
- The Curious Journalist’s Guide to Data (2016). Jonathan Stray.
- Not an intro to programming or math, but rather a good guide to quantitatively thinking through evidence and argument. Aimed at journalists, but of potential use to more empirically-minded humanists.
Data Visualization, Web Development, & Related
- D3.js in Action, 2nd edition (2017). Elijah Meeks.
- Drupal for Humanists (2016). Quinn Dombrowski.
- Full-length introduction to Drupal, a web platform that allows you to build “environments for gathering, annotating, arranging, and presenting their research and supporting materials” on the web. Useful for those interested in getting started with the creation of web-based projects but who don’t want to dive head-first into from-scratch web development.
- (Xe)LaTeX appliqué aux sciences humaines (2012). Maïeul Rouquette, Brendan Chabannes et Enimie Rouquette.
- French introduction to LaTeX for humanists. LaTeX is the primary means scientists use to prepare documents (instead of MS Word or similar software), which allows for more sustainable, robust, and easily typeset scholarly publications. If humanists wish to publish in natural (or some social) science journals, this is an important skill.