Teaching Yourself to Code in DH

tl;dr Book-length introductions to programming or analytic methods (math / statistics / etc.) aimed at or useful for humanists with limited coding experience.

I’m collecting programming & methodological textbooks for humanists as part of a reflective study on DH, but figured it’d also be useful for those interested in teaching themselves to code, or teachers who need a textbook for their class. Though I haven’t read them all yet, I’ve organized them into very imperfect categories and provided (hopefully) some useful comments. Short coding exercises, books that assume some pre-existing knowledge of coding, and theoretical introductions are not listed here.

Thanks to @Literature_Geek, @ProgHist, @heatherfro, @electricarchaeo, @digitaldante, @kintopp, @dmimno, & @collinj for their contributions to the growing list. In the interest of maintaining scope, not all of their suggestions appear below.

Historical Analysis

  • The Programming Historian, 1st edition (2007). William J. Turkel and Alan MacEachern.
    • An open access introduction to programming in Python. Mostly web scraping and basic text analysis. Probably best to look to newer resources, due to the date. Although it’s aimed at historians, the methods are broadly useful to all text-based DH.
  • The Programming Historian, 2nd edition (ongoing). Afanador-Llach, Maria José, Antonio Rojas Castro, Adam Crymble, Víctor Gayol, Fred Gibbs, Caleb McDaniel, Ian Milligan, Amanda Visconti, and Jeri Wieringa, eds.
    • Constantly updating lessons, ostensibly aimed at historians, but useful to all of DH. Includes introductions to web development, text analysis, GIS, network analysis, etc. in multiple programming languages. Not a monograph, and no real order.
  • Computational Historical Thinking with Applications in R (ongoing). Lincoln Mullen.
    • A series of lessons in in R, still under development with quite a few chapters missing. Probably the only programming book aimed at historians that actually focuses on historical questions and approaches.
  • The Rubyist Historian (2004). Jason Heppler.
    • A short introduction to programming in Ruby. Again, ostensibly aimed at historians, but really just focused on the fundamentals of coding, and useful in that context.
  • Natural Language Processing for Historical Texts (2012). Michael Piotrowski.
    • About natural language processing, but not an introduction to coding. Instead, an introduction to the methodological approaches of natural language processing specific to historical texts (OCR, spelling normalization, choosing a corpus, part of speech tagging, etc.). Teaches a variety of tools and techniques.
  • The Historian’s Macroscope (2015). Graham, Milligan, & Weingart.
    • Okay I’m cheating a bit here! This isn’t teaching you to program, but Shawn, Ian, and I spent a while writing this intro to digital methods for historians, so I figured I’d sneak a link in.

Literary & Linguistic Analysis

  • Text Analysis with R for Students of Literature (2014). Matthew Jockers.
    • Step-by-step introduction to learning R, specifically focused on literary text analysis, both for close and distant reading, with primers on the statistical approaches being used. Includes approaches to, e.g., word frequency distribution, lexical variety, classification, and topic modeling.
  • The Art of Literary Text Analysis (ongoing). Stéfan Sinclair & Geoffrey Rockwell.
    • A growing, interactive textbook similar in scope to Jockers’ book (close & distant reading in literary analysis), but in Python rather than R. Heavily focused on the code itself, and includes such methods as topic modeling and sentiment analysis.
  • Statistics for Corpus Linguistics (1998). Michael Oakes.
    • Don’t know anything about this one, sorry!

General Digital Humanities

Many of the above books are focused on literary or historical analysis only in name, but are really useful for everyone in DH. The below are similar in scope, but don’t aim themselves at one particular group.

  • Humanities Data in R (2015). Lauren Tilton & Taylor Arnold.
    • General introduction to programming through R, and broadly focused on many approaches, including basic statistics, networks, maps, texts, and images. Teaches concepts and programmatic implementations.
  • Digital Research Methods with Mathematica (2015). William J. Turkel.
    • A Mathematica notebook (thus, not accessible unless you have an appropriate reader) teaching text, image, and geo-based analysis. Mathematica itself is an expensive piece of software without an institutional license, so this resource may be inaccessible to many learners. [NOTE: Arno Bosse wrote positive feedback on this textbook in a comment below.]
  • Exploratory Programming for the Arts and Humanities (2016). Nick Montfort.
    • An introduction to the fundamentals of programming specifically for arts and humanities, languages Python and Processing, that goes through statistics, text, sound, animation, images, and so forth. Much more expansive than many other options listed here, but not as focused on needs of text analysis (which is probably a good thing).
  • An Introduction to Text Analysis: A Coursebook (2016). Brandon Walsh & Sarah Horowitz.
    • A brief textbook with exercises and explanatory notes specific to text analysis for the study of literature and history. Not an introduction to programming, but covers some of the mathematical and methodological concepts used in these sorts of studies.
  • Python Programming for Humanists (ongoing). Folgert Karsdorp and Maarten van Gompel.
    • Interactive (Jupyter) notebooks teaching Python for statistical text analysis. Quite thorough, teaching methodological reasoning and examples, including quizzes and other lesson helpers, going from basic tokenization up through unsupervised learning, object-oriented programming, etc.
  • Technical Foundations of Informatics (2017). Michael Freeman and Joel Ross.
    • Teaches the start-to-finish skills needed to write code to work with data, from command line to markdown to github to R and ggplot2. Not aimed at humanists, but aimed at those with no prior technical experience.

Statistical Methods & Machine Learning

  • Statistics for the Humanities (2014). John Canning.
    • Not an introduction to coding of any sort, but a solid intro to statistics geared at the sort of stats needed by humanists (archaeologists, literary theorists, philosophers, historians, etc.). Reading this should give you a solid foundation of statistical methods (sampling, confidence intervals, bias, etc.)
  • Data Mining: Practical Machine Learning Tools and Techniques, 4th edition (2016). Witten, Frank, Hall, & Pal.
    • A practical intro to machine learning in Weka, Java-based software for data mining and modeling. Not aimed at humanists, but legible to the dedicated amateur. It really gets into the weeds of how machine learning works.
  • Text Mining with R (2017). Julia Silge and David Robinson.
    • Introduction to text mining aimed at data scientists in the statistical programming language R. Some knowledge of R is expected; the authors suggest using R for Data Science (2016) by Grolemund & Wickham to get up to speed. This is for those interested in current data science coding best-practices, though it does not get as in-depth as some other texts focused on literary text analysis. Good as a solid base to learn from.
  • The Curious Journalist’s Guide to Data (2016). Jonathan Stray.
    • Not an intro to programming or math, but rather a good guide to quantitatively thinking through evidence and argument. Aimed at journalists, but of potential use to more empirically-minded humanists.
  • Six Septembers: Mathematics for the Humanist (2017). Patrick Juola & Stephen Ramsay.
    • Fantastic introduction to simple and advanced mathematics written by and for humanists. Approachable, prose-heavy, and grounded in humanities examples. Covers topics like algebra, calculus, statistics, differential equations. Definitely a foundations text, not an applications one.

Data Visualization, Web Development, & Related

  • Data Visualization for Social Science: A practical introduction with R and ggplot2 (2017). Kieran Healy
    • A “hands-on introduction to the principles and practice of looking at and presenting data using R and ggplot” that introduce readers “to both the ideas and the methods of data visualization in a comprehensible and reproducible way”. Incredibly thorough, painstakingly annotated, and though not aimed directly at humanists, is close enough in scope to be more valuable than a general introduction to data science.
  • Interactive Information Visualization (2017). Michael Freeman.
    • Introduction to the skills, tools, and setup required to create interactive web visualizations, briefly covering everything from HTML to D3.js. Not aimed at the humanities, but aimed at those with no prior experience with code.
  • D3.js in Action, 2nd edition (2017). Elijah Meeks.
    • Introduction to programmatic, online data visualization in javascript and the library D3.js. Not aimed at the humanities, but written by a digital humanist; easy to read and follow. The great thing about D3 is it’s a library for visualizing something in whatever fashion you might imagine, so this is a good book for those who want to design their own visualizations rather than using off-the-shelf tools.
  • Drupal for Humanists (2016). Quinn Dombrowski.
    • Full-length introduction to Drupal, a web platform that allows you to build “environments for gathering, annotating, arranging, and presenting their research and supporting materials” on the web. Useful for those interested in getting started with the creation of web-based projects but who don’t want to dive head-first into from-scratch web development.
  • (Xe)LaTeX appliqué aux sciences humaines (2012). Maïeul Rouquette, Brendan Chabannes et Enimie Rouquette.
    • French introduction to LaTeX for humanists. LaTeX is the primary means scientists use to prepare documents (instead of MS Word or similar software), which allows for more sustainable, robust, and easily typeset scholarly publications. If humanists wish to publish in natural (or some social) science journals, this is an important skill.

4 thoughts on “Teaching Yourself to Code in DH”

  1. This is an excellent list — thank you! I’d like to add two quick comments to your note on about Bill Turkel’s entry to flesh it out a bit. The commercial version of Mathematica is expensive. On the other hand, many colleges/universities already have institutional licenses which provides free use of the software. Student pricing w/o an institutional license is $150. ‘Digital Research Methods with Mathematica’ is a Mathematica notebook. But it’s also a c. 130 page equiv. textbook covering a great many DH methods, accompanied by exercies with answer keys, a full syllabus etc. I think it’s a fantastic resource which promotes the kind of interactive teaching platform we’re starting to see people in the DH community begin to explore with Jupytr notebooks.

Leave a Reply