Teaching Yourself to Code in DH

tl;dr Book-length introductions to programming or analytic methods (math / statistics / etc.) aimed at or useful for humanists with limited coding experience.


I’m collecting programming & methodological textbooks for humanists as part of a reflective study on DH, but figured it’d also be useful for those interested in¬†teaching themselves to code, or teachers who need a textbook for their class. Though I haven’t read them all yet, I’ve organized them into very imperfect categories and provided (hopefully) some useful comments. Short coding exercises, books that¬†assume some pre-existing knowledge of coding, and theoretical introductions are not listed here.

Thanks to @Literature_Geek, @ProgHist, @heatherfro, @electricarchaeo, @digitaldante, @kintopp, @dmimno, & @collinj for their contributions to the growing list. In the interest of maintaining scope, not all of their suggestions appear below.

Historical Analysis

  • The Programming Historian, 1st edition¬†(2007). William J. Turkel and Alan MacEachern.
    • An open access introduction to programming in Python. Mostly web scraping and basic text analysis. Probably best to look to newer resources, due to the date. Although it’s aimed at historians, the methods are broadly useful to all text-based DH.
  • The Programming Historian, 2nd edition (ongoing).¬†Afanador-Llach, Maria Jos√©, Antonio Rojas Castro, Adam Crymble, V√≠ctor Gayol, Fred Gibbs, Caleb McDaniel, Ian Milligan, Amanda Visconti, and Jeri Wieringa, eds.
    • Constantly updating lessons, ostensibly aimed at historians, but useful to all of DH. Includes introductions to web development, text analysis, GIS, network analysis, etc. in multiple programming languages. Not a monograph, and no real order.
  • Computational Historical Thinking with Applications in R (ongoing). Lincoln Mullen.
    • A series of lessons in in R, still under development with quite a few chapters missing. Probably the only programming book aimed at historians that actually focuses on historical questions and approaches.
  • The Rubyist Historian (2004). Jason Heppler.
    • A short introduction to programming in Ruby. Again, ostensibly aimed at historians, but really just focused on the fundamentals of coding, and useful in that context.
  • Natural Language Processing for Historical Texts (2012).¬†Michael Piotrowski.
    • About natural language processing, but not an introduction to coding. Instead, an introduction to the methodological approaches of natural language processing specific to historical texts (OCR, spelling normalization, choosing a corpus, part of speech tagging, etc.).¬†Teaches a variety of tools and techniques.
  • The Historian’s Macroscope¬†(2015). Graham, Milligan, & Weingart.
    • Okay I’m cheating¬†a bit here! This isn’t teaching you to program, but Shawn, Ian, and I spent a while writing this intro to digital methods for historians, so I figured I’d sneak a link in.

Literary & Linguistic Analysis

  • Text Analysis with R for Students of Literature (2014). Matthew Jockers.
    • Step-by-step introduction to¬†learning R, specifically focused on literary text analysis, both for close and distant reading, with primers on the statistical approaches¬†being used. Includes¬†approaches to, e.g., word frequency distribution, lexical variety,¬†classification, and topic modeling.
  • The Art of Literary Text Analysis (ongoing). St√©fan Sinclair & Geoffrey Rockwell.
    • A growing, interactive textbook similar in scope to Jockers’ book (close & distant reading in literary analysis), but in Python rather than R. Heavily focused on the code itself, and includes such methods as topic modeling and sentiment analysis.
  • Statistics for Corpus Linguistics (1998). Michael Oakes.
    • Don’t know anything about this one, sorry!

General Digital Humanities

Many¬†of the above books are focused on literary or historical analysis only in name, but are really useful for everyone¬†in DH. The below are similar in scope, but don’t aim themselves at one particular group.

  • Humanities Data in R (2015). Lauren Tilton & Taylor Arnold.
    • General introduction to¬†programming through R, and broadly focused on many approaches, including basic statistics, networks, maps, texts, and images. Teaches concepts and programmatic implementations.
  • Digital Research Methods with Mathematica (2015). William J. Turkel.
    • A Mathematica notebook (thus, not accessible unless you have an appropriate reader) teaching text, image, and geo-based analysis. Mathematica itself is¬†an expensive piece of software without an institutional license, so this resource may be inaccessible to many learners. [NOTE: Arno Bosse wrote¬†positive feedback on this textbook in a comment below.]
  • Exploratory Programming for the Arts and Humanities (2016). Nick Montfort.
    • An introduction to the fundamentals of programming specifically for arts and humanities, languages Python and Processing, that goes through statistics, text, sound, animation, images, and so forth.¬†Much more expansive than many other options listed here, but not as focused on needs of text analysis (which is probably a good thing).
  • An Introduction to Text Analysis: A Coursebook (2016). Brandon Walsh & Sarah Horowitz.
    • A brief textbook with exercises and explanatory notes specific to text analysis for the study of literature and history. Not an introduction to programming, but covers some of the¬†mathematical and methodological concepts used in these sorts of studies.
  • Python Programming for Humanists (ongoing).¬†Folgert Karsdorp and Maarten van Gompel.
    • Interactive (Jupyter) notebooks teaching Python for statistical¬†text analysis. Quite thorough, teaching methodological¬†reasoning and examples, including quizzes and other lesson helpers, going from basic tokenization up through unsupervised learning, object-oriented programming, etc.

Statistical Methods & Machine Learning

  • Statistics for the Humanities (2014). John Canning.
    • Not an introduction to coding of any sort, but a solid intro to statistics geared at the sort of stats needed by humanists (archaeologists, literary theorists, philosophers, historians, etc.). Reading this should give you a solid foundation of statistical methods (sampling, confidence intervals, bias, etc.)
  • Data Mining: Practical Machine Learning Tools and Techniques, 4th edition (2016).¬†Witten, Frank, Hall, & Pal.
    • A practical intro to machine learning in Weka, Java-based software for data mining and modeling. Not aimed at humanists, but legible to the dedicated amateur. It really gets into the weeds of how machine learning works.
  • Text Mining with R (2017).¬†Julia Silge and David Robinson.
    • Introduction to text mining aimed at data scientists in the statistical programming language R. Some knowledge of R is expected; the authors suggest using R for Data Science¬†(2016)¬†by¬†Grolemund & Wickham to get up to speed. This is for those interested in current data science coding best-practices, though it does not get as in-depth as some other texts focused on literary text analysis. Good¬†as a solid base to learn from.
  • The Curious Journalist’s Guide to Data (2016). Jonathan Stray.
    • Not an intro to programming or math, but rather a good guide to¬†quantitatively thinking through evidence and argument. Aimed at journalists, but of¬†potential use to more empirically-minded humanists.
  • Six Septembers: Mathematics for the Humanist¬†(2017).¬†Patrick Juola & Stephen Ramsay.
    • Fantastic introduction to simple and advanced mathematics written by and for humanists. Approachable, prose-heavy, and grounded in humanities examples. Covers topics like algebra, calculus, statistics, differential equations. Definitely a foundations text, not an applications one.

Data Visualization, Web Development, & Related

  • D3.js in Action, 2nd edition (2017). Elijah Meeks.
    • Introduction to programmatic, online data visualization in javascript and the library D3.js. Not aimed at the humanities, but written by a digital humanist; easy to read and follow. The great thing about D3 is it’s a library for visualizing something in whatever fashion you might imagine, so this is a good book for those who want to design their own visualizations rather than using off-the-shelf tools.
  • Drupal for Humanists (2016).¬†Quinn Dombrowski.
    • Full-length introduction to Drupal, a web platform that allows you to build “environments for gathering, annotating, arranging, and presenting their research and supporting materials” on the web. Useful for those interested in getting started with the creation of web-based projects but who don’t want to dive head-first into from-scratch web development.
  • (Xe)LaTeX appliqu√© aux sciences humaines (2012).¬†Ma√Įeul Rouquette, Brendan Chabannes et Enimie Rouquette.
    • French introduction to LaTeX for humanists. LaTeX is the primary means scientists use to prepare documents (instead of MS Word or similar software), which allows for more sustainable, robust, and easily typeset scholarly publications. If humanists wish to publish in¬†natural (or some social) science journals, this is an important skill.

A Working Definition of Digital Humanites

Hah! I tricked you. I don’t intend to define digital humanities here‚ÄĒtoo much blood has already been spilled over that subject. I’m sure we all remember the terrible digital humanities / humanities computing wars of 2004, now commemorated yearly under a Big Tent in the U.S., Europe, or in 2015, Australia. Most of us still suffer from ACH or ALLC (edit: I’ve been reminded the more politically correct acronym these days is EADH).

Instead, I’m here to report the findings of an extremely informal survey, with a sample size of 5, inspired by Paige Morgan’s question of what courses an undergraduate interested in digital humanities should take:

The question inspired a long discussion, worth reading through if you’re interested in digital humanities curricula. I suggested, were the undergrad interested in the heavily computational humanities (like Ted Underwood, Ben Schmidt, etc.), they might take linear algebra, statistics for social science, programming 1 & 2, web development, and a social science (like psych) research methods course, along with all their regular humanities courses. Others suggested to remove some and include others, and of course all of these are pipe dreams unless our mystery undergrad is in the six year program.

The Pipe Dream Curriculum. [via]
The Pipe Dream Curriculum. [via]
The discussion got me thinking: how did the digital humanists we know and love get to where they are today? Given that the basic ethos of DH is that if you want to know something, you just have to ask, I decided to ask a few well-respected DHers how someone might go about reaching expertise in their subject matter.¬†This isn’t a question of how to define digital humanities, but of the sorts of things the digital humanists we know and love learned to get where they are today. I asked:

Dear all,

Some of you may have seen this tweet by Paige Morgan this morning, asking about what classes an undergraduate student should take hoping to pursue DH. I’ve emailed you, a random and diverse smattering of highly recognizable names associated with DH, in the hopes of getting a broader answer than we were able to generate through twitter alone.

I know you’re all extremely busy, so please excuse my unsolicited semi-mass email and no worries if you don’t get around to replying.

If you do reply, however, I’d love to get a list of undergraduate courses (traditional humanities or otherwise) that you believe was or would be instrumental to the research you do. My list, for example, would include historical methods, philosophy of science, linear algebra, statistics, programming, and web development. I’ll take the list of lists and write up a short blog post about them, because I believe it would be beneficial for many new students who are interested in pursuing DH in all its guises. I’d also welcome suggestions for other people and “schools of DH” I’m sure to have missed.

Many thanks,
Scott

The Replies

And because the people in DH are awesome and forthcoming, I got many replies back. I’ll list them first here, and then attempt some preliminary synthesis below.

Ted Underwood

The first reply was from Ted Underwood, who was afraid my question skirted a bit too close to defining DH, saying:

No matter how heavily I hedge and qualify my response (“this is just a personal list relevant to the particular kind of research I do …”), people will tend to read lists like this as tacit/covert/latent efforts to DEFINE DH — an enterprise from which I never harvest anything but thorns.

Thankfully he came back to me a bit later, saying he’d worked up the nerve to reply to my survey because he’s “coming to the conclusion that this is a vital question we can’t afford to duck, even if it’s controversial [emphasis added]”. Ted continued:

So here goes, with three provisos:

  1. I’m talking only about my own field (literary text mining), and not about the larger entity called “DH,” which may be too deeply diverse to fit into a single curriculum.
  2. A lot of this is not stuff I actually took in the classroom.
  3. I really don’t have strong opinions about how much of this should be taken as an undergrad, and what can wait for grad school. In practice, no undergrad is going to prepare themselves specifically for literary text mining (at least, I hope not). They should be aiming at some broader target.

But at some point, as preparation for literary text-mining, I’d recommend

  • A lot of courses in literary history and critical theory (you probably need a major’s worth of courses in some aspect of literary studies).
  • At least one semester of experience programming. Two semesters is better. But existing CS courses may not be the most efficient delivery system. You probably don’t need big-O notation. You do need data structures. You may not need to sweat the fine points of encapsulation. You probably do need to know about version control. I think there’s room for a “Programming for Humanists” course here.
  • Maybe one semester of linguistics (I took historical linguistics, but corpus linguistics would also work).
  • Statistics — a methods course for social scientists would be great.
  • At least one course in data mining / machine learning. This may presuppose more math than one semester of statistics will provide, so
  • Your recommendation of linear algebra is probably also a good idea.

I doubt all of that will fit in anyone’s undergrad degree. So in practice, any undergrad with courses in literary history plus a semester or two of programming experience, and perhaps statistics, would be doing very well.

So Underwood’s reply was that focusing too much in undergrad is not necessarily ideal, but were an undergraduate interested in literary text mining, they wouldn’t go astray with literary history, critical theory, a programming for humanists course, linguistics, statistics, data mining, and potentially linear algebra.

Johanna Drucker

While Underwood is pretty well known for his computational literary history, Johanna Drucker is probably most well known in our circles for her work in DH criticism. Her reply was concise and helpful:

Look at http://dh101.humanities.ucla.edu

In the best of all possible worlds, this would be followed by specialized classes in database design, scripting for the humanities, GIS/mapping, virtual worlds design, metadata/classification/culture, XML/markup, and data mining (textual corpora, image data mining, network analysis), and complex systems modeling, as well as upper division courses in disciplines (close/distant reading for literary studies, historical methods and mapping etc.).

The site she points is an online coursebook that provides a broad overview of DH concepts, along with exercises and tutorials, that would make a good basic course on the groundwork of DH. She then lists a familiar list of computer-related and humanities course that might be useful.

Melissa Terras

The next reply came from Melissa Terras, the director of the DH center (I’m sorry, centre) at UCL. Her response was a bit more general:

My first response is that they must be interested in Humanities research – and make the transition to being taught about Humanities, to doing research in the Humanities, and get the bug for finding out new information about a Humanities topic. It doesn’t matter what the Humanities subject is – but they must understand Humanities research questions, and what it means to undertake new research in the Humanities proper. (Doesn’t matter if their research project has no computing component, it’s about a hunger for new knowledge in this area, rather than digesting prior knowledge).

Like Underwood and Drucker, Terras is stressing that students cannot forget the humanities for the digital.

Then they must become information literate, and IT literate. We have a variety of training courses at our institution, and there is also the “European Driving License in IT” which is basic IT skills. They must get the bug for learning more about computing too. They’ll know after some basic courses whether they are a natural fit to computing.

Without the bug to do research, and the bug to understand more about computing, they are sunk for pursuing DH. These are the two main prerequisites.

Interestingly (but not surprisingly, given general DH trends), Terras frames passion about computing as more important than any particular skill.

Once they get the bug, then taking whatever courses are on offer to them at their institution Рeither for credit modules, or pure training courses in various IT methods, would stand them in good stead. For example, you are not going to get a degree course in Photoshop, but attending 6 hours of training in that…. plus spreadsheets, plus databases, plus XML, plus web design, would prepare you for pursuing a variety of other courses. Even if the institution doesnt offer taught DH courses, chances are they offer training in IT. They need to get their hands dirty, and to love learning more about computing, and the information environment we inhabit.

Her stress on hyper-focused courses of a few hours each is also interesting, and very much in line with our “workshop and summer school”-focused training mindset in DH.

It’s at that stage I’d be looking for a master’s program in DH, to take the learning of both IT and the humanities to a different level. Your list excludes people who have done “pure” humanities as an undergrad to pursuing DH, and actually, I think DH needs people who are, ya know, obsessed with Byzantine Sculpture in the first instance, but aren’t afraid of learning new aspects of computing without having any undergrad credit courses in it.

I’d also say that there is plenty room for people who do it the other way around – undergrads in comp sci, who then learn and get the bug for humanities research.

Terras continued that taking everything as an undergraduate would equate more to liberal arts or information science than a pure humanities degree:

As with all of these things, it depends on the make up of the individual programs. In my undergrad, I did 6 courses in my final year. If I had taken all of the ones you suggest: (historical methods, philosophy of science, linear algebra, statistics, programming, and web development) then I wouldn’t have been able to take any humanities courses! which would mean I was doing liberal arts, or information science, rather than a pure humanities degree. This will be a problem for many – just sayin’. ūüôā

But yes, I think the key thing really is the *interest* and the *passion*. If your institution doesnt allow that type of courses as part of a humanities degree, you haven’t shot yourself in the foot, you just need to learn computing some other way…

Self-teaching is something that I think most people reading this blog can get behind (or commiserate with). I’m glad Terras shifted my focus away from undergraduate courses, and more on¬†how to get a DH education.

John Walsh

John Walsh is known in the DH world for his work on TEI, XML, and other formal data models of humanities media. He replied:

I started undergrad as a fine arts major (graphic design) at Ohio University, before switching to English literary studies. As an art major, I was required during my freshman year to take ‚ÄúComparative Arts I & II,‚ÄĚ in which we studied mostly the formal aspects of literature, visual arts, music, and architecture. Each of the two classes occupied a ten-week “quarter” (fall winter spring summer), rather than a semester. At the time OU had a department of comparative arts, which has since become the School of Interdisciplinary Arts.

In any case, they were fascinating classes, and until you asked the question, I hadn’t really considered those courses in the context of DH, but they were definitely relevant and influential to my own work. I took these courses in the 80s, but I imagine an updated version that took into account digital media and digital representations of non-digital media would be especially useful. The study of the formal aspects of these different art forms and media and shared issues of composition and construction gave me a solid foundation for my own work constructing things to model and represent these formal characteristics and relationships.

Walsh was the first one to single out a specific¬†humanities course as particularly beneficial to the DH agenda. It makes sense: the course appears to have crossed many boundaries, focusing particularly on formal similarities. I’d hazard that this approach is at the heart of many of the more computational and formal areas of digital humanities (but perhaps less so for those areas more aligned with new media or critical theory).

I agree web development should be in the mix somewhere, along with something like¬†Ryan Cordell’s “Text Technologies” that would cover various representations of text/documents and a look at their production, digital and otherwise, as well as tools (text analysis, topic modeling, visualization) for doing interesting things with those texts/documents.

Otherwise, Walsh’s courses aligned with those of Underwood and Drucker.

Matt Jockers

Matt Jockers‘ expertise, like Underwoods, tends toward computational literary history and criticism. His reply was short and to the point:

The thing I see missing here are courses Linguistics and Machine Learning. Specifically courses in computational linguistics, corpus linguistics, and NLP. The later are sometimes found in the CS depts. and sometimes in linguistics, it depends. Likewise, courses in Machine Learning are sometimes found in Statistics (as at Stanford) and sometimes in CS (as at UNL).

Jockers, like Underwood, mentioned that I was missing linguistics. On the twitter conversation, Heather Froehlich pointed out the same deficiency. He and Underwood also pointed out machine learning, which are particularly useful for the sort of research they both do.

Wrapping Up

I was initially surprised by how homogeneous the answers were, given the much-touted diversity of the digital humanities. I had asked a few others to get back to me, who for various reasons couldn’t get back to me at the time, situated more closely in the new media, alt-ac, and library camps, but even the similarity among those I asked was a bit surprising. Is it that DH is slowly canonizing around particular axes and methods, or is my selection criteria just woefully biased? I wouldn’t be too surprised if it were the latter.

In the end, it seems (at least according to life-paths of these particular digital humanists), the modern digital humanist should be a passionate generalist, well-versed in their particular field of humanistic inquiry, and decently-versed in a dizzying array of subjects and methods that are tied to computers in some way or another. The path is not necessarily one an undergraduate curriculum is well-suited for, but the self-motivated have many potential sources for education.

I was initially hoping to turn this short survey into a list of potential undergraduate curricula for different DH paths (much like my list of DH syllabi), but it seems we’re either not yet at that stage, or DH is particularly ill-suited for the undergraduate-style curricula. I’m hoping some of you will leave comments on the areas of DH I’ve clearly missed, but from the view thus-far, there seems to be more similarities than differences.