Argument Clinic

Zoe LeBlanc asked how basic statistics lead to a meaningful historical argument. A good discussion followed, worth reading, but since I couldn’t fit my response into tweets, I hoped to add a bit to the thread here on the irregular. I’m addressing only one tiny corner of her question, in a way that is peculiar to my own still-forming approach to computational history; I hope it will be of some use to those starting out.

In brief, I argue that one good approach to computational history cycles between data summaries and focused hypothesis exploration, driven by historiographic knowledge, in service to finding and supporting historically interesting agendas. There’s a lot of good computational history that doesn’t do this, and a lot of bad computational history that does, but this may be a helpful rubric to follow.

In the spirit of Monty Python, the below video has absolutely nothing to do with the discussion at hand.

Zoe’s question gets at the heart of one of the two most prominent failures of computational history in 2017 1: the inability to go beyond descriptive statistics into historical argument. 2 I’ve written before on one of the many reasons for this inability, but that’s not the subject of this post. This post covers some good practices in getting from statistics to arguments.

Describing the Past

Historians, for the most part, aren’t experimentalists. 3 Our goals vary, but they often include telling stories about the past that haven’t been told, by employing newly-discovered evidence, connecting events that seemed unrelated, or revisiting an old narrative with a fresh perspective.

Facts alone usually don’t cut it. We don’t care what Jane ate for breakfast without a so what. Maybe her breakfast choices say something interesting about her socioeconomic status, or about food culture, or about how her eating habits informed the way she lived. Alongside a fact, we want why or how it came to be, what it means, or its role in some larger or future trend. A sufficiently big and surprising fact may be worthy of note on its own (“Jane ate orphans for breakfast” or “The government did indeed collude with a foreign power”), but such surprising revelations are rare, not the only purpose for historians, and still beg for context.

Computational history has gotten away with a lot of context-free presentations of fact. 4 That’s great! It’s a sign there’s a lot we didn’t know that contemporary methods & data make easily visible. 5 Here’s an example of one of mine, showing that, despite evidence to the contrary, there is a thriving community at the intersection of history and philosophy of science:

My citation analysis showing a bridge between history & philosophy of science.

But, though we’re not running out of low-hanging fruit, the novelty of mere description is wearing thin. Knowing that a community exists between history & philosophy of science is not particularly interesting; knowing why it exists, what it changes, or whether it is less tenuous than any other disciplinary borderland are more interesting and historiographically recognizable questions.

Context is Key

So how to get from description to historical argument? Though there’s no right path, and the route depends on the type of claim, this post may offer some guidance. Before we get too far, though, a note:

Description has little meaning without context and comparison. The data may show that more people are eating apples for breakfast, but there’s a lot to unpack there before it can be meaningful, let alone relevant.

Line chart of # of people who eat apples over time.

It may be, for example, that the general population is growing just as quickly as the number of people who eat apples. If that’s the case, does it matter that apple-eaters themselves don’t seem to be making up any larger percent of the population?

Line chart of # of people who eat apples over time (left axis) compared to general population (right axis).

The answer for a historian is: of course it matters. If we were talking about casualties of war, or amount of cities in a country, rather than apples, a twofold increase in absolute value (rather than percentage of population) makes a huge difference. It’s more lives affected; it’s more infrastructure and resources for a growing nation.

But the nature of that difference changes when we know our subject of study matches population dynamics. If we’re looking at voting patterns across cities, and we notice population density correlates with party affiliation, we can use that as a launching point for so what. Perhaps sparser cities rely on fewer social services to run smoothly, leading the population to vote more conservative; perhaps past events pushed conservative families towards the outskirts; perhaps.

Without having a ground against which to contextualize our results, a base map like general population, the fact of which cities voted in which direction gives us little historical meat to chew on.

On the other hand, some surprising facts, when contextualized, leave us less surprised. A two-fold increase in apple eating across a decade is pretty surprising, until you realize it happened alongside a similar increase in population. The fact is suddenly less worthy of report by itself, though it may have implications for, say, the growth of the apple industry.

But Zoe asked about statistics, not counting, in finding meaning. I don’t want to divert this post into teaching stats, and nor do I want to assume statistical knowledge, so I’ll opt for an incredibly simple metric: ratio.

The illustration above shows an increase in both population and apple-eating, and eyeball estimates show them growing apace. If we divide the total population by the number of people eating apples, however, our story is complicated.

Line chart of # of people who eat apples over time (left axis) compared to general population (right axis). A thick blue line in the middle (left axis) shows the ratio between the two.

Though both population and apple-eating increase, in 1806 the population begins rising much more rapidly than the number of apple-eaters. 6 It is in this statistically-derived difference that the historian may find something worth exploring and explaining further.

There are a many ways to compare and contextualize data, of which this is one. They aren’t worth enumerating, but the importance of contextualization is relevant to what comes next.

Question- and Data-Driven History

Computational historians like to talk about question-driven analysis. Computational history is best, we say, when it is led by a specific question or angle. The alternative is dumping a bunch of data into a statistics engine, describing it, and finding something weird, and saying “oh, this looks interesting.”

When push comes to shove, most would agree the above dichotomy is false. Historical questions don’t pop out of thin air, but from a continuously shifting relationship with the past. We read primary and secondary sources, do some data entry, do some analysis, do some more reading, and through it all build up a knowledge-base and a set of expectations about the past. We also by this point have a set of claims we don’t quite agree with, or some secondary sources with stories that feel wrong or incomplete.

This is where the computational history practice begins: with a firm grasp of the history and historiography of a period, and a set of assumptions, questions, and mild disagreements.

From here, if you’re reading this blog post, you’re likely in one of two camps:

  1. You have a big dataset and don’t know what to do with it, or
  2. You have a historiographic agenda (a point to prove, a question to answer, etc.) that you don’t know how to make computationally tractable.

We’ll begin with #1.

1. I have data. Now what?

Congratulations, you have data!

Congratulations!

This is probably the thornier of the two positions, and the one more prone to results of mere description. You want to know how to turn your data into interesting history, but you may end up doing little more than enumerating the blades of grass on a field. To avoid that, you must begin down a process sometimes called scalable reading, or a special case of the hermeneutic circle.

You start, of course, with mere description. How many records are there? What are the values in each? Are there changes over time or place? Who is most central? Before you start quantifying the data, write down the answers you expect to these questions, with a bit of a causal explanation for each.

Now, barrage your dataset with visualizations and statistical tests to find out exactly what makes it up. See how the results align with the hypotheses you noted down. If you created the data yourself, one archival visit at a time, you won’t find a lot that surprises you. That’s alright. Be sure to take time to consider what’s missing from the dataset, due to archival lacunae, bias, etc.

If any results surprise you, dig into the data to try to understand why. If none do, think about claims from secondary sources–do any contradict the data? Align with it?

This is also a good point to bring in contextualization. If you’re looking at the number of people doing something over time, try to compare your dataset to population dynamics. If you’re looking at word usage, find a way to compare your data to base frequencies of that word in similar collections. If you’re looking at social networks, compare them to random networks to see if their average path length or degree distribution are surprising compared to networks of similar size. Every unexpected result is an opportunity for exploration.

Internal comparisons may also yield interesting points to pursue further, especially if you think your data are biased. Given a limited dataset of actors, their genders, their roles, and play titles, for example, you may not be able to make broad claims about which plays are more popular, but you could see how different roles are distributed across genders within the group.

Internal comparisons could also be temporal. Given a dataset of occupations over time with a particular city, if you compare those numbers to population changes over time, you could find the moments where population and occupation dynamics part ways, and focus on those instances. Why, suddenly, are there more grocers?

The above boils down into two possible points of further research: deviations from expectation, or deviations from internal consistency.

Deviations from expectation–your own or that of some notable secondary source–can be particularly question-provoking. “Why didn’t this meet expectations” quickly becomes “what is wrong or incomplete about this common historical narrative?” From here, it’s useful to dig down into the points of data that exemplify such deviations, and see if you can figure out why and how they break from expectations.

Deviations from internal consistency–that is, when comparisons within the data wind up showing different trends–lead to positive rather than negative questions. Instead of “why is this theory wrong?”, you may ask, “why are these groups different?” or “why does this trend cease to keep pace with population during these decades?” Here you are asking specific questions that require new or shifted theories, whereas with deviations from expectations, you begin by seeing where existing narratives fail.

It’s worth reiterating that, in both scenarios, questions are drawn from deviations from some underlying theory.

In deviations from expectation, the underlying theory is what you bring to your data; you assume the data ought to look one way, but it doesn’t. You are coming with an internal, if not explicit, quantitative model of how the data ought to look.

In deviations from internal consistency, that data’s descriptive statistics provide the underlying theory against which there may be deviations. Apple-eaters deviating in number from population growth is only interesting if, at most points, apple-eaters grow  evenly alongside population. That is, you assume general statistics should be the same between groups or over time, and if they are not, it is worthy of explanation.

This an oversimplification, but a useful one. Undoubtedly, combinations of the two will arise: maybe you expect the differences between men and women in roles they play will be large, but it turns out they are small. This provides a deviation of both kinds, but no less legitimate for it. In this case, your recourse may be looking for other theatrical datasets to see if the gender dynamics play out the same across them, or if your data are somehow special and worthy of explanation outside the context of larger gender dynamics.

Which brings us, inexorably, to the cyclic process of computational history. Scalable reading. The hermeneutic circle. Whatever.

Point is, you’re at the point where some deviation or alignment seems worth explanation or exploration. You could stop here. You could present this trend, give a convincing causal just-so story of why it exists, and leave it at that. You will probably get published, since you’ve already gone farther than mere description, the trap of so much computational history.

But you shouldn’t stop here. You should take this opportunity to strengthen your story. Perhaps this is the point where you put your “traditional” historian’s cap back on, and go dust-diving for archival evidence to support your claims. I wouldn’t think less of you for it, but if you stop there, you’d only be reaping half the advantages of computational history.

In the example above, looking for other theatrical datasets to contextualize gender results in your own, hinted at the second half of the computational history research cycle: creating computationally tractable questions. Recall this section described the first half: making sense of data. Although I presented the two as separate, they productively feed on one another.

Once you’ve gone through your data to find how it aligns with your or others’ preconceived notions of the past, or how by its own internal deviations it presents interesting dilemmas, you have found yourself in the second half of the cycle. You have questions or theories you want to ask of data, but you do not yet have the data or the statistics to explore them.

This seems counter-intuitive. Why not just use the data or statistics already gathered, sometimes painstakingly over several years? Because if you use the same data & stats to both generate and answer questions, your evidence is circular. Specifically, you risk making a scientistic claim of what could easily be a spurious trend. It may simply be that, by random chance, the breakfast record-keeper lost a bunch of records from 1806-1810, thus causing the decline seen in the population ratio.

To convincingly make arguments from a historical data description, you must back it up using triangulation–approaching the problem from many angles. That triangulation may be computational, archival, archaeological, or however else you’re used to historying, but we’ll focus here on computational.

2. Computationally Tractable Questions

So you’ve got a historiographic agenda, and now you want to make it computationally tractable. Good luck! This is the hard part.

Good luck!

“Sparse areas relied less on social services.” “The infrastructure of science became less dependent on specific individuals over the course of the 17th century.” “T-Rex was a remarkable climber.” “Who benefited most from the power vacuum left by the assassination?” These hypotheses and questions do not, on their own, lend themselves to quantitative analysis.

Chief among the common difficulties of turning a historiographic agenda into a computationally tractable hypothesis is a lack of familiarity of computational methods. If you don’t know what a computer is good at, you can’t form an experiment to use one.

I said that history isn’t experimental, but I lied. Archival research can be an experiment if you go in with a hypothesis and a pre-conceived approach or set of criteria that would confirm it. Computational history, at this stage, is also experimental. It often works a little like this (but it may not): 7

  1. Set your agenda. Start with a hypothesis, historiographic framework, or question. For example, “The infrastructure of science became less dependent on specific individuals over the course of the 17th century.” (that question’s mine, don’t steal it.)
  2. Find testable hypotheses. Break it into many smaller statements that can be confirmed, denied, or quantitatively assessed. “If science depends less on specific individuals over the 17th century, the distribution of names mentioned in scholarly correspondence will flatten out. That is, in 1600 a few people will be mentioned frequently, whereas most will be mentioned infrequently; in 1700, the frequency of name mentions will be more evenly distributed across correspondence.” Or “If science depends less on specific individuals over the 17th century, when an important person died, it affected the scholarly network less in 1700 than in 1600.” (Notice in these two examples how finding evidence for the littler statements will corroborate the bigger hypothesis, and vice-versa.)
  3. Match hypotheses to approaches. Come up with methodological proxies, datasets, and/or statistical tests that could corroborate the littler statements. Be careful, thorough, and specific. For example, “In a network of 17th-century letter writers, if the removal of a central figure in 1600 decreases the average path length of the network less than the the removal of a central figure in 1700, central figures likely played less important structural roles. This will be most convincing if the effects of node removal smoothly decreases across the century.” (This is the step in which you need to come to the table with knowledge of different computational methods and what they do.)
  4. Specify proxies. List specific analytic approaches needed for the promising tests, and the data required to do them. For example, you need a list of senders and recipients of scholarly letters, roughly evenly distributed across time between 1600 and 1700, and densely-packed enough to perform network analysis. There could be a few different analytic approaches, including removing highly-central nodes and re-calculating average path length; employing measurements of attack tolerance; etc. Probably worth testing them all and seeing if each yields conforms to the pre-existing theory.
  5. Find data. Find pre-existing datasets that will fit your proxies, or estimate how long it will take to gather enough data yourself to reasonably approach your hypotheses. Opt for data that will work for as many approaches as possible. You may find some data that will suggest new hypotheses, and you’ll iterate back and forth between steps #3-#5 a few times.
  6. Collect data. Run experiments. Uh, yeah, just do those things. Easy as baking apple pie from scratch.
  7. Match experimental results to hypotheses. Here’s the fun part, you get to see how many of your predictions matched your results. Hopefully a bunch, but even if they didn’t, it’s an excuse to figure out why, and start the process anew. You can also start exploring the additional datasets to help you develop new questions. The astute may have noticed, this step brings us back to the first half of computational historiography: exploring data and seeing what you can find. 8

From here, it may be worthwhile to cycle back to the data exploration stage, then back here to computationally tractable hypothesis exploration, and so on ad infinitum.

By now, making meaning out of data probably feels impossible. I’m sorry. The process is much more fluid and intertwined than is easily unpacked in a blog post. The back-and-forth can take hours, days, months, or years.

But the important thing is, after you’ve gone back-and-forth a few times, you should have a combination of quantitative, archival, theoretical, and secondary support for a solidly historical argument.

Contexts of Discovery and Justification

Early 20th-century philosophy of science cared a lot about the distinction between the contexts of discovery and justification. Violently shortened, the context of discovery is how you reached your conclusion, and the context of justification is how you argue your point, regardless of the process that got you there.

I bring this up as a reminder that the two can be distinct. By the 1990s, quantitative historians who wanted to remain legible to their non-quantitative colleagues often saved the data analysis for an appendix, and even there the focus was on the actual experiments, not the long process of coming up with tests, re-testing, collecting more data, and so on.

The result of this cyclical computational historiography need not be (and rarely is, and perhaps can never be) a description of the process that led you to the evidence supporting your argument. While it’s a good idea to be clear about where your methods led you astray, the most legible result to historians will necessarily involve a narrative reconfiguration.

Causality and Truth

Small final notes on two big topics.

First, Causality. This approach won’t get you there. It’s hard to disentangle causality from correlation, but more importantly in this context, it’s hard to choose between competing causal explanations. The above process can lead you to plausible and corroborated hypotheses, but it cannot prove anything.

Consider this: “My hypothesis about apples predicts these 10 testable claims.” You test each claim, and each test agrees with your predictions. It’s a success, but a soft one; you’ve shown your hypothesis to be plausible given the evidence, but not inevitable. A dozen other equally sensible hypotheses could have produced the same 10 testable claims. You did not prove those hypotheses wrong, you just chose one model that happened to work. 9

Even if no alternate hypothesis presents itself, and all of your tests agree with your hypothesis, you still do not have causal proof. It may be that the proxies you chose to test your claims are bad ones, or incomplete, or your method has unseen holes. Causality is tricky, and in the humanities, proof especially so.

Which leads us to the next point: Truth. Even if somehow you devise the perfect process to find proof of a causal hypothesis, the causal description does not constitute capital-T Truth. There are many truths, coming from many perspectives, about the past, and they don’t need to agree with each other. Historians care not just about what happened, but how and why, and those hows and whys are driven by people. Messy, inconsistent people who believe many conflicting things within the span of a moment. When it comes to questions of society, even the most scientistic of scholars must come to terms with uncertainty and conflict, which after all are more causally central to the story of history than most clever narratives we might tell.

Notes:

  1. Also called digital history, and related to quantitative history and cliometrics in ways we don’t often like to admit.
  2. The other most prominent failure in computational history is our tendency to group things into finite discrete categories; in this case, a two-part list of failures.
  3. With some notable exceptions. Some historians simulate the past, others perform experiments on rates of material decay, or on the chemical composition of inks. It’s a big world out there.
  4. When I say fact, assume I add all the relevant post-modernist caveats of the contingency of objectivity etc. etc. Really I mean “matters of history that the volume of available evidence make difficult to dispute.”
  5. Ted Underwood and I have both talked about the exciting promise of incredibly low-hanging fruit in new approaches.
  6. OK in retrospect I should have used a more historically relevant example – I wasn’t expecting to push this example so far.
  7. If this seems overly scientistic, worry not! Experimental science is often defined by its recourse to rote procedure, which means pretty much any procedural explanation of research will resemble experimental science. There are many ways one can go about scalable reading / triangulation of computational historiography, not just the procedural steps #1-#7 above, but this is one of the easier approaches to explain. Soft falsification and hypothesis testing are plausible angles into computational history, but not necessary ones.
  8. A brief addendum to steps #6-#7: although I’d argue Null-Hypothesis Significance Testing or population-based statistical inferences may not be relevant to historiography, especially when its based in triangulation, they may be useful in certain cases. Without delving too deeply into the weeds, they can help you figure out the extent to which the effect you see may just be noise, not indicative of any particular trend. Statistical effect sizes also may be of use, helping you see whether the magnitude of your finding is big enough to have any appreciable role in the historical narrative.
  9. Shawn Graham and I wrote about this in relation to archaeology and simulation here, on the subject of underdetermination and abduction

Teaching Yourself to Code in DH

tl;dr Book-length introductions to programming or analytic methods (math / statistics / etc.) aimed at or useful for humanists with limited coding experience.


I’m collecting programming & methodological textbooks for humanists as part of a reflective study on DH, but figured it’d also be useful for those interested in teaching themselves to code, or teachers who need a textbook for their class. Though I haven’t read them all yet, I’ve organized them into very imperfect categories and provided (hopefully) some useful comments. Short coding exercises, books that assume some pre-existing knowledge of coding, and theoretical introductions are not listed here.

Thanks to @Literature_Geek, @ProgHist, @heatherfro, @electricarchaeo, @digitaldante, @kintopp, @dmimno, & @collinj for their contributions to the growing list. In the interest of maintaining scope, not all of their suggestions appear below.

Historical Analysis

  • The Programming Historian, 1st edition (2007). William J. Turkel and Alan MacEachern.
    • An open access introduction to programming in Python. Mostly web scraping and basic text analysis. Probably best to look to newer resources, due to the date. Although it’s aimed at historians, the methods are broadly useful to all text-based DH.
  • The Programming Historian, 2nd edition (ongoing). Afanador-Llach, Maria José, Antonio Rojas Castro, Adam Crymble, Víctor Gayol, Fred Gibbs, Caleb McDaniel, Ian Milligan, Amanda Visconti, and Jeri Wieringa, eds.
    • Constantly updating lessons, ostensibly aimed at historians, but useful to all of DH. Includes introductions to web development, text analysis, GIS, network analysis, etc. in multiple programming languages. Not a monograph, and no real order.
  • Computational Historical Thinking with Applications in R (ongoing). Lincoln Mullen.
    • A series of lessons in in R, still under development with quite a few chapters missing. Probably the only programming book aimed at historians that actually focuses on historical questions and approaches.
  • The Rubyist Historian (2004). Jason Heppler.
    • A short introduction to programming in Ruby. Again, ostensibly aimed at historians, but really just focused on the fundamentals of coding, and useful in that context.
  • Natural Language Processing for Historical Texts (2012). Michael Piotrowski.
    • About natural language processing, but not an introduction to coding. Instead, an introduction to the methodological approaches of natural language processing specific to historical texts (OCR, spelling normalization, choosing a corpus, part of speech tagging, etc.). Teaches a variety of tools and techniques.
  • The Historian’s Macroscope (2015). Graham, Milligan, & Weingart.
    • Okay I’m cheating a bit here! This isn’t teaching you to program, but Shawn, Ian, and I spent a while writing this intro to digital methods for historians, so I figured I’d sneak a link in.

Literary & Linguistic Analysis

  • Text Analysis with R for Students of Literature (2014). Matthew Jockers.
    • Step-by-step introduction to learning R, specifically focused on literary text analysis, both for close and distant reading, with primers on the statistical approaches being used. Includes approaches to, e.g., word frequency distribution, lexical variety, classification, and topic modeling.
  • The Art of Literary Text Analysis (ongoing). Stéfan Sinclair & Geoffrey Rockwell.
    • A growing, interactive textbook similar in scope to Jockers’ book (close & distant reading in literary analysis), but in Python rather than R. Heavily focused on the code itself, and includes such methods as topic modeling and sentiment analysis.
  • Statistics for Corpus Linguistics (1998). Michael Oakes.
    • Don’t know anything about this one, sorry!

General Digital Humanities

Many of the above books are focused on literary or historical analysis only in name, but are really useful for everyone in DH. The below are similar in scope, but don’t aim themselves at one particular group.

  • Humanities Data in R (2015). Lauren Tilton & Taylor Arnold.
    • General introduction to programming through R, and broadly focused on many approaches, including basic statistics, networks, maps, texts, and images. Teaches concepts and programmatic implementations.
  • Digital Research Methods with Mathematica (2015). William J. Turkel.
    • A Mathematica notebook (thus, not accessible unless you have an appropriate reader) teaching text, image, and geo-based analysis. Mathematica itself is an expensive piece of software without an institutional license, so this resource may be inaccessible to many learners. [NOTE: Arno Bosse wrote positive feedback on this textbook in a comment below.]
  • Exploratory Programming for the Arts and Humanities (2016). Nick Montfort.
    • An introduction to the fundamentals of programming specifically for arts and humanities, languages Python and Processing, that goes through statistics, text, sound, animation, images, and so forth. Much more expansive than many other options listed here, but not as focused on needs of text analysis (which is probably a good thing).
  • An Introduction to Text Analysis: A Coursebook (2016). Brandon Walsh & Sarah Horowitz.
    • A brief textbook with exercises and explanatory notes specific to text analysis for the study of literature and history. Not an introduction to programming, but covers some of the mathematical and methodological concepts used in these sorts of studies.
  • Python Programming for Humanists (ongoing). Folgert Karsdorp and Maarten van Gompel.
    • Interactive (Jupyter) notebooks teaching Python for statistical text analysis. Quite thorough, teaching methodological reasoning and examples, including quizzes and other lesson helpers, going from basic tokenization up through unsupervised learning, object-oriented programming, etc.

Statistical Methods & Machine Learning

  • Statistics for the Humanities (2014). John Canning.
    • Not an introduction to coding of any sort, but a solid intro to statistics geared at the sort of stats needed by humanists (archaeologists, literary theorists, philosophers, historians, etc.). Reading this should give you a solid foundation of statistical methods (sampling, confidence intervals, bias, etc.)
  • Data Mining: Practical Machine Learning Tools and Techniques, 4th edition (2016). Witten, Frank, Hall, & Pal.
    • A practical intro to machine learning in Weka, Java-based software for data mining and modeling. Not aimed at humanists, but legible to the dedicated amateur. It really gets into the weeds of how machine learning works.
  • Text Mining with R (2017). Julia Silge and David Robinson.
    • Introduction to text mining aimed at data scientists in the statistical programming language R. Some knowledge of R is expected; the authors suggest using R for Data Science (2016) by Grolemund & Wickham to get up to speed. This is for those interested in current data science coding best-practices, though it does not get as in-depth as some other texts focused on literary text analysis. Good as a solid base to learn from.
  • The Curious Journalist’s Guide to Data (2016). Jonathan Stray.
    • Not an intro to programming or math, but rather a good guide to quantitatively thinking through evidence and argument. Aimed at journalists, but of potential use to more empirically-minded humanists.
  • Six Septembers: Mathematics for the Humanist (2017). Patrick Juola & Stephen Ramsay.
    • Fantastic introduction to simple and advanced mathematics written by and for humanists. Approachable, prose-heavy, and grounded in humanities examples. Covers topics like algebra, calculus, statistics, differential equations. Definitely a foundations text, not an applications one.

Data Visualization, Web Development, & Related

  • D3.js in Action, 2nd edition (2017). Elijah Meeks.
    • Introduction to programmatic, online data visualization in javascript and the library D3.js. Not aimed at the humanities, but written by a digital humanist; easy to read and follow. The great thing about D3 is it’s a library for visualizing something in whatever fashion you might imagine, so this is a good book for those who want to design their own visualizations rather than using off-the-shelf tools.
  • Drupal for Humanists (2016). Quinn Dombrowski.
    • Full-length introduction to Drupal, a web platform that allows you to build “environments for gathering, annotating, arranging, and presenting their research and supporting materials” on the web. Useful for those interested in getting started with the creation of web-based projects but who don’t want to dive head-first into from-scratch web development.
  • (Xe)LaTeX appliqué aux sciences humaines (2012). Maïeul Rouquette, Brendan Chabannes et Enimie Rouquette.
    • French introduction to LaTeX for humanists. LaTeX is the primary means scientists use to prepare documents (instead of MS Word or similar software), which allows for more sustainable, robust, and easily typeset scholarly publications. If humanists wish to publish in natural (or some social) science journals, this is an important skill.

Lessons From Digital History’s Antecedents

The below is the transcript from my October 29 keynote presented to the Creativity and The City 1600-2000 conference in Amsterdam, titled “Punched-Card Humanities”. I survey historical approaches to quantitative history, how they relate to the nomothetic/idiographic divide, and discuss some lessons we can learn from past successes and failures. For ≈200 relevant references, see this Zotero folder.


Title Slide
Title Slide

I’m here to talk about Digital History, and what we can learn from its quantitative antecedents. If yesterday’s keynote was framing our mutual interest in the creative city, I hope mine will help frame our discussions around the bottom half of the poster; the eHumanities perspective.

Specifically, I’ve been delighted to see at this conference, we have a rich interplay between familiar historiographic and cultural approaches, and digital or eHumanities methods, all being brought to bear on the creative city. I want to take a moment to talk about where these two approaches meet.

Yesterday’s wonderful keynote brought up the complicated goal of using new digital methods to explore the creative city, without reducing the city to reductive indices. Are we living up to that goal? I hope a historical take on this question might help us move in this direction, that by learning from those historiographic moments when formal methods failed, we can do better this time.

Creativity Conference Theme
Creativity Conference Theme

Digital History is different, we’re told. “New”. Many of us know historians who used computers in the 1960s, for things like demography or cliometrics, but what we do today is a different beast.

Commenting on these early punched-card historians, in 1999, Ed Ayers wrote, quote, “the first computer revolution largely failed.” The failure, Ayers, claimed, was in part due to their statistical machinery not being up to the task of representing the nuances of human experience.

We see this rhetoric of newness or novelty crop up all the time. It cropped up a lot in pioneering digital history essays by Roy Rosenzweig and Dan Cohen in the 90s and 2000s, and we even see a touch of it, though tempered, in this conference’s theme.

In yesterday’s final discussion on uncertainty, Dorit Raines reminded us the difference between quantitative history in the 70s and today’s Digital History is that today’s approaches broaden our sources, whereas early approaches narrowed them.

Slide (r)evolution
Slide (r)evolution

To say “we’re at a unique historical moment” is something common to pretty much everyone, everywhere, forever. And it’s always a little bit true, right?

It’s true that every historical moment is unique. Unprecedented. Digital History, with its unique combination of public humanities, media-rich interests, sophisticated machinery, and quantitative approaches, is pretty novel.

But as the saying goes, history never repeats itself, but it rhymes. Each thread making up Digital History has a long past, and a lot of the arguments for or against it have been made many times before. Novelty is a convenient illusion that helps us get funding.

Not coincidentally, it’s this tension I’ll highlight today: between revolution and evolution, between breaks and continuities, and between the historians who care more about what makes a moment unique, and those who care more about what connects humanity together.

To be clear, I’m operating on two levels here: the narrative and the metanarrative. The narrative is that the history of digital history is one of continuities and fractures; the metanarrative is that this very tension between uniqueness and self-similarity is what swings the pendulum between quantitative and qualitative historians.

Now, my claim that debates over continuity and discontinuity are a primary driver of the quantitative/qualitative divide comes a bit out of left field — I know — so let me back up a few hundred years and explain.

Chronology
Chronology

Francis Bacon wrote that knowledge would be better understood if it were collected into orderly tables. His plea extended, of course, to historical knowledge, and inspired renewed interest in a genre already over a thousand years old: tabular chronology.

These chronologies were world histories, aligning the pasts of several regions which each reconned the passage of time differently.

Isaac Newton inherited this tradition, and dabbled throughout his life in establishing a more accurate universal chronology, aligning Biblical history with Greek legends and Egyptian pharoahs.

Newton brought to history the same mind he brought to everything else: one of stars and calculations. Like his peers, Newton relied on historical accounts of astronomical observations to align simultaneous events across thousands of miles. Kepler and Scaliger, among others, also partook in this “scientific history”.

Where Newton departed from his contemporaries, however, was in his use of statistics for sorting out history. In the late 1500s, the average or arithmetic mean was popularized by astronomers as a way of smoothing out noisy measurements. Newton co-opted this method to help him estimate the length of royal reigns, and thus the ages of various dynasties and kingdoms.

On average, Newton figured, a king’s reign lasted 18-20 years. If the history books record 5 kings, that means the dynasty lasted between 90 and 100 years.

Newton was among the first to apply averages to fill in chronologies, though not the first to apply them to human activities. By the late 1600s, demographic statistics of contemporary life — of births, burials and the like — were becoming common. They were ways of revealing divinely ordered regularities.

Incidentally, this is an early example of our illustrious tradition of uncritically appropriating methods from the natural sciences. See? We’ve all done it, even Newton!  

Joking aside, this is an important point: statistical averages represented divine regularities. Human statistics began as a means to uncover universal truths, and they continue to be employed in that manner. More on that later, though.

Musgrave Quote

Newton’s method didn’t quite pass muster, and skepticism grew rapidly on the whole prospect of mathematical history.

Criticizing Newton in 1782, for example, Samuel Musgrave argued, in part, that there are no discernible universal laws of history operating in parallel to the universal laws of nature. Nature can be mathematized; people cannot.

Not everyone agreed. Francesco Algarotti passionately argued that Newton’s calculation of average reigns, the application of math to history, was one of his greatest achievements. Even Voltaire tried Newton’s method, aligning a Chinese chronology with Western dates using average length of reigns.

Nomothetic / Idiographic
Nomothetic / Idiographic

Which brings us to the earlier continuity/discontinuity point: quantitative history stirs debate in part because it draws together two activities Immanuel Kant sets in opposition: the tendency to generalize, and the tendency to specify.

The tendency to generalize, later dubbed Nomothetic, often describes the sciences: extrapolating general laws from individual observations. Examples include the laws of gravity, the theory of evolution by natural selection, and so forth.

The tendency to specify, later dubbed Idiographic, describes, mostly, the humanities: understanding specific, contingent events in their own context and with awareness of subjective experiences. This could manifest as a microhistory of one parish in the French Revolution, a critical reading of Frankenstein focused on gender dynamics, and so forth.  

These two approaches aren’t mutually exclusive, and they frequently come in contact around scholarship of the past. Paleontologists, for example, apply general laws of biology and geology to tell the specific story of prehistoric life on Earth. Astronomers, similarly, combine natural laws and specific observations to trace to origins of our universe.

Historians have, with cyclically recurring intensity, engaged in similar efforts. One recent nomothetic example is that of cliodynamics: the practitioners use data and simulations to discern generalities such as why nations fail or what causes war. Recent idiographic historians associate more with the cultural and theoretical turns in historiography, often focusing on microhistories or the subjective experiences of historical actors.

Both tend to meet around quantitative history, but the conversation began well before the urge to quantify. They often fruitfully align and improve one another when working in concert; for example when the historian cites a common historical pattern in order to highlight and contextualize an event which deviates from it.

But more often, nomothetic and idiographic historians find themselves at odds. Newton extrapolated “laws” for the length of kings, and was criticized for thinking mathematics had any place in the domain of the uniquely human. Newton’s contemporaries used human statistics to argue for divine regularities, and this was eventually criticized as encroaching on human agency, free will, and the uniqueness of subjective experience.

Bacon Taxonomy
Bacon Taxonomy

I’ll highlight some moments in this debate, focusing on English-speaking historians, and will conclude with what we today might learn from foibles of the quantitative historians who came before.

Let me reiterate, though, that quantitative is not nomothetic history, but they invite each other, so I shouldn’t be ahistorical by dividing them.

Take Henry Buckle, who in 1857 tried to bridge the two-culture divide posed by C.P. Snow a century later. He wanted to use statistics to find general laws of human progress, and apply those generalizations to the histories of specific nations.

Buckle was well-aware of historiography’s place between nomothetic and idiographic cultures, writing: “it is the business of the historian to mediate between these two parties, and reconcile their hostile pretensions by showing the point at which their respective studies ought to coalesce.”

In direct response, James Froud wrote that there can be no science of history. The whole idea of Science and History being related was nonsensical, like talking about the colour of sound. They simply do not connect.

This was a small exchange in a much larger Victorian debate pitting narrative history against a growing interest in scientific history. The latter rose on the coattails of growing popular interest in science, much like our debates today align with broader discussions around data science, computation, and the visible economic successes of startup culture.

This is, by the way, contemporaneous with something yesterday’s keynote highlighted: the 19th century drive to establish ‘urban laws’.

By now, we begin seeing historians leveraging public trust in scientific methods as a means for political control and pushing agendas. This happens in concert with the rise of punched cards and, eventually, computational history. Perhaps the best example of this historical moment comes from the American Census in the late 19th century.

19C Map
19C Map

Briefly, a group of 19th century American historians, journalists, and census chiefs used statistics, historical atlases, and the machinery of the census bureau to publicly argue for the disintegration of the U.S. Western Frontier in the late 19th century.

These moves were, in part, made to consolidate power in the American West and wrestle control from the native populations who still lived there. They accomplished this, in part, by publishing popular atlases showing that the western frontier was so fractured that it was difficult to maintain and defend. 1

The argument, it turns out, was pretty compelling.

Hollerith Cards
Hollerith Cards

Part of what drove the statistical power and scientific legitimacy of these arguments was the new method, in 1890, of entering census data on punched cards and processing them in tabulating machines. The mechanism itself was wildly successful, and the inventor’s company wound up merging with a few others to become IBM. As was true of punched-card humanities projects through the time of Father Roberto Busa, this work was largely driven by women.

It’s worth pausing to remember that the history of punch card computing is also a history of the consolidation of government power. Seeing like a computer was, for decades, seeing like a state. And how we see influences what we see, what we care about, how we think.  

Recall the Ed Ayers quote I mentioned at the beginning of his talk. He said the statistical machinery of early quantitative historians could not represent the nuance of historical experience. That doesn’t just mean the math they used; it means the actual machinery involved.

See, one of the truly groundbreaking punch card technologies at the turn of the century was the card sorter. Each card could represent a person, or household, or whatever else, which is sort of legible one-at-a-time, but unmanageable in giant stacks.

Now, this is still well before “computers”, but machines were being developed which could sort these cards into one of twelve pockets based on which holes were punched. So, for example, if you had cards punched for people’s age, you could sort the stacks into 10 different pockets to break them up by age groups: 0-9, 10-19, 20-29, and so forth.

This turned out to be amazing for eyeball estimates. If your 20-29 pocket was twice as full as your 10-19 pocket after all the cards were sorted, you had a pretty good idea of the age distribution.

Over the next 50 years, this convenience would shape the social sciences. Consider demographics or marketing. Both developed in the shadow of punch cards, and both relied heavily on what’s called “segmentation”, the breaking of society into discrete categories based on easily punched attributes. Age ranges, racial background, etc. These would be used to, among other things, determine who was interested in what products.

They’d eventually use statistics on these segments to inform marketing strategies.

But, if you look at the statistical tests that already existed at the time, these segmentations weren’t always the best way to break up the data. For example, age flows smoothly between 0 and 100; you could easily contrive a statistical test to show that, as a person ages, she’s more likely to buy one product over another, over a set of smooth functions.

That’s not how it worked though. Age was, and often still is, chunked up into ten or so distinct ranges, and those segments were each analyzed individually, as though they were as distinct from one another as dogs and cats. That is, 0-9 is as related to 10-19 as it is to 80-89.

What we see here is the deep influence of technological affordances on scholarly practice, and it’s an issue we still face today, though in different form.

As historians began using punch cards and social statistics, they inherited, or appropriated, a structure developed for bureaucratic government processing, and were rightly soon criticized for its dehumanizing qualities.

Pearson Stats

Unsurprisingly, given this backdrop, historians in the first few decades of the 20th century often shied away from or rejected quantification.

The next wave of quantitative historians, who reached their height in the 1930s, approached the problem with more subtlety than the previous generations in the 1890s and 1860s.

Charles Beard’s famous Economic Interpretation of the Constitution of the United States used economic and demographic stats to argue that the US Constitution was economically motivated. Beard, however, did grasp the fundamental idiographic critique of quantitative history, claiming that history was, quote:

“beyond the reach of mathematics — which cannot assign meaningful values to the imponderables, immeasurables, and contingencies of history.”

The other frequent critique of quantitative history, still heard, is that it uncritically appropriates methods from stats and the sciences.

This also wasn’t entirely true. The slide behind me shows famed statistician Karl Pearson’s attempt to replicate the math of Isaac Newton that we saw earlier using more sophisticated techniques.

By the 1940s, Americans with graduate training in statistics like Ernest Rubin were actively engaging historians in their own journals, discussing how to carefully apply statistics to historical research.

On the other side of the channel, the French Annales historians were advocating longue durée history; a move away from biographies to prosopographies, from events to structures. In its own way, this was another historiography teetering on the edge between the nomothetic and idiographic, an approach that sought to uncover the rhymes of history.

Interest in quantitative approaches surged again in the late 1950s, led by a new wave of Annales historians like Fernand Braudel and American quantitative manifestos like those by Benson, Conrad, and Meyer.

William Aydolette went so far as to point out that all historians implicitly quantify, when they use words like “many”, “average”, “representative”, or “growing” – and the question wasn’t can there be quantitative history, but when should formal quantitative methods be utilized?

By 1968, George Murphy, seeing the swell of interest, asked a very familiar question: why now? He asked why the 1960s were different from the 1860s or 1930s, why were they, in that historical moment, able to finally do it right? His answer was that it wasn’t just the new technologies, the huge datasets, the innovative methods: it was the zeitgeist. The 1960s was the right era for computational history, because it was the era of computation.

By the early 70s, there was a historian using a computer in every major history department. Quantitative history had finally grown into itself.

Popper Historicism
Popper Historicism

Of course, in retrospect, Murphy was wrong. Once the pendulum swung too far towards scientific history, theoretical objections began pushing it the other way.

In Poverty of Historicism, Popper rejected scientific history, but mostly as a means to reject historicism outright. Popper’s arguments represent an attack from outside the historiographic tradition, but one that eventually had significant purchase even among historians, as an indication of the failure of nomothetic approaches to culture. It is, to an extent, a return to Musgrave’s critique of Isaac Newton.

At the same time, we see growing criticism from historians themselves. Arthur Schlesinger famously wrote that “important questions are important precisely because they are not susceptible to quantitative answers.”

There was a converging consensus among English-speaking historians, as in the early 20th century, that quantification erased the essence of the humanities, that it smoothed over the very inequalities and historical contingencies we needed to highlight.

Barzun's Clio
Barzun’s Clio

Jacques Barzun summed it up well, if scathingly, saying history ought to free us from the bonds of the machine, not feed us into it.

The skeptics prevailed, and the pendulum swung the other way. The post-structural, cultural, and literary-critical turns in historiography pivoted away from quantification and computation. The final nail was probably Fogel and Engerman’s 1974 Time on the Cross, which reduced the Atlantic  slave-trade to economic figures, and didn’t exactly treat the subject with nuance and care.

The cliometricians, demographers, and quantitative historians didn’t disappear after the cultural turn, but their numbers shrunk, and they tended to find themselves in social science departments, or fled here to Europe, where social and economic historians were faring better.

Which brings us, 40 years on, to the middle of a new wave of quantitative or “formal method” history. Ed Ayers, like George Murphy before him, wrote, essentially, this time it’s different.

And he’s right, to a point. Many here today draw their roots not to the cliometricians, but to the very cultural historians who rejected quantification in the first place. Ours is a digital history steeped in the the values of the cultural turn, that respects social justice and seeks to use our approaches to shine a light on the underrepresented and the historically contingent.

But that doesn’t stop a new wave of critiques that, if not repeating old arguments, certainly rhymes. Take Johanna Drucker’s recent call to rebrand data as capta, because when we treat observations objectively as if it were the same as the phenomena observed, we collapse the critical distance between the world and our interpretation of it. And interpretation, Drucker contends, is the foundation on which humanistic knowledge is based.

Which is all to say, every swing of the pendulum between idiographic and nomothetic history was situated in its own historical moment. It’s not a clock’s pendulum, but Foucault’s pendulum, with each swing’s apex ending up slightly off from the last. The issues of chronology and astronomy are different from those of eugenics and manifest destiny, which are themselves different from the capitalist and dehumanizing tendencies of 1950s mainframes.

But they all rhyme. Quantitative history has failed many times, for many reasons, but there are a few threads that bind them which we can learn from — or, at least, a few recurring mistakes we can recognize in ourselves and try to avoid going forward.

We won’t, I suspect, stop the pendulum’s inevitable about-face, but at least we can continue our work with caution, respect, and care.

Which is to be Master?
Which is to be Master?

The lesson I’d like to highlight may be summed up in one question, asked by Humpty Dumpty to Alice: which is to be master?

Over several hundred years of quantitative history, the advice of proponents and critics alike tends to align with this question. Indeed in 1956, R.G. Collingwood wrote specifically “statistical research is for the historian a good servant but a bad master,” referring to the fact that statistical historical patterns mean nothing without historical context.

Schlesinger, the guy who I mentioned earlier who said historical questions are interesting precisely because they can’t be quantified, later acknowledged that while quantitative methods can be useful, they’ll lead historians astray. Instead of tackling good questions, he said, historians will tackle easily quantifiable ones — and Schlesinger was uncomfortable by the tail wagging the dog.

Which is to be master - questions
Which is to be master – questions

I’ve found many ways in which historians have accidentally given over agency to their methods and machines over the years, but these five, I think, are the most relevant to our current moment.

Unfortunately since we running out of time, you’ll just have to trust me that these are historically recurring.

Number 1 is the uncareful appropriation of statistical methods for historical uses. It controls us precisely because it offers us a black box whose output we don’t truly understand.

A common example I see these days is in network visualizations. People visualize nodes and edges using what are called force-directed layouts in Gephi, but they don’t exactly understand what those layouts mean. As these layouts were designed, physical proximity of nodes are not meant to represent relatedness, yet I’ve seen historians interpret two neighboring nodes as being related because of their visual adjacency.

This is bad. It’s false. But because we don’t quite understand what’s happening, we get lured by the black box into nonsensical interpretations.

The second way methods drive us is in our reliance on methodological imports. That is, we take the time to open the black box, but we only use methods that we learn from statisticians or scientists. Even when we fully understand the methods we import, if we’re bound to other people’s analytic machinery, we’re bound to their questions and biases.

Take the example I mentioned earlier, with demographic segmentation, punch card sorters, and its influence on social scientific statistics. The very mechanical affordances of early computers influence the sort of questions people asked for decades: how do discrete groups of people react to the world in different ways, and how do they compare with one another?

The next thing to watch out for is naive scientism. Even if you know the assumptions of your methods, and you develop your own techniques for the problem at hand, you still can fall into the positivist trap that Johanna Drucker warns us about — collapsing the distance between what we observe and some underlying “truth”.

This is especially difficult when we’re dealing with “big data”. Once you’re working with so much material you couldn’t hope to read it all, it’s easy to be lured into forgetting the distance between operationalizations and what you actually intend to measure.

For instance, if I’m finding friendships in Early Modern Europe by looking for particular words being written in correspondences, I will completely miss the existence of friends who were neighbors, and thus had no reason to write letters for us to eventually read.

A fourth way we can be mislead by quantitative methods is the ease with which they lend an air of false precision or false certainty.

This is the problem Matthew Lincoln and the other panelists brought up yesterday, where missing or uncertain data, once quantified, falsely appears precise enough to make comparisons.

I see this mistake crop up in early and recent quantitative histories alike; we measure, say, the changing rate of transnational shipments over time, and notice a positive trend. The problem is the positive difference is quite small, easily attributable to error, but because numbers are always precise, it still feels like we’re being more precise than doing a qualitative assessment. Even when it’s unwarranted.

The last thing to watch out for, and maybe the most worrisome, is the blinders quantitative analysis places on historians who don’t engage in other historiographic methods. This has been the downfall of many waves of quantitative history in the past; the inability to care about or even see that which can’t be counted.

This was, in part, was what led Time on the Cross to become the excuse to drive historians from cliometrics. The indicators of slavery that were measurable were sufficient to show it to have some semblance of economic success for black populations; but it was precisely those aspects of slavery they could not measure that were the most historically important.

So how do we regain mastery in light of these obstacles?

Which is to be master - answers
Which is to be master – answers

1. Uncareful Appropriation – Collaboration

Regarding the uncareful appropriation of methods, we can easily sidestep the issue of accidentally misusing a method by collaborating with someone who knows how the method works. This may require a translator; statisticians can as easily misunderstand historical problems as historians can misunderstand statistics.

Historians and statisticians can fruitfully collaborate, though, if they have someone in the middle trained to some extent in both — even if they’re not themselves experts. For what it’s worth, Dutch institutions seem to be ahead of the game in this respect, which is something that should be fostered.

2. Reliance on Imports – Statistical Training

Getting away from reliance on disciplinary imports may take some more work, because we ourselves must learn the approaches well enough to augment them, or create our own. Right now in DH this is often handled by summer institutes and workshop series, but I’d argue those are not sufficient here. We need to make room in our curricula for actual methods courses, or even degrees focused on methodology, in the same fashion as social scientists, if we want to start a robust practice of developing appropriate tools for our own research.

3. Naive Scientism – Humanities History

The spectre of naive scientism, I think, is one we need to be careful of, but we are also already well-equipped to deal with it. If we want to combat the uncareful use of proxies in digital history, we need only to teach the history of the humanities; why the cultural turn happened, what’s gone wrong with positivistic approaches to history in the past, etc.

Incidentally, I think this is something digital historians already guard well against, but it’s still worth keeping in mind and making sure we teach it. Particularly, digital historians need to remain aware of parallel approaches from the past, rather than tracing their background only to the textual work of people like Roberto Busa in Italy.

4. False Precision & Certainty – Simulation & Triangulation

False precision and false certainty have some shallow fixes, and some deep ones. In the short term, we need to be better about understanding things like confidence intervals and error bars, and use methods like what Matthew Lincoln highlighted yesterday.

In the long term, though, digital history would do well to adopt triangulation strategies to help mitigate against these issues. That means trying to reach the same conclusion using multiple different methods in parallel, and seeing if they all agree. If they do, you can be more certain your results are something you can trust, and not just an accident of the method you happened to use.

5. Quantitative Blinders – Rejecting Digital History

Avoiding quantitative blinders – that is, the tendency to only care about what’s easily countable – is an easy fix, but I’m afraid to say it, because it might put me out of a job. We can’t call what we do digital history, or quantitative history, or cliometrics, or whatever else. We are, simply, historians.

Some of us use more quantitative methods, and some don’t, but if we’re not ultimately contributing to the same body of work, both sides will do themselves a disservice by not bringing every approach to bear in the wide range of interests historians ought to pursue.

Qualitative and idiographic historians will be stuck unable to deal with the deluge of material that can paint us a broader picture of history, and quantitative or nomothetic historians will lose sight of the very human irregularities that make history worth studying in the first place. We must work together.

If we don’t come together, we’re destined to remain punched-card humanists – that is, we will always be constrained and led by our methods, not by history.

Creativity Theme Again
Creativity Theme Again

Of course, this divide is a false one. There are no purely quantitative or purely qualitative studies; close-reading historians will continue to say things like “representative” or “increasing”, and digital historians won’t start publishing graphs with no interpretation.

Still, silos exist, and some of us have trouble leaving the comfort of our digital humanities conferences or our “traditional” history conferences.

That’s why this conference, I think, is so refreshing. It offers a great mix of both worlds, and I’m privileged and thankful to have been able to attend. While there are a lot of lessons we can still learn from those before us, from my vantage point, I think we’re on the right track, and I look forward to seeing more of those fruitful combinations over the course of today.

Thank you.

Notes:

  1. This account is influenced from some talks by Ben Schmidt. Any mistakes are from my own faulty memory, and not from his careful arguments.

Culturomics 2: The Search for More Money

“God willing, we’ll all meet again in Spaceballs 2: The Search for More Money.” -Mel Brooks, Spaceballs, 1987

A long time ago in a galaxy far, far away (2012 CE, Indiana), I wrote a few blog posts explaining that, when writing history, it might be good to talk to historians (1,2,3). They were popular posts for the Irregular, and inspired by Mel Brooks’ recent interest in making Spaceballs 2,  I figured it was time for a sequel of my own. You know, for all the money this blog pulls in. 1

SpaceballsTheFlamethrower[1]

Two teams recently published very similar articles, attempting cultural comparison via a study of historical figures in different-language editions of Wikipedia. The first, by Gloor et al., is for a conference next week in Japan, and frames itself as cultural anthropology through the study of leadership networks. The second, by Eom et al. and just published in PLoS ONE, explores cross-cultural influence through historical figures who span different language editions of Wikipedia.

Before reading the reviews, keep in mind I’m not commenting on method or scientific contribution—just historical soundness. This often doesn’t align with the original authors’ intents, which is fine. My argument isn’t that these pieces fail at their goals (science is, after all, iterative), but that they would be markedly improved by adhering to the same standards of historical rigor as they adhere to in their home disciplines, which they could accomplish easily by collaborating with a historian.

The road goes both ways. If historians don’t want physicists and statisticians bulldozing through history, we ought to be open to collaborating with those who don’t have a firm grasp on modern historiography, but who nevertheless have passion, interest, and complementary skills. If the point is understanding people better, by whatever means relevant, we need to do it together.

Cultural Anthropology

“Cultural Anthropology Through the Lens of Wikipedia – A Comparison of Historical Leadership Networks in the English, Chinese, Japanese and German Wikipedia” by Gloor et al. analyzes “the historical networks of the World’s leaders since the beginning of written history, comparing them in the four different Wikipedias.”

Their method is simple (simple isn’t bad!): take each “people page” in Wikipedia, and create a network of people based on who else is linked within that page. For example, if Wikipedia’s article on Mozart links to Beethoven, a connection is drawn between them. Connections are only drawn between people whose lives overlap; for example, the Mozart (1756-1791) Wikipedia page also links to Chopin (1810-1849), but because they did not live concurrently, no connection is drawn.

Figure 1 from http://arxiv.org/ftp/arxiv/papers/1502/1502.05256.pdf
Figure 1 from Gloor et al

A separate network is created for four different language editions of Wikipedia (English, Chinese, Japanese, German), because biographies in each edition are rarely exact translations, and often different people will be prominent within the same biography across all four languages. PageRank was calculated for all the people in the resulting networks, to get a sense of who the most central figures are according to the Wikipedia link structure.

“Who are the most important people of all times?” the authors ask, to which their data provides them an answer. 2 In China and Japan, they show, only warriors and politicians make the cut, whereas religious leaders, artists, and scientists made more of a mark on Germany and the English-speaking world. Historians and biographers wind up central too, given how often their names appear on the pages of famous contemporaries on whom they wrote.

Diversity is also a marked difference: 80% of the “top 50” people for the English Wikipedia were themselves non-English, whereas only 4% of the top people from the Chinese Wikipedia are not Chinese. The authors conclude that “probing the historical perspective of many different language-specific Wikipedias gives an X-ray view deep into the historical foundations of cultural understanding of different countries.”

Figure 3
Figure 3 from Gloor et al

Small quibbles aside (e.g. their data include the year 0 BC, which doesn’t exist), the big issue here is the ease with which they claim these are the “most important” actors in history, and that these datasets provides an “X-ray” into the language cultures that produced them. This betrays the same naïve assumptions that plague much of culturomics research: that you can uncritically analyze convenient datasets as a proxy for analyzing larger cultural trends.

You can in fact analyze convenient datasets as a proxy for larger cultural trends, you just need some cultural awareness and a critical perspective.

In this case, several layers of assumptions are open for questioning, including:

  • Is the PageRank algorithm a good proxy for historical importance? (The answer turns out to be yes in some situations, but probably not this one.)
  • Is the link structure in Wikipedia a good proxy for historical dependency? (No, although it’s probably a decent proxy for current cultural popularity of historical figures, which would have been a better framing for this article. Better yet, these data can be used to explore the many well-known and unknown biases that pervade Wikipedia.)
  • Can differences across language editions of Wikipedia be explained by any factors besides cultural differences? (Yes. For example, editors of the German-language Wikipedia may be less likely to write a German biography if one already exists in English, given that ≈64% of Germany speaks English.)

These and other questions, unexplored in the article, make it difficult to take at face value that this study can reveal important historical actors or compare cultural norms of importance. Which is a shame, because simple datasets and approaches like this one can produce culturally and scientifically valid results that wind up being incredibly important. And the scholars working on the project are top-notch, it’s just that they don’t have all the necessary domain expertise to explore their data and questions.

Cultural Interactions

The great thing about PLoS is the quality control on its publications: there isn’t much. As long as primary research is presented, the methods are sound, the data are open, and the experiment is well-documented, you’re in.

It’s a great model: all reasonable work by reasonable people is published, and history decides whether an article is worthy of merit. Contrast this against the current model, where (let’s face it) everything gets published eventually anyway, it’s just a question of how many journal submissions and rounds of peer review you’re willing to sit through. Research sits for years waiting to be published, subject to the whims of random reviewers and editors who may hold long grudges, when it could be out there the minute it’s done, open to critique and improvement, and available to anyone to draw inspiration or to learn from someone’s mistakes.

“Interactions of Cultures and Top People of Wikipedia from Ranking of 24 Language Editions” by Eom et al. is a perfect example of this model. Do I consider it a paragon of cultural research? Obviously not, if I’m reviewing it here. Am I happy the authors published it, respectful of their attempt, and willing to use it to push forward our mutual goal of soundly-researched cultural understanding? Absolutely.

Eom et al.’s piece, similar to that of Gloor et al. above, uses links between Wikipedia people pages to rank historical figures and to make cultural comparisons. The article explores 24 different language editions of Wikipedia, and goes one step further, using the data to explore intercultural influence. Importantly, given that this is a journal-length article and not a paper from a conference proceeding like Gloor et al.’s, extra space and thought was clearly put into the cultural biases of Wikipedia across languages. That said, neither of the articles reviewed here include any authors who identify themselves as historians or cultural experts.

This study collected data a bit differently from the last. Instead of a network connecting only those people whose lives overlapped, this network connected all pages within a single-language edition of Wikipedia, based only on links between articles. 3 They then ranked pages using a number of metrics, including but not limited to PageRank, and only then automatically extracted people to find who was the most prominent in each dataset.

In short, every Wikipedia article is linked in a network and ranked, after which all articles are culled except those about people. The authors explain: “On the basis of this data set we analyze spatial, temporal, and gender skewness in Wikipedia by analyzing birth place, birth date, and gender of the top ranked historical figures in Wikipedia.” By birth place, they mean the country currently occupying the location where a historical figure was born, such that Aristophanes, born in Byzantium 2,300 years ago, is considered Turkish for the purpose of this dataset. The authors note this can lead to cultural misattributions ≈3.5% of the time (e.g. Kant is categorized as Russian, having been born in a city now in Russian territory). They do not, however, call attention to the mutability of culture over time.

Table 2 from Eom et al.
Table 2 from Eom et al.

It is unsurprising, though comforting, to note that the fairly different approach to measuring prominence yields many of the same top-10 results as Gloor’s piece: Shakespeare, Napoleon, Bush, Jesus, etc.

Analysis of the dataset resulted in several worthy conclusions:

  • Many of the “top” figures across all language editions hail from Western Europe or the U.S.
  • Language editions bias local heroes (half of top figures in Wikipedia English are from the U.S. and U.K.; half of those in Wikipedia Hindi are from India) and regional heroes (Among Wikipedia Korean, many top figures are Chinese).
  • Top figures are distributed throughout time in a pattern you’d expect given global population growth, excepting periods representing foundations of modern cultures (religions, politics, and so forth).
  • The farther you go back in time, the less likely a top figure from a certain edition of Wikipedia is to have been born in that language’s region. That is, modern prominent figures in Wikipedia English are from the U.S. or the U.K., but the earlier you go, the less likely top figures are born in English-speaking regions. (I’d question this a bit, given cultural movement and mutability, but it’s still a result worth noting).
  • Women are consistently underrepresented in every measure and edition. More recent top people are more likely to be women than those from earlier years.
Figure 4 from Eom et al.
Figure 4 from Eom et al.

The article goes on to describe methods and results for tracking cultural influence, but this blog post is already tediously long, so I’ll leave that section out of this review.

There are many methodological limitations to their approach, but the authors are quick to notice and point them out. They mention that Linnaeus ranks so highly because “he laid the foundations for the modern biological naming scheme so that plenty of articles about animals, insects and plants point to the Wikipedia article about him.” This research was clearly approached with a critical eye toward methodology.

Eom et al. do not fare as well historically as methodologically; opportunities to frame claims more carefully, or to ask different sorts of questions, are overlooked. I mentioned earlier that the research assumes historical cultural consistency, but cultural currents intersect languages and geography at odd angles.

The fact that Wikipedia English draws significantly from other locations the earlier you look should come as no surprise. But, it’s unlikely English Wikipedians are simply looking to more historically diverse subjects; rather, the locus of some cultural current (Christianity, mathematics, political philosophy) has likely moved from one geographic region to another. This should be easy to test with their dataset by looking at geographic clustering and spread in any given year. It’d be nice to see them move in that direction next.

I do appreciate that they tried to validate their method by comparing their “top people” to lists other historians have put together. Unfortunately, the only non-Wikipedia-based comparison they make is to a book written by an astrophysicist and white separatist with no historical training: “To assess the alignment of our ranking with previous work by historians, we compare it with [Michael H.] Hart’s list of the top 100 people who, according to him, most influenced human history.”

Top People

Both articles claim that an algorithm analyzing Wikipedia networks can compare cultures and discover the most important historical actors, though neither define what they mean by “important.” The claim rests on the notion that Wikipedia’s grand scale and scope smooths out enough authorial bias that analyses of Wikipedia can inductively lead to discoveries about Culture and History.

And critically approached, that notion is more plausible than historians might admit. These two reviewed articles, however, don’t bring that critique to the table. 4 In truth, the dataset and analysis lets us look through a remarkably clear mirror into the cultures that created Wikipedia, the heroes they make, and the roots to which they feel most connected.

Usefully for historians, there is likely much overlap between history and the picture Wikipedia paints of it, but the nature of that overlap needs to be understood before we can use Wikipedia to aid our understanding of the past. Without that understanding, boldly inductive claims about History and Culture risk reinforcing the same systemic biases which we’ve slowly been trying to fix. I’m absolutely certain the authors don’t believe that only 5% of history’s most important figures were women, but the framing of the articles do nothing to dispel readers of this notion.

Eom et al. themselves admit “[i]t is very difficult to describe history in an objective way,” which I imagine is a sentiment we can all get behind. They may find an easier path forward in the company of some historians.

Notes:

  1. net income: -$120/year.
  2. If you’re curious, the 10 most important people in the English-speaking world, in order, are George W. Bush, ol’ Willy Shakespeare, Sidney Lee, Jesus, Charles II, Aristotle, Napoleon, Muhammad, Charlemagne, and Plutarch.
  3. Download their data here.
  4. Actually the Eom et al. article does raise useful critiques, but mentioning them without addressing them doesn’t really help matters.

Networks Demystified 9: Bimodal Networks

What do you think, is a year long enough to wait between Networks Demystified posts? I don’t think so, which is why it’s been a year and a month. Welcome back! A recent twitter back-and-forth culminated in a request for a discussion of “bimodal networks”, and my Networks Demystified series seemed like a perfect place for just such a discussion.

What’s a bimodal network, you ask? (Go on, ask aloud at your desk. Nobody will look at you funny, this is the age of Siri!) A bimodal network is one which connects two varieties of things. It’s also called a bipartite, 2-partite, or 2-mode network. A network of authors connected to the papers they write is bimodal, as are networks of books to topics, and people to organizations they are affiliated with.

A bimodal network.
A bimodal network.

This is a bimodal network which connects people and the clubs they belong to. Alice is a member of the Network Club and the We Love History Society, Bob‘s in the Network Club and the No Adults Allowed Club, and Carol‘s in the No Adults Allowed Club.

If this makes no sense, read my earlier Networks Demystified posts (the first two posts), or the our Historian’s Macroscope chapter, for a primer on networks. If it does make sense, excellent! The rest of this post will hopefully take you out of your comfort zone, but remain understandable to someone who doesn’t speak math.

k-partite Networks & Projections

Bimodal networks are part of a larger class of k-partite networks. Unipartite/unimodal networks have only one type of node (remember, nodes are the stuff being connected by the edges), bipartite/bimodal networks have two types of nodes, tripartite/trimodal networks have three types of node, and so on to infinity.

The most common networks you’ll see being researched are unipartite. Who follows whom on Twitter? Who’s writing to whom in early modern Europe? What articles cite which other articles? All are examples of unipartite networks. It’s important to realize this isn’t necessarily determined by the dataset, but by the researcher doing the studying. For example, you can use the same organization affiliation dataset to create a unipartite network of who is in a club with whom, or a bipartite network of which person is affiliated with each organization.

The same dataset used to create a unipartite (left) and a bipartite (right) network.
The same dataset used to create a unipartite (left) and a bipartite (right) network.

The above illustration shows the same dataset used to create a unimodal and a bimodal network. The process of turning a pre-existing bimodal network into a unimodal network is called a bimodal projection. This process collapses one set of nodes into edges connecting the other set. In this case, because Alice and Bob are both members of the Network Club, the Network Club collapses into becoming an edge between those two people. The No Adults Allowed Club collapses into an edge between Bob and Carol. Because only Alice is a member of the We Love History Society, it does not collapse into an edge connecting any people.

You can also collapse the network in the opposite direction, connecting organizations who share people. No Adults Allowed and Network Club would share an edge (Bob), as would Network Club and We Love History Society (Alice).

Why Bimodal Networks?

If the same dataset can be described with unimodal networks, which are less complex, why go to bi-, tri-, or multimodal? The answer to that is in your research question: different network representations suit different questions better.

Collaboration is a hot topic in bibliometrics. Who collaborates with whom? Why? Do your collaborators affect your future collaborations? Co-authorship networks are well-suited to some of these questions, since they directly connect collaborators who author a piece together. This is a unimodal network: I wrote The Historian’s Macroscope with Shawn Graham and Ian Milligan, so we draw an edge connecting each of us together.

Some of the more focused questions of collaboration, however, require a more nuanced view of the data. Let’s say you want to know how individual instances of collaboration affect individual research patterns going forward. In this case, you want to know more than the fact that I’ve co-authored two pieces with Shawn and Ian, and they’ve co-authored three pieces together.

For this added nuance, we can draw an edge from each of us to The Historian’s Macroscope (rather than each-other), then another set edges to the piece we co-authored in The Programming Historian, and a last set of edges going from Shawn and Ian to the piece they wrote in the Journal of Digital Humanities. That’s three people nodes and three publication nodes.

Scott, Ian, and Shawn's co-authorship network
Scott, Ian, and Shawn’s co-authorship network

Why Not Bimodal Networks?

Humanities data are often a rich array of node types: people, places, things, ideas, all connected to each other via a complex network. The trade-off is, the more complex and multimodal your dataset, the less you can reasonably do with it. This is one of the fundamental tensions between computational and traditional humanities. More categories lead to a richer understanding of the diversity of human experience, but are incredibly unhelpful when you want to count things.

Consider two pie-charts showing the religious makeup of the United States. The first chart groups together religions that fall under a similar umbrella, and the second does not. That is, the first chart groups religions like Calvinists and Lutherans together into the same pie slice (Protestants), and the second splits them into separate slices. The second, more complex chart obviously presents a richer picture of religious diversity in the United States, but it’s also significantly more difficult to read. It might trick you into thinking there are more Catholics than Protestants in the country, due to how the pie is split.

The same is true in network analysis. By creating a dataset with a hundred varieties of nodes, you lose your ability to see a bigger picture through meaningful aggregations.

Surely, you’re thinking, bimodal networks, with only two categories, should be fine! Wellllll, yes and no. You don’t bump into the same aggregation problem you do with very multimodal networks; instead, you bump into technical and mathematical issues. These issues are why I often warn non-technical researchers away from bimodal networks in research. They’re not theoretically unsound, they’re just difficult to work with properly unless you know what changes when you’re working with these complex networks.

The following section will discuss a few network metrics you may be familiar with, and what they mean for bimodal networks.

Network Metrics and Bimodality

The easiest thing to measure in a network is a node’s degree centrality. You’ll recall this is a measurement of how many edges are attached to a node, which gives a rough proxy for this concept we’ve come to call network “centrality“. It means different things depending on your data and your question: the most important or well-connected person in your social network; the point in the U.S. electrical grid which is most vulnerable to attack; the book that shares the most concepts with other books (the encyclopedia?); the city that the most traders pass through to get to their destination. These are all highly “central” in the networks they occupy.

A network with each node labeled with its degree centrality.
A network with each node labeled with its degree centrality, via Wikipedia.

Degree centrality is the easiest such proxy to compute: how many connections does a node have? The idea is that nodes that are more highly connected are more central. The assumption only goes so far, and it’s easy to come up with nodes that are central that do not have a  high degree, as with the network below.

The blue node is highly central, but only has a degree centrality of 3. [via]
The blue node is highly central, but only has a degree centrality of 3. [via]
That’s the thing with these metrics: if you know how they work, you know which networks they apply well to, and which they do not. If what you mean by “centrality” is “has more friends”, and we’re talking about a Facebook network, then degree centrality is a perfect metric for the job.

If what you mean is “an important stop for river trade”, and we’re talking about 12th century Russia, then degree centrality sucks. The below is an illustration of such a network by Pitts (1978):

Russian river trade routes. Numbers/nodes are cities, and edges are rivers between them.
Russian river trade routes. Numbers/nodes are cities, and edges are rivers between them.

Moscow is number 35, and pretty clearly the most central according to the above criteria (you’ll likely pass through it to reach other destinations). But it only has a degree centrality of four! Node 9 also has a degree centrality of four, but clearly doesn’t play as important a structural role as Moscow in this network.

We already see that depending on your question, your definitions, and your dataset, specific metrics will either be useful or not. Metrics may change meanings entirely from one network to the next – for example, looking at bimodal rather than unimodal networks.

Consider what degree centrality means for the Alice, Bob, and Carol’s bimodal affiliation network above, where each is associated with a different set of clubs. Calculate the degree centralities in your head (hint: if you can’t, you haven’t learned what degree centrality means yet. Try again.).

Alice and Bob have a degree of 2, and Carol has a degree of 1. Is this saying anything about how central each is to the network? Not at all. Compare this to the unimodal projection, and you’ll see Bob is clearly the only structurally central actor in the network. In a bimodal network, degree centrality is nothing more than a count of affiliations with the other half of the network. It is much less likely to tell you something structurally useful than if you were looking at a unimodal network.

Consider another common measurement: clustering coefficient. You’ll recall that a node’s local clustering coefficient is the extent to which its neighbors are neighbors to one another. If all my Facebook friends know each other, I have a high clustering coefficient; if none of them know each other, I have a low clustering coefficient. If all of a power plant’s neighbors directly connect to one another, it has a high clustering coefficient, and if they don’t, it has a low clustering coefficient.

Clustering coefficient, from largest to smallest. [via]
Clustering coefficient, from largest to smallest. [via]
This measurement winds up being important for all sorts of reasons, but one way to interpret its meaning is as a proxy for the extent to which a node bridges diverse communities, the extent to which it is an important broker. In the 17th century, Henry Oldenburg was an important broker between disparate scholarly communities, in that he corresponded with people all across Europe, many of whom would never meet one another. The fact that they’d never meet is represented by the local clustering coefficient. It’s low, so we know his neighbors were unlikely to be neighbors of one another.

You can get creative (and network scientists often are) with what this metric means in the context of your own dataset. As long as you know how the algorithm works (taking the fraction of neighbors who are neighbors to one another), and the structural assumptions underlying your dataset, you can argue why clustering coefficient is a useful proxy for answering whatever question you’re asking.

Your argument may be pretty good, like if you say clustering coefficient is a decent (but not the best) proxy for revealing nodes that broker between disparate sections of a unimodal social network. Or your argument may be bad, like if you say clustering coefficient is a good proxy for organizational cohesion on the bimodal Alice, Bob, and Carol affiliation network above.

A thorough glance at the network, and a realization of our earlier definition of clustering coefficient (taking the fraction of neighbors who are neighbors to one another), should reveal why this is a bad justification. Alice’s clustering coefficient is zero. As is Bob’s. As is the Network Club’s. Every node has a clustering coefficient of zero, because no node’s neighbors connect to each other. That’s just the nature of bimodal networks: they connect across, rather than between, modes. Alice can never connect directly with Bob, and the Network Club can never connect directly with the We Love History Society.

Bob’s neighbors (the organizations) can never be neighbors with each other. There will never be a clustering coefficient as we defined it.

In short, the simplest definition of clustering coefficient doesn’t work on bimodal networks. It’s obvious if you know how your network works, and how clustering coefficient is calculated, but if you don’t think about it before you press the easy “clustering coefficient” button in Gephi, you’ll be lead astray.

Gephi doesn’t know if your network is bimodal or unimodal or ∞modal. Gephi doesn’t care. Gephi just does what you tell it to. You want Gephi to tell you the degree centralities in a bimodal network? Here ya go! You want it to give you the local clustering coefficients of nodes in a bimodal network? Voila! Everything still works as though these metrics would produce meaningful, sensible results.

But they won’t be meaningful on your network. You need to be your own network’s sanity check, and not rely on software to tell you something’s a bad idea. Think about your network, think about your algorithm, and try to work through what an algorithm means in the context of your data.

Using Bimodal Networks

This doesn’t mean you should stop using bimodal networks. Most of the easy network software out there comes with algorithms made for unimodal networks, but other algorithms exist and are available for more complex networks. Very occasionally, but by no means always, you can project your bimodal network to a unimodal network, as described above, and run your unimodal algorithms on that new network projection.

There are a number of times when this doesn’t work well. At 2,300 words, this tutorial is already too long, so I’ll leave thinking through why as an exercise for the reader. It’s less complicated than you’d expect, if you have a pen and paper and know how fractions work.

The better solution, usually, is to use an algorithm meant for bi- or multimodal networks. Tore Opsahl has put together a good primer on the subject with regard to clustering coefficient (slightly mathy, but you can get through it with ample use of Wikipedia). He argues that projection isn’t an optimal solution, but gives a simple algorithm for a finding bimodal clustering coefficients, and directions to do so in R. Essentially the algorithm extends the visibility of the clustering coefficient, asking whether a node’s neighbors 2 hops away can reach the others via 2 hops as well. Put another way, I don’t want to know what clubs Bob belongs to, but rather whether Alice and Carol can also connect to one another through a club.

It’s a bit difficult to write without the use of formulae, but looking at the bimodal network and thinking about what clustering coefficient ought to mean should get you on the right track.

Bimodal networks aren’t an unsolved problem. If you search Google Scholar for bimodal, bipartite, and 2-mode networks, you’ll discover all sorts of clever methods for analyzing bimodal networks, including some great introductory texts by Borgatti and Everett.

The issue is there aren’t easy solutions through platforms like Gephi, and that’s probably on us as Digital Humanists.  I’ve found that DHers are much more likely to have bi- or multimodal datasets than most network researchers. If we want to be able to analyze them easily, we need to start developing our own plugins to Gephi, or our own tools, to do so. Push-button solutions are great if you know what’s happening when you push the button.

So let this be an addendum to my previous warnings against using bimodal networks: by all means, use them, but make sure you really think about the algorithms and your data, and what any given metric might imply when run on your network specifically. There are all sorts of free resources online you can find by googling your favorite algorithm. Use them.


For more information, read up on specific algorithms, methods, interpretations, etc. for two-mode networks from Tore Opsahl.

 

Digital History, Saturn’s Rings, and the Battle of Trafalgar

History and astronomy are a lot alike. When people claim history couldn’t possibly be scientific, because how can you do science without direct experimentation, astronomy should be used as an immediate counterexample.

Astronomers and historians both view their subjects from great distances; too far to send instruments for direct measurement and experimentation. Things have changed a bit in the last century for astronomy, of course, with the advent of machines sensitive enough to create earth-based astronomical experiments. We’ve also built ships to take us to the farthest reaches, for more direct observations.

Voyager 1 Spacecraft, on the cusp of interstellar space. [via]
Voyager 1 Spacecraft, on the cusp of interstellar space. [via]
It’s unlikely we’ll invent a time machine any time soon, though, so historians are still stuck looking at the past in the same way we looked at the stars for so many thousands of years: through a glass, darkly. Like astronomers, we face countless observational distortions, twisting the evidence that appears before us until we’re left with an echo of a shadow of the past. We recreate the past through narratives, combining what we know of human nature with the evidence we’ve gathered, eventually (hopefully) painting ever-clearer pictures of a time we could never touch with our fingers.

Some take our lack of direct access as a good excuse to shake away all trappings of “scientific” methods. This seems ill-advised. Retaining what we’ve learned over the past 50 years about how we construct the world we see is important, but it’s not the whole story, and it’s got enough parallels with 17th century astronomy that we might learn some lessons from that example.

Saturn’s Rings

In the summer 1610, Galileo observed Saturn through a telescope for the first time. He wrote with surprise that

Galileo's observation of Saturn through a telescope, 1610. [via]
Galileo’s Saturn. [via]

the star of Saturn is not a single star, but is a composite of three, which almost touch each other, never change or move relative to each other, and are arranged in a row along the zodiac, the middle one being three times larger than the two lateral ones…

This curious observation would take half a century to resolve into what we today see as Saturn’s rings. Galileo wrote that others, using inferior telescopes, would report seeing Saturn as oblong, rather than as three distinct spheres. Low and behold, within months, several observers reported an oblong Saturn.

Galileo's Saturn in 1616.
Galileo’s Saturn in 1616.

What shocked Galileo even more, however, was an observation two years later when the two smaller bodies disappeared entirely. They appeared consistently, with every observation, and then one day poof they’re gone. And when they eventually did come back, they looked remarkably odd.

Saturn sometimes looked as though it had “handles”, one connected to either side, but the nature of those handles were unknown to Galileo, as was the reason why sometimes it looked like Saturn had handles, sometimes moons, and sometimes nothing at all.

Saturn was just really damn weird. Take a look at these observations from Gassendi a few decades later:

Gassendi's Saturn [via]
Gassendi’s Saturn [via]
What the heck was going on? Many unsatisfying theories were put forward, but there was no real consensus.

Enter Christiaan Huygens, who in the 1650s was fascinated by the Saturn problem. He believed a better telescope was needed to figure out what was going on, and eventually got some help from his brother to build one.

The idea was successful. Within short order, Huygens developed the hypothesis that Saturn was encircled by a ring. This explanation, along with the various angles we would be viewing Saturn and its ring from Earth, accounted for the multitude of appearances Saturn could take. The figure below explains this:

Huygens' Saturn [via]
Huygens’ Saturn [via]
The explanation, of course, was not universally accepted. An opposing explanation by an anti-Copernican Jesuit contested that Saturn had six moons, the configuration of which accounted for the many odd appearances of the planet. Huygens countered that the only way such a hypothesis could be sustained would be with inferior telescopes.

While the exact details of the dispute are irrelevant, the proposed solution was very clever, and speaks to contemporary methods in digital history. The Accademia del Cimento devised an experiment that would, in a way, test the opposing hypotheses. They built two physical models of Saturn, one with a ring, and one with six satellites configured just-so.

The Model of Huygens' Saturn [via]
The Model of Huygens’ Saturn [via]
In 1660, the experimenters at the academy put the model of a ringed Saturn at the end of a 75-meter / 250-foot hallway. Four torches illuminated the model but were obscured from observers, so they wouldn’t be blinded by the torchlight.  Then they had observers view the model through various quality telescopes from the other end of the hallway. The observers were essentially taken from the street, so they wouldn’t have preconceived notions of what they were looking at.

Depending on the distance and quality of the telescope, observers reported seeing an oblong shape, three small spheres, and other observations that were consistent with what astronomers had seen. When seen through a glass, darkly, a ringed Saturn does indeed form the most unusual shapes.

In short, the Accademia del Cimento devised an experiment, not to test the physical world, but to test whether an underlying reality could appear completely different through the various distortions that come along with how we observe it. If Saturn had rings, would it look to us as though it had two small satellites? Yes.

This did not prove Huygens’ theory, but it did prove it to be a viable candidate given the observational instruments at the time. Within a short time, the ring theory became generally accepted.

The Battle of Trafalgar

So what’s Saturn’s ring have to do with the price of tea in China? What about digital history?

The importance is in the experiment and the model. You do not need direct access to phenomena, whether they be historical or astronomical, to build models, conduct experiments, or generally apply scientific-style methods to test, elaborate, or explore a theory.

In October 1805, Lord Nelson led the British navy to a staggering victory against the French and Spanish during the Napoleonic Wars. The win is attributed to Nelson’s unusual and clever battle tactics of dividing his forces in columns perpendicular to the single line of the enemy ships. Twenty-seven British ships defeated thirty-three Franco-Spanish ones. Nelson didn’t lose a single British ship lost, while the Franco-Spanish fleet lost twenty-two.

Horatio Nelson [via]
Horatio Nelson [via]
But let’s say the prevailing account is wrong. Let’s say, instead, due to the direction of the wind and the superior weaponry of the British navy, victory was inevitable: no brilliant naval tactician required.

This isn’t a question of counterfactual history, it’s simply a question of competing theories. But how can we support this new theory without venturing into counterfactual thinking, speculation? Obviously Nelson did lead the fleet, and obviously he did use novel tactics, and obviously a resounding victory ensued. These are indisputable historical facts.

It turns out we can use a similar trick to what the Accademia del Cimento devised in 1660: pretend as though things are different (Saturn has a ring; Nelson’s tactics did not win the battle), and see whether our observations would remain the same (Saturn looks like it is flanked by two smaller moons; the British still defeated the French and Spanish).

It turns out, further, that someone’s already done this. In 2003, two Italian physicists built a simulation of the Battle of Trafalgar, taking into account details of the ships, various strategies, wind direction, speed, and so forth. The simulation is a bit like a video game that runs itself: every ship has its own agency, with the ability to make decisions based on its environment, to attack and defend, and so forth.  It’s from a class of simulations called agent-based models.

When the authors directed the British ships to follow Lord Nelson’s strategy, of two columns, the fleet performed as expected: little loss of life on behalf of the British, major victory, and so forth. But when they ran the model without Nelson’s strategy, a combination of wind direction and superior British firepower still secured a British victory, even though the fleet was outnumbered.

…[it’s said] the English victory in Trafalgar is substantially due to the particular strategy adopted by Nelson, because a different plan would have led the outnumbered British fleet to lose for certain. On the contrary, our counterfactual simulations showed that English victory always occur unless the environmental variables (wind speed and direction) and the global strategies of the opposed factions are radically changed, which lead us to consider the British fleet victory substantially ineluctable.

Essentially, they tested assumptions of an alternative hypothesis, and found those assumptions would also lead to the observed results. A military historian might (and should) quibble with the details of their simplifying assumptions, but that’s all part of the process of improving our knowledge of the world. Experts disagree, replace simplistic assumptions with more informed ones, and then improve the model to see if the results still hold.

The Parable of the Polygons

This agent-based approach to testing theories about how society works is exemplified by the Schelling segregation model. This week the model shot to popularity through Vi Hart and Nicky Case’s Parable of the Polygons, a fabulous, interactive discussion of some potential causes of segregation. Go click on it, play through it, experience it. It’s worth it. I’ll wait.

Finished? Great! The model shows that, even if people only move homes if less than 1/3rd of their neighbors are the same color that they are, massive segregation will still occur. That doesn’t seem like too absurd a notion: everyone being happy with 2/3rds of their neighbors as another color, and 1/3rd as their own, should lead to happy, well-integrated communities, right?

Wrong, apparently. It turns out that people wanting 33% of their neighbors to be the same color as they are is sufficient to cause segregated communities. Take a look at the community created in Parable of the Polygons under those conditions:

Parable of the Polygons
Parable of the Polygons

This shows that very light assumptions of racism can still easily lead to divided communities. It’s not making claims about racism, or about society: what it’s doing is showing that this particular model, where people want a third of their neighbors to be like them, is sufficient to produce what we see in society today. Much like Saturn having rings is sufficient to produce the observation of two small adjacent satellites.

More careful work is needed, then, to decide whether the model is an accurate representation of what’s going on, but establishing that base, that the model is a plausible description of reality, is essential before moving forward.

Digital History

Digital history is a ripe field for this sort of research. Like astronomers, we cannot (yet?) directly access what came before us, but we can still devise experiments to help support our research, in finding plausible narratives and explanations of the past. The NEH Office of Digital Humanities has already started funding workshops and projects along these lines, although they are most often geared toward philosophers and literary historians.

The person doing the most thoughtful theoretical work at the intersection of digital history and agent-based modeling is likely Marten Düring, who is definitely someone to keep an eye on if you’re interested in this area. An early innovator and strong practitioner in this field is Shawn Graham, who actively blogs about related issues.  This technique, however, is far from the only one available to historians for devising experiments with the past. There’s a lot we can still learn from 17th century astronomers.

Understanding Special Relativity through History and Triangles (pt. 1)

We interrupt this usually-DH blog because I got in a discussion about Special Relativity with a friend, and promised it was easily understood using only the math we use for triangles. But I’m a historian, so I can’t leave a good description alone without some background.

If you just want to learn how relativity works, skip ahead to the next post, Relativity Made Simple [Note! I haven’t written it yet, this is a two-part post. Stay-tuned for the next section]; if you hate science and don’t want to know how the universe functions, but love history, read only this post. If you have a month of time to kill, just skip this post entirely and read through my 122-item relativity bibliography on Zotero. Everyone else, disregard this paragraph.

An Oddly Selective History of Relativity

This is not a history of how Einstein came up with his Theory of Special Relativity as laid out in Zur Elektrodynamik bewegter Körper in 1905. It’s filled with big words like aberration and electrodynamics, and equations with occult symbols. We don’t need to know that stuff. This is a history of how others understood relativity. Eventually, you’re going to understand relativity, but first I’m going to tell you how other people, much smarter than you, did not.

There’s an infamous (potentially mythical) story about how difficult it is to understand relativity: Arthur Eddington, a prominent astronomer, was asked whether it was true that only three people in the world understood relativity. After pausing for a moment, Eddington replies “I’m trying to think who the third person is!” This was about General Relativity, but it was also a joke: good scientists know relativity isn’t incredibly difficult to grasp, and even early on, lots of people could claim to understand it.

Good historians, however, know that’s not the whole story. It turns out a lot of people who thought they understood Einstein’s conceptions of relativity actually did not, including those who agreed with him. This, in part, is that story.

Relativity Before Einstein

Einstein’s special theory of relativity relied on two assumptions: (1) you can’t ever tell whether you’re standing still or moving at a constant velocity (or, in physics-speak, the laws of physics in any inertial reference frame are indistinguishable from one another), and (2) light always looks like it’s moving at the same speed (in physics-speak, the speed of light is always constant no matter the velocity of the emitting body nor that of the observer’s inertial reference frame). Let’s trace these concepts back.

Our story begins in the 14th century. William of Occam, famous for his razor, claimed motion was merely the location of a body and its successive positions over time; motion itself was in the mind. Because position was simply defined in terms of the bodies that surround it, this meant motion was relative. Occam’s student, Buridan, pushed that claim forward, saying “If anyone is moved in a ship and imagines that he is at rest, then, should he see another ship which is truly at rest, it will appear to him that the other ship is moved.”

Galileo's relativity [via]. The site where this comes from is a little crazy, but the figure is still useful, so here it is.
Galileo’s relativity [via]. The site where this comes from is a little crazy, but the figure is still useful, so here it is.
The story movies forward at irregular speed (much like the speed of this blog, and the pacing of this post). Within a century, scholars introduced the concepts of an infinite universe without any center, nor any other ‘absolute’ location. Copernicus cleverly latched onto this relativistic thinking by showing that the math works just as well, if not better, when the Earth orbits the Sun, rather than vice versa. Galileo claimed there was no way, on the basis of mechanical experiments, to tell whether you were standing still or moving at a uniform speed.

For his part, Descartes disagreed, but did say that the only way one could discuss movement was relative to other objects. Christian Huygens takes Descartes a step forward, showing that there are no ‘privileged’ motions or speeds (that is, there is no intrinsic meaning of a universal ‘at rest’ – only ‘at rest’ relative to other bodies). Isaac Newton knew that it was impossible to measure something’s absolute velocity (rather than velocity relative to an observer), but still, like Descartes, supported the idea that there was an absolute space and absolute velocity – we just couldn’t measure it.

Lets skip ahead some centuries. The year is 1893; the U.S. Supreme Court declared the tomato was a vegetable, Gandhi campaigned against segregation in South Africa, and the U.S. railroad industry bubble had just popped, forcing the government to bail out AIG for $85 billion. Or something. Also, by this point, most scientists thought light traveled in waves. Given that in order for something to travel in a wave, something has to be waving, scientists posited there was this luminiferous ether that pervaded the universe, allowing light to travel between stars and candles and those fish with the crazy headlights. It makes perfect sense. In order for sound waves to travel, they need air to travel through; in order for light waves to travel, they need the ether.

Ernst Mach, A philosopher read by many contemporaries (including Einstein), said that Newton and Descartes were wrong: absolute space and absolute motion are meaningless. It’s all relative, and only relative motion has any meaning. It is both physically impossible to measure the an objects “real” velocity, and also philosophically nonsensical. The ether, however, was useful. According to Mach and others, we could still measure something kind of like absolute position and velocity by measuring things in relationship to that all-pervasive ether. Presumably, the ether was just sitting still, doing whatever ether does, so we could use its stillness as a reference point and measure how fast things were going relative to it.

Well, in theory. Earth is hurtling through space, orbiting the sun at about 70,000 miles per hour, right? And it’s spinning too, at about a thousand miles an hour. But the ether is staying still. And light, supposedly, always travels at the same speed through the ether no matter what. So in theory, light should look like it’s moving a bit faster if we’re moving toward its source, relative to the ether, and a bit slower, if we’re moving away from it, relative to the ether. It’s just like if you’re in a train hurdling toward a baseball pitcher at 100 mph, and the pitcher throws a ball at you, also at 100 mph, in a futile attempt to stop the train. To you, the baseball will look like it’s going twice as fast, because you’re moving toward it.

The earth moving in the ether. [via]
The earth moving through the ether. [via]
It turns out measuring the speed of light in relation to the ether was really difficult. A bunch of very clever people made a bunch of very clever instruments which really should have measured the speed of earth moving through the ether, based on small observed differences of the speed of light going in different directions, but the experiments always showed light moving at the same speed. Scientists figured this must mean the earth was actually exerting a pull on the ether in its vicinity, dragging it along with it as the earth hurtled through space, explaining why light seemed to be constant in both directions when measured on earth. They devised even cleverer experiments that would account for such an ether drag, but even those seemed to come up blank. Their instruments, it was decided, simply were not yet fine-tuned enough to measure such small variations in the speed of light.

Not so fast! shouted Lorentz, except he shouted it in Dutch. Lorentz used the new electromagnetic theory to suggest that the null results of the ether experiments were actually a result, not of the earth dragging the ether along behind it, but of physical objects compressing when they moved against the ether. The experiments weren’t showing any difference in the speed of light they sought because the measuring instruments themselves contracted to just the right length to perfectly offset the difference in the velocity of light, when measuring “into” the ether. The ether was literally squeezing the electrons in the meter stick together so it became a little shorter; short enough to inaccurately measure light’s speed. The set of equations used to describe this effect became known as Lorentz Transformations. One property of these transformations was that the physical contractions would, obviously, appear the same from any observer. No matter how fast you were going relative to your measuring device, if it were moving into the ether, you would see it contracting slightly to accommodate the measurement difference.

Not so fast! shouted Poincaré, except he shouted it in French. This property of transformations to always appear the same, relative to the ether, was actually a problem. Remember that 500 years of physics that said there is no way to mechanically determine your absolute speed or absolute location in space? Yeah, so did Poincaré. He said the only way you could measure velocity or location was matter-to-matter, not matter-to-ether, so the Lorentz transformations didn’t fly.

It’s worth taking a brief aside to talk about the underpinnings of the theories of both Lorentz and Poincaré. Their theories were based on experimental evidence, which is to say, they based their reasoning on contraction on apparent experimental evidence of said contraction, and they based their theories of relativity off of experimental evidence of motion being relative.

Einstein and Relativity

When Einstein hit the scene in 1905, he approached relativity a bit differently. Instead of trying to fit the apparent contraction of objects from the ether drift experiment to a particular theory, Einstein began with the assumption that light always appeared to move at the same rate, regardless of the relative velocity of the observer. The other assumption he began with was that there was no privileged frame of reference; no absolute space or velocity, only the movement of matter relative to other matter. I’ll work out the math later, but, unsurprisingly, it turned out that working out these assumptions led to exactly the same transformation equations as Lorentz came up with experimentally.

The math was the same. The difference was in the interpretation of the math. Einstein’s theory required no ether, but what’s more, it did not require any physical explanations at all. Because Einstein’s theory of special relativity rested on two postulates about measurement, the theory’s entire implications rested in its ability to affect how we measure or observe the universe. Thus, the interpretation of objects “contracting,” under Einstein’s theory, was that they were not contracting at all. Instead, objects merely appear as though they contract relative to the movement of the observer. Another result of these transformation equations is that, from the perspective of the observer, time appears to move slower or faster depending on the relative speed of what is being observed. Lorentz’s theory predicted the same time dilation effects, but he just chalked it up to a weird result of the math that didn’t actually manifest itself. In Einstein’s theory, however, weird temporal stretching effects were Actually What Was Going On.

To reiterate: the math of Lorentz, Einstein, and Poincaré were (at least at this early stage) essentially equivalent. The result was that no experimental result could favor one theory over another. The observational predictions between each theory were exactly the same.

Relativity’s Supporters in America

I’m focusing on America here because it’s rarely focused on in the historiography, and it’s about time someone did. If I were being scholarly and citing my sources, this might actually be a novel contribution to historiography. Oh well, BLOG! All my primary sources are in that Zotero library I linked to earlier.

In 1910, Daniel Comstock wrote a popular account of the relativity of Lorentz and Einstein, to some extent conflating the two. He suggested that if Einstein’s postulates could be experimentally verified, his special theory of relativity would be true. “If either of these postulates be proved false in the future, then the structure erected can not be true in is present form. The question is, therefore, an experimental one.” Comstock’s statement betrays a misunderstanding of Einstein’s theory, though, because, at the time of that writing, there was no experimental difference between the two theories.

Gilbert Lewis and Richard Tolman presented a paper at the 1908 American Physical Society in New York, where they describe themselves as fully behind Einstein over Lorentz. Oddly, they consider Einstein’s theory to be correct, as opposed to Lorentz’s, because his postulates were “established on a pretty firm basis of experimental fact.” Which, to reiterate, couldn’t possibly have been a difference between Lorentz and Einstein. Even more oddly still, they presented the theory not as one of physics or of measurement, but of psychology (a bit like 14th century Oresme). The two went on to separately write a few articles which supposedly experimentally confirmed the postulates of special relativity.

In fact, the few Americans who did seem to engage with the actual differences between Lorentz and Einstein did so primarily in critique. Louis More, a well-respected physicist from Cincinnati, labeled the difference as metaphysical and primarily useless. This American critique was fairly standard.

At the 1909 America Physical Society meeting in Boston, one physicist (Harold Wilson) claimed his experiments showed the difference between Einstein and Lorentz. One of the few American truly theoretical physicists, W.S. Franklin, was in attendance, and the lectures he saw inspired him to write a popular account of relativity in 1911; in it, he found no theoretical difference between Lorentz and Einstein. He tended to side theoretically with Einstein, but assumed Lorentz’s theory implied the same space and time dilation effects, which they did not.

Even this series of misunderstandings should be taken as shining examples in the context of an American approach to theoretical physics that was largely antagonistic, at times decrying theoretical differences entirely. At a symposium on Ether Theories at the 1911 APS, the presidential address by William Magie was largely about the uselessness of relativity because, according to him, physics should be a functional activity based in utility and experimentation. Joining Magie’s “side” in the debate were Michelson, Morley, and Arthur Gordon Webster, the co-founder of the America Physical Society. Of those at the meeting supporting relativity, Lewis was still convinced Einstein differed experimentally from Lorentz, and Franklin and Comstock each felt there was no substantive difference between the two. In 1912, Indiana University’s R.D. Carmichael stated Einstein’s postulates were “a direct generalization from experiment.” In short, the American’s were really focused on experiment.

Of Einstein’s theory, Louis More wrote in 1912:

Professor Einstein’s theory of Relativity [… is] proclaimed somewhat noisily to be the greatest revolution in scientific method since the time of Newton. That [it is] revolutionary there can be no doubt, in so far as [it] substitutes mathematical symbols as the basis of science and denies that any concrete experience underlies these symbols, thus replacing an objective by a subjective universe. The question remains whether this is a step forward or backward […] if there is here any revolution in thought, it is in reality a return to the scholastic methods of the Middle Ages.

More goes on to say how the “Anglo-Saxons” demand practical results, not the unfathomable theories of “the German mind.” Really, that quote about sums it up. By this point, the only Americans who even talked about relativity were the ones who trained in Germany.

I’ll end here, where most histories of the reception of relativity begin: the first Solvay Conference. It’s where this beautiful picture was taken.

First Solvay Conference. [via]
First Solvay Conference. [via]
To sum up: in the seven year’s following Einstein’s publication, the only Americans who agreed with Einstein were ones who didn’t quite understand him. You, however, will understand it much better, if you only read the next post [coming this week!].

Do historians need scientists?

[edit: I’m realizing I didn’t make it clear in this post that I’m aware many historians consider themselves scientists, and that there’s plenty of scientific historical archaeology and anthropology. That’s exactly what I’m advocating there be more of, and more varied.]

Short Answer: Yes.

Less Snarky Answer: Historians need to be flexible to fresh methods, fresh perspectives, and fresh blood. Maybe not that last one, I guess, as it might invite vampires.Okay, I suppose this answer wasn’t actually less snarky.

Long Answer

The long answer is that historians don’t necessarily need scientists, but that we do need fresh scientific methods. Perhaps as an accident of our association with the ill-defined “humanities”, or as a result of our being placed in an entirely different culture (see: C.P. Snow), most historians seem fairly content with methods rooted in thinking about text and other archival evidence. This isn’t true of all historians, of course – there are economic historians who use statistics, historians of science who recreate old scientific experiments, classical historians who augment their research with archaeological findings, archival historians who use advanced ink analysis,  and so forth. But it wouldn’t be stretching the truth to say that, for the most part, historiography is the practice of thinking cleverly about words to make more words.

I’ll argue here that our reliance on traditional methods (or maybe more accurately, our odd habit of rarely discussing method) is crippling historiography, and is making it increasingly likely that the most interesting and innovative historical work will come from non-historians. Sometimes these studies are ill-informed, especially when the authors decide not to collaborate with historians who know the subject, but to claim that a few ignorant claims about history negate the impact of these new insights is an exercise in pedantry.

In defending the humanities, we like to say that scientists and technologists with liberal arts backgrounds are more well-rounded, better citizens of the world, more able to contextualize their work. Non-humanists benefit from a liberal arts education in pretty much all the ways that are impossible to quantify (and thus, extremely difficult to defend against budget cuts). We argue this in the interest of rounding a person’s knowledge, to make them aware of their past, of their place in a society with staggering power imbalances and systemic biases.

Humanities departments should take a page from their own books. Sure, a few general ed requirements force some basic science and math… but I got an undergraduate history degree in a nice university, and I’m well aware how little STEM I actually needed to get through it. Our departments are just as guilty of narrowness as those of our STEM colleagues, and often because of it, we rely on applied mathematicians, statistical physicists, chemists, or computer scientists to do our innovative work for (or sometimes, thankfully, with) us.

Of course, there’s still lots of innovative work to be done from a textual perspective. I’m not downplaying that. Not everyone needs to use crazy physics/chemistry/computer science/etc. methods. But there’s a lot of low hanging fruit at the intersection of historiography and the natural sciences, and we’re not doing a great job of plucking it.

The story below is illustrative.

Gutenberg

Last night, Blaise Agüera y Arcas presented his research on Gutenberg to a packed house at our rare books library. He’s responsible for a lot of the cool things that have come out of Microsoft in the last few years, and just got a job at Google, where presumably he will continue to make cool things. Blaise has degrees in physics and applied mathematics. And, a decade ago, Blaise and historian/librarian Paul Needham sent ripples through the History of the Book community by showing that Gutenberg’s press did not work at all the way people expected.

It was generally assumed that Gutenberg employed a method called punchcutting in order to create a standard font. A letter carved into a metal rod (a “punch”) would be driven into a softer metal (a “matrix”) in order to create a mold. The mold would be filled with liquid metal which hardened to form a small block of a single letter (a “type”), which would then be loaded onto the press next to other letters, inked, and then impressed onto a page. Because the mold was metal, many duplicate “types” could be made of the same letter, thus allowing many uses of the same letter to appear identical on a single pressed page.

Punch matrix system. [via]
Punch matrix system. [via]
Type to be pressed. [via]
Type to be pressed. [via]
This process is what allowed all the duplicate letters to appear identical in Gutenberg’s published books. Except, of course, careful historians of early print noticed that letters weren’t, in fact, identical. In the 1980s, Paul Needham and a colleague attempted to produce an inventory of all the different versions of letters Gutenberg used, but they stopped after frequently finding 10 or more obviously distinct versions of the same letter.

Needham's inventory of Gutenberg type. [via]
Needham’s inventory of Gutenberg type. [via]
This was perplexing, but the subject was bracketed away for a while, until Blaise Agüera y Arcas came to Princeton and decided to work with Needham on the problem. Using extremely high-resolution imagining techniques, Blaise noted that there were in fact hundreds of versions of every letter. Not only that, there were actually variations and regularities in the smaller elements that made up letters. For example, an “n” was formed by two adjacent vertical lines, but occasionally the two vertical lines seem to have flipped places entirely. The extremely basic letter “i” itself had many variations, but within those variations, many odd self-similarities.

Variations in the letter "i" in Gutenberg's type. [via]
Variations in the letter “i” in Gutenberg’s type. [via]
Historians had, until this analysis, assumed most letter variations were due to wear of the type blocks. This analysis blew that hypothesis out of the water. These “i”s were clearly not all made in the same mold; but then, how had they been made? To answer this, they looked even closer at the individual letters.

 

Close up of Gutenberg letters, with light shining through page. [via]
Close up of Gutenberg letters, with light shining through page. [via]
It’s difficult to see at first glance, but they found something a bit surprising. The letters appeared to be formed of overlapping smaller parts: a vertical line, a diagonal box, and so forth. The below figure shows a good example of this. The glyphs on the bottom have have a stem dipping below the bottom horizontal line, while the glyphs at the top do not.

Abbreviation of 'per'. [via]
Abbreviation of ‘per’. [via]
The conclusion Needham and Agüera y Arcas drew, eventually, was that the punchcutting method must not have been used for Gutenberg’s early material. Instead, a set of carved “strokes” were pushed into hard sand or soft clay, configured such that the strokes would align to form various letters, not unlike the formation of cuneiform. This mold would then be used to cast letters, creating the blocks we recognize from movable type. The catch is that this soft clay could only cast letters a few times before it became unusable and would need to be recreated. As Gutenberg needed multiple instances of individual letters per page, many of those letters would be cast from slightly different soft molds.

Low-Hanging Fruit

At the end of his talk, Blaise made an offhand comment: how is it that historians/bibliographers/librarians have been looking at these Gutenbergs for so long, discussing the triumph of their identical characters, and not noticed that the characters are anything but uniform? Or, of those who had noticed it, why hadn’t they raised any red flags?

The insights they produced weren’t staggering feats of technology. He used a nice camera, a light shining through the pages of an old manuscript, and a few simple image recognition and clustering algorithms. The clustering part could even have been done by hand, and actually had been, by Paul Needham. And yes, it’s true, everything is obvious in hindsight, but there were a lot of eyes on these bibles, and odds are if some of them had been historians who were trained in these techniques, this insight could have come sooner. Every year students do final projects and theses and dissertations, but what percent of those use techniques from outside historiography?

In short, there’s a lot of very basic assumptions we make about the past that could probably be updated significantly if we had the right skillset, or knew how to collaborate with those who did. I think people like William Newman, who performs Newton’s alchemical experiments, is on the right track. As is Shawn Graham, who reanimates the trade networks of ancient Rome using agent-based simulations, or Devon Elliott, who creates computational and physical models of objects from the history of stage magic. Elliott’s models have shown that certain magic tricks couldn’t possibly have worked as they were described to.

The challenge is how to encourage this willingness to reach outside traditional historiographic methods to learn about the past. Changing curricula to be more flexible is one way, but that is a slow and institutionally difficult process. Perhaps faculty could assign group projects to students taking their gen-ed history courses, encouraging disciplinary mixes and non-traditional methods. It’s an open question, and not an easy one, but it’s one we need to tackle.

Appreciability & Experimental Digital Humanities

Operationalize: to express or define (something) in terms of the operations used to determine or prove it.

Precision deceives. Quantification projects an illusion of certainty and solidity no matter the provenance of the underlying data. It is a black box, through which uncertain estimations become sterile observations. The process involves several steps: a cookie cutter to make sure the data are all shaped the same way, an equation to aggregate the inherently unique, a visualization to display exact values from a process that was anything but.

In this post, I suggest that Moretti’s discussion of operationalization leaves out an integral discussion on precision, and I introduce a new term, appreciability, as a constraint on both accuracy and precision in the humanities. This conceptual constraint paves the way for an experimental digital humanities.

Operationalizing and the Natural Sciences

An operationalization is the use of definition and measurement to create meaningful data. It is an incredibly important aspect of quantitative research, and it has served the western world well for at leas 400 years. Franco Moretti recently published a LitLab Pamphlet and a nearly identical article in the New Left Review about operationalization, focusing on how it can bridge theory and text in literary theory. Interestingly, his description blurs the line between the operationalization of his variables (what shape he makes the cookie cutters that he takes to his text) and the operationalization of his theories (how the variables interact to form a proxy for his theory).

Moretti’s account anchors the practice in its scientific origin, citing primarily physicists and historians of physics. This is a deft move, but an unexpected one in a recent DH environment which attempts to distance itself from a narrative of humanists just playing with scientists’ toys. Johanna Drucker, for example, commented on such practices:

[H]umanists have adopted many applications […] that were developed in other disciplines. But, I will argue, such […] tools are a kind of intellectual Trojan horse, a vehicle through which assumptions about what constitutes information swarm with potent force. These assumptions are cloaked in a rhetoric taken wholesale from the techniques of the empirical sciences that conceals their epistemological biases under a guise of familiarity.

[…]

Rendering observation (the act of creating a statistical, empirical, or subjective account or image) as if it were the same as the phenomena observed collapses the critical distance between the phenomenal world and its interpretation, undoing the basis of interpretation on which humanistic knowledge production is based.

But what Drucker does not acknowledge here is that this positivist account is a century-old caricature of the fundamental assumptions of the sciences. Moretti’s account of operationalization as it percolates through physics is evidence of this. The operational view very much agrees with Drucker’s thesis, where the phenomena observed takes second stage to a definition steeped in the nature of measurement itself. Indeed, Einstein’s introduction of relativity relied on an understanding that our physical laws and observations of them rely not on the things themselves, but on our ability to measure them in various circumstances. The prevailing theory of the universe on a large scale is a theory of measurement, not of matter. Moretti’s reliance on natural scientific roots, then, is not antithetical to his humanistic goals.

I’m a bit horrified to see myself typing this, but I believe Moretti doesn’t go far enough in appropriating natural scientific conceptual frameworks. When describing what formal operationalization brings to the table that was not there before, he lists precision as the primary addition. “It’s new because it’s precise,” Moretti claims, “Phaedra is allocated 29 percent of the word-space, not 25, or 39.” But he asks himself: is this precision useful? Sometimes, he concludes, “It adds detail, but it doesn’t change what we already knew.”

From Moretti, 'Operationalizing', New Left Review.
From Moretti, ‘Operationalizing’, New Left Review.

I believe Moretti is asking the wrong first question here, and he’s asking it because he does not steal enough from the natural sciences. The question, instead, should be: is this precision meaningful? Only after we’ve assessed the reliability of new-found precision can we understand its utility, and here we can take some inspiration from the scientists, in their notions of accuracy, precision, uncertainty, and significant figures.

Terminology

First some definitions. The accuracy of a measurement is how close it is to the true value you are trying to capture, whereas the precision of a measurement is how often a repeated measurement produces the same results. The number of significant figures is a measurement of how precise the measuring instrument can possibly be. False precision is the illusion that one’s measurement is more precise than is warranted given the significant figures. Propagation of uncertainty is the pesky habit of false precision to weasel its way into the conclusion of a study, suggesting conclusions that might be unwarranted.

Accuracy and Precision. [via]
Accuracy and Precision. [via]
Accuracy roughly corresponds to how well-suited your operationalization is to finding the answer you’re looking for. For example, if you’re interested in the importance of Gulliver in Gulliver’s Travels, and your measurement is based on how often the character name is mentioned (12 times, by the way), you can be reasonably certain your measurement is inaccurate for your purposes.

Precision roughly corresponds to how fine-tuned your operationalization is, and how likely it is that slight changes in measurement will affect the outcomes of the measurement. For example, if you’re attempting to produce a network of interacting characters from The Three Musketeers, and your measuring “instrument” is increase the strength of connection between two characters every time they appear in the same 100-word block, then you might be subject to difficulties of precision. That is, your network might look different if you start your sliding 100-word window from the 1st word, the 15th word, or the 50th word. The amount of variation in the resulting network is the degree of imprecision of your operationalization.

Significant figures are a bit tricky to port to DH use. When you’re sitting at home, measuring some space for a new couch, you may find that your meter stick only has tick marks to the centimeter, but nothing smaller. This is your highest threshold for precision; if you eyeballed and guessed your space was actually 250.5cm, you’ll have reported a falsely precise number. Others looking at your measurement may have assumed your meter stick was more fine-grained than it was, and any calculations you make from that number will propagate that falsely precise number.

Significant Figures. [via]
Significant Figures. [via]
Uncertainty propagation is especially tricky when you wind up combing two measurements together, when one is more precise and the other less. The rule of thumb is that your results can only be as precise as the least precise measurements that made its way into your equation. The final reported number is then generally in the form of 250 (±1 cm). Thankfully, for our couch, the difference of a centimeter isn’t particularly appreciable. In DH research, I have rarely seen any form of precision calculated, and I believe some of those projects would have reported different results had they accurately represented their significant figures.

Precision, Accuracy, and Appreciability in DH

Moretti’s discussion of the increase of precision granted by operationalization leaves out any discussion of the certainty of that precision. Let’s assume for a moment that his operationalization is accurate (that is, his measurement is a perfect conversion between data and theory). Are his measurements precise? In the case of Phaedra, the answer at first glance is yes, words-per-character in a play would be pretty robust against slight changes in the measurement process.

And yet, I imagine, that answer will probably not sit well with some humanists. They may ask themselves: Is Oenone’s 12%  appreciably different from Theseus’s 13% of the word-space of the play? In the eyes of the author? Of the actors? Of the audience? Does the difference make a difference?

The mechanisms by which people produce and consume literature is not precise. Surely Jean Racine did not sit down intending to give Theseus a fraction more words than Oenone. Perhaps in DH we need a measurement of precision, not of the measuring device, but of our ability to interact with the object we are studying. In a sense, I’m arguing, we are not limited to the precision of the ruler when measuring humanities objects, but to the precision of the human.

In the natural sciences, accuracy is constrained by precision: you can only have as accurate a measurement as your measuring device is precise.  In the corners of humanities where we study how people interact with each other and with cultural objects, we need a new measurement that constrains both precision and accuracy: appreciability. A humanities quantification can only be as precise as that precision is appreciable by the people who interact with matter at hand. If two characters differ by a single percent of the wordspace, and that difference is impossible to register in a conscious or subconscious level, what is the meaning of additional levels of precision (and, consequently, additional levels of accuracy)?

Experimental Digital Humanities

Which brings us to experimental DH. How does one evaluate the appreciability of an operationalization except by devising clever experiments to test the extent of granularity a person can register? Without such understanding, we will continue to create formulae and visualizations which portray a false sense of precision. Without visual cues to suggest uncertainty, graphs present a world that is exact and whose small differentiations appear meaningful or deliberate.

Experimental DH is not without precedent. In Reading Tea Leaves (Chang et al., 2009), for example, the authors assessed the quality of certain topic modeling tweaks based on how a large number of people assessed the coherence of certain topics. If this approach were to catch on, as well as more careful acknowledgements of accuracy, precision, and appreciability, then those of us who are making claims to knowledge in DH can seriously bolster our cases.

There are some who present the formal nature of DH as antithetical to the highly contingent and interpretative nature of the larger humanities. I believe appreciability and experimentation can go some way alleviating the tension between the two schools, building one into the other. On the way, it might build some trust in humanists who think we sacrifice experience for certainty, and in natural scientists who are skeptical of our abilities to apply quantitative methods.

Right now, DH seems to find its most fruitful collaborations in computer science or statistics departments. Experimental DH would open the doors to new types of collaborations, especially with psychologists and sociologists.

I’m at an extremely early stage in developing these ideas, and would welcome all comments (especially those along the lines of “You dolt! Appreciability already exists, we call it x.”) Let’s see where this goes.

Bridging Token and Type

There’s an oft-spoken and somewhat strawman tale of how the digital humanities is bridging C.P. Snow’s “Two Culture” divide, between the sciences and the humanities. This story is sometimes true (it’s fun putting together Ocean’s Eleven-esque teams comprising every discipline needed to get the job done) and sometimes false (plenty of people on either side still view the other with skepticism), but as a historian of science, I don’t find the divide all that interesting. As Snow’s title suggests, this divide is first and foremost cultural. There’s another overlapping divide, a bit more epistemological, methodological, and ontological, which I’ll explore here. It’s the nomothetic(type)/idiographic(token) divide, and I’ll argue here that not only are its barriers falling, but also that the distinction itself is becoming less relevant.

Nomothetic (Greek for “establishing general laws”-ish) and Idiographic (Greek for “pertaining to the individual thing”-ish) approaches to knowledge have often split the sciences and the humanities. I’ll offload the hard work onto Wikipedia:

Nomothetic is based on what Kant described as a tendency to generalize, and is typical for the natural sciences. It describes the effort to derive laws that explain objective phenomena in general.

Idiographic is based on what Kant described as a tendency to specify, and is typical for the humanities. It describes the effort to understand the meaning of contingent, unique, and often subjective phenomena.

These words are long and annoying to keep retyping, and so in the longstanding humanistic tradition of using new words for words which already exist, henceforth I shall refer to nomothetic as type and idiographic as token. 1 I use these because a lot of my digital humanities readers will be familiar with their use in text mining. If you counted the number of unique words in a text, you’d be be counting the number of types. If you counted the number of total words in a text, you’d be counting the number of tokens, because each token (word) is an individual instance of a type. You can think of a type as the platonic ideal of the word (notice the word typical?), floating out there in the ether, and every time it’s actually used, it’s one specific token of that general type.

The Token/Type Distinction
The Token/Type Distinction

Usually the natural and social sciences look for general principles or causal laws, of which the phenomena they observe are specific instances. A social scientist might note that every time a student buys a $500 textbook, they actively seek a publisher to punch, but when they purchase $20 textbooks, no such punching occurs. This leads to the discovery of a new law linking student violence with textbook prices. It’s worth noting that these laws can and often are nuanced and carefully crafted, with an awareness that they are neither wholly deterministic nor ironclad.

[via]
[via]
The humanities (or at least history, which I’m more familiar with) are more interested in what happened than in what tends to happen. Without a doubt there are general theories involved, just as in the social sciences there are specific instances, but the intent is most-often to flesh out details and create a particular internally consistent narrative. They look for tokens where the social scientists look for types. Another way to look at it is that the humanist wants to know what makes a thing unique, and the social scientist wants to know what makes a thing comparable.

It’s been noted these are fundamentally different goals. Indeed, how can you in the same research articulate the subjective contingency of an event while simultaneously using it to formulate some general law, applicable in all such cases? Rather than answer that question, it’s worth taking time to survey some recent research.

A recent digital humanities panel at MLA elicited responses by Ted Underwood and Haun Saussy, of which this post is in part itself a response. One of the papers at the panel, by Long and So, explored the extent to which haiku-esque poetry preceded what is commonly considered the beginning of haiku in America by about 20 years. They do this by teaching the computer the form of the haiku, and having it algorithmically explore earlier poetry looking for similarities. Saussy comments on this work:

[…] macroanalysis leads us to reconceive one of our founding distinctions, that between the individual work and the generality to which it belongs, the nation, context, period or movement. We differentiate ourselves from our social-science colleagues in that we are primarily interested in individual cases, not general trends. But given enough data, the individual appears as a correlation among multiple generalities.

One of the significant difficulties faced by digital humanists, and a driving force behind critics like Johanna Drucker, is the fundamental opposition between the traditional humanistic value of stressing subjectivity, uniqueness, and contingency, and the formal computational necessity of filling a database with hard decisions. A database, after all, requires you to make a series of binary choices in well-defined categories: is it or isn’t it an example of haiku? Is the author a man or a woman? Is there an author or isn’t there an author?

Underwood addresses this difficulty in his response:

Though we aspire to subtlety, in practice it’s hard to move from individual instances to groups without constructing something like the sovereign in the frontispiece for Hobbes’ Leviathan – a homogenous collection of instances composing a giant body with clear edges.

But he goes on to suggest that the initial constraint of the digital media may not be as difficult to overcome as it appears. Computers may even offer us a way to move beyond the categories we humanists use, like genre or period.

Aren’t computers all about “binary logic”? If I tell my computer that this poem both is and is not a haiku, won’t it probably start to sputter and emit smoke?

Well, maybe not. And actually I think this is a point that should be obvious but just happens to fall in a cultural blind spot right now. The whole point of quantification is to get beyond binary categories — to grapple with questions of degree that aren’t well-represented as yes-or-no questions. Classification algorithms, for instance, are actually very good at shades of gray; they can express predictions as degrees of probability and assign the same text different degrees of membership in as many overlapping categories as you like.

Here we begin to see how the questions asked of digital humanists (on the one side; computational social scientists are tackling these same problems) are forcing us to reconsider the divide between the general and the specific, as well as the meanings of categories and typologies we have traditionally taken for granted. However, this does not yet cut across the token/type divide: this has gotten us to the macro scale, but it does not address general principles or laws that might govern specific instances. Historical laws are a murky subject, prone to inducing fits of anti-deterministic rage. Complex Systems Science and the lessons we learn from Agent-Based Modeling, I think, offer us a way past that dilemma, but more on that later.

For now, let’s talk about influence. Or diffusion. Or intertextuality. 2 Matthew Jockers has been exploring these concepts, most recently in his book Macroanalysis. The undercurrent of his research (I think I’ve heard him call it his “dangerous idea”) is a thread of almost-determinism. It is the simple idea that an author’s environment influences her writing in profound and easy to measure ways. On its surface it seems fairly innocuous, but it’s tied into a decades-long argument about the role of choice, subjectivity, creativity, contingency, and determinism. One word that people have used to get around the debate is affordances, and it’s as good a word as any to invoke here. What Jockers has found is a set of environmental conditions which afford certain writing styles and subject matters to an author. It’s not that authors are predetermined to write certain things at certain times, but that a series of factors combine to make the conditions ripe for certain writing styles, genres, etc., and not for others. The history of science analog would be the idea that, had Einstein never existed, relativity and quantum physics would still have come about; perhaps not as quickly, and perhaps not from the same person or in the same form, but they were ideas whose time had come. The environment was primed for their eventual existence. 3

An example of shape affording certain actions by constraining possibilities and influencing people. [via]
An example of shape affording certain actions by constraining possibilities and influencing people. [via]
It is here we see the digital humanities battling with the token/type distinction, and finding that distinction less relevant to its self-identification. It is no longer a question of whether one can impose or generalize laws on specific instances, because the axes of interest have changed. More and more, especially under the influence of new macroanalytic methodologies, we find that the specific and the general contextualize and augment each other.

The computational social sciences are converging on a similar shift. Jon Kleinberg likes to compare some old work by Stanley Milgram 4, where he had people draw maps of cities from memory, with digital city reconstruction projects which attempt to bridge the subjective and objective experiences of cities. The result in both cases is an attempt at something new: not quite objective, not quite subjective, and not quite intersubjective. It is a representation of collective individual experiences which in its whole has meaning, but also can be used to contextualize the specific. That these types of observations can often lead to shockingly accurate predictive “laws” isn’t really the point; they’re accidental results of an attempt to understand unique and contingent experiences at a grand scale. 5

Manhattan. Dots represent where people have taken pictures; blue dots are by locals, red by tourists, and yellow unsure. [via Eric Fischer]
Manhattan. Dots represent where people have taken pictures; blue dots are by locals, red by tourists, and yellow are uncertain. [via Eric Fischer]
It is no surprise that the token/type divide is woven into the subjective/objective divide. However, as Daston and Galison have pointed out, objectivity is not an ahistorical category. 6 It has a history, is only positively defined in relation to subjectivity, and neither were particularly useful concepts before the 19th century.

I would argue, as well, that the nomothetic and idiographic divide is one which is outliving its historical usefulness. Work from both the digital humanities and the computational social sciences is converging to a point where the objective and the subjective can peaceably coexist, where contingent experiences can be placed alongside general predictive principles without any cognitive dissonance, under a framework that allows both deterministic and creative elements. It is not that purely nomothetic or purely idiographic research will no longer exist, but that they no longer represent a binary category which can usefully differentiate research agendas. We still have Snow’s primary cultural distinctions, of course, and a bevy of disciplinary differences, but it will be interesting to see where this shift in axes takes us.

Notes:

  1. I am not the first to do this. Aviezer Tucker (2012) has a great chapter in The Oxford Handbook of Philosophy of Social Science, “Sciences of Historical Tokens and Theoretical Types: History and the Social Sciences” which introduces and historicizes the vocabulary nicely.
  2. Underwood’s post raises these points, as well.
  3. This has sometimes been referred to as environmental possibilism.
  4. Milgram, Stanley. 1976. “Pyschological Maps of Paris.” In Environmental Psychology: People and Their Physical Settings, edited by Proshansky, Ittelson, and Rivlin, 104–124. New York.

    ———. 1982. “Cities as Social Representations.” In Social Representations, edited by R. Farr and S. Moscovici, 289–309.

  5. If you’re interested in more thoughts on this subject specifically, I wrote a bit about it in relation to single-authorship in the humanities here
  6. Daston, Lorraine, and Peter Galison. 2007. Objectivity. New York, NY: Zone Books.