How many people do you need? Is an artistic movement only a movement as a collective? Can one person alone carry the melody?
Over the course of 12 hours between October 29th and October 31st, a pop-up writing collective of artists, scholars, and algorithms uncovered a fragmentary history of the Center for Midnight, an imagined artistic movement of the late twentieth century.
We named ourselves the Midnight Society, though our membership was as difficult to enumerate as our goals. About thirty participants wandered in and out of the STUDIO for Creative Inquiry at Carnegie Mellon University over those three evenings, contributing words or technical expertise or editorial opinions or halloween candy, some for moments and others for hours. Members arrived from as far as the Atlantic and the Pacific, though the heart of our collective rested in Pittsburgh, the mind in Robin Sloan, and the words in a neural network taught to read by biographers and comedians.
The Center for Midnight began, as so many things do, with a blank page and a blinking cursor.
When we first invited bestselling author and technologist Robin Sloan to Pittsburgh, we knew we wanted him for an extended artist’s residency, but we didn’t have an end goal in sight. His first book, Mr. Penumbra’s 24-Hour Bookstore, has been called a love letter to digital humanities (by me, among others), so you can see why a digital humanities center like ours would be interested in bringing him to town.
Inspiration came from his most recent experiments on human/computer collaborative writing. Sloan is developing a sort of cyborg text editor, an algorithmic cure for writer’s block, a machine that reads what you’ve written so far and offers a few words that might come next. It does so by reaching into its model of language, a recurrent neural network trained on whatever collection of text seems appropriate, and trying to find sensible endings to the sentence you began.
Together with the Frank-Ratchye STUDIO for Creative Inquiry and the Department of English, Carnegie Mellon University’s dSHARP Center for Innovative Digital Initiatives decided to invite Sloan to lead a three-day experiment of generative fiction. We would assemble a multi-talented team of artists and scholars from around Pittsburgh and elsewhere, connect them with Robin Sloan’s generative text editor, and attempt to assemble a readable short story in the space of 12 hours. The results exceeded our high expectations.
Before the workshop, we established a few ground rules. Participants would be capped at a dozen (we failed at keeping to that rule, to our benefit), would need to commit to being available every day (also failed, and also worked out fine), and would need to come with diverse skills and backgrounds (thankfully, finally, a success). More than four hours of writing a day would be a slog, and most people had daytime commitments, so we settled on 4-8pm, Monday through Wednesday, with copious food provided.
Inspired by David Markson’s The Last Novel, Robin decided to assemble the short story in 1-3 sentence snippets, which would allow people to contribute as much or as little as they were able. The story would be about a yet-unnamed artistic movement, so Robin pre-trained his recurrent neural network on the biographies of artists.
When everyone arrived on the afternoon of October 29th, the house was surprisingly packed; well over the dozen people I’d hand-selected, with more trickling in as the night went on. I guess word had gotten out. Our temporary base, the STUDIO for Creative Inquiry, acts as a home to rotating students, artists, and other ne’er-do-wells; its residents filled out our rogues’ gallery.
We spent a good while introducing ourselves, which proved important. By the end of the workshop, though our numbers had thinned, we wound up leaning on each person’s interests and skills for the tasks required to finish the story.
Robin introduced the premise, that each of us would use the algorithmic collaboration tool to assemble snippets of text about some fictional artist or artistic theme, and dump our results into a collective google doc. We produced about 80 snippets in all, ranging from a handful of words to over 300, each appended with a brief process note from the author:
“I have no idea if Maabundas is a real word or if it was generated nonsense. Either way, it sounds cool.”
“Following up on my magical steer from a previous text chunk. This required a bit of guidance. I liked that it made me think of how I wanted certain bits to sound by generating text that I could respond to.”
Over the course of the night, we brainstormed other documents on which to train the neural network, and we settled on a bunch of biographies from the Harlem Renaissance, a corpus of stand-up comedy scripts, and the collected biographies of art collectors.
At the end of the evening, each of us picked a favorite line or phrase to share with the group, including:
“The institutionalized monks of Yann Hirsch”
“The Center for Midnight”
“The golden age of lithography”
“He wandered down to the beach, watching as the anti-capitalist, plant-themed novelist and short story writer wrestled with filmmaker Benjamin John O’Toole in a drunken bout of delirium.”
Stuffed with Mediterranean food and halloween candy, we went our separate ways, while Robin continued to work.
A 6×6 wall of seemingly blank post-its awaited us in the STUDIO. Each had on its sticky side a unique short instruction from Robin, defining a period, a subject, and a method: inception / artistic work / generate text; conflict / artist / mine text; development / relationship / generate text.
We also arrived to a one-page description, assembled by Robin from yesterday’s favorite lines:
THE CENTER FOR MIDNIGHT (1967-1978)Methods: (primarily but not exclusively) lithography and embroidery
Obsessed with: the sea, aging, and time
Inception → Development → Conflict → Dissolution
Dramatis personaeStrongly consider mentioning one or more of these:
-The filmmaker Benjamin John O'Toole
-Territoria Migraine ← yep that's a name
-The institutionalized monks of Yann Hirsch
Today’s assignment was to uncover the Center for Midnight’s story, which began in 1967 (the average year from yesterday’s google doc) and ended in 1978 (the median year). We each took a sticky note in turn, read our instructions, and got to work writing about the artists, artworks, and relationships that circled the Center at every stage of its short life. When finished, we deposited the text in a new google doc, exchanged stickies, and started all over again.
The Midnight Society, as I started thinking of our team, wrote 4,300 words that day. We riffed off each other, taking narrative threads we saw being dropped in the google doc and weaving them through our own snippets of semi-generated prose.
While we wrote, we listened to a ghostly soundtrack of music generated by Robin, assembled from a neural network trained on an artist he wished had produced more music.
Today’s algorithmic collaboration felt a bit different, now that we’d expanded the corpus on which the model was trained to include art collectors, artists from the Harlem Renaissance, and stand-up comedians. It was a bizarre time.
The night wrapped up, again, with the eating of food and picking of favorites:
“She embroidered the ideas of Laura de Gioste on a seaside tree.”
“Many works found considerable readers in the airport, specifically the painting called Neue Big Chrome.”
“Minerva Black’s irreverent embroidery depicted classical Greek figures alongside high-tech imagery: Athena and her computer.”
“When he died, she is reported to have said, ‘He became a response to himself.’”
The evening of Halloween, and only the most dedicated remained. About ten of us arrived to a soundtrack Robin had generated just that morning. The music was not unlike the calls of a dying caribou, and about as distressing, which if nothing else fit the holiday.
A new google doc awaited us, assembled from the words we’d contributed yesterday, though significantly reduced. Robin put order to our words, replaced a few proper nouns to solidify the narrative thread, and gave us some time to read what we’d written (or be impressed by this master author’s ability to give meaning to madness).
To polish the draft off, we marked the passages that confused or displeased us, and then each spent a while fixing the problem sections: making the narrative flow, removing tangents, and tightening the prose. On the final readthrough, we vetoed changes that needed vetoing, revived a few beloved but cut lines, and generally marveled at the readability of the final piece.
Somehow, amidst the chaos of machine prose and a barely coordinated, rotating group of amateurs, we assembled a story with a narrative arc, delicious prose, and a coherent (if strange) plot.
We can answer the question that drove our experimental workshop: can a dozen artists, technologists, and scholars collaborate with each other and with machines to produce a readable, interesting story in under 12 hours?
Yes, if guided by a professional cyborg author like Robin Sloan.
While I can’t speak for the others, I found this to be the most refreshing writing crucible I’ve yet experienced.
I rarely get the opportunity to write fiction, but when I do, it’s a one-way street. I can send words to an empty screen, but the screen never sends words back. Over these three nights, a combination of algorithms and compatriots sat behind my blank page, and we lobbed words back and forth as though the blinking cursor were a tennis net.
Robin Sloan’s algorithmic writing companion works an awful lot like gmail’s new predictive sentence completion, just turned upside-down. It expands rather constrains a text’s possible futures. Whether this bodes a new era of writing, I cannot say. The experience rhymes with Oulipo, but reads more accessibly. If ease and mass distribution are the tailwind of 21st century change, perhaps the next decade will see the rise of a new sort of writing.
tl;dr Academics’ individual policing of disciplinary boundaries at the expense of intellectual merit does a disservice to our global research community, which is already structured to reinforce disciplinarity at every stage. We should work harder to encourage research misfits to offset this structural pull.
The academic game is stacked to reinforce old community practices. PhDs aren’t only about specialization, but about teaching you to think, act, write, and cite like the discipline you’ll soon join. Tenure is about proving to your peers you are like them. Publishing and winning grants are as much about goodness of fit as about quality of work.
This isn’t bad. One of science’s most important features is that it’s often cumulative or at least agglomerative, that scientists don’t start from scratch with every new project, but build on each other’s work to construct an edifice that often resembles progress. The scientific pipeline uses PhDs, tenure, journals, and grants as built-in funnels, ensuring everyone is squeezed snugly inside the pipes at every stage of their career. It’s a clever institutional trick to keep science cumulative.
But the funnels work too well. Or at least, there’s no equally entrenched clever institutional mechanism for building new pipes, for allowing the development of new academic communities that break the mold. Publishing in established journals that enforce their community boundaries is necessary for your career; most of the world’s scholarly grant programs are earmarked for and evaluated by specific academic communities. It’s easy to be disciplinary, and hard to be a misfit.
To be sure, this is a known problem. Patches abound. Universities set aside funds for “interdisciplinary research” or “underfunded areas”; postdoc positions, centers, and antidsciplinary journals exist to encourage exactly the sort of weird research I’m claiming has no little place in today’s university. These solutions are insufficient.
University or even external grant programs fostering “interdisciplinarity” for its own sake become mostly useless because of the laws of Goodhart & Campbell. They’re usually designed to bring disciplines together rather than to sidestep disciplinarity altogether, which while admirable, is a system that’s pretty easy to game, and often leads to awkward alliances of convenience.
Universities do a bit better in encouraging certain types of centers that, rather than being “interdisciplinary”, are focused on a specific goal, method, or topic that doesn’t align easily with the local department structure. A new pipe, to extend my earlier bad metaphor. The problems arise here because centers often lack the institutional benefits available to departments: they rely on soft money, don’t get kickback from grant overheads, don’t get money from cross-listed courses, and don’t get tenure lines. Antidisciplinary postdoc positions suffer a similar fate, allowing misfits to thrive for a year or so before having to go back on the job market to rinse & repeat.
In short, the overwhelming inertial force of academic institutions pulls towards disciplinarity despite frequent but half-assed or poorly-supported attempts to remedy the situation. Even when new disciplinary configurations break free of institutional inertia, presenting themselves as means to knowledge every bit as legitimate as traditional departments (chemistry, history, sociology, etc.), it can take decades for them to even be given the chance to fail.
It is perhaps unsurprising that the community which taught us about autopoiesis proved incapable of sustaining itself, though half a century on its influences are glaringly apparent and far-reaching across today’s research universities. I wonder if we reconfigured the organization of colleges and departments from scratch today, whether there would be more departments of environmental studies and fewer departments of [redacted] 1.
I bring this all up to raise awareness of the difficulty facing good work with no discernible home, and to advocate for some individual action which, though it won’t change the system overnight, will hopefully make the world a bit easier for those who deserve it.
It is this: relax the reflexive disciplinary boundary drawing, and foster programs or communities which celebrate misfits.I wrote a bit about this last year in the context of history and culturomics; historians clamored to show that culturomics was bad history, but culturomics never attempted to be good history—it attempted to be good culturomics. Though I’d argue it often failed at that as well, it should have been evaluated by its own criteria, not the criteria of some related but different discipline.
Some potential ways to move forward:
If you are reviewing for a journal or grant and the piece is great, but doesn’t quite fit, and you can’t think of a better home for it, push against the editor to let it in anyway.
If you’re a journal editor or grant program officer, be more flexible with submissions which don’t fit your mold but don’t have easy homes elsewhere.
If you control funds for research grants, earmark half your money for good work that lacks a home. Not “good work that lacks a home but still looks like the humanities”, or “good work that looks like economics but happens to involve a computer scientist and a biologist”, but truly homeless work. I realize this won’t happen, but if I’m advocating, I might as well advocate big!
If you are training graduate students, hiring faculty, or evaluating tenure cases, relax the boundary-drawing urge to say “her work is fascinating, but it’s not exactly our department.”
If you have administrative and financial power at a university, commit to supporting nondisciplinary centers and agendas with the creation of tenure lines, the allocation of course & indirect funds, and some of the security offered to departments.
Ultimately, we need clever systems to foster nondisciplinary thinking which are as robust as those systems that foster cumulative research. This problem is above my paygrade. In the meantime, though, we can at least avoid the urge to equate disciplinary fitness with intellectual quality.
You didn’t seriously expect me to name names, did you? ↩
Zoe LeBlanc asked how basic statistics lead to a meaningful historical argument. A good discussion followed, worth reading, but since I couldn’t fit my response into tweets, I hoped to add a bit to the thread here on the irregular. I’m addressing only one tiny corner of her question, in a way that is peculiar to my own still-forming approach to computational history; I hope it will be of some use to those starting out.
In brief, I argue that one good approach to computational history cycles between data summaries and focused hypothesis exploration, driven by historiographic knowledge, in service to finding and supporting historically interesting agendas. There’s a lot of good computational history that doesn’t do this, and a lot of bad computational history that does, but this may be a helpful rubric to follow.
In the spirit of Monty Python, the below video has absolutely nothing to do with the discussion at hand.
Zoe’s question gets at the heart of one of the two most prominent failures of computational history in 2017 1: the inability to go beyond descriptive statistics into historical argument. 2 I’ve written before on one of the many reasons for this inability, but that’s not the subject of this post. This post covers some good practices in getting from statistics to arguments.
Describing the Past
Historians, for the most part, aren’t experimentalists. 3 Our goals vary, but they often include telling stories about the past that haven’t been told, by employing newly-discovered evidence, connecting events that seemed unrelated, or revisiting an old narrative with a fresh perspective.
Facts alone usually don’t cut it. We don’t care what Jane ate for breakfast without a so what. Maybe her breakfast choices say something interesting about her socioeconomic status, or about food culture, or about how her eating habits informed the way she lived. Alongside a fact, we want why or how it came to be, what it means, or its role in some larger or future trend. A sufficiently big and surprising fact may be worthy of note on its own (“Jane ate orphans for breakfast” or “The government did indeed collude with a foreign power”), but such surprising revelations are rare, not the only purpose for historians, and still beg for context.
Computational history has gotten away with a lot of context-free presentations of fact. 4 That’s great! It’s a sign there’s a lot we didn’t know that contemporary methods & data make easily visible. 5 Here’s an example of one of mine, showing that, despite evidence to the contrary, there is a thriving community at the intersection of history and philosophy of science:
But, though we’re not running out of low-hanging fruit, the novelty of mere description is wearing thin. Knowing that a community exists between history & philosophy of science is not particularly interesting; knowing why it exists, what it changes, or whether it is less tenuous than any other disciplinary borderland are more interesting and historiographically recognizable questions.
Context is Key
So how to get from description to historical argument? Though there’s no right path, and the route depends on the type of claim, this post may offer some guidance. Before we get too far, though, a note:
Description has little meaning without context and comparison. The data may show that more people are eating apples for breakfast, but there’s a lot to unpack there before it can be meaningful, let alone relevant.
It may be, for example, that the general population is growing just as quickly as the number of people who eat apples. If that’s the case, does it matter that apple-eaters themselves don’t seem to be making up any larger percent of the population?
The answer for a historian is: of course it matters. If we were talking about casualties of war, or amount of cities in a country, rather than apples, a twofold increase in absolute value (rather than percentage of population) makes a huge difference. It’s more lives affected; it’s more infrastructure and resources for a growing nation.
But the nature of that difference changes when we know our subject of study matches population dynamics. If we’re looking at voting patterns across cities, and we notice population density correlates with party affiliation, we can use that as a launching point for so what. Perhaps sparser cities rely on fewer social services to run smoothly, leading the population to vote more conservative; perhaps past events pushed conservative families towards the outskirts; perhaps.
Without having a ground against which to contextualize our results, a base map like general population, the fact of which cities voted in which direction gives us little historical meat to chew on.
On the other hand, some surprising facts, when contextualized, leave us less surprised. A two-fold increase in apple eating across a decade is pretty surprising, until you realize it happened alongside a similar increase in population. The fact is suddenly less worthy of report by itself, though it may have implications for, say, the growth of the apple industry.
But Zoe asked about statistics, not counting, in finding meaning. I don’t want to divert this post into teaching stats, and nor do I want to assume statistical knowledge, so I’ll opt for an incredibly simple metric: ratio.
The illustration above shows an increase in both population and apple-eating, and eyeball estimates show them growing apace. If we divide the total population by the number of people eating apples, however, our story is complicated.
Though both population and apple-eating increase, in 1806 the population begins rising much more rapidly than the number of apple-eaters. 6It is in this statistically-derived difference that the historian may find something worth exploring and explaining further.
There are a many ways to compare and contextualize data, of which this is one. They aren’t worth enumerating, but the importance of contextualization is relevant to what comes next.
Question- and Data-Driven History
Computational historians like to talk about question-driven analysis. Computational history is best, we say, when it is led by a specific question or angle. The alternative is dumping a bunch of data into a statistics engine, describing it, and finding something weird, and saying “oh, this looks interesting.”
When push comes to shove, most would agree the above dichotomy is false. Historical questions don’t pop out of thin air, but from a continuously shifting relationship with the past. We read primary and secondary sources, do some data entry, do some analysis, do some more reading, and through it all build up a knowledge-base and a set of expectations about the past. We also by this point have a set of claims we don’t quite agree with, or some secondary sources with stories that feel wrong or incomplete.
This is where the computational history practice begins: with a firm grasp of the history and historiography of a period, and a set of assumptions, questions, and mild disagreements.
From here, if you’re reading this blog post, you’re likely in one of two camps:
You have a big dataset and don’t know what to do with it, or
You have a historiographic agenda (a point to prove, a question to answer, etc.) that you don’t know how to make computationally tractable.
We’ll begin with #1.
1. I have data. Now what?
Congratulations, you have data!
This is probably the thornier of the two positions, and the one more prone to results of mere description. You want to know how to turn your data into interesting history, but you may end up doing little more than enumerating the blades of grass on a field. To avoid that, you must begin down a process sometimes called scalable reading, or a special case of the hermeneutic circle.
You start, of course, with mere description. How many records are there? What are the values in each? Are there changes over time or place? Who is most central? Before you start quantifying the data, write down the answers you expect to these questions, with a bit of a causal explanation for each.
Now, barrage your dataset with visualizations and statistical tests to find out exactly what makes it up. See how the results align with the hypotheses you noted down. If you created the data yourself, one archival visit at a time, you won’t find a lot that surprises you. That’s alright. Be sure to take time to consider what’s missing from the dataset, due to archival lacunae, bias, etc.
If any results surprise you, dig into the data to try to understand why. If none do, think about claims from secondary sources–do any contradict the data? Align with it?
This is also a good point to bring in contextualization. If you’re looking at the number of people doing something over time, try to compare your dataset to population dynamics. If you’re looking at word usage, find a way to compare your data to base frequencies of that word in similar collections. If you’re looking at social networks, compare them to random networks to see if their average path length or degree distribution are surprising compared to networks of similar size. Every unexpected result is an opportunity for exploration.
Internal comparisons may also yield interesting points to pursue further, especially if you think your data are biased. Given a limited dataset of actors, their genders, their roles, and play titles, for example, you may not be able to make broad claims about which plays are more popular, but you could see how different roles are distributed across genders within the group.
Internal comparisons could also be temporal. Given a dataset of occupations over time with a particular city, if you compare those numbers to population changes over time, you could find the moments where population and occupation dynamics part ways, and focus on those instances. Why, suddenly, are there more grocers?
The above boils down into two possible points of further research: deviations from expectation, or deviations from internal consistency.
Deviations from expectation–your own or that of some notable secondary source–can be particularly question-provoking. “Why didn’t this meet expectations” quickly becomes “what is wrong or incomplete about this common historical narrative?” From here, it’s useful to dig down into the points of data that exemplify such deviations, and see if you can figure out why and how they break from expectations.
Deviations from internal consistency–that is, when comparisons within the data wind up showing different trends–lead to positive rather than negative questions. Instead of “why is this theory wrong?”, you may ask, “why are these groups different?” or “why does this trend cease to keep pace with population during these decades?” Here you are asking specific questions that require new or shifted theories, whereas with deviations from expectations, you begin by seeing where existing narratives fail.
It’s worth reiterating that, in both scenarios, questions are drawn from deviations from some underlying theory.
In deviations from expectation, the underlying theory is what you bring to your data; you assume the data ought to look one way, but it doesn’t. You are coming with an internal, if not explicit, quantitative model of how the data ought to look.
In deviations from internal consistency, that data’s descriptive statistics provide the underlying theory against which there may be deviations. Apple-eaters deviating in number from population growth is only interesting if, at most points, apple-eaters grow evenly alongside population. That is, you assume general statistics should be the same between groups or over time, and if they are not, it is worthy of explanation.
This an oversimplification, but a useful one. Undoubtedly, combinations of the two will arise: maybe you expect the differences between men and women in roles they play will be large, but it turns out they are small. This provides a deviation of both kinds, but no less legitimate for it. In this case, your recourse may be looking for other theatrical datasets to see if the gender dynamics play out the same across them, or if your data are somehow special and worthy of explanation outside the context of larger gender dynamics.
Which brings us, inexorably, to the cyclic process of computational history. Scalable reading. The hermeneutic circle. Whatever.
Point is, you’re at the point where some deviation or alignment seems worth explanation or exploration. You could stop here. You could present this trend, give a convincing causal just-so story of why it exists, and leave it at that. You will probably get published, since you’ve already gone farther than mere description, the trap of so much computational history.
But you shouldn’t stop here. You should take this opportunity to strengthen your story. Perhaps this is the point where you put your “traditional” historian’s cap back on, and go dust-diving for archival evidence to support your claims. I wouldn’t think less of you for it, but if you stop there, you’d only be reaping half the advantages of computational history.
In the example above, looking for other theatrical datasets to contextualize gender results in your own, hinted at the second half of the computational history research cycle: creating computationally tractable questions. Recall this section described the first half: making sense of data. Although I presented the two as separate, they productively feed on one another.
Once you’ve gone through your data to find how it aligns with your or others’ preconceived notions of the past, or how by its own internal deviations it presents interesting dilemmas, you have found yourself in the second half of the cycle. You have questions or theories you want to ask of data, but you do not yet have the data or the statistics to explore them.
This seems counter-intuitive. Why not just use the data or statistics already gathered, sometimes painstakingly over several years? Because if you use the same data & stats to both generate and answer questions, your evidence is circular. Specifically, you risk making a scientistic claim of what could easily be a spurious trend. It may simply be that, by random chance, the breakfast record-keeper lost a bunch of records from 1806-1810, thus causing the decline seen in the population ratio.
To convincingly make arguments from a historical data description, you must back it up using triangulation–approaching the problem from many angles. That triangulation may be computational, archival, archaeological, or however else you’re used to historying, but we’ll focus here on computational.
2. Computationally Tractable Questions
So you’ve got a historiographic agenda, and now you want to make it computationally tractable. Good luck! This is the hard part.
“Sparse areas relied less on social services.” “The infrastructure of science became less dependent on specific individuals over the course of the 17th century.” “T-Rex was a remarkable climber.” “Who benefited most from the power vacuum left by the assassination?” These hypotheses and questions do not, on their own, lend themselves to quantitative analysis.
Chief among the common difficulties of turning a historiographic agenda into a computationally tractable hypothesis is a lack of familiarity of computational methods. If you don’t know what a computer is good at, you can’t form an experiment to use one.
I said that history isn’t experimental, but I lied. Archival research can be an experiment if you go in with a hypothesis and a pre-conceived approach or set of criteria that would confirm it. Computational history, at this stage, is also experimental. It often works a little like this (but it may not): 7
Set your agenda. Start with a hypothesis, historiographic framework, or question. For example, “The infrastructure of science became less dependent on specific individuals over the course of the 17th century.” (that question’s mine, don’t steal it.)
Find testable hypotheses. Break it into many smaller statements that can be confirmed, denied, or quantitatively assessed. “If science depends less on specific individuals over the 17th century, the distribution of names mentioned in scholarly correspondence will flatten out. That is, in 1600 a few people will be mentioned frequently, whereas most will be mentioned infrequently; in 1700, the frequency of name mentions will be more evenly distributed across correspondence.” Or “If science depends less on specific individuals over the 17th century, when an important person died, it affected the scholarly network less in 1700 than in 1600.” (Notice in these two examples how finding evidence for the littler statements will corroborate the bigger hypothesis, and vice-versa.)
Match hypotheses to approaches. Come up with methodological proxies, datasets, and/or statistical tests that could corroborate the littler statements. Be careful, thorough, and specific. For example, “In a network of 17th-century letter writers, if the removal of a central figure in 1600 decreases the average path length of the network less than the the removal of a central figure in 1700, central figures likely played less important structural roles. This will be most convincing if the effects of node removal smoothly decreases across the century.” (This is the step in which you need to come to the table with knowledge of different computational methods and what they do.)
Specify proxies. List specific analytic approaches needed for the promising tests, and the data required to do them. For example, you need a list of senders and recipients of scholarly letters, roughly evenly distributed across time between 1600 and 1700, and densely-packed enough to perform network analysis. There could be a few different analytic approaches, including removing highly-central nodes and re-calculating average path length; employing measurements of attack tolerance; etc. Probably worth testing them all and seeing if each result conforms to the pre-existing theory.
Find data. Find pre-existing datasets that will fit your proxies, or estimate how long it will take to gather enough data yourself to reasonably approach your hypotheses. Opt for data that will work for as many approaches as possible. You may find some data that will suggest new hypotheses, and you’ll iterate back and forth between steps #3-#5 a few times.
Match experimental results to hypotheses. Here’s the fun part, you get to see how many of your predictions matched your results. Hopefully a bunch, but even if they didn’t, it’s an excuse to figure out why, and start the process anew. You can also start exploring the additional datasets to help you develop new questions. The astute may have noticed, this step brings us back to the first half of computational historiography: exploring data and seeing what you can find. 8
From here, it may be worthwhile to cycle back to the data exploration stage, then back here to computationally tractable hypothesis exploration, and so on ad infinitum.
By now, making meaning out of data probably feels impossible. I’m sorry. The process is much more fluid and intertwined than is easily unpacked in a blog post. The back-and-forth can take hours, days, months, or years.
But the important thing is, after you’ve gone back-and-forth a few times, you should have a combination of quantitative, archival, theoretical, and secondary support for a solidly historical argument.
Contexts of Discovery and Justification
Early 20th-century philosophy of science cared a lot about the distinction between the contexts of discovery and justification. Violently shortened, the context of discovery is how you reached your conclusion, and the context of justification is how you argue your point, regardless of the process that got you there.
I bring this up as a reminder that the two can be distinct. By the 1990s, quantitative historians who wanted to remain legible to their non-quantitative colleagues often saved the data analysis for an appendix, and even there the focus was on the actual experiments, not the long process of coming up with tests, re-testing, collecting more data, and so on.
The result of this cyclical computational historiography need not be (and rarely is, and perhaps can never be) a description of the process that led you to the evidence supporting your argument. While it’s a good idea to be clear about where your methods led you astray, the most legible result to historians will necessarily involve a narrative reconfiguration.
Causality and Truth
Small final notes on two big topics.
First, Causality. This approach won’t get you there. It’s hard to disentangle causality from correlation, but more importantly in this context, it’s hard to choose between competing causal explanations. The above process can lead you to plausible and corroborated hypotheses, but it cannot prove anything.
Consider this: “My hypothesis about apples predicts these 10 testable claims.” You test each claim, and each test agrees with your predictions. It’s a success, but a soft one; you’ve shown your hypothesis to be plausible given the evidence, but not inevitable. A dozen other equally sensible hypotheses could have produced the same 10 testable claims. You did not prove those hypotheses wrong, you just chose one model that happened to work. 9
Even if no alternate hypothesis presents itself, and all of your tests agree with your hypothesis, you still do not have causal proof. It may be that the proxies you chose to test your claims are bad ones, or incomplete, or your method has unseen holes. Causality is tricky, and in the humanities, proof especially so.
Which leads us to the next point: Truth. Even if somehow you devise the perfect process to find proof of a causal hypothesis, the causal description does not constitute capital-T Truth. There are many truths, coming from many perspectives, about the past, and they don’t need to agree with each other. Historians care not just about what happened, but how and why, and those hows and whys are driven by people. Messy, inconsistent people who believe many conflicting things within the span of a moment. When it comes to questions of society, even the most scientistic of scholars must come to terms with uncertainty and conflict, which after all are more causally central to the story of history than most clever narratives we might tell.
The other most prominent failure in computational history is our tendency to group things into finite discrete categories; in this case, a two-part list of failures. ↩
With some notable exceptions. Some historians simulate the past, others perform experiments on rates of material decay, or on the chemical composition of inks. It’s a big world out there. ↩
When I say fact, assume I add all the relevant post-modernist caveats of the contingency of objectivity etc. etc. Really I mean “matters of history that the volume of available evidence make difficult to dispute.” ↩
Ted Underwood and I have both talked about the exciting promise of incredibly low-hanging fruit in new approaches. ↩
OK in retrospect I should have used a more historically relevant example – I wasn’t expecting to push this example so far. ↩
If this seems overly scientistic, worry not! Experimental science is often defined by its recourse to rote procedure, which means pretty much any procedural explanation of research will resemble experimental science. There are many ways one can go about scalable reading / triangulation of computational historiography, not just the procedural steps #1-#7 above, but this is one of the easier approaches to explain. Soft falsification and hypothesis testing are plausible angles into computational history, but not necessary ones. ↩
A brief addendum to steps #6-#7: although I’d argue Null-Hypothesis Significance Testing or population-based statistical inferences may not be relevant to historiography, especially when its based in triangulation, they may be useful in certain cases. Without delving too deeply into the weeds, they can help you figure out the extent to which the effect you see may just be noise, not indicative of any particular trend. Statistical effect sizes also may be of use, helping you see whether the magnitude of your finding is big enough to have any appreciable role in the historical narrative. ↩
Shawn Graham and I wrote about this in relation to archaeology and simulation here, on the subject of underdetermination and abduction ↩
I’m collecting programming & methodological textbooks for humanists as part of a reflective study on DH, but figured it’d also be useful for those interested in teaching themselves to code, or teachers who need a textbook for their class. Though I haven’t read them all yet, I’ve organized them into very imperfect categories and provided (hopefully) some useful comments. Short coding exercises, books that assume some pre-existing knowledge of coding, and theoretical introductions are not listed here.
An open access introduction to programming in Python. Mostly web scraping and basic text analysis. Probably best to look to newer resources, due to the date. Although it’s aimed at historians, the methods are broadly useful to all text-based DH.
The Programming Historian, 2nd edition (ongoing). Afanador-Llach, Maria José, Antonio Rojas Castro, Adam Crymble, Víctor Gayol, Fred Gibbs, Caleb McDaniel, Ian Milligan, Amanda Visconti, and Jeri Wieringa, eds.
Constantly updating lessons, ostensibly aimed at historians, but useful to all of DH. Includes introductions to web development, text analysis, GIS, network analysis, etc. in multiple programming languages. Not a monograph, and no real order.
A series of lessons in in R, still under development with quite a few chapters missing. Probably the only programming book aimed at historians that actually focuses on historical questions and approaches.
About natural language processing, but not an introduction to coding. Instead, an introduction to the methodological approaches of natural language processing specific to historical texts (OCR, spelling normalization, choosing a corpus, part of speech tagging, etc.). Teaches a variety of tools and techniques.
Step-by-step introduction to learning R, specifically focused on literary text analysis, both for close and distant reading, with primers on the statistical approaches being used. Includes approaches to, e.g., word frequency distribution, lexical variety, classification, and topic modeling.
A growing, interactive textbook similar in scope to Jockers’ book (close & distant reading in literary analysis), but in Python rather than R. Heavily focused on the code itself, and includes such methods as topic modeling and sentiment analysis.
Many of the above books are focused on literary or historical analysis only in name, but are really useful for everyone in DH. The below are similar in scope, but don’t aim themselves at one particular group.
A Mathematica notebook (thus, not accessible unless you have an appropriate reader) teaching text, image, and geo-based analysis. Mathematica itself is an expensive piece of software without an institutional license, so this resource may be inaccessible to many learners. [NOTE: Arno Bosse wrote positive feedback on this textbook in a comment below.]
An introduction to the fundamentals of programming specifically for arts and humanities, languages Python and Processing, that goes through statistics, text, sound, animation, images, and so forth. Much more expansive than many other options listed here, but not as focused on needs of text analysis (which is probably a good thing).
A brief textbook with exercises and explanatory notes specific to text analysis for the study of literature and history. Not an introduction to programming, but covers some of the mathematical and methodological concepts used in these sorts of studies.
Interactive (Jupyter) notebooks teaching Python for statistical text analysis. Quite thorough, teaching methodological reasoning and examples, including quizzes and other lesson helpers, going from basic tokenization up through unsupervised learning, object-oriented programming, etc.
Teaches the start-to-finish skills needed to write code to work with data, from command line to markdown to github to R and ggplot2. Not aimed at humanists, but aimed at those with no prior technical experience.
A series of videos and github code snippets/examples for Vierthaler’s DH class at Leiden University (see the syllabus). General introduction to coding principles, followed by specific needs of digital humanists, like text analysis and web scraping.
Not an introduction to coding of any sort, but a solid intro to statistics geared at the sort of stats needed by humanists (archaeologists, literary theorists, philosophers, historians, etc.). Reading this should give you a solid foundation of statistical methods (sampling, confidence intervals, bias, etc.)
A practical intro to machine learning in Weka, Java-based software for data mining and modeling. Not aimed at humanists, but legible to the dedicated amateur. It really gets into the weeds of how machine learning works.
Introduction to text mining aimed at data scientists in the statistical programming language R. Some knowledge of R is expected; the authors suggest using R for Data Science (2016) by Grolemund & Wickham to get up to speed. This is for those interested in current data science coding best-practices, though it does not get as in-depth as some other texts focused on literary text analysis. Good as a solid base to learn from.
Fantastic introduction to simple and advanced mathematics written by and for humanists. Approachable, prose-heavy, and grounded in humanities examples. Covers topics like algebra, calculus, statistics, differential equations. Definitely a foundations text, not an applications one.
A “hands-on introduction to the principles and practice of looking at and presenting data using R and ggplot” that introduce readers “to both the ideas and the methods of data visualization in a comprehensible and reproducible way”. Incredibly thorough, painstakingly annotated, and though not aimed directly at humanists, is close enough in scope to be more valuable than a general introduction to data science.
Introduction to the skills, tools, and setup required to create interactive web visualizations, briefly covering everything from HTML to D3.js. Not aimed at the humanities, but aimed at those with no prior experience with code.
Full-length introduction to Drupal, a web platform that allows you to build “environments for gathering, annotating, arranging, and presenting their research and supporting materials” on the web. Useful for those interested in getting started with the creation of web-based projects but who don’t want to dive head-first into from-scratch web development.
French introduction to LaTeX for humanists. LaTeX is the primary means scientists use to prepare documents (instead of MS Word or similar software), which allows for more sustainable, robust, and easily typeset scholarly publications. If humanists wish to publish in natural (or some social) science journals, this is an important skill.
This first post covers the basic landscape of submissions to next year’s conference: how many submissions there are, what they’re about, and so forth.
The analysis is opinionated and sprinkled with my own preliminary interpretations. If you disagree with something or want to see more, comment below, and I’ll try to address it in the inevitable follow-up. If you want the data, too bad—since it’s only available to reviewers, there’s an expectation of privacy. If you are sad for political or other reasons and live near me, I will bring you chocolate; if you are sad and do not live near me, you should move to Pittsburgh. We have chocolate.
Submission Numbers & Types
I’ll be honest, I was surprised by this year’s submission numbers. This will be the first ADHO conference held in North America since it was held in Nebraska in 2013, and I expected an influx of submissions from people who haven’t been able to travel off the continent for interim events. I expected the biggest submission pool yet.
What we see, instead, are fewer submissions than Kraków last year: 608 in all. The low number of submissions to Sydney was expected, given it was the first conference held outside Europe or North America, but this year’s numbers suggests the DH Hype Machine might be cooling somewhat, after five years of rapid growth.
We need some more years and some more DH-Hype-Machine Indicators to be sure, but I reckon things are slowing down.
The conference offers five submission tracks: Long Paper, Short Paper, Poster, Panel, and (new this year) Virtual Short Paper. The distribution is pretty consistent with previous years, with the only deviation being in Sydney in 2015. Apparently Australians don’t like short papers or posters?
I’ll be interested to see how the “Virtual Short Paper” works out. Since authors need to decide on this format before submitting, it doesn’t allow the flexibility of seeing if funding will become available over the course of the year. Still, it’s a step in the right direction, and I hope it succeeds.
More of the same! If nothing else, we get points for consistency.
Same as it ever was, nearly half of all submissions are by a single author. I don’t know if that’s because humanists need to justify their presentations to hiring and tenure committees who only respect single authorship, or if we’re just used to working alone. A full 80% of submissions have three or fewer authors, suggesting large teams are still not the norm, or that we’re not crediting all of the labor that goes into DH projects with co-authorships. [Post-publication note: See Adam Crymble’s comment, below, for important context]
Language, Topic, & Discipline
Authors choose from several possible submission languages. This year, 557 submissions were received in English, 40 in French, 7 in Spanish, 3 in Italian, and 1 in German. That’s the easy part.
The Powers That Be decided to make my life harder by changing up the categories authors can choose from for 2017. Thanks, Diane, ADHO, or whoever decided this.
In previous years, authors chose any number of keywords from a controlled vocabulary of about 100 possible topics that applied to their submission. Among other purposes, it helped match authors with reviewers. The potential topic list was relatively static for many years, allowing me to analyze the change in interest in topics over time.
This year, they added, removed, and consolidated a bunch of topics, as well as divided the controlled vocabulary into “Topics” (like metadata, morphology, and machine translation) and “Disciplines” (like disability studies, archaeology, and law). This is ultimately good for the conference, but makes it difficult for me to compare this against earlier years, so I’m holding off on that until another post.
But I’m not bitter.
This year’s options are at the bottom of this post in the appendix. Words in red were added or modified this year, and the last list are topics that used to exist, but no longer do.
So let’s take a look at this year’s breakdown by discipline.
Huh. “Computer science”—a topic which last year did not exist—represents nearly a third of submissions. I’m not sure how much this topic actually means anything. My guess is the majority of people using it are simply signifying the “digital” part of their “Digital Humanities” project, since the topic “Programming”—which existed in previous years but not this year—used to only connect to ~6% of submissions.
“Literary studies” represents 30% of all submissions, more than any previous year (usually around 20%), whereas “historical studies” has stayed stable with previous years, at around 20% of submissions. These two groups, however, can be pretty variable year-to-year, and I’m beginning to suspect that their use by authors is not consistent enough to take as meaningful. More on that in a later post.
That said, DH is clearly driven by lit, history, and library/information science. L/IS is a new and welcome category this year; I’ve always suspected that DHers are as much from L/IS as the humanities, and this lends evidence in that direction. Importantly, it also makes apparent a dearth in our disciplinary genealogies: when we trace the history of DH, we talk about the history of humanities computing, the history of the humanities, the history of computing, but rarely the history of L/IS.
I’ll have a more detailed breakdown later, but there were some surprises in my first impressions. “Film and Media Studies” is way up compared to previous years, as are other non-textual disciplines, which refreshingly shows (I hope) the rise of non-textual sources in DH. Finally. Gender studies and other identity- or intersectional-oriented submissions also seem to be on the rise (this may be an indication of US academic interests; we’ll need another few years to be sure).
If we now look at Topic choices (rather than Discipline choices, above), we see similar trends.
Again, these are just first impressions, there’ll be more soon. Text is still the bread and butter of DH, but we see more non-textual methods being used than ever. Some of the old favorites of DH, like authorship attribution, are staying pretty steady against previous years, whereas others, like XML and encoding, seem to be decreasing in interest year after year.
One last note on Topics and Disciplines. There’s a list of discontinued topics at the bottom of the appendix. Most of them have simply been consolidated into other categories, however one set is conspicuously absent: meta-discussions of DH. There are no longer categories for DH’s history, theory, how it’s taught, or its institutional support. These were pretty popular categories in previous years, and I’m not certain why they no longer exist. Perusing the submissions, there are certainly several that fall into these categories.
For Part 2 of this analysis, look forward to more thoughts on the topical breakdown of conference submissions; preliminary geographic and gender analysis of authors; and comparisons with previous years. After that, who knows? I take requests in the comments, but anyone who requests “Free Bird” is banned for life.
Appendix: Controlled Vocabulary
Words in red were added or modified this year, and the last list are topics that used to exist, but no longer do.
agent modeling and simulation
archives, repositories, sustainability and preservation
audio, video, multimedia
authorship attribution / authority
bibliographic methods / textual studies
concording and indexing
copyright, licensing, and Open Access
corpora and corpus activities
cultural and/or institutional infrastructure
data mining / text mining
data modeling and architecture including hypothesis-driven modeling
The below is the transcript from my October 29 keynote presented to the Creativity and The City 1600-2000 conference in Amsterdam, titled “Punched-Card Humanities”. I survey historical approaches to quantitative history, how they relate to the nomothetic/idiographic divide, and discuss some lessons we can learn from past successes and failures. For ≈200 relevant references, see this Zotero folder.
I’m here to talk about Digital History, and what we can learn from its quantitative antecedents. If yesterday’s keynote was framing our mutual interest in the creative city, I hope mine will help frame our discussions around the bottom half of the poster; the eHumanities perspective.
Specifically, I’ve been delighted to see at this conference, we have a rich interplay between familiar historiographic and cultural approaches, and digital or eHumanities methods, all being brought to bear on the creative city. I want to take a moment to talk about where these two approaches meet.
Yesterday’s wonderful keynote brought up the complicated goal of using new digital methods to explore the creative city, without reducing the city to reductive indices. Are we living up to that goal? I hope a historical take on this question might help us move in this direction, that by learning from those historiographic moments when formal methods failed, we can do better this time.
Digital History is different, we’re told. “New”. Many of us know historians who used computers in the 1960s, for things like demography or cliometrics, but what we do today is a different beast.
Commenting on these early punched-card historians, in 1999, Ed Ayers wrote, quote, “the first computer revolution largely failed.” The failure, Ayers, claimed, was in part due to their statistical machinery not being up to the task of representing the nuances of human experience.
We see this rhetoric of newness or novelty crop up all the time. It cropped up a lot in pioneering digital history essays by Roy Rosenzweig and Dan Cohen in the 90s and 2000s, and we even see a touch of it, though tempered, in this conference’s theme.
In yesterday’s final discussion on uncertainty, Dorit Raines reminded us the difference between quantitative history in the 70s and today’s Digital History is that today’s approaches broaden our sources, whereas early approaches narrowed them.
To say “we’re at a unique historical moment” is something common to pretty much everyone, everywhere, forever. And it’s always a little bit true, right?
It’s true that every historical moment is unique. Unprecedented. Digital History, with its unique combination of public humanities, media-rich interests, sophisticated machinery, and quantitative approaches, is pretty novel.
But as the saying goes, history never repeats itself, but it rhymes. Each thread making up Digital History has a long past, and a lot of the arguments for or against it have been made many times before. Novelty is a convenient illusion that helps us get funding.
Not coincidentally, it’s this tension I’ll highlight today: between revolution and evolution, between breaks and continuities, and between the historians who care more about what makes a moment unique, and those who care more about what connects humanity together.
To be clear, I’m operating on two levels here: the narrative and the metanarrative. The narrative is that the history of digital history is one of continuities and fractures; the metanarrative is that this very tension between uniqueness and self-similarity is what swings the pendulum between quantitative and qualitative historians.
Now, my claim that debates over continuity and discontinuity are a primary driver of the quantitative/qualitative divide comes a bit out of left field — I know — so let me back up a few hundred years and explain.
Francis Bacon wrote that knowledge would be better understood if it were collected into orderly tables. His plea extended, of course, to historical knowledge, and inspired renewed interest in a genre already over a thousand years old: tabular chronology.
These chronologies were world histories, aligning the pasts of several regions which each reconned the passage of time differently.
Isaac Newton inherited this tradition, and dabbled throughout his life in establishing a more accurate universal chronology, aligning Biblical history with Greek legends and Egyptian pharoahs.
Newton brought to history the same mind he brought to everything else: one of stars and calculations. Like his peers, Newton relied on historical accounts of astronomical observations to align simultaneous events across thousands of miles. Kepler and Scaliger, among others, also partook in this “scientific history”.
Where Newton departed from his contemporaries, however, was in his use of statistics for sorting out history. In the late 1500s, the average or arithmetic mean was popularized by astronomers as a way of smoothing out noisy measurements. Newton co-opted this method to help him estimate the length of royal reigns, and thus the ages of various dynasties and kingdoms.
On average, Newton figured, a king’s reign lasted 18-20 years. If the history books record 5 kings, that means the dynasty lasted between 90 and 100 years.
Newton was among the first to apply averages to fill in chronologies, though not the first to apply them to human activities. By the late 1600s, demographic statistics of contemporary life — of births, burials and the like — were becoming common. They were ways of revealing divinely ordered regularities.
Incidentally, this is an early example of our illustrious tradition of uncritically appropriating methods from the natural sciences. See? We’ve all done it, even Newton!
Joking aside, this is an important point: statistical averages represented divine regularities. Human statistics began as a means to uncover universal truths, and they continue to be employed in that manner. More on that later, though.
Newton’s method didn’t quite pass muster, and skepticism grew rapidly on the whole prospect of mathematical history.
Criticizing Newton in 1782, for example, Samuel Musgrave argued, in part, that there are no discernible universal laws of history operating in parallel to the universal laws of nature. Nature can be mathematized; people cannot.
Not everyone agreed. Francesco Algarotti passionately argued that Newton’s calculation of average reigns, the application of math to history, was one of his greatest achievements. Even Voltaire tried Newton’s method, aligning a Chinese chronology with Western dates using average length of reigns.
Which brings us to the earlier continuity/discontinuity point: quantitative history stirs debate in part because it draws together two activities Immanuel Kant sets in opposition: the tendency to generalize, and the tendency to specify.
The tendency to generalize, later dubbed Nomothetic, often describes the sciences: extrapolating general laws from individual observations. Examples include the laws of gravity, the theory of evolution by natural selection, and so forth.
The tendency to specify, later dubbed Idiographic, describes, mostly, the humanities: understanding specific, contingent events in their own context and with awareness of subjective experiences. This could manifest as a microhistory of one parish in the French Revolution, a critical reading of Frankenstein focused on gender dynamics, and so forth.
These two approaches aren’t mutually exclusive, and they frequently come in contact around scholarship of the past. Paleontologists, for example, apply general laws of biology and geology to tell the specific story of prehistoric life on Earth. Astronomers, similarly, combine natural laws and specific observations to trace to origins of our universe.
Historians have, with cyclically recurring intensity, engaged in similar efforts. One recent nomothetic example is that of cliodynamics: the practitioners use data and simulations to discern generalities such as why nations fail or what causes war. Recent idiographic historians associate more with the cultural and theoretical turns in historiography, often focusing on microhistories or the subjective experiences of historical actors.
Both tend to meet around quantitative history, but the conversation began well before the urge to quantify. They often fruitfully align and improve one another when working in concert; for example when the historian cites a common historical pattern in order to highlight and contextualize an event which deviates from it.
But more often, nomothetic and idiographic historians find themselves at odds. Newton extrapolated “laws” for the length of kings, and was criticized for thinking mathematics had any place in the domain of the uniquely human. Newton’s contemporaries used human statistics to argue for divine regularities, and this was eventually criticized as encroaching on human agency, free will, and the uniqueness of subjective experience.
I’ll highlight some moments in this debate, focusing on English-speaking historians, and will conclude with what we today might learn from foibles of the quantitative historians who came before.
Let me reiterate, though, that quantitative is not nomothetic history, but they invite each other, so I shouldn’t be ahistorical by dividing them.
Take Henry Buckle, who in 1857 tried to bridge the two-culture divide posed by C.P. Snow a century later. He wanted to use statistics to find general laws of human progress, and apply those generalizations to the histories of specific nations.
Buckle was well-aware of historiography’s place between nomothetic and idiographic cultures, writing: “it is the business of the historian to mediate between these two parties, and reconcile their hostile pretensions by showing the point at which their respective studies ought to coalesce.”
In direct response, James Froud wrote that there can be no science of history. The whole idea of Science and History being related was nonsensical, like talking about the colour of sound. They simply do not connect.
This was a small exchange in a much larger Victorian debate pitting narrative history against a growing interest in scientific history. The latter rose on the coattails of growing popular interest in science, much like our debates today align with broader discussions around data science, computation, and the visible economic successes of startup culture.
This is, by the way, contemporaneous with something yesterday’s keynote highlighted: the 19th century drive to establish ‘urban laws’.
By now, we begin seeing historians leveraging public trust in scientific methods as a means for political control and pushing agendas. This happens in concert with the rise of punched cards and, eventually, computational history. Perhaps the best example of this historical moment comes from the American Census in the late 19th century.
Briefly, a group of 19th century American historians, journalists, and census chiefs used statistics, historical atlases, and the machinery of the census bureau to publicly argue for the disintegration of the U.S. Western Frontier in the late 19th century.
These moves were, in part, made to consolidate power in the American West and wrestle control from the native populations who still lived there. They accomplished this, in part, by publishing popular atlases showing that the western frontier was so fractured that it was difficult to maintain and defend. 1
The argument, it turns out, was pretty compelling.
Part of what drove the statistical power and scientific legitimacy of these arguments was the new method, in 1890, of entering census data on punched cards and processing them in tabulating machines. The mechanism itself was wildly successful, and the inventor’s company wound up merging with a few others to become IBM. As was true of punched-card humanities projects through the time of Father Roberto Busa, this work was largely driven by women.
It’s worth pausing to remember that the history of punch card computing is also a history of the consolidation of government power. Seeing like a computer was, for decades, seeing like a state. And how we see influences what we see, what we care about, how we think.
Recall the Ed Ayers quote I mentioned at the beginning of his talk. He said the statistical machinery of early quantitative historians could not represent the nuance of historical experience. That doesn’t just mean the math they used; it means the actual machinery involved.
See, one of the truly groundbreaking punch card technologies at the turn of the century was the card sorter. Each card could represent a person, or household, or whatever else, which is sort of legible one-at-a-time, but unmanageable in giant stacks.
Now, this is still well before “computers”, but machines were being developed which could sort these cards into one of twelve pockets based on which holes were punched. So, for example, if you had cards punched for people’s age, you could sort the stacks into 10 different pockets to break them up by age groups: 0-9, 10-19, 20-29, and so forth.
This turned out to be amazing for eyeball estimates. If your 20-29 pocket was twice as full as your 10-19 pocket after all the cards were sorted, you had a pretty good idea of the age distribution.
Over the next 50 years, this convenience would shape the social sciences. Consider demographics or marketing. Both developed in the shadow of punch cards, and both relied heavily on what’s called “segmentation”, the breaking of society into discrete categories based on easily punched attributes. Age ranges, racial background, etc. These would be used to, among other things, determine who was interested in what products.
They’d eventually use statistics on these segments to inform marketing strategies.
But, if you look at the statistical tests that already existed at the time, these segmentations weren’t always the best way to break up the data. For example, age flows smoothly between 0 and 100; you could easily contrive a statistical test to show that, as a person ages, she’s more likely to buy one product over another, over a set of smooth functions.
That’s not how it worked though. Age was, and often still is, chunked up into ten or so distinct ranges, and those segments were each analyzed individually, as though they were as distinct from one another as dogs and cats. That is, 0-9 is as related to 10-19 as it is to 80-89.
What we see here is the deep influence of technological affordances on scholarly practice, and it’s an issue we still face today, though in different form.
As historians began using punch cards and social statistics, they inherited, or appropriated, a structure developed for bureaucratic government processing, and were rightly soon criticized for its dehumanizing qualities.
Unsurprisingly, given this backdrop, historians in the first few decades of the 20th century often shied away from or rejected quantification.
The next wave of quantitative historians, who reached their height in the 1930s, approached the problem with more subtlety than the previous generations in the 1890s and 1860s.
Charles Beard’s famous Economic Interpretation of the Constitution of the United States used economic and demographic stats to argue that the US Constitution was economically motivated. Beard, however, did grasp the fundamental idiographic critique of quantitative history, claiming that history was, quote:
“beyond the reach of mathematics — which cannot assign meaningful values to the imponderables, immeasurables, and contingencies of history.”
The other frequent critique of quantitative history, still heard, is that it uncritically appropriates methods from stats and the sciences.
This also wasn’t entirely true. The slide behind me shows famed statistician Karl Pearson’s attempt to replicate the math of Isaac Newton that we saw earlier using more sophisticated techniques.
By the 1940s, Americans with graduate training in statistics like Ernest Rubin were actively engaging historians in their own journals, discussing how to carefully apply statistics to historical research.
On the other side of the channel, the French Annales historians were advocating longue durée history; a move away from biographies to prosopographies, from events to structures. In its own way, this was another historiography teetering on the edge between the nomothetic and idiographic, an approach that sought to uncover the rhymes of history.
Interest in quantitative approaches surged again in the late 1950s, led by a new wave of Annales historians like Fernand Braudel and American quantitative manifestos like those by Benson, Conrad, and Meyer.
William Aydolette went so far as to point out that all historians implicitly quantify, when they use words like “many”, “average”, “representative”, or “growing” – and the question wasn’t can there be quantitative history, but when should formal quantitative methods be utilized?
By 1968, George Murphy, seeing the swell of interest, asked a very familiar question: why now? He asked why the 1960s were different from the 1860s or 1930s, why were they, in that historical moment, able to finally do it right? His answer was that it wasn’t just the new technologies, the huge datasets, the innovative methods: it was the zeitgeist. The 1960s was the right era for computational history, because it was the era of computation.
By the early 70s, there was a historian using a computer in every major history department. Quantitative history had finally grown into itself.
Of course, in retrospect, Murphy was wrong. Once the pendulum swung too far towards scientific history, theoretical objections began pushing it the other way.
In Poverty of Historicism, Popper rejected scientific history, but mostly as a means to reject historicism outright. Popper’s arguments represent an attack from outside the historiographic tradition, but one that eventually had significant purchase even among historians, as an indication of the failure of nomothetic approaches to culture. It is, to an extent, a return to Musgrave’s critique of Isaac Newton.
At the same time, we see growing criticism from historians themselves. Arthur Schlesinger famously wrote that “important questions are important precisely because they are not susceptible to quantitative answers.”
There was a converging consensus among English-speaking historians, as in the early 20th century, that quantification erased the essence of the humanities, that it smoothed over the very inequalities and historical contingencies we needed to highlight.
Jacques Barzun summed it up well, if scathingly, saying history ought to free us from the bonds of the machine, not feed us into it.
The skeptics prevailed, and the pendulum swung the other way. The post-structural, cultural, and literary-critical turns in historiography pivoted away from quantification and computation. The final nail was probably Fogel and Engerman’s 1974 Time on the Cross, which reduced the Atlantic slave-trade to economic figures, and didn’t exactly treat the subject with nuance and care.
The cliometricians, demographers, and quantitative historians didn’t disappear after the cultural turn, but their numbers shrunk, and they tended to find themselves in social science departments, or fled here to Europe, where social and economic historians were faring better.
Which brings us, 40 years on, to the middle of a new wave of quantitative or “formal method” history. Ed Ayers, like George Murphy before him, wrote, essentially, this time it’s different.
And he’s right, to a point. Many here today draw their roots not to the cliometricians, but to the very cultural historians who rejected quantification in the first place. Ours is a digital history steeped in the the values of the cultural turn, that respects social justice and seeks to use our approaches to shine a light on the underrepresented and the historically contingent.
But that doesn’t stop a new wave of critiques that, if not repeating old arguments, certainly rhymes. Take Johanna Drucker’s recent call to rebrand data as capta, because when we treat observations objectively as if it were the same as the phenomena observed, we collapse the critical distance between the world and our interpretation of it. And interpretation, Drucker contends, is the foundation on which humanistic knowledge is based.
Which is all to say, every swing of the pendulum between idiographic and nomothetic history was situated in its own historical moment. It’s not a clock’s pendulum, but Foucault’s pendulum, with each swing’s apex ending up slightly off from the last. The issues of chronology and astronomy are different from those of eugenics and manifest destiny, which are themselves different from the capitalist and dehumanizing tendencies of 1950s mainframes.
But they all rhyme. Quantitative history has failed many times, for many reasons, but there are a few threads that bind them which we can learn from — or, at least, a few recurring mistakes we can recognize in ourselves and try to avoid going forward.
We won’t, I suspect, stop the pendulum’s inevitable about-face, but at least we can continue our work with caution, respect, and care.
The lesson I’d like to highlight may be summed up in one question, asked by Humpty Dumpty to Alice: which is to be master?
Over several hundred years of quantitative history, the advice of proponents and critics alike tends to align with this question. Indeed in 1956, R.G. Collingwood wrote specifically “statistical research is for the historian a good servant but a bad master,” referring to the fact that statistical historical patterns mean nothing without historical context.
Schlesinger, the guy who I mentioned earlier who said historical questions are interesting precisely because they can’t be quantified, later acknowledged that while quantitative methods can be useful, they’ll lead historians astray. Instead of tackling good questions, he said, historians will tackle easily quantifiable ones — and Schlesinger was uncomfortable by the tail wagging the dog.
I’ve found many ways in which historians have accidentally given over agency to their methods and machines over the years, but these five, I think, are the most relevant to our current moment.
Unfortunately since we running out of time, you’ll just have to trust me that these are historically recurring.
Number 1 is the uncareful appropriation of statistical methods for historical uses. It controls us precisely because it offers us a black box whose output we don’t truly understand.
A common example I see these days is in network visualizations. People visualize nodes and edges using what are called force-directed layouts in Gephi, but they don’t exactly understand what those layouts mean. As these layouts were designed, physical proximity of nodes are not meant to represent relatedness, yet I’ve seen historians interpret two neighboring nodes as being related because of their visual adjacency.
This is bad. It’s false. But because we don’t quite understand what’s happening, we get lured by the black box into nonsensical interpretations.
The second way methods drive us is in our reliance on methodological imports. That is, we take the time to open the black box, but we only use methods that we learn from statisticians or scientists. Even when we fully understand the methods we import, if we’re bound to other people’s analytic machinery, we’re bound to their questions and biases.
Take the example I mentioned earlier, with demographic segmentation, punch card sorters, and its influence on social scientific statistics. The very mechanical affordances of early computers influence the sort of questions people asked for decades: how do discrete groups of people react to the world in different ways, and how do they compare with one another?
The next thing to watch out for is naive scientism. Even if you know the assumptions of your methods, and you develop your own techniques for the problem at hand, you still can fall into the positivist trap that Johanna Drucker warns us about — collapsing the distance between what we observe and some underlying “truth”.
This is especially difficult when we’re dealing with “big data”. Once you’re working with so much material you couldn’t hope to read it all, it’s easy to be lured into forgetting the distance between operationalizations and what you actually intend to measure.
For instance, if I’m finding friendships in Early Modern Europe by looking for particular words being written in correspondences, I will completely miss the existence of friends who were neighbors, and thus had no reason to write letters for us to eventually read.
A fourth way we can be mislead by quantitative methods is the ease with which they lend an air of false precision or false certainty.
This is the problem Matthew Lincoln and the other panelists brought up yesterday, where missing or uncertain data, once quantified, falsely appears precise enough to make comparisons.
I see this mistake crop up in early and recent quantitative histories alike; we measure, say, the changing rate of transnational shipments over time, and notice a positive trend. The problem is the positive difference is quite small, easily attributable to error, but because numbers are always precise, it still feels like we’re being more precise than doing a qualitative assessment. Even when it’s unwarranted.
The last thing to watch out for, and maybe the most worrisome, is the blinders quantitative analysis places on historians who don’t engage in other historiographic methods. This has been the downfall of many waves of quantitative history in the past; the inability to care about or even see that which can’t be counted.
This was, in part, was what led Time on the Cross to become the excuse to drive historians from cliometrics. The indicators of slavery that were measurable were sufficient to show it to have some semblance of economic success for black populations; but it was precisely those aspects of slavery they could not measure that were the most historically important.
So how do we regain mastery in light of these obstacles?
1. Uncareful Appropriation – Collaboration
Regarding the uncareful appropriation of methods, we can easily sidestep the issue of accidentally misusing a method by collaborating with someone who knows how the method works. This may require a translator; statisticians can as easily misunderstand historical problems as historians can misunderstand statistics.
Historians and statisticians can fruitfully collaborate, though, if they have someone in the middle trained to some extent in both — even if they’re not themselves experts. For what it’s worth, Dutch institutions seem to be ahead of the game in this respect, which is something that should be fostered.
2. Reliance on Imports – Statistical Training
Getting away from reliance on disciplinary imports may take some more work, because we ourselves must learn the approaches well enough to augment them, or create our own. Right now in DH this is often handled by summer institutes and workshop series, but I’d argue those are not sufficient here. We need to make room in our curricula for actual methods courses, or even degrees focused on methodology, in the same fashion as social scientists, if we want to start a robust practice of developing appropriate tools for our own research.
3. Naive Scientism – Humanities History
The spectre of naive scientism, I think, is one we need to be careful of, but we are also already well-equipped to deal with it. If we want to combat the uncareful use of proxies in digital history, we need only to teach the history of the humanities; why the cultural turn happened, what’s gone wrong with positivistic approaches to history in the past, etc.
Incidentally, I think this is something digital historians already guard well against, but it’s still worth keeping in mind and making sure we teach it. Particularly, digital historians need to remain aware of parallel approaches from the past, rather than tracing their background only to the textual work of people like Roberto Busa in Italy.
False precision and false certainty have some shallow fixes, and some deep ones. In the short term, we need to be better about understanding things like confidence intervals and error bars, and use methods like what Matthew Lincoln highlighted yesterday.
In the long term, though, digital history would do well to adopt triangulation strategies to help mitigate against these issues. That means trying to reach the same conclusion using multiple different methods in parallel, and seeing if they all agree. If they do, you can be more certain your results are something you can trust, and not just an accident of the method you happened to use.
5. Quantitative Blinders – Rejecting Digital History
Avoiding quantitative blinders – that is, the tendency to only care about what’s easily countable – is an easy fix, but I’m afraid to say it, because it might put me out of a job. We can’t call what we do digital history, or quantitative history, or cliometrics, or whatever else. We are, simply, historians.
Some of us use more quantitative methods, and some don’t, but if we’re not ultimately contributing to the same body of work, both sides will do themselves a disservice by not bringing every approach to bear in the wide range of interests historians ought to pursue.
Qualitative and idiographic historians will be stuck unable to deal with the deluge of material that can paint us a broader picture of history, and quantitative or nomothetic historians will lose sight of the very human irregularities that make history worth studying in the first place. We must work together.
If we don’t come together, we’re destined to remain punched-card humanists – that is, we will always be constrained and led by our methods, not by history.
Of course, this divide is a false one. There are no purely quantitative or purely qualitative studies; close-reading historians will continue to say things like “representative” or “increasing”, and digital historians won’t start publishing graphs with no interpretation.
Still, silos exist, and some of us have trouble leaving the comfort of our digital humanities conferences or our “traditional” history conferences.
That’s why this conference, I think, is so refreshing. It offers a great mix of both worlds, and I’m privileged and thankful to have been able to attend. While there are a lot of lessons we can still learn from those before us, from my vantage point, I think we’re on the right track, and I look forward to seeing more of those fruitful combinations over the course of today.
This account is influenced from some talks by Ben Schmidt. Any mistakes are from my own faulty memory, and not from his careful arguments. ↩
Okay, maybe space whales aren’t behind every spreadsheet, but they’re behind this one, dated 1662, notable for the gigantic nail it hammered into the coffin of our belief that heaven above is perfect and unchanging. The following post is the first in my new series full-stack dev (f-s d), where I explore the secret life of data. 1
The Princess Bride teaches us a good story involves “fencing, fighting, torture, revenge, giants, monsters, chases, escapes, true love, miracles”. In this story, Cetus, three of those play a prominent role: (red) giants, (sea) monsters, and (cosmic) miracles. Also Greek myths, interstellar explosions, beer-brewing astronomers, meticulous archivists, and top-secret digitization facilities. All together, they reveal how technologies, people, and stars aligned to stick this 350-year-old spreadsheet in your browser today.
When Aethiopian queen Cassiopeia claimed herself more beautiful than all the sea nymphs, Poseidon was, let’s say, less than pleased. Mildly miffed. He maybe sent a sea monster named Cetus to destroy Aethiopia.
Because obviously the best way to stop a flood is to drown a princess, Queen Cassiopeia chained her daughter to the rocks as a sacrifice to Cetus. Thankfully the hero Perseus just happened to be passing through Aethiopia, returning home after beheading Medusa, that snake-haired woman whose eyes turned living creatures to stone. Perseus (depicted below as the world’s most boring 2-ball juggler) revealed Medusa’s severed head to Cetus, turning the sea monster to stone and saving the princess. And then they got married because traditional gender roles I guess?
Cetaceans, you may recall from grade school, are those giant carnivorous sea-mammals that Captain Ahab warned you about. Cetaceans, from Cetus. You may also remember we have a thing for naming star constellations and dividing the sky up into sections (see the Zodiac), and that we have a long history of comparing the sky to the ocean (see Carl Sagan or Star Trek IV).
It should come as no surprise, then, that we’ve designated a whole section of space as ‘The Sea‘, home of Cetus (the whale), Aquarius (the God) and Eridanus (the water pouring from Aquarius’ vase, source of river floods), Pisces (two fish tied together by a rope, which makes total sense I promise), Delphinus (the dolphin), and Capricornus (the goat-fish. Listen, I didn’t make these up, okay?).
Ptolemy listed most of these constellations in his Almagest (ca. 150 A.D.), including Cetus, along with descriptions of over a thousand stars. Ptolemy’s model, with Earth at the center and the constellations just past Saturn, set the course of cosmology for over a thousand years.
In this cosmos, reigning in Western Europe for centuries past Copernicus’ death in 1543, the stars were fixed and motionless. There was no vacuum of space; every planet was embedded in a shell made of aether or quintessence (quint-essence, the fifth element), and each shell sat atop the next until reaching the celestial sphere. This last sphere held the stars, each one fixed to it as with a pushpin. Of course, all of it revolved around the earth.
The domain of heavenly spheres was assumed perfect in all sorts of ways. They slid across each other without friction, and the planets and stars were perfect spheres which could not change and were unmarred by inconsistencies. One reason it was so difficult for even “great thinkers” to believe the earth orbited the sun, rather than vice-versa, was because such a system would be at complete odds with how people knew physics to work. It would break gravity, break motion, and break the outer perfection of the cosmos, which was essential (…heh) 2 to our notions of, well, everything.
Which is why, when astronomers with their telescopes and their spreadsheets started systematically observing imperfections in planets and stars, lots of people didn’t believe them—even other astronomers. Over the course of centuries, though, these imperfections became impossible to ignore, and helped launch the earth in rotation ’round the sun.
This is the story of one such imperfection.
A Star is Born (and then dies)
Around 1296 A.D., over the course of half a year, a red dwarf star some 2 quadrillion miles away grew from 300 to 400 times the size of our sun. Over the next half year, the star shrunk back down to its previous size. Light from the star took 300 years to reach earth, eventually striking the retina of German pastor David Fabricius. It was very early Tuesday morning on August 13, 1596, and Pastor Fabricius was looking for Jupiter. 3
At that time of year, Jupiter would have been near the constellation Cetus (remember our sea monster?), but Fabricius noticed a nearby bright star (labeled ‘Mira’ in the below figure) which he did not remember from Ptolemy or Tycho Brahe’s star charts.
Spotting an unrecognized star wasn’t unusual, but one so bright in so common a constellation was certainly worthy of note. He wrote down some observations of the star throughout September and October, after which it seemed to have disappeared as suddenly as it appeared. The disappearance prompted Fabricius to write a letter about it to famed astronomer Tycho Brahe, who had described a similar appearing-then-disappearing star between 1572 and 1574. Brahe jotted Fabricius’ observations down in his journal. This sort of behavior, after all, was a bit shocking for a supposedly fixed and unchanging celestial sphere.
More shocking, however, was what happened 13 years later, on February 15, 1609. Once again searching for Jupiter, pastor Fabricius spotted another new star in the same spot as the last one. Tycho Brahe having recently died, Fabricius wrote a letter to his astronomical successor, Johannes Kepler, describing the miracle. This was unprecedented. No star had ever vanished and returned, and nobody knew what to make of it.
Unfortunately for Fabricius, nobody did make anything of it. His observations were either ignored or, occasionally, dismissed as an error. To add injury to insult, a local goose thief killed Fabricius with a shovel blow, thus ending his place in this star’s story, among other stories.
Three decades passed. On the winter solstice, 1638, Johannes Phocylides Holwarda prepared to view a lunar eclipse. He reported with excitement the star’s appearance and, by August 1639, its disappearance. The new star, Holwarda claimed, should be considered of the same class as Brahe, Kepler, and Fabricius’ new stars. As much a surprise to him as Fabricius, Holwarda saw the star again on November 7, 1639. Although he was not aware of it, his new star was the same as the one Fabricius spotted 30 years prior.
Two more decades passed before the new star in the neck of Cetus would be systematically sought and observed, this time by Johannes Hevelius: local politician, astronomer, and brewer of fine beers. By that time many had seen the star, but it was difficult to know whether it was the same celestial body, or even what was going on.
Hevelius brought everything together. He found recorded observations from Holwarda, Fabricius, and others, from today’s Netherlands to Germany to Poland, and realized these disparate observations were of the same star. Befitting its puzzling and seemingly miraculous nature, Hevelius dubbed the star Mira (miraculous) Ceti. The image below, from Hevelius’ Firmamentum Sobiescianum sive Uranographia (1687), depicts Mira Ceti as the bright star in the sea monster’s neck.
Going further, from 1659 to 1683, Hevelius observed Mira Ceti in a more consistent fashion than any before. There were eleven recorded observations in the 65 years between Fabricius’ first sighting of the star and Hevelius’ undertaking; in the following three, he had recorded 75 more such observations. Oddly, while Hevelius was a remarkably meticulous observer, he insisted the star was inherently unpredictable, with no regularity in its reappearances or variable brightness.
Beginning shortly after Hevelius, the astronomer Ismaël Boulliau also undertook a thirty year search for Mira Ceti. He even published a prediction, that the star would go through its vanishing cycle every 332 days, which turned out to be incredibly accurate. As today’s astronomers note, Mira Ceti‘s brightness increases and decreases by several orders of magnitude every 331 days, caused by an interplay between radiation pressure and gravity in the star’s gaseous exterior.
While of course Boulliau didn’t arrive at today’s explanation for Mira‘s variability, his solution did require a rethinking of the fixity of stars, and eventually contributed to the notion that maybe the same physical laws that apply on Earth also rule the sun and stars.
But we’re not here to talk about Boulliau, or Mira Ceti. We’re here to talk about this spreadsheet:
This snippet represents Hevelius’ attempt to systematically collected prior observations of Mira Ceti. Unreasonably meticulous readers of this post may note an inconsistency: I wrote that Johannes Phocylides Holwarda observed Mira Ceti on November 7th, 1639, yet Hevelius here shows Holwarda observing the star on December 7th, 1639, an entire month later. The little notes on the side are basically the observers saying: “wtf this star keeps reappearing???”
This mistake was not a simple printer’s error. It reappeared in Hevelius’ printed books three times: 1662, 1668, and 1685. This is an early example of what Raymond Panko and others call a spreadsheet error, which appear in nearly 90% of 21st century spreadsheets. Hand-entry is difficult, and mistakes are bound to happen. In this case, a game of telephone also played a part: Hevelius may have pulled some observations not directly from the original astronomers, but from the notes of Tycho Brahe and Johannes Kepler, to which he had access.
Unfortunately, with so few observations, and many of the early ones so sloppy, mistakes compound themselves. It’s difficult to predict a variable star’s periodicity when you don’t have the right dates of observation, which may have contributed to Hevelius’ continued insistence that Mira Ceti kept no regular schedule. The other contributing factor, of course, is that Hevelius worked without a telescope and under cloudy skies, and stars are hard to measure under even the best circumstances.
To Be Continued
Here ends the first half of Cetus. The second half will cover how Hevelius’ book was preserved, the labor behind its digitization, and a bit about the technologies involved in creating the image you see.
Early modern astronomy is a particularly good pre-digital subject for full-stack dev (f-s d), since it required vast international correspondence networks and distributed labor in order to succeed. Hevelius could not have created this table, compiled from the observations of several others, without access to cutting-edge astronomical instruments and the contemporary scholarly network.
You may ask why I included that whole section on Greek myths and Ptolemy’s constellations. Would as many early modern astronomers have noticed Mira Ceti had it not sat in the center of a familiar constellation, I wonder?
I promised this series will be about the secret life of data, answering the question of what’s behind a spreadsheet. Cetus is only the first story (well, second, I guess), but the idea is to upturn the iceberg underlying seemingly mundane datasets to reveal the complicated stories of their creation and usage. Stay-tuned for future installments.
In a similar spirit, for interested allies and struggling fellows, this post is about how my symptoms manifest in the academic world, and how I manage them. 2
Navigating the social world is tough—a fact that may surprise some of my friends and most of my colleagues. I do alright at conferences and in groups, when conversation is polite and skin-deep, but it requires careful concentration and a lot of smoke and mirrors. Inside, it feels like I’m translating from Turkish to Cantonese without knowing either language. Every time this is said, that is the appropriate reply, though I struggle to understand why. I just possess a translation book, and recite what is expected. Stimulus and response. This skill was only recently acquired.
Looking at the point between people’s eyes makes it appear as though I am making direct eye contact during conversations. Certain observations (“you look tired”) are apparently less well-received than others (“you look excited”), and I’ve mostly learned which are which.
After a long day keeping up this appearance, especially at conferences, I find a nice dark room and stay there. Sharing conference hotel rooms with fellow academics is never an option. Some strategies I figured out myself; others, like the eye contact trick, I built over extended discussions with an old girlfriend after she handed me a severely-highlighted copy of The Partner’s Guide to Asperger Syndrome.
ADHD and Autism Spectrum Disorder are highly co-morbid, and I have been diagnosed with either or both by several independent professionals in the last twenty years. Working is hard, and often takes at least twice as much time for me as it does for the peers with whom I have discussed this. When interested in something, I lose myself entirely in it for hours on end, but a single break in concentration will leave me scrambling. It may take hours or days to return to a task, if I do at all. My best work is done in marathon, and work that takes longer than a few days may never get finished, or may drop in quality precipitously. Keeping the internet disconnected and my phone off during regular periods every day, locked in my windowless office, helps keep distractions at bay. But, I have yet to discover a good strategy to manage long projects. A career in the book-driven humanities may have been a poor choice.
Paying bills on time, keeping schedules, and replying to emails are among the most stressful tasks in my life. When I don’t adequately handle all of these mundane tasks, it sets in motion a cycle of horror that paralyzes my ability to get anything done, until I eventually file for task bankruptcy and inevitably disappoint colleagues, friends, or creditors to whom action is owed. Poor time management and stress-cycles lead me to over-promise and under-deliver. On the bright side, I recently received help in strategies to improve that, and they work. Sometimes.
Friendships, surprisingly, are easy to maintain but difficult to nourish. My friends consider me trustworthy and willing to help (if not necessarily always dependable), but I lose track of friends or family who aren’t geographically close. Deeper emotional relationships are rare or, for swaths of my life, non-existent. I get no fits of anger or depression or elation or excitement. Indeed, my friends and family remark how impossible it is to see if I like a gift they’ve given me.
People occasionally describe my actions as offensive, rude, or short, and I get frustrated trying to understand exactly why what I’m doing fits into those categories. Apparently, early in grad school, I had a bit of a reputation for asking obnoxious questions in lectures. But I don’t like upsetting people, and actively (maybe successfully?) try to curb these traits when they are pointed out.
Thankfully, academic life allows me the freedom to lock myself in a room and focus on a task. Using work as a coping mechanism for social difficulties may be unhealthy, but hey, at least I found a career that rewards my peculiarities.
My life is pretty great. I have good friends, a loving family, and hobbies that challenge me. As long as I maintain the proper controlled environment, my fixations and obsessions are a perfect complement to an academic career, especially in a culture that (unfortunately) rewards workaholism. The same tenacity often compensates for difficulties in navigating romantic relationships, of which I’ve had a few incredibly fulfilling and valuable ones over my life thus-far.
Unfortunately, my experience on the autism spectrum is not shared by all academics. Some have enough difficulty managing the social world that they end up alienating colleagues who are on their tenure committees, to disastrous effect. From private conversations, it seems autistic women suffer more from this than men, as they are expected to perform more service work and to be more social. Supportive administrators can be vital in these situations, and autism-spectrum academics may want to negotiate accommodations for themselves as part of their hiring process.
Despite some frustrations, I have found my atypical way of interacting with the world to be a feature, not a bug. My atypicality presents as what used to be called Asperger Syndrome, and it is easier for me to interact with the world, and easier for the world to interact with me, than many other autistic individuals. That said, whether or not my friends and colleagues notice, I still struggle with many aspects common to those diagnosed on the autism spectrum: social-emotional difficulties, alexithymia, intensity of focus, hypersensitivity, system-oriented thinking, etc.
Relationships or friendships with someone on the spectrum can be tough, even with someone who doesn’t outwardly present common characteristics, like me. An old partner once vented her frustrations that she couldn’t turn to her friends for advice, because: “everyone just said Scott is so normal and I was thinking [no], he’s just very very good at passing [as socially aware].” Like many who grow up non-neurotypical, I learned a complex set of coping strategies to help me fit in and succeed in a neurotypical world. To concentrate on work, I create an office cave to shut out the world. I use a complicated set of journals, calendars, and apps to keep me on task and ensure I pay bills on time. To stay attentive, I sit at the front of a lecture hall—it even works, sometimes. Some ADHD symptoms are managed pharmacologically.
These strategies give me the 80% push I need to be a functioning member of society, to become someone who can sustain relationships, not get kicked out of his house for forgetting rent, and can almost finish a PhD. Almost. It’s not quite enough to prevent me from a dozen incompletes on my transcripts, but I make do. A host of unrealistically patient and caring friends, family, and colleagues helps. (If you’re someone to whom I still owe work, but am too scared to reply to because of how delinquent I am, thanks for understanding! waves and runs away). Caring allies help. A lot.
My life so far has been a series of successes and confusions. Not unlike anybody else’s life, I suppose. I occupy my own corner of weirdness, which is itself unique enough, but everyone has their own corner. I doubt my writing this will help anyone understand themselves any better, but hopefully it will help fellow academics feel a bit safer in their own weirdness. And if this essay helps our neurotypical colleagues be a bit more understanding of our struggles, and better-informed as allies, all the better.
The original article, Stigma, was written for the Conditionally Accepted column of Inside Higher Ed. Jeana Jorgensen, Eric Grollman and Sarah Bray provided invaluable feedback, and I wouldn’t have written it without them. They invited me to write this second article for Inside Higher Ed as well, which was my original intent. I wound up posting it on my blog instead because their posting schedule didn’t quite align with my writing schedule. This shouldn’t be counted as a negative reflection on the process of publishing with that fine establishment. ↩
Let me be clear: I know very little about autism, beyond that I have been diagnosed with it. I’m still learning a lot. This post is about me. Knowing other people face similar struggles has been profoundly helpful, regardless of what causes those struggles. ↩
If you claim computational approaches to history (“digital history”) lets historians ask new types of questions, or that they offer new historical approaches to answering or exploring old questions, you are wrong. You’re not actually wrong, but you are institutionally wrong, which is maybe worse.
This is a problem, because rhetoric from practitioners (including me) is that we can bring some “new” to the table, and when we don’t, we’re called out for not doing so. The exchange might (but probably won’t) go like this:
Digital Historian: And this graph explains how velociraptors were of utmost importance to Victorian sensibilities.
Historian in Audience: But how is this telling us anything we haven’t already heard before? Didn’t John Hammond already make the same claim?
DH: That’s true, he did. One thing the graph shows, though, is that velicoraptors in general tend to play much more unimportant roles across hundreds of years, which lends support to the Victorian thesis.
HiA: Yes, but the generalized argument doesn’t account for cultural differences across those times, so doesn’t meaningfully contribute to this (or any other) historical conversation.
History (like any discipline) is made of people, and those people have Ideas about what does or doesn’t count as history (well, historiography, but that’s a long word so let’s ignore it). If you ask a new type of question or use a new approach, that new thing probably doesn’t fit historians’ Ideas about proper history.
The age of peak celebrity has been consistent over time: about 75 years after birth. But the other parameters have been changing. Fame comes sooner and rises faster. Between the early 19th century and the mid-20th century, the age of initial celebrity declined from 43 to 29 years, and the doubling time fell from 8.1 to 3.3 years.
Historians saw those claims and asked “so what”? It’s not interesting or relevant according to the things historians usually consider interesting or relevant, and it’s problematic in ways historians find things problematic. For example, it ignores cultural differences, does not speak to actual human experiences, and has nothing of use to say about a particular historical moment.
It’s true. Culturomics-style questions do not fit well within a humanities paradigm (incommensurable, anyone?). By the standard measuring stick of what makes a good history project, culturomics does not measure up. A new type of question requires a new measuring stick; in this case, I think a good one for culturomics-style approaches is the extent to which they bridge individual experiences with large-scale social phenomena, or how well they are able to reconcile statistical social regularities with free or contingent choice.
The point, though, is a culturomics presentation would fit few of the boxes expected at a history conference, and so would be considered a failure. Rightly so, too—it’s a bad history presentation. But what culturomics is successfully doing is asking new types of questions, whether or not historians find them legitimate or interesting. Is it good culturomics?
To put too fine a point on it, since history is often a question-driven discipline, new types of questions that are too different from previous types are no longer legitimately within the discipline of history, even if they are intrinsically about human history and do not fit in any other discipline.
What’s more, new types of questions may appear simplistic by historian’s standards, because they fail at fulfilling even the most basic criteria usually measuring historical worth. It’s worth keeping in mind that, to most of the rest of the world, our historical work often fails at meeting their criteria for worth.
New approaches to old questions share a similar fate, but for different reasons. That is, if they are novel, they are not interesting, and if they are interesting, they are not novel.
Traditional historical questions are, let’s face it, not particularly new. Tautologically. Some old questions in my field are: what role did now-silent voices play in constructing knowledge-making instruments in 17th century astronomy? How did scholarship become institutionalized in the 18th century? Why was Isaac Newton so annoying?
My own research is an attempt to provide a broader view of those topics (at least, the first two) using computational means. Since my topical interest has a rich tradition among historians, it’s unlikely any of my historically-focused claims (for example, that scholarly institutions were built to replace the really complicated and precarious role people played in coordinating social networks) will be without precedent.
After decades, or even centuries, of historical work in this area, there will always be examples of historians already having made my claims. My contribution is the bolstering of a particular viewpoint, the expansion of its applicability, the reframing of a discussion. Ultimately, maybe, I convince the world that certain social network conditions play an important role in allowing scholarly activity to be much more successful at its intended goals. My contribution is not, however, a claim that is wholly without precedent.
But this is a problem, since DH rhetoric, even by practitioners, can understandably lead people to expect such novelty. Historians in particular are very good at fitting old patterns to new evidence. It’s what we’re trained to do.
Any historical claim (to an acceptable question within the historical paradigm) can easily be countered with “but we already knew that”. Either the question’s been around long enough that every plausible claim has been covered, or the new evidence or theory is similar enough to something pre-existing that it can be taken as precedent.
The most masterful recent discussion of this topic was Matthew Lincoln’s Confabulation in the humanities, where he shows how easy it is to make up evidence and get historians to agree that they already knew it was true.
To put too fine a point on it, new approaches to old historical questions are destined to produce results which conform to old approaches; or if they don’t, it’s easy enough to stretch the old & new theories together until they fit. New approaches to old questions will fail at producing completely surprising results; this is a bad standard for historical projects.If a novel methodology were to create truly unrecognizable results, it is unlikely those results would be recognized as “good history” within the current paradigm. That is, historians would struggle to care.
What Is This Beast?
What is this beast we call digital history? Boundary-drawing is a tried-and-true tradition in the humanities, digital or otherwise. It’s theoretically kind of stupid but practically incredibly important, since funding decisions, tenure cases, and similar career-altering forces are at play. If digital history is a type of history, it’s fundable as such, tenurable as such; if it isn’t, it ain’t. What’s more, if what culturomics researchers are doing are also history, their already-well-funded machine can start taking slices of the sad NEH pie.
So “what counts?” is unfortunately important to answer.
This discussion around what is “legitimate history research” is really important, but I’d like to table it for now, because it’s so often conflated with the discussion of what is “legitimate research” sans history. The former question easily overshadows the latter, since academics are mostly just schlubs trying to make a living.
For the last century or so, history and philosophy of science have been smooshed together in departments and conferences. It’s caused a lot of concern. Does history of science need philosophy of science? Does philosophy of science need history of science? What does it mean to combine the two? Is what comes out of the middle even useful?
Weirdly, the question sometimes comes down to “does history and philosophy of science even exist?”. It’s weird because people identify with that combined title, so I published a citation analysis in Erkenntnis a few years back that basically showed that, indeed, there is an area between the two communities, and indeed those people describe themselves as doing HPS, whatever that means to them.
I bring this up because digital history, as many of us practice it, leaves us floating somewhere between public engagement, social science, and history. Culturomics occupies a similar interstitial space, though inching closer to social physics and complex systems.
From this vantage point, we have a couple of options. We can say digital history is just history from a slightly different angle, and try to be evaluated by standard historical measuring sticks—which would make our work easily criticized as not particularly novel. Or we can say digital history is something new, occupying that in-between space—which could render the work unrecognizable to our usual communities.
The either/or proposition is, of course, ludicrous. The best work being done now skirts the line, offering something just novel enough to be surprising, but not so out of traditional historical bounds as to be grouped with culturomics. But I think we need to more deliberate and organized in this practice, lest we want to be like History and Philosophy of Science, still dealing with basic questions of legitimacy fifty years down the line.
In the short term, this probably means trying not just to avoid the rhetoric of newness, but to actively curtail it. In the long term, it may mean allying with like-minded historians, social scientists, statistical physicists, and complexity scientists to build a new framework of legitimacy that recognizes the forms of knowledge we produce which don’t always align with historiographic standards. As Cassidy Sugimoto and I recently wrote, this often comes with journals, societies, and disciplinary realignment.
The least we can do is steer away from a novelty rhetoric, since what is novel often isn’t history, and what is history often isn’t novel.
Here’s a way of thinking that might get us past this muddle (and I think I agree with the authors that the hype around DH is a mistake): let’s stop branding our scholarship. We don’t need Next Big Things and we don’t need Academic Superstars, whether they are DH Superstars or Theory Superstars. What we do need is to find more democratic and inclusive ways of thinking about the value of scholarship and scholarly communities.
This is relevant here, and good, but tough to reconcile with the earlier post. In an ideal world, without disciplinary brandings, we can all try to be welcoming of works on their own merits, without relying our preconceived disciplinary criteria. In the present condition, though, it’s tough to see such an environment forming. In that context, maybe a unified digital history “brand” is the best way to stay afloat. This would build barriers against whatever new thing comes along next, though, so it’s a tough question.
Below is some crazy, uninformed ramblings about the least-complex possible way to trick someone into thinking a computer is a human, for the purpose of history research. I’d love some genuine AI/Machine Intelligence researchers to point me to the actual discussions on the subject. These aren’t original thoughts; they spring from countless sci-fi novels and AI research from the ’70s-’90s. Humanists beware: this is super sci-fi speculative, but maybe an interesting thought experiment.
If someone’s chatting with a computer, but doesn’t realize her conversation partner isn’t human, that computer passes the Turing Test. Unrelatedly, if a robot or piece of art is just close enough to reality to be creepy, but not close enough to be convincingly real, it lies in the Uncanny Valley. I argue there is a useful concept in the simplest possible computer which is still convincingly human, and that computer will be at the Turing Point. 1
Forgive my twisting Turing Tests and Uncanny Valleys away from their normal use, for the sake of outlining the Turing Point concept:
A human simulacrum is a simulation of a human, or some aspect of a human, in some medium, which is designed to be as-close-as-possible to that which is being modeled, within the scope of that medium.
A Turing Test winner is any human simulacrum which humans consistently mistake for the real thing.
An occupant of the Uncanny Valley is any human simulacrum which humans consistently doubt as representing a “real” human.
Between the Uncanny Valley and Turing Test winners lies the Turing Point, occupied by the least-sophisticated human simulacrum that can still consistently pass as human in a given medium. The Turing Point is a hyperplane in a hypercube, such that there are many points of entry for the simulacrum to “phase-transition” from uncanny to convincing.
Extending the Turing Test
The classic Turing Test scenario is a text-only chatbot which must, in free conversation, be convincing enough for a human to think it is speaking with another human. A piece of software named Eugene Goostman sort-of passed this test in 2014, convincing a third of judges it was a 13-year-old Ukrainian boy.
There are many possible modes in which a computer can act convincingly human. It is easier to make a convincing simulacrum of a 13-year-old non-native English speaker who is confined to text messages than to make a convincing college professor, for example. Thus the former has a lower Turing Point than the latter.
Playing with the constraints of the medium will also affect the Turing Point threshold. The Turing Point for a flesh-covered robot is incredibly difficult to surpass, since so many little details (movement, design, voice quality, etc.) may place it into the Uncanny Valley. A piece of software posing as a Twitter user, however, would have a significantly easier time convincing fellow users it is human.
The Turing Point, then, is flexible to the medium in which the simulacrum intends to deceive, and the sort of human it simulates.
From Type to Token
Convincing the world a simulacrum is any old human is different than convincing the world it is some specific human. This is the token/type distinction; convincingly simulating a specific person (token) is much more difficult than convincingly simulating any old person (type).
Simulations of specific people are all over the place, even if they don’t intend to deceive. Several Twitter-bots exist as simulacra of Donald Trump, reading his tweets and creating new ones in a similar style. Perhaps imitating Poe’s Law, certain people’s styles, or certain types of media (e.g. Twitter), may provide such a low Turing Point that it is genuinely difficult to distinguish humans from machines.
Put differently, the way some Turing Tests may be designed, humans could easily lose.
It’ll be useful to make up and define two terms here. I imagine the concepts already exist, but couldn’t find them, so please comment if they do so I can use less stupid words:
A type-bot is a machine designed to be represent something at the type-level. For example, a bot that can be mistaken for some random human, but not some specific human.
A token-bot is a machine designed to represent something at the token-level. For example, a bot that can be mistaken for Donald Trump.
This all got me thinking, if we reach the Turing Point for some social media personalities (that is, it is difficult to distinguish between their social media presence, and a simulacrum of it), what’s to say we can’t reach it for an entire social media ecosystem? Can we take a snapshot of Twitter and project it several seconds/minutes/hours/days into the future, a bit like a meteorological model?
A few questions and obvious problems:
Much of Twitter’s dynamics are dependent upon exogenous forces: memes from other media, real world events, etc. Thus, no projection of Twitter alone would ever look like the real thing. One can, however, potentially use such a simulation to predict how certain types of events might affect the system.
This is way overkill, and impossibly computationally complex at this scale. You can simulate the dynamics of Twitter without simulating every individual user, because people on average act pretty systematically. That said, for the humanities-inclined, we may gain more insight from the ground-level of the system (individual agents) than macroscopic properties.
This is key. Would a set of plausibly-duplicate Twitter personalities on aggregate create a dynamic system that matches Twitter as an aggregate system? That is, just because the algorithms pass the Turing Test, because humans believe them to be humans, does that necessarily imply the algorithms have enough fidelity to accurately recreate the dynamics of a large scale social network? Or will small unnoticeable differences between the simulacrum and the original accrue atop each other, such that in aggregate they no longer act like a real social network?
The last point is I think a theoretically and methodologically fertile one for people working in DH, AI, and Cognitive Science: whether reducing human-appreciable traits between machines and people is sufficient to simulate aggregate social behavior, or whether human-appreciability (i.e., Turing Test) is a strict enough criteria for making accurate predictions about societies.
These points aside, if we ever do manage to simulate specific people (even in a very limited scope) as token-bots based on the traces they leave, it opens up interesting pedagogical and research opportunities for historians. Scott Enderle tweeted a great metaphor for this:
@scott_bot In one of the thousand science fiction stories I’ll never write, “nachlass” is the name for an archived consciousness.
Imagine, as a student, being able to have a plausible discussion with Marie Curie, or sitting in an Enlightenment-era salon. 2 Or imagine, as a researcher (if individual Turing Point machines do aggregate well), being able to do well-grounded counterfactual history that works at the token level rather than at the type level.
Turing Point Simulations
Bringing this slightly back into the realm of the sane, the interesting thing here is the interplay between appreciability (a person’s ability to appreciate enough difference to notice something wrong with a simulacrum) and fidelity.
We can specifically design simulation conditions with incredibly low-threshold Turing Points, even for token-bots. That is to say, we can create a condition where the interactions are simple enough to make a bot that acts indistinguishably from the specific human it is simulating.
At the most extreme end, this is obviously pointless. If our system is one in which a person can only answer “yes” or “no” to pre-selected preference questions (“Do you like ice-cream?”), making a bot to simulate that person convincingly would be trivial.
Putting that aside (lest we get into questions of the Turing Point of a set of Turing Points), we can potentially design reasonably simplistic test scenarios that would allow for an easy-to-reach Turing Point while still being historiographically or sociologically useful. It’s sort of a minimization problem in topological optimizations. Such a goal would limit the burden of the simulation while maximizing the potential research benefit (but only if, as mentioned before, the difference between true fidelity and the ability to win a token-bot Turing Test is small enough to allow for generalization).
In short, the concept of a Turing Point can help us conceptualize and build token-simulacra that are useful for research or teaching. It helps us ask the question: what’s the least-complex-but-still-useful token-simulacra? It’s also kind-of maybe sort-of like Kolmogorov complexity for human appreciability of other humans: that is, the simplest possible representation of a human that is convincing to other humans.
I’ll end by saying, once again, I realize how insane this sounds, and how far-off. And also how much an interloper I am to this space, having never so much as designed a bot. Still, as Bill Hart-Davidson wrote,
@scott_bot I agree, and that scale is more clearly plausible than it all seemed years ago reading Mona Lisa Overdrive
the possibility seems more plausible than ever, even if not soon-to-come. I’m not even sure why I posted this on the Irregular, but it seemed like it’d be relevant enough to some regular readers’ interests to be worth spilling some ink.
The name itself is maybe too on-the-nose, being a pun for turning point and thus connected to the rhetoric of singularity, but ¯\_(ツ)_/¯ ↩
Yes yes I know, this is SecondLife all over again, but hopefully much more useful. ↩