Connecting the Dots

This is the incredibly belated transcript of my HASTAC 2015 keynote. Many thanks to the organizers for inviting me, and to my fellow participants for all the wonderful discussions. The video and slides are also online. You can find citations to some of the historical illustrations and many of my intellectual inspirations here. What I said and what I wrote probably don’t align perfectly.

When you’re done reading this, you should read Roopika Risam’s closing keynote, which connects surprisingly well with this, though we did not plan it.


If you take a second to expand and disentangle “HASTAC”, you get a name of an organization that doubles as a fairly strong claim about the world: that Humanities, Arts, Science, and Technology are separate things, that they probably aren’t currently in alliance with one another, and that they ought to form an alliance.

This intention is reinforced in the theme of this year’s conference: “The Art and Science of Digital Humanities.” Here again we get the four pillars: humanities, arts, science, and technology. In fact, bear with me as I read from the CFP:

We welcome sessions that address, exemplify, and interrogate the interdisciplinary nature of DH work. HASTAC 2015 challenges participants to consider how the interplay of science, technology, social sciences, humanities, and arts are producing new forms of knowledge, disrupting older forms, challenging or reifying power relationships, among other possibilities.

Here again is that implicit message: disciplines are isolated, and their interplay can somehow influence power structures. As with a lot of digital humanities and cultural studies, there’s also a hint of activism: that building intentional bridges is a beneficial activity, and we’re organizing the community of HASTAC around this goal.

hastac-outline

This is what I’ll be commenting on today. First, what does disciplinary isolation mean? I put this historically, and argue that we must frame disciplinary isolation in a rhetorical context.

This brings me to my second point about ontology. It turns out the way we talk about isolation is deeply related to the way we think about knowledge, the way we illustrate it, and ultimately the shape of knowledge itself. That’s ontology.

My third point brings us back to HASTAC: that we represent an intentional community, and this intent is to build bridges which positively affect the academy and the world.

I’ll connect these three strands by arguing that we need a map to build bridges, and we need to carefully think about the ontology of knowledge to draw that map. And once we have a map, we can use it to design a better territory.

In short, this plenary is a call-to-action. It’s my vocal support for an intentionally improved academy, my exploration of its historical and rhetorical underpinnings, and my suggestions for affecting positive change in the future.

PhDKnowledge.002[1]
Matt Might’s Illustrated Guide to the Ph.D.
Let’s begin at the beginning. With isolation.

Stop me if you’ve heard this one before:

Within this circle is the sum of all human knowledge. It’s nice, it’s enclosed, it’s bounded. It’s a comforting thought, that everything we’ve ever learned or created sits comfortably inside these boundaries.

This blue dot is you, when you’re born. It’s a beautiful baby picture. You’ve got the whole world ahead of you, an entire universe to learn, just waiting. You’re at the center because you have yet to reach your proverbial hand out in any direction and begin to learn.

Matt Might's Illustrated Guide to the Ph.D.
Matt Might’s Illustrated Guide to the Ph.D.

But time passes and you grow. You go to highschool, you take your liberal arts and sciences, and you slowly expand your circle into the great known. Rounding out your knowledge, as it were.

Then college happens! Oh, those heady days of youth. We all remember it, when the shape of our knowledge started leaning tumorously to one side. The ill-effects of specialization and declaring a major, I suspect.

As you complete a master’s degree, your specialty pulls your knowledge inexorably towards the edge of the circle of the known. You’re not a jack of all trades anymore. You’re an expert.

http://matt.might.net/articles/phd-school-in-pictures/
Matt Might’s Illustrated Guide to the Ph.D.

Then your PhD advisor yells at you to focus and get even smaller. So you complete your qualifying exams and reach the edge of what’s known. What lies beyond the circle? Let’s zoom in and see!

Matt Might's Illustrated Guide to the Ph.D.
Matt Might’s Illustrated Guide to the Ph.D.

You’ve reached the edge. The end of the line. The sum of all human knowledge stops here. If you want to go further, you’ll need to come up with something new. So you start writing your dissertation.

That’s your PhD. Right there, at the end of the little arrow.

You did it. Congratulations!

You now know more about less than anybody else in the world. You made a dent in the circle, you pushed human knowledge out just a tiny bit further, and all it cost you was your mental health, thirty years of your life, and the promise of a certain future. …Yay?

PhDKnowledge.012[1]
Matt Might’s Illustrated Guide to the Ph.D.
So here’s the new world that you helped build, the new circle of knowledge. With everyone in this room, I bet we’ve managed to make a lot of dents. Maybe we’ve even managed to increase the circle’s radius a bit!

Now, what I just walked us all through is Matt Might’s illustrated guide to the Ph.D. It made its rounds on the internet a few years back, it was pretty popular.

And, though I’m being snarky about it, it’s a pretty uplifting narrative. It provides that same dual feeling of insignificance and importance that you get when you stare at the Hubble Ultra Deep Field. You know the picture, right?

Hubble Ultra Deep Field
Hubble Ultra Deep Field

There are 10,000 galaxies on display here, each with a hundred billion stars. To think that we, humans, from our tiny vantage point on Earth, could see so far and so much because of the clever way we shape glass lenses? That’s really cool.

And saying that every pinprick of light we see is someone else’s PhD? Well, that’s a pretty fantastic metaphor. Makes getting the PhD seem worth it, right?

Dante and the Early Astronomers; M. A. Orr (Mrs. John Evershed), 1913
Dante and the Early Astronomers; M. A. Orr (Mrs. John Evershed), 1913

It kinda reminds me of the cosmological theories of some of our philosophical ancestors.

The cosmos (Greek for “Order”), consisted of concentric, perfectly layered spheres, with us at the very center.

The cosmos was bordered by celestial fire, the light from heaven, and stars were simply pin-pricks in a dark curtain which let the heavenly light shine through.

Flammarion
Flammarion

So, if we beat Matt Might’s PhD metaphor to death, each of our dissertations are poking holes in the cosmic curtain, letting the light of heaven shine through. And that’s a beautiful thought, right? Enough pinpricks, and we’ll all be bathed in light.

Expanding universe.
Expanding universe.

But I promised we’d talk about isolation, and even if we have to destroy this metaphor to get there, we’ll get there.

The universe is expanding. That circle of knowledge we’re pushing the boundaries of? It’s getting bigger too. And as it gets larger, things that were once close get further and further apart. You and I and Alpha Centauri were all neighbors for the big bang, but things have changed since then, and the star that was once our neighbor is now 5 light years away.

Atlas of Science, Katy Borner (2010).
Atlas of Science, Katy Borner (2010).

In short, if we’re to take Matt Might’s PhD model as accurate, then the result of specialization is inexorable isolation. Let’s play this out.

Let’s say two thousand years ago, a white dude from Greece invented science. He wore a beard.

[Note for readers: the following narrative is intentionally awful. Read on and you’ll see why.]

Untitled-3

He and his bearded friends created pretty much every discipline we’re familiar with at Western universities: biology, cosmology, linguistics, philosophy, administration, NCAA football, you name it.

Over time, as Ancient Greek beards finished their dissertations, the boundaries of science expanded in every direction. But the sum of human knowledge was still pretty small back then, so one beard could write many dissertations, and didn’t have to specialize in only one direction. Polymaths still roamed the earth.

Untitled-3

Fast forward a thousand years or so. Human knowledge had expanded in the interim, and the first European universities branched into faculties: theology, law, medicine, arts.

Another few hundred years, and we’ve reached the first age of information overload. It’s barely possible to be a master of all things, and though we remember scholars and artists known for their amazing breadth, this breadth is becoming increasingly difficult to manage.

We begin to see the first published library catalogs, since the multitude of books required increasingly clever and systematic cataloging schemes. If you were to walk through Oxford in 1620, you’d see a set of newly-constructed doors with signs above them denoting their disciplinary uses: music, metaphysics, history, moral philosophy, and so on.

The encyclopedia of Diderot & D'alembert
The encyclopedia of Diderot & D’alembert

Time goes on a bit further, the circle of knowledge expands, and specialization eventually leads to fracturing.

We’ve reached the age of these massive hierarchical disciplinary schemes, with learning branching in every direction. Our little circle has become unmanageable.

A few more centuries pass. Some German universities perfect the art of specialization, and they pass it along to everyone else, including the American university system.

Within another 50 years, CP Snow famously invoked the “Two Cultures” of humanities and sciences.

And suddenly here we are

Untitled-3

On the edge of our circle, pushing outward, with every new dissertation expanding our radius, and increasing the distance to our neighbors.

Basically, the inevitable growth of knowledge results in an equally inevitable isolation. This is the culmination of super-specialization: a world where the gulf between disciplines is impossible to traverse, filled with language barriers, value differences, and intellectual incommensurabilities. You name it.

hastac-outline

By this point, 99% of the room is probably horrified. Maybe it’s by the prospect of an increasingly isolated academy. More likely the horror’s at my racist, sexist, whiggish, Eurocentric account of the history of science, or at my absurdly reductivist and genealogical account of the growth of knowledge.

This was intentional, and I hope you’ll forgive me, because I did it to prove a point: the power of visual rhetoric in shaping our thoughts. We use the word “imagine” to describe every act of internal creation, whether or not it conforms to the root word of “image”. In classical and medieval philosophy, thought itself was a visual process, and complex concepts were often illustrated visually in order to help students understand and remember. Ars memoriae, it was called.

And in ars memoriae, concepts were not only given visual form, they were given order. This order wasn’t merely a clever memorization technique, it was a reflection on underlying truths about the relationship between concepts. In a sense, visual representations helped bridge human thought with divine structure.

This is our entrance into ontology. We’ve essentially been talking about interdisciplinarity for two thousand years, and always alongside a visual rhetoric about the shape, or ontology, of knowledge. Over the next 10 minutes, I’ll trace the interwoven histories of ontology, illustrations, and rhetoric of interdisciplinarity. This will help contextualize our current moment, and the intention behind meeting at a conference like this one. It should, I hope, also inform how we design our community going forward.

Let’s take a look some alternatives to the Matt Might PhD model.

Diagrams of Knowledge
Diagrams of Knowledge

Countless cultural and religious traditions associate knowledge with trees; indeed, in the Bible, the fruit of one tree is knowledge itself.

During the Roman Empire and the Middle Ages, the sturdy metaphor of trees provided a sense of lineage and order to the world that matched perfectly with the neatly structured cosmos of the time. Common figures of speech we use today like “the root of the problem” or “branches of knowledge” betray the strength with which we connected these structures to one another. Visual representations of knowledge, obviously, were also tree-like.

See, it’s impossible to differentiate the visual from the essential here. The visualization wasn’t a metaphor, it was an instantiation of essence. There are three important concepts that link knowledge to trees, which at that time were inseparable.

One: putting knowledge on a tree implied a certain genealogy of ideas. What we discovered and explored first eventually branched into more precise subdisciplines, and the history of those branches are represented on the tree. This is much like any family tree you or I would put together with our parents and grandparents and so forth. The tree literally shows the historical evolution of concepts.

Two: putting knowledge on a tree implied a specific hierarchy that would by the Enlightenment become entwined with how we understood the universe. Philosophy separates into the theoretical and the practical; basic math into geometry and arithmetic. This branching hierarchy gave an importance to the root of the tree, be that root physics or God or philosophy or man, and that importance decreased as you reached the further limbs. It also implied an order of necessity: the branches of math could not exist without the branch of philosophy it stemmed from. This is why today people still think things like physics is the most important discipline.

Three: As these trees were represented, there was no difference between the concept of a branch of knowledge, the branch of knowledge itself, and the object of study of that branch of knowledge. The relationship of physics to chemistry isn’t just genealogical or foundational; it’s actually transcendent. The conceptual separation of genealogy, ontology, and transcendence would not come until much later.

It took some time for the use of the branching tree as a metaphor for knowledge to take hold, competing against other visual and metaphorical representations, but once it did, it ruled victorious for centuries. The trees spread and grew until they collapsed under their own weight by the late nineteenth century, leaving a vacuum to be filled by faceted classification systems and sprawling network visualizations. The loss of a single root as the source of knowledge signaled an epistemic shift in how knowledge is understood, the implications of which are still unfolding in present-day discussions of interdisciplinarity.

By visualizing knowledge itself as a tree, our ancestors reinforced both an epistemology and a phenomenology of knowledge, ensuring that we would think of concepts as part of hierarchies and genealogies for hundreds of years. As we slowly moved away from strictly tree-based representations of knowledge in the last century, we have also moved away from the sense that knowledge forms a strict hierarchy. Instead, we now believe it to be a diffuse system of occasionally interconnected parts.

Of course, the divisions of concepts and bodies of study have no natural kind. There are many axes against which we may compare biology to literature, but even the notion of an axis of comparison implies a commonality against which the two are related which may not actually exist. Still, we’ve found the division of knowledge into subjects, disciplines, and fields a useful practice since before Aristotle. The metaphors we use for these divisions influence our understanding of knowledge itself: structured or diffuse; overlapping or separate; rooted or free; fractals or divisions; these metaphors inform how we think about thinking, and they lend themselves to visual representations which construct and reinforce our notions of the order of knowledge.

Arbor Scientiae, late thirteenth century, Ramon Llull. [via]
Arbor Scientiae, late thirteenth century, Ramon Llull.
Given all this, it should come as no surprise that medieval knowledge was shaped like a tree – God sat at the root, and the great branching of knowledge provided a transcendental order of things. Physics, ethics, and biology branched further and further until tiny subdisciplines sat at every leaf. One important aspect of these illustrations was unity – they were whole and complete, and even more, they were all connected. This mirrors pretty closely that circle from Matt Might.

Christophe de Savigny’s Tableaux: Accomplis de tous les arts liberaux, 1587
Christophe de Savigny’s Tableaux: Accomplis de tous les arts liberaux, 1587

Speaking of that circle I had up earlier, many of these branching diagrams had a similar feature. Notice the circle encompassing this illustration, especially the one on the left here: it’s a chain. The chain locks the illustration down: it says, there are no more branches to grow.

This and similar illustrations were also notable for their placement. This was an index to a book, an early encyclopedia of sorts – you use the branches to help you navigate through descriptions of the branches of knowledge. How else should you organize a book of knowledge than by its natural structure?

Bacon's Advancement of Learning
Bacon’s Advancement of Learning

We start seeing some visual, rhetorical, and ontological changes by the time of Francis Bacon, who wrote “the distributions and partitions of knowledge are […] like branches of a tree that meet in a stem, which hath a dimension and quantity of entireness and continuance, before it come to discontinue and break itself into arms and boughs.”

The highly influential book broke the trends in three ways:

  1. it broke the “one root” model of knowledge.
  2. It shifted the system from closed to open, capable of growth and change
  3. it detached natural knowledge from divine wisdom.

Bacon’s uprooting of knowledge, dividing it into history, poesy, and philosophy, each with its own root, was an intentional rhetorical strategy. He used it to argue that natural philosophy should be explored at the expense of poesy and history. Philosophy, what we now call science, was now a different kind of knowledge, worthier than the other two.

And doesn’t that feel a lot like today?

Bacon’s system also existed without an encompassing chain, embodying the idea that learning could be advanced; that the whole of knowledge could not be represented as an already-grown tree. There was no complete order of knowledge, because knowledge changes.

And, by being an imperfect, incomplete entity, without union, knowledge was notably separated from divine wisdom.

Kircher's Philosophical tree representing all branches of knowledge, from Ars Magna Sciendi (1669), p. 251.
Kircher’s Philosophical tree representing all branches of knowledge, from Ars Magna Sciendi (1669), p. 251.

Of course, divinity and transcendence wasn’t wholly exorcised from these ontological illustrations: Athanasius Kircher put God on the highest branch, feeding the tree’s growth. (Remember, from my earlier circle metaphor, the importance of the poking holes in the fabric of the cosmos to let the light of heaven shine through?). Descartes as well continued to describe knowledge as a tree, whose roots were reliant on divine existence.

Chambers' Cyclopædia
Chambers’ Cyclopædia

But even without the single trunk, without God, without unity, the metaphors were still ontologically essential, even into the 18th century. This early encyclopedia by Ephraim Chambers uses the tree as an index, and Chambers writes:

“the Origin and Derivation of the several Parts, and the relation in which [the disciplines] stand to their common Stock and to each other; will assist in restoring ‘em to their proper Places

Their proper places. This order is still truth with a capital T.

The encyclopedia of Diderot & D'alembert
The encyclopedia of Diderot & D’alembert

It wasn’t until the mid-18th century, with Diderot and d’Alembert’s encyclopedia, that serious thinkers started actively disputing the idea that these trees were somehow indicative of the essence of knowledge. Even they couldn’t escape using trees, however, introducing their enyclopedia by saying “We have chosen a division which has appeared to us most nearly satisfactory for the encyclopedia arrangement of our knowledge and, at the same time, for its genealogical arrangement.

Even if the tree wasn’t the essence of knowledge, it still represented possible truth about the genealogy of ideas. It took until a half century later, with the Encyclopedia Britannica, for the editors to do away with tree illustrations entirely and write that the world was “perpetually blended in almost every branch of human knowledge”. (Notice they still use the word branch.) By now, a philosophical trend that began with Bacon was taking form through the impossibility of organizing giant libraries and encyclopedia: that there was no unity of knowledge, no implicit order, and no viable hierarchy.

Banyan tree [via]
It took another century to find a visual metaphor to replace the branching tree. Herbert Spencer wrote that the branches of knowledge “now and again re-unite […], they severally send off and receive connecting growths; and the intercommunion is ever becoming more frequent, more intricate, more widely ramified.” Classification theorist S.R. Ranganathan compared knowledge to the Banyan tree from his home country of India, which has roots which both grow from the bottom up and the top down.

Otlet 1937
Otlet 1937

The 20th century saw a wealth of new shapes of knowledge. Paul Otlet conceived a sort of universal network, connected through individual’s thought processes. H.G. Wells shaped knowledge very similar to Matt Might’s illustrated PhD from earlier: starting with a child’s experience of learning and branching out. These were both interesting developments, as they rhetorically placed the ontology of knowledge in the realm of the psychological or the social: driven by people rather than some underlying objective reality about conceptual relationships.

Porter’s 1939 Map of Physics [via]
Around this time there was a flourishing of visual metaphors, to fill the vacuum left by the loss of the sturdy tree.There was, uncoincidentally, a flourishing of uses for these illustrations. Some, like this map, was educational and historical, teaching students how the history of physics split and recombined like water flowing through rivers and tributaries. Others, like the illustration to the right, showed how the conceptual relationships between knowledge domains differed from and overlapped with library classification schemes and literature finding aids.

Small & Garfield, 1985
Small & Garfield, 1985

By the 80s, we start seeing a slew of the illustrations we’re all familiar with: those sexy sexy network spaghetti-and-meatball graphs. We often use them to illustrate citation chains, and the relationship between academic disciplines. These graphs, so popular in the 21st century, go hand-in-hand with the ontological baggage we’re used to: that knowledge is complex, unrooted, interconnected, and co-constructed. This fits well with the current return to a concept we’d mostly left in the 19th century: that knowledge is a single, growing unit, that it’s consilient, that everyone is connected. It’s a return to the Republic of Letters from the C.P. Snow’s split of the Two Cultures.

It also notably departs from genealogical, transcendental, and even conceptual discussions of knowledge. These networks, broadly construed, are social representations, and while those relationships may often align with conceptual ones, concepts are not what drive the connections.

Fürbringer's Illustration of Bird Evolution, 1888
Fürbringer’s Illustration of Bird Evolution, 1888

Interestingly, there is precedent in these sorts of illustrations in the history of evolutionary biology. In the late 19th-century, illustrators and scientists began asking what it would look like if you took a slice from the evolutionary tree – or, what does the tree of life look like when you’re looking at it from the top-down?

What you get is a visual structure very similar to the network diagrams we’re now used to. And often, if you probe those making the modern visualizations, they will weave a story about the history of these networks that is reminiscent of branching evolutionary trees.

There’s another set of epistemological baggage that comes along with these spaghetti-and-meatball-graphs. Ben Fry, a well-known researcher in information visualization, wrote:

“There is a tendency when using [networks] to become smitten with one’s own data. Even though a graph of a few hundred nodes quickly becomes unreadable, it is often satisfying for the creator because the resulting figure is elegant and complex and may be subjectively beautiful, and the notion that the creator’s data is ‘complex’ fits just fine with the creator’s own interpretation of it. Graphs have a tendency of making a data set look sophisticated and important, without having solved the problem of enlightening the viewer.”

Actually, were any of you here at last night’s Pink Floyd light show in the planetarium? They’re a lot like that. [Yes, readers, HASTAC put on a Pink Floyd light show.]

And this is where we are now.

hastac-outline

Which brings us back to the outline, and HASTAC. Cathy Davidson has often described HASTAC as a social network, which is (at least on the web) always an intentionally-designed medium. Its design grants certain affordances to users: is it easier to communicate individually or in groups? What types of communities, events, or content is prioritized? These are design decisions that affect how the HASTAC community functions and interacts.

And the design decisions going into HASTAC are informed by its intent, so what is that intent? In their groundbreaking 2004 manifesto in the Chronicle, Cathy Davidson and David Goldberg wrote:

“We believe that a new configuration in the humanities must be championed to ensure their centrality to all intellectual enterprises in the university and, more generally, to understanding the human condition and thereby improving it; and that those intellectual changes must be supported by new institutional structures and values.”

This was a HASTAC rallying cry: how can the humanities constructively inform the world? Notice especially how they called for “New Institutional Structures.”

Remember earlier, how I talked about the problem if isolation? While my story about it was problematic, it doesn’t make disciplinary superspecialization any less real a problem. For all its talk of interdisciplinarity, academia is averse to synthesis on many fronts, superspecialization being just one of them. A dissertation based on synthesis, for example, is much less likely to get through a committee than a thorough single intellectual contribution to one specific field.

The academy is also weirdly averse to writing for public audiences. Popular books won’t get you tenure. But every discipline is a popular audience to most other disciplines: you wouldn’t talk to a chemist about history the same way you’d talk to a historian. Synthetic and semi-public work is exactly the sort of work that will help with HASTAC’s goal of a truly integrated and informed academy for social good, but the cards are stacked against it. Cathy and David hit the nail on the head when they target institutional structures as a critical point for improvement.

This is where design comes in.

Richmond, 1954
Richmond, 1954

Recall again the theme this year: The Art and Science of Digital Humanities. I propose we take the next few days to think about how we can use art and science to make HASTAC even better at living up its intent. That is, knowing what we do about collaboration, about visual rhetoric, about the academy, how can we design an intentional community to meet its goals? Perusing the program, it looks like most of us will already be discussing exactly this, but it’s useful to put a frame around it.

When we talk about structure and the social web, there’s many great examples we may learn from. One such example is that of Tara McPherson and her colleagues, in designing the web publishing platform Scalar. As opposed to WordPress, its cousin in functionality, Scalar was designed with feminist and humanist principles in mind, allowing for more expressive, non-hierarchical “pathways” through content.

When talking of institutional, social, and web-based structures, we can also take lessons history. In Early Modern Europe, the great network of information exchange known as the Republic of Letters was a shining example of the influence of media structures on innovation. Scholars would often communicate through “hubs”, which were personified in people nicknamed things like “the mailbox of Europe”. And they helped distribute new research incredibly efficiently through their vast web of social ties. These hubs were essential to what’s been called the scientific revolution, and without their structural role, it’s unlikely you’d see references to a scientific revolution in the 17th century Europe.

Similarly, at that time, the Atlantic slave trade was wreaking untold havoc on the world. For all the ills it caused, we at least can take some lessons from it in the intentional design of a scholarly network. There existed a rich exchange of medical knowledge between Africans and indigenous Americans that bypassed Europe entirely, taking an entirely different sort of route through early modern social networks.

If we take the present day, we see certain affordances of social networks similarly used to subvert or reconfigure power structures, as with the many revolutions in North Africa and the Middle East, or the current activist events taking place around police brutality and racism in the US. Similar tactics that piggy-back on network properties are used by governments to spread propaganda, ad agencies to spread viral videos, and so forth.

The question, then, is how we can intentionally design a community, using principles we learn from historical action, as well as modern network science, in order to subvert institutional structures in the manner raised by Cathy and David?

Certainly we also ought to take into account the research going into collaboration, teamwork, and group science. We’ve learned, for example, that teams with diverse backgrounds often come up with more creative solutions to tricky problems. We’ve learned that many small, agile groups often outperform large groups with the same amount of people, and that informal discussion outside the work-space contributes in interesting ways to productivity. Many great lessons can be found in Michael Nielsen’s book, Reinventing Discovery.

We can use these historical and lab-based examples to inform the design of social networks. HASTAC already work towards this goal through its scholars program, but there are more steps that may be taken, such as strategically seeking out scholars from underrepresented parts of the network.

So this covers covers the science, but what about the art?

Well, I spent the entire middle half of this talk discussing how visual rhetoric is linked to ontological metaphors of knowledge. The tree metaphor of knowledge, for example, was so strongly held that it fooled Descartes into breaking his claims of mind-body dualism.

So here is where the artists in the room can also fruitfully contribute to the same goal: by literally designing a better infrastructure. Visually. Illustrations can be remarkably powerful drivers of reconceptualization, and we have the opportunity here to affect changes in the academy more broadly.

One of the great gifts of the social web, at least when it’s designed well, is its ability to let nodes on the farthest limbs of the network to still wield remarkable influence over the whole structure. This is why viral videos, kickstarter projects, and cats playing pianos can become popular without “industry backing”. And the decisions we make in creating illustrations, in fostering online interactions, in designing social interfaces, can profoundly affect the way those interactions reinforce, subvert, or sidestep power structures.

So this is my call to the room: let’s revisit the discussion about designing the community we want to live in.

 

Thanks very much.

Down the Rabbit Hole

WHEREIN I get angry at the internet and yell at it to get off my lawn.

You know what’s cool? Ryan Cordell and friends’ Viral Texts project. It tracks how 19th-century U.S. newspapers used to copy texts from each other, little snippets of news or information, and republish them in their own publications. A single snippet of text could wind its way all across the country, sometimes changing a bit like a game of telephone, rarely-if-ever naming the original author.

Which newspapers copied from one another, from the Viral Texts project.
Which newspapers copied from one another, from the Viral Texts project.

Isn’t that a neat little slice of journalistic history? Different copyright laws, different technologies of text, different constraints of the medium, they all led to an interesting moment of textual virality in 19th-century America. If I weren’t a historian who knew better, I’d call it something like “quaint” or “charming”.

You know what isn’t quaint or charming? Living in the so-called “information age“, where everything is intertwingled, with hyperlinks and text costing pretty much zilch, and seeing the same gorram practices.

What proceeds is a rant. They say never to blog in anger. But seriously.

Inequality in Science

Tonight Alex Vespignani, notable network scientist, tweeted a link to an interesting-sounding study about inequality in scientific publishing. In Quartz! I like Quartz, it’s where Christopher Mims used to post awesome science things. Part of their mission statement reads:

In all that we do at Quartz, we embrace openness: open source code, an open newsroom, and open access to the data behind our journalism.

Pretty cool, right?

Anyway, here’s the tweet:

It links to this article on a “map of the world’s scientific research“. Because Vespignani tweeted it, I took it seriously (yes yes I know rt≠endorsement), and read the article. It describes a cartogram map of scientific research publications which shows how the U.S. and Western Europe (and a bit of China) dominates the research world, making the point that such a disparity is “disturbingly unequal”.

Map of scientific research, pulled from qz.com
Map of scientific research, by how many published articles are produced in a country, pulled from qz.com

“What’s driving the inequality?” they ask. Money & tech play a big role. So does what counts as “high impact” in science. What’s worse, the journalist writes,

In the worst cases, the global south simply provides novel empirical sites and local academics may not become equal partners in these projects about their own contexts.

The author points out an issue with the data: it only covers journals, not monographs, grey literature, edited volumes, etc. This often excludes the humanities and social sciences. The author also raises the issue of journal paywalls and how it decreases access to researchers in countries without large research budges. But we need to do better on “open dissemination”, the article claims.

Sources

Hey, that was a good read! I agree with everything the author said. What’s more, it speaks to my research, because I’ve done a fair deal of science mapping myself at the Cyberinfrastructure for Network Science Center under Katy Börner. Great, I think, let’s take a look at the data they’re using, given Quartz’s mission statement about how they always use open data.

I want to see the data because I know a lot of scientific publication indexing sites do a poor job of indexing international publications, and I want to see how it accounts for that bias. I look at the bottom of the page.

Crap.

This post originally appeared at The Conversation. Follow @US_conversation on Twitter. We welcome your comments at ideas@qz.com.

Alright, no biggie, time to look at the original article on The Conversation, a website whose slogan is “Academic rigor, journalistic flair“. Neat, academic rigor, I like the sound of that.

I scroll to the bottom, looking for the source.

A longer version of this article originally appeared on the London School of Economics’ Impact Blog.

Hey, the LSE Impact blog! They usually publish great stuff surrounding metrics and the like. Cool, I’ll click the link to read the longer version. The author writes something interesting right up front:

What would it take to redraw the knowledge production map to realise a vision of a more equitable and accurate world of knowledge?

A more accurate world of knowledge? Was this map inaccurate in a way the earlier articles didn’t report? I read on.

Well, this version of the article goes on a little to say that people in the global south aren’t always publishing in “international” journals. That’s getting somewhere, maybe the map only shows “international journals”! (Though she never actually makes that claim). Interestingly, the author writes of literature in the global south:

Even when published, this kind of research is often not attributed to its actual authors. It has the added problem of often being embargoed, with researchers even having to sign confidentiality agreements or “official secrets acts” when they are given grants. This is especially bizarre in an era where the mantra of publically funded research being made available to the public has become increasingly accepted.

Amen to that. Authorship information and openness all the way!

So who made this map?

Oh, the original article (though not the one in Quantz or The Conversation) has a link right up front to something called “The World of Science“. The link doesn’t actually take you to the map pictured, it just takes you to a website called worldmapper that’s filled with maps, letting you fend for yourself. That’s okay, my google-fu is strong.

www.worldmapper.org
www.worldmapper.org

I type “science” in the search bar.

Found it! Map #205, created by no-author-name-listed. The caption reads:

Territory size shows the proportion of all scientific papers published in 2001 written by authors living there.

Also, it only covers “physics, biology, chemistry, mathematics, clinical medicine, biomedical research, engineering, technology, and earth and space sciences.” I dunno about you, but I can name at least 2.3 other types of science, but that’s cool.

In tiny letters near the bottom of the page, there are a bunch of options, including the ability to see the poster or download the data in Excel.

SUCCESS. ish.

Map of Science Poster from worldmapper.org
Map of Science Poster from worldmapper.org

Ahhhhh I found the source! I mean, it took a while, but here it is. You apparently had to click “Open PDF poster, designed for printing.” It takes you to a 2006 poster, which marks that it was made by the SASI Group from Sheffield and Mark Newman, famous and awesome complex systems scientist from Michigan. An all-around well-respected dude.

To recap, that’s a 7/11/2015 tweet, pointing to a 7/11/2015 article on Quartz, pointing to a 7/8/2015 article on The Conversation, pointing to a 4/29/2013 article on the LSE Impact Blog, pointing to a website made Thor-knows-when, pointing to a poster made in 2006 with data from 2001. And only the poster cites the name of the creative team who originally made the map. Blood and bloody ashes.

Intermission

Please take a moment out of your valuable time to watch this video clip from the BBC’s television adaptation of Douglas Adam’s Hitchhiker’s Guide to the Galaxy. I’ll wait.

If you’re hard-of-hearing, read some of the transcript instead.

What I’m saying is, the author of this map was “on display at the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying beware of the leopard.”

The Saga Continues

Okay, at least I now can trust the creation process of the map itself, knowing Mark Newman had a hand in it. What about the data?

Helpfully, worldmapper.org has a link to the data as an Excel Spreadsheet. Let’s download and open it!

Frak. Frak frak frak frak frak.

My eyes.

Excel data for the science cartogram from worldmapper.org
Excel data for the science cartogram from worldmapper.org

Okay Scott. Deep breaths. You can brave the unicornfarts color scheme and find the actual source of the data. Be strong.

“See the technical notes” it says. Okay, I can do that. It reads:

Nearly two thirds of a million papers were published in enumerated science journals in 2001

Enumerated science journals? What does enumerated mean? Whatever, let’s read on.

The source of this data is the World Bank’s 2005 World Development Indicators, in the series on Scientific and technical journal articles (IP.JRN.ARTC.SC).

Okay, sweet, IP.JRN.ARTC.SC at the World Bank. I can Google that!

It brings me to the World Bank’s site on Scientific and technical journal articles. About the data it says:

Scientific and technical journal articles refer to the number of scientific and engineering articles published in the following fields: physics, biology, chemistry, mathematics, clinical medicine, biomedical research, engineering and technology, and earth and space sciences

Yep, knew that already, but it’s good to see the sources agreeing with each other.

I look for the data source to no avail, but eventually do see a small subtitle “National Science Foundation, Science and Engineering Indicators.”

Alright /me *rolls sleeves*, IRC-style.

Eventually, through the Googles, I find my way to what I assume is the original data source website, although at this point who the hell knows? NSF Science and Engineering Indicators 2006.

Want to know what I find? A 1,092-page report (honestly, see the pdfs, volumes 1 & 2) within which, presumably, I can find exactly what I need to know. In the 1,092-page report.

I start with Chapter 5: Academic Research and Development. Seems promising.

Three-quarters-of-the-way-down-the-page, I see it. It’s shimmering in blue and red and gold to my Excel-addled eyes.

S&E

Could this be it? Could this be the data source I was searching for, the Science Citation Index and the Social Sciences Citation Index? It sounds right! Remember the technical notes which states “Nearly two thirds of a million papers were published in enumerated science journals in 2001?” That fits with the number in the picture above! Let’s click on the link to the data.

There is no link to the data.

There is no reference to the data.

That’s OKAY. WE’RE ALRIGHT. THERE ARE DATA APPENDICES IT MUST BE THERE. EVEN THOUGH THIS IS A REAL WEBSITE WITH HYPERTEXT LINKS AND THEY DIDN’T LINK TO DATA IT’S PROBABLY IN THE APPENDICES RIGHT?

Do you think the data are in the section labeled “Tables” or “Appendix Tables“? Don’t you love life’s little mysteries?

(Hint: I checked. After looking at 14 potential tables in the “Tables” section, I decided it was in the “Appendix Tables” section.)

Success! The World Bank data is from Appendix Table 5-41, “S&E articles, by region and country/economy: 1988–2003”.

Wait a second, friends, this can’t be right. If this is from the Science Citation Index and the Social Science Citation Index, then we can’t really use these metrics as a good proxy for global scientific output, because the criteria for national inclusion in the index is apparently kind of weird and can skew the output results.

Also, and let me be very clear about this,

This dataset actually covers both science and social science. It is, you’ll recall, the Science Citation Index and the Social Sciences Citation Index. [edit: at least as far as I can tell. Maybe they used different data, but if they did, it’s World Bank’s fault for not making it clear. This is the best match I could find.]

In Short

Which brings us back to Do. The article on Quartz made (among other things) two claims: that the geographic inequality of scientific output is troubling, and that the map really ought to include social scientific output.

And I agree with both of these points! And all the nuanced discussion is respectable and well-needed.

But by looking at the data, I just learned that A) the data the map draws from is not really a great representation of global output, and B) social scientific output is actually included.

I leave you with the first gif I’ve ever posted on my blog:

source: http://s569.photobucket.com/user/SuperFlame64/media/kramer_screaming.gif.html real source: Seinfeld. Seriously, people.
source: http://s569.photobucket.com/user/SuperFlame64/media/kramer_screaming.gif.html
real source: Seinfeld. Seriously, people.

You know what’s cool? Ryan Cordell and friend’s Viral Texts project. It tracks how 19th-century U.S. newspapers used to copy texts from each other, little snippets of news or information, and republish them in their own publications. A single snippet of text could wind its way all across the country, sometimes changing a bit like a game of telephone, rarely-if-ever naming the original author.

—————————————————————————————————

(p.s. I don’t blame the people involved, doing the linking. It’s just the tumblr-world of 19th century newspapers we live in.)

[edit: I’m noticing some tweets are getting the wrong idea, so let me clarify: this post isn’t a negative reflection on the research therein, which is needed and done by good people. It’s frustration at the fact that we write in an environment that affords full references and rich hyperlinking, and yet we so often revert to context-free tumblr-like reblogging which separates text from context and data. We’re reverting to the affordances of 18th century letters, 19th century newspapers, 20th century academic articles, etc., and it’s frustrating.]

[edit 2: to further clarify, two recent tweets:

]

Acceptances to Digital Humanities 2015 (part 2)

Had enough yet? Too bad! Full-ahead into my analysis of DH2015, part of my 6,021-part series on DH conference submissions and acceptances. If you want more context, read the Acceptances to DH2015 part 1.

tl;dr

This post’s about the topical coverage of DH2015 in Australia. If you’re curious about how the landscape compares to previous years, see this post. You’ll see a lot of text, literature, and visualizations this year, as well as archives and digitisation projects. You won’t see a lot of presentations in other languages, or presentations focused on non-text sources. Gender studies is pretty much nonexistent. If you want to get accepted, submit pieces about visualization, text/data, literature, or archives. If you want to get rejected, submit pieces about pedagogy, games, knowledge representation, anthropology, or cultural studies.

Topical analysis

I’m sorry. This post is going to contain a lot of giant pictures, because I’m in the mountains of Australia and I’d much rather see beautiful vistas than create interactive visualizations in d3. Deal with it, dweebs. You’re just going to have to do a lot of scrolling down to see the next batch of text.

This year’s conference presents a mostly-unsurprising continuations of the status quo (see 2014’s and 2013’s topical landscapes). Figure 1, below, shows the top author-chosen topic words of DH2015, as a proportion of the total presentations at the conference. For example, an impressive quarter, 24%, of presentations at DH2015 are about “text analysis”. The authors were able to choose multiple topics for each presentation, which is why the percentages add up to way more than 100%.

Scroll down for the rest of the post.

Figure 1. Topical coverage of DH2015. Percent represents the % of presentations which authors have tagged with a certain topical keyword. Authors could tag multiple keywords per presentation.
Figure 1. Topical coverage of DH2015. Percent represents the % of presentations which authors have tagged with a certain topical keyword. Authors could tag multiple keywords per presentation.

Text analysis, visualization, literary studies, data mining, and archives take top billing. History’s a bit lower, but at least there’s more history than the abysmal showing at DH2013. Only a tenth of DH2015 presentations are about DH itself, which is maybe impressive given how much we talk about ourselves? (cf. this post)

As usual, gender studies representation is quite low (1%), as are foreign language presentations and presentations not centered around text. I won’t do a lot of interpretation this post, because it’d mostly be repeat of earlier years. At any rate, acceptance rate is a bit more interesting than coverage this time around. Figure 2 shows acceptance rates of each topic, ordered by volume. Figure 3 shows the same, sorted by acceptance rate.

The topics that appear most frequently at the conference are on the far left, and the red line shows the percent of submitted articles that will be presented at DH2015. The horizontal black line is the overall acceptance rate to the conference, 72%, just to show which topics are above or below average.

Figure 2. Acceptance rates of topics to DH2015, sorted by volume.
Figure 2. Acceptance rates of topics to DH2015, sorted by volume. Click to enlarge.
Figure 2. Acceptance rates of topics to DH2015, sorted by acceptance rate. Click to enlarge.
Figure 3. Acceptance rates of topics to DH2015, sorted by acceptance rate. Click to enlarge.

Notice that all the most well-represented topics at DH2015 have a higher-than-average acceptance rate, possibly suggesting a bit of path-dependence on the part of peer reviewers or editors. Otherwise, it could mean that, since a majority peer reviewers were also authors in the conference, and since (as I’ve shown) the majority of authors have a leaning toward text, lit, and visualization, it’s also what they’re likely to rate highly in peer review.

The first dips we see under the average acceptance rate is “Interdisciplinary Studies” and “Historical Studies” (☹), but the dips aren’t all that low, and we ought not to read too much into it without comparing it to earlier conferences. More significant are the low rates for “Cultural Studies”, and even more than that are the two categories on Teaching, Pedagogy, and Curriculum. Both categories’ acceptance rates are about 20% under the average, and although they’re obviously correlated with one another, the acceptance rates are similar to 2014 and 2013. In short, DH peer reviewers or editors are more unlikely to accept submissions on pedagogy than on most other topics, even though they sometimes represent a decent chunk of submissions.

Other low points worth pointing out are “Anthropology” (huh, no ideas there), “Games and Meaningful Play” (that one came as a surprise), and “Other” (can’t help you here). Beyond that, the submission counts are too low to read any meaningful interpretations into the data. The Game Studies dip is curious, and isn’t reflected in earlier conferences, so it could just be noise for 2015. The low acceptance rates in Anthropology are consistent 2013-2015, and it’d be worth looking more into that.

Topical Co-Occurrence, 2013-2015

Figure 4, below, shows how topics appear together on submissions to DH2013, DH2014, and DH2015. Technically this has nothing to do with acceptances, and little to do with this year specifically, but the visualization should provide a little context to the above analysis. Topics connect to one another if they appear on a submission together, and the line connecting them gets thicker the more connections two topics share.

Figure 4. Topical co-occurrence, 2013-2015. Click to enlarge.
Figure 4. Topical co-occurrence, 2013-2015. Click to enlarge.

Although the “Interdisciplinary Collaboration” topic has a low acceptance rate, it understandably ties the network together; other topics that play a similar role are “Visualization”, “Programming”, “Content Analysis”, “Archives”, and “Digitisation”. All unsurprising for a conference where people come together around method and material. In fact, this reinforces our “DH identity” along those lines, at least insofar as it is represented by the annual ADHO conference.

There’s a lot to unpack in this visualization, and I may go into more detail in the next post. For now, I’ve got a date with the Blue Mountains west of Sydney.

Submissions to Digital Humanities 2015 (pt. 2)

Do you like the digital humanities? Me too! You better like it, because this is the 700th or so in a series of posts about our annual conference, and I can’t imagine why else you’d be reading it.

My last post went into some summary statistics of submissions to DH2015, concluding in the end that this upcoming conference, the first outside the Northern Hemisphere, with the theme “Global Digital Humanities”, is surprisingly similar to the DH we’ve seen before. This post will compare this year to submissions to the previous two conferences, in Switzerland and the Nebraska. Part 3 will go into some more detail of geography and globalizing trends.

I can only compare the sheer volume of submissions this year to 2013 and 2014, which is as far back as I’ve got hard data. As many pieces were submitted for DH2015 as were submitted for DH2013 in Nebraska – around 360. Submissions to DH2014 shot up to 589, and it’s not yet clear whether the subsequent dip is an accident of location (Australia being quite far away from most regular conference attendees), or whether this signifies the leveling out of what’s been fairly impressive growth in the DH world.

DH by volume, 1999-2014.  This chart shows how many DHSI workshops occurred per year (right axis), alongside how many pieces were actually presented at the DH conference annually (left axis). This year is not included because we don't yet know which submissions will be accepted.
DH by volume, 1999-2014. This chart shows how many DHSI workshops occurred per year (right axis), alongside how many pieces were actually presented at the DH conference annually (left axis). This year is not included because we don’t yet know which submissions will be accepted.

This graph shows a pretty significant recent upward trend in DH by volume; if acceptance rates to DH2015 are comparable to recent years (60-65%), then DH2015 will represent a pretty significant drop in presentation volume. My gut intuition is this is because of the location, and not a downward trend in DH, but only time will tell.

Replying to my most recent post, Jordan T. T-H commented on his surprise at how many single-authored works were submitted to the conference. I suggested this was of our humanistic disciplinary roots, and that further analysis would likely reveal a trend of increasing co-authorship. My prediction was wrong: at least over the last three years, co-authorship numbers have been stagnant.

This chart shows the that ~40% of submissions to DH conferences over the past three years have been single-authored.
This chart shows the that ~40% of submissions to DH conferences over the past three years have been single-authored.

Roughly 40% of submissions to DH conferences over the past three years have been single-authored; the trend has not significantly changed any further down the line, either. Nickoal Eichmann and I are looking into data from the past few decades, but it’s not ready yet at the time of this blog post. This result honestly surprised me; just from watching and attending conferences, I had the impression we’ve become more multi-authored over the past few years.

Topically, we are noticing some shifts. As a few people noted on Twitter, topics are not perfect proxies for what’s actually going on in a paper; every author makes different choices on how they they tag their submissions. Still, it’s the best we’ve got, and I’d argue it’s good enough to run this sort of analysis on, especially as we start getting longitudinal data. This is an empirical question, and if we wanted to test my assumption, we’d gather a bunch of DHers in a room and see to what extent they all agree on submission topics. It’s an interesting question, but beyond the scope of this casual blog post.

Below is the list of submission topics in order of how much topical coverage has changed since 2013. For example, this year 21% of submissions were tagged as involving Text Analysis. By contrast, only 15% were tagged as Text Analysis in 2013, resulting in a growth of 6% over the last two years. Similarly, this year Internet and World Wide Web studies comprised 7% of submissions, whereas that number was 12% in 2013, showing coverage shrunk by 5%. My more detailed evaluation of the results are below the figure.

dh-topicalchange-2015

We see, as I previously suggested, that Text Analysis (unsurprisingly) has gained a lot of ground. Given the location, it should be unsurprising as well that Asian Studies has grown in coverage, too. Some more surprising results are the re-uptake of Digitisation, which have been pretty low recently, and the growth of GLAM (Galleries, Libraries, Archives, Museums), which I suspect if we could look even further back, we’d spot a consistent upward trend. I’d guess it’s due to the proliferation of DH Alt-Ac careers within the GLAM world.

Not all of the trends are consistent: Historical Studies rose significantly between 2013 and 2014, but dropped a bit in submissions this year to 15%. Still, it’s growing, and I’m happy about that. Literary Studies, on the other hand, has covered a fifth of all submissions in 2013, 2014, and 2015, remaining quite steady. And I don’t see it dropping any time soon.

Visualizations are clearly on the rise, year after year, which I’m going to count as a win. Even if we’re not branching outside of text as much as we ought, the fact that visualizations are increasingly important means DHers are willing to move beyond text as a medium for transmission, if not yet as a medium of analysis. The use of Networks is also growing pretty well.

As Jacqueline Wernimont just pointed out, representation of Gender Studies is incredibly low. And, as the above chart shows, it’s even lower this year than it was in both previous years. Perhaps this isn’t so surprising, given the gender ratio of authors at DH conferences recently.

Gender ratio of authors at DH conferences 2010-2013. Women consistently represent a bit under a third of all authors.
Gender ratio of authors at DH conferences 2010-2013. Women consistently represent a bit under a third of all authors.

Some categories involving Maps and GIS are increasing, while others are decreasing, suggesting small fluctuations in labeling practices, but probably no significant upward or downward trend in their methodological use. Unfortunately, most non-text categories dropped over the past three years: Music, Film & Cinema Studies, Creative/Performing Arts, and Audio/Video/Multimedia all dropped. Image Studies grew, but only slightly, and its too soon to say if this represents a trend.

We see the biggest drops in XML, Encoding, Scholarly Editing, and Interface & UX Design. This won’t come as a surprise to anyone, but it does show how much the past generation’s giant (putting together, cleaning, and presenting scholarly collections) is making way for the new behemoth (analytics). Internet / World Wide Web is the other big coverage loss, but I’m not comfortable giving any causal explanation for that one.

This analysis offers the same conclusion as the earlier one: with the exception of the drop in submissions, nothing is incredibly surprising. Even the drop is pretty well-expected, given how far the conference is from the usual attendees. The fact that the status is pretty quo is worthy of note, because many were hoping that a global DH would seem more diverse, or appreciably different, in some way. In Part 3, I’ll start picking apart geographic and deeper topical data, and maybe there we’ll start to see the difference.

Submissions to Digital Humanities 2015 (pt. 1)

It’s that time of the year again! The 2015 Digital Humanities conference will take place next summer in Australia, and as per usual, I’m going to summarize what is being submitted to the conference and, eventually, how those submissions become accepted. Each year reviewers get the chance to “bid” on conference submissions, and this lets us get a peak inside the general trends in DH research. This post (pt. 1) will focus solely on this year’s submissions, and next post will compare them to previous years and locations.

It’s important to keep in mind that trends in the conference over the last three years may be temporal, geographic, or accidental. The 2013 conference took place in Nebraska, 2014 in Switzerland, 2015 in Australia, and 2016 is set to happen in Poland; it’s to be expected that regional differences will significantly inform who is submitting pieces and what topics will be discussed.

This year, 358 pieces were submitted to the conference (about as many as were submitted to Nebraska in 2013, but more on that in the follow-up post). As with previous years, authors could submit four varieties of works: long papers, short papers, posters, and panels / multi-paper sessions. Long papers comprised 54% of submissions, panels 4%, posters 15%, and short papers 30%.

In total, there were 859 named authors on submissions – this number counts authors more than once if they appear on multiple submissions. Of those, 719 authors are unique. 1 Over half the submissions are multi-authored (58%), with 2.4 authors per submission on average, a median of 2 authors per submission, and a max of 10 authors on one submission. While the majority of submissions included multiple authors, the sheer number of single-authored papers still betrays the humanities roots of DH. The histogram is below.

A histogram of authors-per-submission.
A histogram of authors-per-submission.

As with previous years, authors may submit articles in any of a number of languages. The theme of this year’s conference is “Global Digital Humanities”, but if you expected a multi-lingual conference, you might be disappointed. Of the 358 submissions, 353 are in English. The rest are in French (2), Italian (2), and German (1).

Submitting authors could select from a controlled vocabulary to tag their submissions with topics. There were 95 topics to choose from, and their distribution is not especially surprising. Two submissions each were tagged with 25 topics, suggesting they are impressively far reaching, but for the most part submissions stuck to 5-10 topics. The breakdown of submissions by topic is below, where the percentage represents the percentage of submissions which are tagged by a specific topic. My interpretation is below that.

Percentage of submissions tagged with a specific topic.
Percentage of submissions tagged with a specific topic.

A full 21% of submissions include some form of Text Analysis, and a similar number claim Text or Data Mining as a topic. Other popular methodological topics are Visualizations, Network Analysis, Corpus Analysis, and Natural Language Processing. The DH-o-sphere is still pretty text-heavy; Audio, Video, and Multimedia are pretty low on the list, GIS even lower, and Image Analysis (surprisingly) even lower still. Bibliographic methods, Linguistics, and other approaches more traditionally associated with the humanities appear pretty far down the list. Other tech-y methods, like Stylistics and Agent-Based Modeling, are near the bottom. If I had to guess, the former is on its way down, and the latter on its way up.

Unsurprisingly, regarding disciplinary affiliations, Literary Studies is at the top of the food chain (I’ll talk more about how this compares to previous years in the next post), with Archives and Repositories not far behind. History is near the top tier, but not quite there, which is pretty standard. I don’t recall the exact link, but Ben Schmidt argued pretty convincingly that this may be because there are simply fewer new people in History than in Literary Studies. Digitization seems to be gaining some ground its lost in the previous years. The information science side (UX Design, Knowledge Representation, Information Retrieval, etc.) seems reasonably strong. Cultural Studies is pretty well-represented, and Media Studies, English Studies, Art History, Anthropology, and Classics are among the other DH-inflected communities out there.

Thankfully we’re not completely an echo chamber yet; only about a tenth of the submissions are about DH itself – not great, not terrible. We still seem to do a lot of talking about ourselves, and I’d like to see that number decrease over the next few years. Pedagogy-related submissions are also still a bit lower than I’d like, hovering around 10%. Submissions on the “World Wide Web” are decreasing, which is to be expected, and TEI isn’t far behind.

All in all, I don’t really see the trend toward “Global Digital Humanities” that the conference is themed to push, but perhaps a more complex content analysis will reveal a more global DH than we’ve sen in the past. The self-written Keyword tags (as opposed to the Topic tags, not a controlled vocabulary) reveal a bit more internationalization, although I’ll leave that analysis for a future post.

It’s worth pointing out there’s a statistical property at play that makes it difficult to see deviations from the norm. Shakespeare appears prominently because many still write about him, but even if Shakespearean research is outnumbered by work on more international playwrights, it’d be difficult to catch, because I have no category for “international playwright” – each one would be siphoned off into its own category. Thus, even if the less well-known long tail topics  significantly outweigh the more popular topics, that fact would be tough to catch.

All in all, it looks like DH2015 will be an interesting continuation of the DH tradition. Perhaps the most surprising aspect of my analysis was that nothing in it surprised me; half-way around the globe, and the trends over there are pretty identical to those in Europe and the Americas. It’ll take some more searching to see if this is a function of the submitting authors being the same as previous years (whether they’re all simply from the Western world), or whether it is actually indicative of a fairly homogeneous global digital humanities.

Stay-tuned for Part 2, where I compare the analysis to previous years’ submissions, and maybe even divine future DH conference trends using tea leaves or goat entrails or predictive modeling (whichever seems the most convincing; jury’s still out).

Notes:

  1. As far as I can tell – I used all the text similarity methods I could think of to unify the nearly-duplicate names.

[Review] The Book of Trees, Manuel Lima

The first line on the first page of The Book of Trees is “This is the book I wish had been available when I was researching my previous book, Visual Complexity: Mapping Patterns of Information.” It’s funny, because this is also the book I wish had been available when I was researching my own project, Knowledge Uprooted. It took Alberto Cairo, reading over a draft of my article, to point out that Manuel Lima was working on a book-length version of a very similar project. If the book had come out a year ago, my own research might look very different. Lima’s book is beautifully designed, well-researched, and a delightful resource for anyone interested in visualizations of knowledge.

Tree of Consanguinity, ca. 1450-1510. Page 52.
Tree of Consanguinity, ca. 1450-1510. Page 52.

Lima’s book is a history of hierarchical visualizations, most frequently as trees, and often representing branches of knowledge. He roots his narrative in trees themselves, describing how their symbolism has touched religions and cultures for millennia. The narrative weaves through Ancient Greece and Medieval Europe, makes a few stops outside of the West and winds its way to the present day. Subsequent chapters are divided into types of tree visualizations: figurative, vertical, horizontal, multidirectional, radial, hyperbolic, rectangular, voronoi, circular, sunbursts, and icicles. Each chapter presents a chronological set of beautiful examples embodying that type.

Biblical Genealogy, ca. 1060. Page 112.
Biblical Genealogy, ca. 1060. Page 112.

Of course, any project with such a wide scope is bound to gloss over or inaccurately portray some of its historical content. I’d quibble, for example, with Lima’s suggestion that the use of these visual diagrams could be understood in the context of ars memorativa, a method for improving memory and understanding in the Middle Ages. Instead, I’d argue that the tradition stemmed from a more innate Aristotelian connection between thinking and seeing. Lima also argues that the scala naturae, depictions of entities on a natural order rising to God, is an obvious reflection on contemporary feudal stratification. The story is a bit more complex than that, with feudal stratification itself being concomitant to the medieval worldview of a natural order. In discussing Ramon Llull, Lima oddly writes “the notion of a unified trunk of science has remained to this day,” a claim which Lima himself shows isn’t exactly true in his earlier book, Visual Complexity. But this isn’t a book written by or for historians, and that’s okay—it’s accurate enough to get a good sense of the progression of trees.

The Blog Tree, 2012. Page 77.
The Blog Tree, 2012. Page 77.

Where the book shines is in its clear, well-cited, contextualized illustrations, which comprise the majority of its contents. Over a hundred illustrations pack the book, each with at least a paragraph of description and, in many cases, translation. This is a book for people passionate about visualizations, and interested in their history. There is not yet a book-length treatment for historians interested in this subject, though Murdoch’s Album of Science (1984) comes close. For those who want to delve even deeper into this history, I’ve compiled a 100+ reference bibliography that is freely available here.

Acceptances to Digital Humanities 2014 (part 1)

It’s that time again! The annual Digital Humanities conference schedule has been released, and this time it’s in Switzerland. In an effort to console myself from not having the funding to make it this year, I’ve gone ahead and analyzed the nitty-gritty of acceptances and rejections to the conference. For those interested in this sort of analysis, you can find my take on submissions to DH2013, acceptances at DH2013, and submissions to DH2014. If you’re visiting this page from the future, you can find any future DH conference analyses at this tag link.

The overall acceptance rate to DH2014 was 59%, although that includes many papers and panels that were accepted as posters. There were 589 submissions this year (compared to 348 submissions last year), of which 345 were accepted. By submission medium, this is the breakdown:

  • Long papers: 62% acceptance rate (lower than last year)
  • Short papers: 52% acceptance rate (lower than last year)
  • Panels: 57% acceptance rate (higher than last year)
  • Posters: 64% acceptance rate (didn’t collect this data last year)
Acceptances to DH2014 by submission medium.
Figure 1: Acceptances to DH2014 by submission medium.

A surprising number of submitted papers switched from one medium to another when they were accepted. A number of panels became long papers, a bunch of short papers became long papers, and a punch of long papers became short papers. Although a bunch of submissions became posters, no posters wound up “breaking out” to become some other medium. I was most surprised by the short papers which became long (13 in all), which leads me to believe some of them may have been converted for scheduling reasons. This is idle speculation on my part – the organizers may reply otherwise. [Edit: the organizers did reply, and assured us this was not the case. I see no recent to doubt that, so congratulations to those 13 short papers that became long papers!]

Medium switches in DH2014 between submission and acceptance.
Figure 2: Medium switches in DH2014 between submission and acceptance.

It’s worth keeping in mind, in all analyses listed here, that I do not have access to any withdrawals; accepted papers were definitely accepted, but not accepted may have been withdrawn rather than rejected.

Figures 3 and 4 all present the same data, but shed slightly different lights on digital humanities. Each shows the acceptance rate by various topics, but they’re ordered slightly differently. All submitting authors needed to select from a limited list of topics to label their submissions, in order to aid with selecting peer reviewers and categorization.

Figure 3 sorts topics by the total amount that were accepted to DH2014. This is at odds with Figure 2 from my post on DH2014 submissions, which sorts by total number of topics submitted. The figure from my previous post gives a sense of what digital humanists are doing and submitting, whereas Figure 3 from this post gives a sense of what the visitor to DH2014 will encounter.

Figure 3. Topical acceptance to DH2014 sorted by total number of accepted papers tagged with a particular topic.
Figure 3: Topical acceptance to DH2014 sorted by total number of accepted papers tagged with a particular topic. (click to enlarge)

The visitor to DH2014 won’t see a hugely different topical landscape than the visitor to DH2013 (see analysis here). Literary studies, text analysis, and text mining still reign supreme, with archives and repositories not far behind. Visitors will see quite a bit fewer studies dedicated to the internet and the world wide web, and quite a bit more dedicated to historical and corpus-based research. More details can be seen by comparing the individual figures.

Figure 4, instead, sorts the topics by their acceptance rate. The most frequently accepted topics appear at the left, and the least frequently appear at the right. A lighter red line is used to show acceptance rates of the same topics for 2013. This graph shows what peers consider to me more credit-worthy, and how this has changed since 2013.

Figure 4:
Figure 4: Topical acceptance to DH2014 sorted by percentage of acceptance for each topic. (click to enlarge)

It’s worth pointing out that the highest and lowest acceptance rates shouldn’t be taken very seriously; with so few submitted articles, the rates are as likely random as indicative of any particularly interesting trend. Also, for comparisons with 2013, keep in mind the North American and European traditions of digital humanities may be driving the differences.

There are a few acceptance ratios worthy of note. English studies and GLAM (Galleries, Libraries, Archives, Museums) both have acceptance rates extremely above average, and also quite a bit higher than their acceptance rates from the previous year. Studies of XML are accepted slightly above the average acceptance rate, and also accepted proportionally more frequently than they were in 2013. Acceptance rates for both literary and historical studies papers are about average, and haven’t changed much since 2013 (even though there were quite a few more historical submissions than the previous year).

Along with an increase in GLAM acceptance rates, there was a big increase in rates for studies involving archives and repositories. It may be they are coming back in style, or it may be indicative of a big difference between European and North American styles. There was a pretty big drop in acceptance rates for ontology and semantic web research, as well as in pedagogy research across the board. Pedagogy had a weak foothold in DH2013, and has an even weaker foothold in 2014, with both fewer submitted articles, and a lower rate of acceptance on those submitted articles.

In the next blog post, I plan on drilling a bit into author-supplied keywords, the role of gender on acceptance rates, and the geography of submissions. As always, I’m happy to share data, but in this case I will only share sufficiently aggregated/anonymized data, because submitting authors who did not get accepted have an expectation of privacy that I intend to keep.

Appreciability & Experimental Digital Humanities

Operationalize: to express or define (something) in terms of the operations used to determine or prove it.

Precision deceives. Quantification projects an illusion of certainty and solidity no matter the provenance of the underlying data. It is a black box, through which uncertain estimations become sterile observations. The process involves several steps: a cookie cutter to make sure the data are all shaped the same way, an equation to aggregate the inherently unique, a visualization to display exact values from a process that was anything but.

In this post, I suggest that Moretti’s discussion of operationalization leaves out an integral discussion on precision, and I introduce a new term, appreciability, as a constraint on both accuracy and precision in the humanities. This conceptual constraint paves the way for an experimental digital humanities.

Operationalizing and the Natural Sciences

An operationalization is the use of definition and measurement to create meaningful data. It is an incredibly important aspect of quantitative research, and it has served the western world well for at leas 400 years. Franco Moretti recently published a LitLab Pamphlet and a nearly identical article in the New Left Review about operationalization, focusing on how it can bridge theory and text in literary theory. Interestingly, his description blurs the line between the operationalization of his variables (what shape he makes the cookie cutters that he takes to his text) and the operationalization of his theories (how the variables interact to form a proxy for his theory).

Moretti’s account anchors the practice in its scientific origin, citing primarily physicists and historians of physics. This is a deft move, but an unexpected one in a recent DH environment which attempts to distance itself from a narrative of humanists just playing with scientists’ toys. Johanna Drucker, for example, commented on such practices:

[H]umanists have adopted many applications […] that were developed in other disciplines. But, I will argue, such […] tools are a kind of intellectual Trojan horse, a vehicle through which assumptions about what constitutes information swarm with potent force. These assumptions are cloaked in a rhetoric taken wholesale from the techniques of the empirical sciences that conceals their epistemological biases under a guise of familiarity.

[…]

Rendering observation (the act of creating a statistical, empirical, or subjective account or image) as if it were the same as the phenomena observed collapses the critical distance between the phenomenal world and its interpretation, undoing the basis of interpretation on which humanistic knowledge production is based.

But what Drucker does not acknowledge here is that this positivist account is a century-old caricature of the fundamental assumptions of the sciences. Moretti’s account of operationalization as it percolates through physics is evidence of this. The operational view very much agrees with Drucker’s thesis, where the phenomena observed takes second stage to a definition steeped in the nature of measurement itself. Indeed, Einstein’s introduction of relativity relied on an understanding that our physical laws and observations of them rely not on the things themselves, but on our ability to measure them in various circumstances. The prevailing theory of the universe on a large scale is a theory of measurement, not of matter. Moretti’s reliance on natural scientific roots, then, is not antithetical to his humanistic goals.

I’m a bit horrified to see myself typing this, but I believe Moretti doesn’t go far enough in appropriating natural scientific conceptual frameworks. When describing what formal operationalization brings to the table that was not there before, he lists precision as the primary addition. “It’s new because it’s precise,” Moretti claims, “Phaedra is allocated 29 percent of the word-space, not 25, or 39.” But he asks himself: is this precision useful? Sometimes, he concludes, “It adds detail, but it doesn’t change what we already knew.”

From Moretti, 'Operationalizing', New Left Review.
From Moretti, ‘Operationalizing’, New Left Review.

I believe Moretti is asking the wrong first question here, and he’s asking it because he does not steal enough from the natural sciences. The question, instead, should be: is this precision meaningful? Only after we’ve assessed the reliability of new-found precision can we understand its utility, and here we can take some inspiration from the scientists, in their notions of accuracy, precision, uncertainty, and significant figures.

Terminology

First some definitions. The accuracy of a measurement is how close it is to the true value you are trying to capture, whereas the precision of a measurement is how often a repeated measurement produces the same results. The number of significant figures is a measurement of how precise the measuring instrument can possibly be. False precision is the illusion that one’s measurement is more precise than is warranted given the significant figures. Propagation of uncertainty is the pesky habit of false precision to weasel its way into the conclusion of a study, suggesting conclusions that might be unwarranted.

Accuracy and Precision. [via]
Accuracy and Precision. [via]
Accuracy roughly corresponds to how well-suited your operationalization is to finding the answer you’re looking for. For example, if you’re interested in the importance of Gulliver in Gulliver’s Travels, and your measurement is based on how often the character name is mentioned (12 times, by the way), you can be reasonably certain your measurement is inaccurate for your purposes.

Precision roughly corresponds to how fine-tuned your operationalization is, and how likely it is that slight changes in measurement will affect the outcomes of the measurement. For example, if you’re attempting to produce a network of interacting characters from The Three Musketeers, and your measuring “instrument” is increase the strength of connection between two characters every time they appear in the same 100-word block, then you might be subject to difficulties of precision. That is, your network might look different if you start your sliding 100-word window from the 1st word, the 15th word, or the 50th word. The amount of variation in the resulting network is the degree of imprecision of your operationalization.

Significant figures are a bit tricky to port to DH use. When you’re sitting at home, measuring some space for a new couch, you may find that your meter stick only has tick marks to the centimeter, but nothing smaller. This is your highest threshold for precision; if you eyeballed and guessed your space was actually 250.5cm, you’ll have reported a falsely precise number. Others looking at your measurement may have assumed your meter stick was more fine-grained than it was, and any calculations you make from that number will propagate that falsely precise number.

Significant Figures. [via]
Significant Figures. [via]
Uncertainty propagation is especially tricky when you wind up combing two measurements together, when one is more precise and the other less. The rule of thumb is that your results can only be as precise as the least precise measurements that made its way into your equation. The final reported number is then generally in the form of 250 (±1 cm). Thankfully, for our couch, the difference of a centimeter isn’t particularly appreciable. In DH research, I have rarely seen any form of precision calculated, and I believe some of those projects would have reported different results had they accurately represented their significant figures.

Precision, Accuracy, and Appreciability in DH

Moretti’s discussion of the increase of precision granted by operationalization leaves out any discussion of the certainty of that precision. Let’s assume for a moment that his operationalization is accurate (that is, his measurement is a perfect conversion between data and theory). Are his measurements precise? In the case of Phaedra, the answer at first glance is yes, words-per-character in a play would be pretty robust against slight changes in the measurement process.

And yet, I imagine, that answer will probably not sit well with some humanists. They may ask themselves: Is Oenone’s 12%  appreciably different from Theseus’s 13% of the word-space of the play? In the eyes of the author? Of the actors? Of the audience? Does the difference make a difference?

The mechanisms by which people produce and consume literature is not precise. Surely Jean Racine did not sit down intending to give Theseus a fraction more words than Oenone. Perhaps in DH we need a measurement of precision, not of the measuring device, but of our ability to interact with the object we are studying. In a sense, I’m arguing, we are not limited to the precision of the ruler when measuring humanities objects, but to the precision of the human.

In the natural sciences, accuracy is constrained by precision: you can only have as accurate a measurement as your measuring device is precise.  In the corners of humanities where we study how people interact with each other and with cultural objects, we need a new measurement that constrains both precision and accuracy: appreciability. A humanities quantification can only be as precise as that precision is appreciable by the people who interact with matter at hand. If two characters differ by a single percent of the wordspace, and that difference is impossible to register in a conscious or subconscious level, what is the meaning of additional levels of precision (and, consequently, additional levels of accuracy)?

Experimental Digital Humanities

Which brings us to experimental DH. How does one evaluate the appreciability of an operationalization except by devising clever experiments to test the extent of granularity a person can register? Without such understanding, we will continue to create formulae and visualizations which portray a false sense of precision. Without visual cues to suggest uncertainty, graphs present a world that is exact and whose small differentiations appear meaningful or deliberate.

Experimental DH is not without precedent. In Reading Tea Leaves (Chang et al., 2009), for example, the authors assessed the quality of certain topic modeling tweaks based on how a large number of people assessed the coherence of certain topics. If this approach were to catch on, as well as more careful acknowledgements of accuracy, precision, and appreciability, then those of us who are making claims to knowledge in DH can seriously bolster our cases.

There are some who present the formal nature of DH as antithetical to the highly contingent and interpretative nature of the larger humanities. I believe appreciability and experimentation can go some way alleviating the tension between the two schools, building one into the other. On the way, it might build some trust in humanists who think we sacrifice experience for certainty, and in natural scientists who are skeptical of our abilities to apply quantitative methods.

Right now, DH seems to find its most fruitful collaborations in computer science or statistics departments. Experimental DH would open the doors to new types of collaborations, especially with psychologists and sociologists.

I’m at an extremely early stage in developing these ideas, and would welcome all comments (especially those along the lines of “You dolt! Appreciability already exists, we call it x.”) Let’s see where this goes.

Networks Demystified 8: When Networks are Inappropriate

A few hundred years ago, I promised to talk about when not to use networks, or when networks are used improperly. With The Historian’s Macroscope in the works, I’ve decided to finally start answering that question, and this Networks Demystified is my first attempt at doing so. If you’re new here, this is part of an annoyingly long series (1 network basics, 2 degree, 3 power laws, 4 co-citation analysis, 5 communities and PageRank, 6 this space left intentionally blank, 7 co-citation analysis II). I’ve issued a lot of vague words of caution without doing a great job of explaining them, so here is the first substantive part of that explanation.

Networks are great. They allow you to do things like understand the role of postal routes in the circulation of knowledge in early modern Europe, or of the spread of the black death in the middle ages, or the diminishing importance of family ties in later Chinese governments. They’re versatile, useful, and pretty easy in today’s software environment. And they’re sexy, to boot. I mean, have you seen this visualization of curved lines connecting U.S. cities? I don’t even know what it’s supposed to represent, but it sure looks pretty enough to fund!

A really pretty network visualization. [via]
A really pretty network visualization. [via]

So what could possibly dissuade you from using a specific network, or the concept of networks in general? A lot of things, it turns out, and even a big subset of things that belong only to historians. I won’t cover all of them here, but I will mention a few big ones.

An Issue of Memory Loss

Okay, I lied about not knowing what the above network visualization represents. It turns out it’s a network of U.S. air travel pathways; if a plane goes from one city to another, an edge connects the two cities together. Pretty straightforward. And pretty useful, too, if you want to model something like the spread of an epidemic. You can easily see how someone with the newest designer virus flying into Texas might infect half-a-dozen people at the airport, who would in turn travel to other airports, and quickly infect most parts of the country with major airports. Transportation networks like this are often used by the CDC for just such a purpose, to determine what areas might need assistance/quarantine/etc.

The problem is that, although such a network might be useful for epidemiology, it’s not terribly useful for other seemingly intuitive questions. Take migration patterns: you want to know how people travel. I’ll give you another flight map that’s a bit easier to read.

Flight patterns over U.S. [via]
Flight patterns over U.S. [via]
The first thing people tend to do when getting their hands on a new juicy network dataset is to throw it into their favorite software suite (say, Gephi) and run a bunch of analyses on it. Of those, people really like things like PageRank or Betweenness Centrality, which can give the researcher a sense of important nodes in the network based on how central they are; in this case, how many flights have to go through a particular city in order to get where they eventually intend to go.

Let’s look at Las Vegas. By anyone’s estimation it’s pretty important; well-connected to cities both near and far, and pretty central in the southwest. If I want to go from Denver to Los Angeles and a direct flight isn’t possible, Las Vegas seems to be the way to go. If we also had road networks, train networks, cell-phone networks, email networks, and so forth all overlaid on top of this one, looking at how cities interact with each other, we might be able to begin to extrapolate other information like how rumors spread, or where important trade hubs are.

Here’s the problem: network structures are deceitful. They come with a few basic assumptions that are very helpful in certain areas, but extremely dangerous in others, and they are the reason why you shouldn’t analyze a network without thinking through what you’re implying by fitting your data to the standard network model. In this case, the assumption to watch out for is what’s known as a lack of memory.

The basic networks you learn about, with nodes and edges and maybe some attributes, embed no information on how those networks are generally traversed. They have no memories. For the purposes of disease tracking, this is just fine: all epidemiologists generally need to know is whether two people might accidentally happen to find themselves in the same place at the same time, and where they individually go from there. The structure of the network is enough to track the spread of a disease.

For tracking how people move, or how information spreads, or where goods travel, structure alone is rarely enough. It turns out that Las Vegas is basically a sink, not a hub, in the world of airline travel. People who travel there tend to stay for a few days before traveling back home. The fact that it happens to sit between Colorado and California is meaningless, because people tend not to go through Vegas to get from one to another, even though individually, people from both states travel there with some frequency.

If the network had a memory to it, if it somehow knew not just that a lot of flights tended to go between Colorado and Vegas and between LA and Vegas, but also that the people who went to Vegas returned to where they came from, then you’d be able to see that Vegas isn’t the same sort of hub that, say, Atlanta is. Travel involving Vegas tends to be to or from, rather than through. In truth, all cities have their own unique profiles, and some may be extremely central to the network without necessarily being centrally important in questions about that network (like human travel patterns).

The same might be true of letter-writing networks in early modern Europe, my research of choice. We often find people cropping up as extremely central, connecting very important figures whom we did not previously realize were connected, only to find out that later that, well, it’s not exactly what we thought. This new central figure, we’ll call him John Smith, happened to be the cousin of an important statesman, the neighbor of a famous philosopher, and the once-business-partner of some lawyer. None of the three ever communicated with John about any of the others, and though he was structurally central on the network, he was no-one of any historical note. A lack of memory in the network that information didn’t flow through John, only to or from him, means my centrality measurements can often be far from the mark.

It turns out that in letter-writing networks, people have separate spheres: they tend to write about family with family members, their governmental posts with other officials, and their philosophies with other philosophers. The overarching structure we see obscures partitions between communities that seem otherwise closely-knit. When researching with networks, especially going from the visualization to the analysis phase, it’s important to keep in mind what the algorithms you use do, and what assumptions they and your network structure embed in the evidence they provide.

Sometimes, the only network you have might be the wrong network for the job. I have a lot of peers (me included) who try to understand the intellectual landscape of early modern Europe using correspondence networks, but this is a poor proxy indeed for what we are trying to understand. Because of the spurious structural connections, like that of our illustrious John Smith, early modern networks give us a sense of unity that might not have been present at the time.

And because we’re only looking on one axis (letters), we get an inflated sense of the importance of spatial distance in early modern intellectual networks. Best friends never wrote to each other; they lived in the same city and drank in the same pubs; they could just meet on a sunny afternoon if they had anything important to say. Distant letters were important, but our networks obscure the equally important local scholarly communities.

If there’s a moral to the story, it’s that there are many networks that can connect the same group of nodes, and many questions that can be asked of any given network, but before trying to use networks to study history, you should be careful to make sure the questions match the network.

Multimodality

As humanists asking humanistic questions, our networks tend to be more complex than the sort originally explored in network science. We don’t just have people connected to people or websites to websites, we’ve got people connected to institutions to authored works to ideas to whatever else, and we want to know how they all fit together. Cue the multimodal network, or a network that includes several types of nodes (people, books, places, etc.).

I’m going to pick on Elijah Meeks’ map of of the DH2011 conference, because I know he didn’t actually use it to commit the sins I’m going to discuss. His network connected participants in the conference with their institutional affiliations and the submissions they worked on together.

Part of Elijah Meeks' map of DH2011. [via]
Part of Elijah Meeks’ map of DH2011. [via]
From a humanistic perspective, and especially from a Latourian one, these multimodal networks make a lot of sense. There are obviously complex relationships between many varieties of entities, and the promise of networks is to help us understand these relationships. The issue here, however, is that many of the most common metrics you’ll find in tools like Gephi were not created for multimodal networks, and many of the basic assumptions of network research need to be re-aligned in light of this type of use.

Let’s take the local clustering coefficient as an example. It’s a measurement often used to see if a particular node spans several communities, and it’s calculated by seeing how many of a node’s connections are connected to each other. More concretely, if all of my friends were friends with one another, I would have a high local clustering coefficient; if, however, my friends tended not to be friends with one another, and I was the only person in common between them, my local clustering coefficient would be quite low. I’d be the bridge holding the disparate communities together.

If you study the DH2011 network, the problem should become clear: local clustering coefficient is meaningless in multimodal networks. If people are connected to institutions and conference submissions, but not to one another, then everyone must have the same local clustering coefficient: zero. Nobody’s immediate connections are connected to each other, by definition in this type of network.

Local clustering coefficient is an extreme example, but many of the common metrics break down or mean something different when multiple node-types are introduced to the network. People are coming up with ways to handle these networks, but the methods haven’t yet made their way into popular software. Yet another reason that a researcher should have a sense of how the algorithms work and how they might interact with their own data.

No Network Zone

The previous examples pointed out when networks might be used inappropriately, but there are also times when there is no appropriate use for a network. This isn’t so much based on data (most data can become a network if you torture them enough), but on research questions. Networks seem to occupy a similar place in the humanities as power laws do in computational social sciences: they tend to crop up everywhere regardless of whether they actually add anything informative. I’m not in the business of calling out poor uses of networks, but a good rule of thumb on whether you should include a network in your poster or paper is to ask yourself whether its inclusion adds anything that your narrative doesn’t.

Alternatively, it’s also not uncommon to see over-explanations of networks, especially network visualizations. A narrative description isn’t always the best tool for conveying information to an audience; just as you wouldn’t want to see a table of temperatures over time when a simple line chart would do, you don’t want a two-page description of communities in a network when a simple visualization would do.

This post is a bit less concise and purposeful than the others in this series, but stay-tuned for a revamped (and hopefully better) version to show up in The Historian’s Macroscope. In the meantime, as always, comments are welcome and loved and will confer good luck on all those who write them.

Submissions to Digital Humanities 2014

Submissions for the 2014 Digital Humanities conference just closed. It’ll be in Switzerland this time around, which unfortunately means I won’t be able make it, but I’ll be eagerly following along from afar. Like last year, reviewers are allowed to preview the submitted abstracts. Also like last year, I’m going to be a reviewer, which means I’ll have the opportunity to revisit the submissions to DH2013 to see how the submissions differed this time around. No doubt when the reviews are in and the accepted articles are revealed, I’ll also revisit my analysis of DH conference acceptances.

To start with, the conference organizers received a record number of submissions this year: 589. Last year’s Nebraska conference only received 348 submissions. The general scope of the submissions haven’t changed much; authors were still supposed to tag their submissions using a controlled vocabulary of 95 topics, and were also allowed to submit keywords of their own making. Like last year, authors could submit long papers, short papers, panels, or posters, but unlike last year, multilingual submissions were encouraged (English, French, German, Italian, or Spanish). [edit: Bethany Nowviskie, patient awesome person that she is, has noticed yet another mistake I’ve made in this series of posts. Apparently last year they also welcomed multilingual submissions, and it is standard practice.]

Digital Humanities is known for its collaborative nature, and not much has changed in that respect between 2013 and 2014 (Figure 1). Submissions had, on average, between two and three authors, with 60% of submissions in both years having at least two authors. This year, a few fewer papers have single authors, and a few more have two authors, but the difference is too small to be attributable to anything but noise.

Figure 1. Number of authors per paper.
Figure 1. Number of authors per paper.

The distribution of topics being written about has changed mildly, though rarely in extreme ways. Any changes visible should also be taken with a grain of salt, because a trend over a single year is hardly statistically robust to small changes, say, in the location of the event.

The grey bars in Figure 2 show what percentage of DH2014 submissions are tagged with a certain topic, and the red dotted outlines show what the percentages were in 2013. The upward trends to note this year are text analysis, historical studies, cultural studies, semantic analysis, and corpora and corpus activities. Text analysis was tagged to 15% of submissions in 2013 and is now tagged to 20% of submissions, or one out of every five. Corpus analysis similarly bumped from 9% to 13%. Clearly this is an important pillar of modern DH.

Figure 2. Topics from DH2014 ordered by the percent of submissions which fall in that category. The dotted lines represent the percentage from DH2013.
Figure 2. Topics from DH2014 ordered by the percent of submissions which fall in that category. The red dotted outlines represent the percentage from DH2013.

I’ve pointed out before that History is secondary compared to Literary Studies in DH (although Ted Underwood has convincingly argued, using Ben Schmidt’s data, that the numbers may merely be due to fewer people studying history). This year, however, historical studies nearly doubled in presence, from 10% to 17%. I haven’t yet collected enough years of DH conference data to see if this is a trend in the discipline at large, or more of a difference between European and North American DH. Semantic analysis jumped from 1% to 7% of the submissions, cultural studies went from 10% to 14%, and literary studies stayed roughly equivalent. Visualization, one of the hottest topics of DH2013, has become even hotter in 2014 (14% to 16%).

The most visible drops in coverage came in pedagogy, scholarly editions, user interfaces, and research involving social media and the web. At DH2013, submissions on pedagogy had a surprisingly low acceptance rate, which combined the drop in pedagogy submissions this year (11% to 8% in “Digital Humanities – Pedagogy and Curriculum” and 7% to 4% in “Teaching and Pedagogy”) might suggest a general decline in interest in the DH world in pedagogy. “Scholarly Editing” went from 11% to 7% of the submissions, and “Interface and User Experience Design” from 13% to 8%, which is yet more evidence for the lack of research going into the creation of scholarly editions compared to several years ago. The most surprising drops for me were those in “Internet / World Wide Web” (12% to 8%) and “Social Media” (8.5% to 5%), which I would have guessed would be growing rather than shrinking.

The last thing I’ll cover in this post is the author-chosen keywords. While authors needed to tag their submissions from a list of 95 controlled vocabulary words, they were also encouraged to tag their entries with keywords they could choose themselves. In all they chose nearly 1,700 keywords to describe their 589 submissions. In last year’s analysis of these keywords, I showed that visualization seemed to be the glue that held the DH world together; whether discussing TEI, history, network analysis, or archiving, all the disparate communities seemed to share visualization as a primary method. The 2014 keyword map (Figure 3) reveals the same trend: visualization is squarely in the middle. In this graph, two keywords are linked if they appear together on the same submission, thus creating a network of keywords as they co-occur with one another. Words appear bigger when they span communities.

Figure 3. Co-occurrence of DH2014 author-submitted keywords.
Figure 3. Co-occurrence of DH2014 author-submitted keywords.

Despite the multilingual conference, the large component of the graph is still English. We can see some fairly predictable patterns: TEI is coupled quite closely with XML; collaboration is another keyword that binds the community together, as is (obviously) “Digital Humanities.” Linguistic and literature are tightly coupled, much moreso than, say, linguistic and history. It appears the distant reading of poetry is becoming popular, which I’d guess is a relatively new phenomena, although I haven’t gone back and checked.

This work has been supported by an ACH microgrant to analyze DH conferences and the trends of DH through them, so keep an eye out for more of these posts forthcoming that look through the last 15 years. Though I usually share all my data, I’ll be keeping these to myself, as the submitters to the conference did so under an expectation of privacy if their proposals were not accepted.

[edit: there was some interest on twitter last night for a raw frequency of keywords. Because keywords are author-chosen and I’m trying to maintain some privacy on the data, I’m only going to list those keywords used at least twice. Here you go (Figure 4)!]

Figure 4. Keywords used in DH2014 submissions ordered by frequency.
Figure 4. Keywords used in DH2014 submissions ordered by frequency.