Early Modern Letters Online

Early modern history! Science! Letters! Data! Four of my favoritest things have been combined in this brand new beta release of Early Modern Letters Online from Oxford University.



EMLO (what an adorable acronym, I kind of what to tickle it) is Oxford’s answer to a metadata database (metadatabase?) of, you guessed it, early modern letters. This is pretty much a gold standard metadata project. It’s still in beta, so there are some interface kinks and desirable features not-yet-implemented, but it has all the right ingredients for a great project:

  • Information is free and open; I’m even told it will be downloadable at some point.
  • Developed by a combination of historians (via Cultures of Knowledge) and librarians (via the Bodleian Library) working in tandem.
  • The interface is fast, easy, and includes faceted browsing.
  • Has a fantastic interface for adding your own data.
  • Actually includes citation guidelines thank you so much.
  • Visualizations for at-a-glance understanding of data.
  • Links to full transcripts, abstracts, and hard-copies where available.
  • Lots of other fantastic things.

Sorry if I go on about how fantastic this catalog is – like I said, I love letters so much. The index itself includes roughly 12,000 people, 4,000 locations, 60,000 letters, 9,000 images, and 26,000 additional comments. It is without a doubt the largest public letters database currently available. Between the data being compiled by this group, along with that of the CKCC in the Netherlands, the Electronic Enlightenment Project at Oxford, Stanford’s Mapping the Republic of Letters project, and R.A. Hatch‘s research collection, there will without a doubt soon be hundreds of thousands of letters which can be tracked, read, and analyzed with absolute ease. The mind boggles.

Bodleian Card Catalogue Summaries

Without a doubt, the coolest and most unique feature this project brings to the table is the digitization of Bodleian Card Catalogue, a fifty-two drawer index-card cabinet filled with summaries of nearly 50,000 letters held in the library, all compiled by the Bodleian staff many years ago. In lieu of full transcriptions, digitizations, or translations, these summary cards are an amazing resource by themselves. Many of the letters in the EMLO collection include these summaries as full-text abstracts.

One of the Bodleian summaries showing Heinsius looking far and wide for primary sources, much like we’re doing right now…

The collection also includes the correspondences of John Aubrey (1,037 letters), Comenius (526), Hartlib (4,589 many including transcripts), Edward Lhwyd (2,139 many including transcripts), Martin Lister (1,141), John Selden (355), and John Wallis (2,002). The advanced search allows you to look for only letters with full transcripts or abstracts available. As someone who’s worked with a lot of letters catalogs of varying qualities, it is refreshing to see this one being upfront about unknown/uncertain values. It would, however, be nice if they included the editor’s best guess of dates and locations, or perhaps inferred locations/dates from the other information available. (For example, if birth and death dates are known, it is likely a letter was not written by someone before or after those dates.)


In the interest of full disclosure, I should note that, much like with the CKCC letters interface, I spent some time working with the Cultures of Knowledge team on visualizations for EMLO. Their group was absolutely fantastic to work with, with impressive resources and outstanding expertise. The result of the collaboration was the integration of visualizations in metadata summaries, the first of which is a simple bar chart showing the numbers of letters written, received, and mentioned in per year of any given individual in the catalog. Besides being useful for getting an at-a-glance idea of the data, these charts actually proved really useful for data cleaning.

Sir Robert Crane (1604-1643)

In the above screenshot from previous versions of the data, Robert Crane is shown to have been addressed letters in the mid 1650s, several years after his reported death. While these could also have been spotted automatically, there are many instances where a few letters are dated very close to a birth or death date, and they often turn out to miss-reported. Visualizations can be great tools for data cleaning as a form of sanity test. This is the new, corrected version of Robert Crane’s page. They are using d3.js, a fantastic javascript library for building visualizations.

Because I can’t do anything with letters without looking at them as a network, I decided to put together some visualizations using Sci2 and Gephi. In both cases, the Sci2 tool was used for data preparation and analysis, and the final network was visualized in GUESS and Gephi, respectively. The first graph shows network in detail with edges, and names visible for the most “central” correspondents. The second visualization is without edges, with each correspondent clustered according to their place in the overall network, with the most prominent figures in each cluster visible.

Built with Sci2/Guess
Built with Sci2/Gephi

The graphs show us that this is not a fully connected network. There are many islands of one or two letters or a small handful of letters. These can be indicative of a prestige bias in the data. That is, the collection contains many letters from the most prestigious correspondents, and increasingly fewer as the prestige of the correspondent decreases. Put in another way, there are many letters from a few, and few letters from many. This is a characteristic shared with power law and other “long tail” distributions. The jumbled community structure at the center of the second graph is especially interesting, and it would be worth comparing these communities against institutions and informal societies at the time. Knowledge of large-scale patterns in a network can help determine what sort of analyses are best for the data at hand. More on this in particular will be coming in the next few weeks.

It’s also worth pointing out these visualizations as another tool for data-checking. You may notice, on the bottom left-hand corner of the first network visualization, two separate Edward Lhwyds with virtually the same networks of correspondence. This meant there were two distinct entities in their database referring to the same individual – a problem which has since been corrected.

More Letters!

Notice that the EMLO site makes it very clear that they are open to contributions. There are many letters datasets out there, some digitized, some still languishing idly on dead trees, and until they are all combined, we will be limited in the scope of the research possible. We can always use more. If you are in any way responsible for an early-modern letters collection, meta-data or full-text, please help by opening that collection up and making it integrable with the other sets out there. It will do the scholarly world a great service, and get us that much closer to understanding the processes underlying scholarly communication in general. The folks at Oxford are providing a great example, and I look forward to watching this project as it grows and improves.

Zotpress is so cool.

So, you may have noticed this site has been overhauled over the past few days. The old WP theme really wasn’t doing it for me, so I decided to switch to the Great and Glorious Suffusion theme, which is more customizable than barrel of monkeys. The switch to the new theme opened up all sorts of real-estate for new content, and a brief look around the #DH blogosphere landed me on Zotpress.

Do you guys use Zotero? You should use Zotero. It’s a fantastic citation management program that snuggles up nice and close to your browser and turns it into a super research machine.

Dear Zotero, I ♥ you.

Anyway, Zotpress is a WordPress plugin that allows you to put the power of Zotero into your blog. Want to reference stuff? Easy! Want to make a list of most recently read items? Cake! (See the right side of this blog for that particular feature.) This is one of those plugins that I never thought I needed, but now that I have it I cannot imagine blogging efficiently without it.

For your reading pleasure, below is a list of some of super cool articles, courtesy of Zotpress:

[zotpress item=”34ABEHCE,D9SRGW5H,H52588XW,EF3KZ27G,HK6XQ3CI,42WF9AT7,SH7RT4P5″]

Alchemy, Text Analysis, and Networks! Oh my!

“Newton wrote and transcribed about a million words on the subject of alchemy.” —chymistry.org


Beside bringing us things like calculus, universal gravitation, and perhaps the inspiration for certain Pink Floyd albums, Isaac Newton spent many years researching what was then known as “chymistry,” a multifaceted precursor to, among other things, what we now call chemistry, pharmacology, and alchemy.

Pink Floyd and the Occult: Discuss.

Researchers at Indiana University, notably William R. Newman, John A. Walsh, Dot Porter, and Wallace Hooper, have spent the last several years developing The Chymistry of Isaac Newton, an absolutely wonderful history of science resource which, as of this past month, has digitized all 59 of Newton’s alchemical manuscripts assembled by John Keynes in 1936. Among the sites features are heavily annotated transcriptions, manuscript images, often scholarly synopses, and examples of alchemical experiments. That you can try at home. That’s right, you can do alchemy with this website. They also managed to introduce alchemical symbols into unicode (U+1F700 – U+1F77F), which is just indescribably cool.

Alchemical experiments at home! http://webapp1.dlib.indiana.edu/newton/reference/mineral.do

What I really want to highlight, though, is a brand new feature introduced by Wallace Hooper: automated Latent Semantic Analysis (LSA) of the entire corpus. For those who are not familiar with it, LSA is somewhat similar LDA, the algorithm driving the increasingly popular Topic Models used in Digital Humanities. They both have their strengths and weaknesses, but essentially what they do is show how documents and terms relate to one another.

Newton Project LSA

In this case, the entire corpus of Newton’s alchemical texts is fed into the LSA implementation (try it for yourself), and then based on the user’s preferences, the algorithm spits out a network of terms, documents, or both together. That is, if the user chooses document-document correlations, a list is produced of the documents that are most similar to one another based on similar word use within them. That list includes weights – how similar are they to one another? – and those weights can be used to create a network of document similarity.

Similar Documents using LSA

One of the really cool features of this new service is that it can export the network either as CSV for the technical among us, or as an nwb file to be loaded into the Network Workbench or the Sci² Tool. From there, you can analyze or visualize the alchemical networks, or you can export the files into a network format of your choice.

Network of how Newton’s alchemical documents relate to one-another visualized using NWB.

It’s great to see more sophisticated textual analyses being automated and actually used. Amber Welch recently posted on Moving Beyond the Word Cloud using the wonderful TAPoR, and Michael Widner just posted a thought-provoking article on using Voyeur Tools for the process of paper revision. With tools this easy to use, it won’t be long now before the first thing a humanist does when approaching a text (or a million texts) is to glance at all the high-level semantic features and various document visualizations before digging in for the close read.

Rippling o’er the Wave

The inimitable Elijah Meeks recently shared his reasoning behind joining Google+ over Twitter or Facebook. “G+ seems to be self-consciously a network graph that happens to let one connect and keep in touch.” For those who haven’t made the jump, Google+ feels like a contact list on steroids; it lets you add contacts, organize them into different (often overlapping) “circles,” and ultimately you can share materials based on those circles, video chat, send messages, and so forth. By linking your pre-existing public Google profile (and rolling in old features like Buzz and Google Reader), Google has essentially socialized web presences rather than “web presencifying” the social space.

It’s a wishy-washy distinction, and not entirely true, but it feels true enough that many who never worried about social networking sites are going to Google+. This is also one of the big distinctions between the loved-but-lost Google Wave, which was ultrasocial but also ultraprivate; it was not an extended Twitter, but an extended AIM or gmail — really some Frankenstein of the two. It wasn’t about presences and extending contacts, but about chatting alone.

True to Google form, they’ve already realized the potential of sharing in this semi-public space. If Twitter weren’t so minimalistic, they too would have caught on early. Yesterday, via G+ itself, Ripples rippled through the social space. Google+ Ripples describes itself as “a way to visualize the impact of any public post.” This link 1 shows the “ripples” of Ripples itself 2, or the propagation of news of Ripples through the G+ space.

They do a great job invoking the very circles used to organize contacts. Nested circles show subsequent generations of the shared post, and in most cases nested circles also represent followers of the most recent root node. Below the graph, G+ displays the posting frequency over time and allows the user to rewind the clock, seeing how the network grew. Hidden at the bottom of the page, you can find the people with the most public reshares (“influencers”), basic network statistics (average path length, not terribly meaningful in this situation; longest chain; and shares-per-hour), and languages of reshared posts. You can also read the reshares themselves on the right side of the screen, which immediately moved this from my mental “toy” box to the “research tool” box.

Make no mistake, this is a research tool. Barring the lack of permanent links or the ability to export the data into some manipulable file 3, this is a perfect example of information propagation done well. When doing similar research on Twitter, one often requires API-programming prowess to get even this far; in G+, it’s as simple as copying a link. By making information-propagating-across-a-network something sexy, interesting, and easily accessible to everyone, Google is making diffusion processes part of the common vernacular. For this, I give Google +1.




  1. One feature I would like would be the ability to freeze Ripples links. The linked content will change as more people share the initial post – this is potentially problematic.
  2. Anything you can do I can do meta.
  3. which will be necessary for this to go from “research tool” to “actually used research tool”

Psychology of Science as a New Subdiscipline in Psychology

Feist, G. J. 2011. “Psychology of Science as a New Subdiscipline in Psychology.” Current Directions in Psychological Science 20 (October 5): 330-334. doi:10.1177/0963721411418471.

Gregory Feist, a psychologist from San Jose State University, recently wrote a review of the past decade of findings in the psychology of science. He sets the discipline apart from history, philosophy, anthropology, and sociology of science, defining the psychology of science as “the scientific study of scientific thought and behavior,” both implicit and explicit, in children and adults.

Some interesting results covered in the paper:

  • “People pay more attention to evidence when it concerns plausible theories than when it concerns implausible ones.”
  • “Babies as young as 8 months of age understand probability… children as young as 4 years old can correctly draw causal inferences from bar graphs.” (I’m not sure how much I believe that last one – can grown scientists correctly draw causal inferences from bar graphs?)
  • “children, adolescents, and nonscientist adults use different criteria when evaluating explanations and evidence, they are not very good at separating belief from fact (theory and evidence), and they persistently give their beliefs as evidence for their beliefs.”
  • “one reason for the inability to distinguish theory from evidence is the belief that knowledge is certain and absolute—that is, either right or wrong”
  • “scientists use anomalies and unexpected findings as sources for new theories and experiments and that analogy is very important in generating hypotheses and interpreting results”
  • “the personality traits that make scientific interest more likely are high conscientiousness and low openness, whereas the traits that make scientific creativity more likely are high openness, low conscientiousness, and high confidence.”
  • “scientists are less prone to mental health difficulties than are other creative people,” although “It may be that science tends to weed out those with mental health problems in a way that art, music, and poetry do not.”
It is somewhat surprising that Feist doesn’t mention the old use of “psychology of science,” which largely surrounded Reichenbach’s (1938) context distinctions, as echoed by the Vienna Circle and many others. The context of discovery (rather than the context of justification) deals with the question that, as Salmon (1963) put it, “When a statement has been made, … how did it come to be thought of?” Barry F. Singer (1971) wrote “Toward a Psychology of Science,” where he quoted S.S. Stevens (1936, 1939) on the subject of a scientific psychology of science.
It is exciting that the psychology of science is picking up again as an interesting object of study, although it would have been nice for Feist to cite someone earlier than 1996 when discussing this “new subdiscipline in psychology.”
From Wired Magazine