Science Systems Engineering

Warning: This post is potentially evil, and definitely normative. While I am unsure whether what I describe below should be doneI’m becoming increasingly certain that it could be. Read with caution.

Complex Adaptive Systems

Science is a complex adaptive system. It is a constantly evolving network of people and ideas and artifacts which interact with and feed back on each other to produce this amorphous socio-intellectual entity we call science. Science is also a bunch of nested complex adaptive systems, some overlapping, and is itself part of many other systems besides.

The study of complex interactions is enjoying a boom period due to the facilitating power of the “information age.” Because any complex system, whether it be a social group or a pool of chemicals, can exist in almost innumerable states while comprising the same constituent parts, it requires massive computational power to comprehend all the many states a system might find itself in. From the other side, it takes a massive amount of data observation and collection to figure out what states systems eventually do find themselves in, and that knowledge of how complex systems play out in the real world relies on collective and automated data gathering. From seeing how complex systems work in reality, we can infer properties of their underlying mechanisms; by modeling those mechanisms and computing the many possibilities they might allow, we can learn more about ourselves and our place in the larger multisystem. 1

One of the surprising results of complexity theory is that seemingly isolated changes can produce rippling, massive effects throughout a system.  Only a decade after the removal of big herbivores like giraffes and elephants from an African savanna, a generally positive relationship between bugs and plants turned into an antagonistic one. Because the herbivores no longer grazed on certain trees, those trees began producing less nectar and fewer thorns, which in turn caused cascading repercussions throughout the ecosystem. Ultimately, the trees’ mortality rate doubled, and a variety of species were worse-off than they had been. 2 Similarly, the introduction of an invasive species can cause untold damage to an ecosystem, as has become abundantly clear in Florida 3 and around the world (the extinction of flightless birds in New Zealand springs to mind).

http://www.flickr.com/photos/arnolouise/3202569865/

Both evolutionary and complexity theories show that self-organizing systems evolve in such a way that they are self-sustaining and self-perpetuating. Often, within a given context or environment, the systems which are most resistant to attack, or the most adaptable to change, are the most likely to persist and grow. Because the entire environment evolves concurrently, small changes in one subsystem tend to propagate as small changes in many others. However, when the constraints of the environment change rapidly (like with the introduction of an asteroid and a cloud of sun-cloaking dust), when a new and sufficiently foreign system is introduced (land predators to New Zealand), or when an important subsystem is changed or removed (the loss of megafauna in Africa), devastating changes ripple outward.

An environmental ecosystem is one in which many smaller overlapping systems exist, and changes in the parts may change the whole; society can be described similarly. Students of history know that the effects of one event (a sinking ship, an assassination, a terrorist attack) can propagate through society for years or centuries to come. However, a system not merely a slave to these single occurrences which cause Big Changes. The structure and history of a system implies certain stable, low energy states. We often anthropomorphize the tendency of systems to come to a stable mean, for example “nature abhors a vacuum.” This is just the manifestation of the second law of thermodynamics: entropy always increases, systems naturally tend toward low energy states.

For the systems of society, they are historically structured constrained in such a way that certain changes would require very little energy (an assassination leading to war in a world already on the brink), whereas others would require quite a great deal (say, an attempt to cause war between Canada and the U.S.). It is a combination of the current structural state of a system and the interactions of the constituent parts that lead that system in one direction or another. Put simply, a society, its people, and its environment are responsible for its future. Not terribly surprising, I know, but the formal framework of complexity theory is a useful one for what is described below.

metastability

The above picture, from the Wikipedia article on metastability, provides an example of what’s described above. The ball is resting in a valley, a low energy state, and a small change may temporarily excite the system, but the ball eventually finds its way into the same, or another, low energy state. When the environment is stable, its subsystems tend to find comfortably stable niches as well. Of course, I’m not sure anyone would call society wholly stable…

Science as a System

Science (by which I mean wissenschaft, any systematic research) is part of society, and itself includes many constituent and overlapping parts. I recently argued, not without precedent, that the correspondence network between early modern Europeans facilitated the rapid growth of knowledge we like to call the Scientific Revolution. Further, that network was an inevitable outcome of socio/political/technological factors, including shrinking transportation costs, increasing political unrest leading to scholarly displacement, and, very simply, an increased interest in communicating once communication proved so fruitful. The state of the system affected the parts, the parts in turn affected the system, and a growing feedback loop led to the co-causal development of a massive communication network and a period of massively fruitful scholarly work.

Scientific Correspondence Network

Today and in the past, science is embedded in, and occasionally embodied by, the various organizational and communicative hierarchies its practitioners find themselves in. The people, ideas, and products of science feed back on one another. Scientists are perhaps more affected by their labs, by the process of publication, by the realities of funding, than they might admit. In return, the knowledge and ideas produced by science, the message, shape and constrain the medium in which they are propagated. I’ve often heard and read two opposing views: that knowledge is True and Right  and unaffected the various social goings on of those who produce it, and that knowledge is Constructed and Meaningless outside of the social and linguistic system it resides in. The truth, I’m sure, is a complex tangle somewhere between the two, and affected by both.

In either case, science does not take place in a vacuum. We do our work through various media and with various funds, in departments and networks and (sometimes) lab-coats, using a slew of carefully designed tools and a language that was not, in general, made for this purpose. In short, we and our work exist in a  complex system.

Engineering the Academy

That system is changing. Michael Nielsen’s recent book 4 talks about the rise of citizen science, augmented intelligence, and collaborative systems as not merely as ways to do what we’ve already done faster, but as new methods of discovery. The ability to coordinate on such a scale, and in such new ways, changes the game of science. It changes the system.

While much of these changes are happening automatically, in a self-organized sort of way, Nielsen suggests that we can learn from our past and learn from other successful collective ventures in order to make a “design science of collaboration.” That is, using what we know of how people work together best, of what spurs on the most inspired research and the most interesting results, we can design systems to facilitate collaboration and scientific research. In Nielsen’s case, he’s talking mostly about computer systems; how can we design a website or an algorithm or a technological artifact that will aid in scientific discovery, using the massive distributed power of the information age? One way Nielson points out is “designed serendipity,” creating an environment where scientists are more likely experience serendipitous occurrences, and thus more likely to come up with innovated and unexpected ideas.

Can we engineer science? http://www.flickr.com/photos/seattlemunicipalarchives/4818952324

In complexity terms, this idea is restructuring the system in such a way that the constituent parts or subsystems will be or do “better,” however we feel like defining better in this situation. It’s definitely not the first time an idea like this has been used. For example, science policy makers, government agencies, and funding bodies have long known that science will often go where the money is. If there is a lot of money available to research some particular problem, then that problem will tend to get researched. If the main funding requires research funded to become open access, by and large that will happen (NIH’s PubMed requirements).

There are innumerable ways to affect the system in a top-down way in order to shape its future. Terrence Deacon writes about how it is the constraints on a system which tend it toward some equilibrium state 5; by shaping the structure of the scientific system, we can predictably shape its direction. That is, we can artificially create a low energy state (say, open access due to policy and funding changes), and let the constituent parts find their way into that low energy state eventually, reaching equilibrium. I talked a bit more about this idea of constraints leading a system in a recent post.

As may be recalled from the discussion above, however, this is not the only way to affect a complex system. External structural changes are only part of the story of how a system grows shifts, but only a small part of the story. Because of the series of interconnected feedback loops that embody a system’s complexity, small changes can (and often do) propagate up and change the system as a whole. Lie, Slotine, and Barabási recently began writing about the “controllability of complex networks 6,”  suggesting ways in which changing or controlling constituent parts of a complex system can reliably and predictably change the entire system, perhaps leading it toward a new preferred low energy state. In this case, they were talking about the importance of well-connected hubs in a network; adding or removing them in certain areas can deeply affect the evolution of that network, no matter the constraints. Watts recounts a great example of how a small power outage rippled into a national disaster because just the right connections were overloaded and removed 7. The strategic introduction or removal of certain specific links in the scientific system may go far toward changing the system itself.

Not only is science is a complex adaptive system, it is a system which is becoming increasingly well-understood. A century of various science studies combined with the recent appearance of giant swaths of data about science and scientists themselves is beginning to allow us to learn the structure and mechanisms of the scientific system. We do not, and will never, know the most intricate details of that system, however in many cases and for many changes, we only need to know general properties of a system in order to change it in predictable ways. If society feels a certain state of science is better than others, either for the purpose of improved productivity or simply more control, we are beginning to see which levers we need to pull in order to enact those changes.

This is dangerous. We may be able to predict first order changes, but as they feed back onto second order, third order, and further-down-the-line changes, the system becomes more unpredictable. Changing one thing positively may affect other aspects in massively negative (and massively unpredictable) ways.

However, generally if humans can do something, we will. I predict the coming years will bring a more formal Science Systems Engineering, a specialty apart from science policy which will attempt to engineer the direction of scientific research from whatever angle possible. My first post on this blog concerned a concept I dubbed scientonomy, which was just yet another attempt at unifying everybody who studies science in a meta sort of way. In that vocabulary, then, this science systems engineering would be an applied scientonomy. We have countless experts in all aspects of how science works on a day-to-day basis from every angle; that expertise may soon become much more prominent in application.

It is my hope and belief that a more formalized way of discussing and engineering scientific endeavors, either on the large scale or the small, can lead to benefits to humankind in the long run. I share the optimism of Michael Nielsen in thinking that we can design ways to help the academy run more smoothly and to lead it toward a more thorough, nuanced, and interesting understanding of whatever it is being studied. However, I’m also aware of the dangers of this sort of approach, first and foremost being disagreement on what is “better” for science or society.

At this point, I’m just putting this idea out there to hear the thoughts of my readers. In my meatspace day-to-day interactions, I tend to be around experimental scientists and quantitative social scientists who in general love the above ideas,  but at my heart and on my blog I feel like a humanist, and these ideas worry me for all the obvious reasons (and even some of the more obscure ones). I’d love to get some input, especially from those who are terrified that somebody could even think this is possible.

Notes:

  1. I’m coining the term “multisystem” because ecosystem is insufficient, and I don’t know something better. By multisystem, I mean any system of systems; specifically here, the universe and how it evolves. If you’ve got a better term that invokes that concept, I’m all for using it. Cosmos comes to mind, but it no longer represents “order,” a series of interlocking systems, in the way it once did.
  2. Palmer, Todd M, Maureen L Stanton, Truman P Young, Jacob R Goheen, Robert M Pringle, and Richard Karban. 2008. “Breakdown of an Ant-Plant Mutualism Follows the Loss of Large Herbivores from an African Savanna.” Science319 (5860) (January 11): 192–195. doi:10.1126/science.1151579.
  3. Gordon, Doria R. 1998. “Effects of Invasive, Non-Indigenous Plant Species on Ecosystem Processes: Lessons From Florida.” Ecological Applications 8 (4): 975–989. doi:10.1890/1051-0761(1998)008[0975:EOINIP]2.0.CO;2.
  4. Nielsen, Michael. Reinventing Discovery: The New Era of Networked Science. Princeton University Press, 2011.
  5. Deacon, Terrence W. “Emergence: The Hole at the Wheel’s Hub.” In The Re-Emergence of Emergence: The Emergentist Hypothesis from Science to Religion, edited by Philip Clayton and Paul Davies. Oxford University Press, USA, 2006.
  6. Liu, Yang-Yu, Jean-Jacques Slotine, and Albert-László Barabási. “Controllability of Complex Networks.” Nature473, no. 7346 (May 12, 2011): 167–173.
  7. Watts, Duncan J. Six Degrees: The Science of a Connected Age. 1st ed. W. W. Norton & Company, 2003.

The Internet Listens

The public science blogosphere has recently been buzzing about an online edited book review called Download The Universe. The twist is that the editors only review online-only science books, and their definition of “book” is broadly construed:

[W]e define ebooks broadly. They may be self-published pdf manuscripts. They may be Kindle Singles about science. They can even be apps that have games embedded in them. We hope that we will eventually review new kinds of ebooks that we can’t even imagine yet. And we hope that you will find Download the Universe a useful doorway into that future.

The site aims to fill the publicity gap that prevents interesting and good science ebooks from finding their way into the hands of receptive readers. Traditional reviews and blogs tend not to cover this new media, the editors say. In the spirit of the fast-paced nature of the internet, the entire project was conceived last month at Science Online (#scio12) and already features 8 posts and an editing staff of 16.

Download The Universe

My initial excitement of this project was tempered somewhat when I found that their news feed offered exceptionally tiny snippets of their ebook reviews. That’s no good! I’m subscribed to 361 feeds in Google Reader, with nearly 500 posts a day, and if I don’t have a few paragraphs to see whether an article is interesting, it is unlikely that I’d ever click through to the actual page to investigate further. (By the way, if you’re interested in the best of what I read, you can subscribe to my favorite feed items here, where I read through 361 blogs so you don’t have to.) Unfortunately, snippet news feeds are becoming increasingly frequent, as blogs and sites attempt to entice you to their pages where they can get usage statistics and ad-views in ways they could not through a simple RSS feed.

Apparently, when you talk, the internet listens. My disappointment was such that I sent an email to the coordinating editor, science writer Carl Zimmer, explaining my problem. He immediately sent a reply telling me he would look into the feedburner settings, and within short order, the RSS became a full, no-snippet news feed. Woah! A big (and public) thank you to Carl Zimmer, and the entire crew at Download The Universe, for putting together a wonderful and important new site and for being so receptive to their readers. Bravo!

The Networked Structure of Scientific Growth

Well, it looks like Digital Humanities Now scooped me on posting my own article. As some of you may have read, I recently did not submit a paper on the Republic of Letters, opting instead to hold off until I could submit it to a journal which allowed authorial preprint distribution. Preprints are a vital part of rapid knowledge exchange in our ever-quickening world, and while some disciplines have embraced the preprint culture, many others have yet to. I’d love the humanities to embrace that practice, and in the spirit of being the change you want to see in the world, I’ve decided to post a preprint of my Republic of Letters paper, which I will be submitting to another journal in the near future. You can read the full first draft here.

The paper, briefly, is an attempt to contextualize the Republic of Letters and the Scientific Revolution using modern computational methodologies. It draws from secondary sources on the Republic of Letters itself, especially from my old mentor R.A. Hatch, some network analysis from sociology and statistical physics, modeling, human dynamics, and complexity theory. All of this is combined through datasets graciously donated by the Dutch Circulation of Knowledge group and Oxford’s Cultures of Knowledge project, totaling about 100,000 letters worth of metadata. Because it favors large scale quantitative analysis over an equally important close and qualitative analysis, the paper is a contribution to historiopgraphic methodology rather than historical narrative; that is, it doesn’t say anything particularly novel about history, but it does offer a (fairly) new way of looking at and contextualizing it.

A visualization of the Dutch Republic of Letters using Sci2 & Gephi

At its core, the paper suggests that by looking at how scholarly networks naturally grow and connect, we as historians can have new ways to tease out what was contingent upon the period and situation. It turns out that social networks of a certain topology are basins of attraction similar to those I discussed in Flow and Empty Space. With enough time and any of a variety of facilitating social conditions and technologies, a network similar in shape and influence to the Republic of Letters will almost inevitably form. Armed with this knowledge, we as historians can move back to the microhistories and individuated primary materials to find exactly what those facilitating factors were, who played the key roles in the network, how the network may differ from what was expected, and so forth. Essentially, this method is one base map we can use to navigate and situate historical narrative.

Of course, I make no claims of this being the right way to look at history, or the only quantitative base map we can use. The important point is that it raises new kinds of questions and is one mechanism to facilitate the re-integration of the individual and the longue durée, the close and the distant reading.

The project casts a necessarily wide net. I do not yet, and probably could not ever, have mastery over each and every disciplinary pool I draw from. With that in mind, I welcome comments, suggestions, and criticisms from historians, network analysts, modelers, sociologists, and whomever else cares to weigh in. Whomever helps will get a gracious acknowledgement in the final version, good scholarly karma, and a cookie if we ever meet in person. The draft will be edited and submitted in the coming months, and if you have ideas, please post them in the comment section below. Also, if you use ideas from the paper, please cite it as an unpublished manuscript or, if it gets published, cite that version instead.

On Keeping Pledges

A few months back, I posted a series of pledges about being a good scholarly citizen. Among other things, I pledged to keep my data and code open whenever possible, and to fight to retain the right to distribute materials pending and following their publication. I also signed the Open Access Pledge. Since then, a petition boycotting Elsevier cropped up with very similar goals, and as of this writing has nearly 7,000 signatures.

As a young scholar with as-yet no single authored publications (although one is pending in the forward-thinking Journal of Digital Humanities, which you should all go and peer review), I had to think very carefully in making these pledges. It’s a dangerous world out there for people who aren’t free to publish in whatever journal they like; reducing my publication options is not likely to win me anything but good karma.

With that in mind, I actually was careful never to pledge explicitly that I would not publish in closed access venues; rather, I pledged to “Freely distribute all published material for which I have the right, and to fight to retain those rights in situations where that is not the case.” The pressure of the eventual job market prevented me from saying anything stronger.

Today, my resolve was tested. A recent CFP solicited papers about “Shaping the Republic of Letters: Communication, Correspondence and Networks in Early Modern Europe.” This is, essentially, the exact topic that I’ve been studying and analyzing for the past several years, and I recently finished a draft of a paper on this topic precisely. The paper utilizes methodologies not-yet prevalent in the humanities, and I’d like the opportunity to spread the technique as quickly and widely as possible, in the hopes that some might find it useful or at least interesting. I also feel strongly that the early and open dissemination of scholarly production is paramount to a healthy research community.

I e-mailed the editor asking about access rights, and he sent a very kind reply, saying that, unfortunately, any article in the journal must be unpublished (even on the internet), and cannot be republished for two years following its publication. The journal itself is part of a small press, and as such is probably trying to get itself established and sold to libraries, so their reticence is (perhaps) understandable. However, I was faced with a dilemma: submit my article to them, going against the spirit – though not the letter – of my pledge, or risk losing a golden opportunity to submit my first single-authored article to a journal where it would actually fit.

In the end, it was actually the object of my study itself – the Republic of Letters – that convinced me to make a stand and not submit my article. The Republic, a self-titled community of 17th century scholars communicating widely by post, was embodied by the ideal of universal citizenship and the free flow of knowledge. While they did not live up to this ideal, in large part because of the technologies of the time, we now are closer to being able to do so. I need to do my part in bringing about this ideal by taking a stand on the issues of open access and dissemination.

The below was my e-mail to the editor:

Many thanks for your fast reply.

Unfortunately, I cannot submit my article unless those conditions are changed. I fear they represent a policy at odds with the past ideals and present realities of scholarly dissemination. The ideals of the Republic of Letters, regarding the free flow of information and universal citizenship, are finally becoming attainable (at least in some parts of the world) with nigh-ubiquitous web access. In a world as rapidly changing as our own, immediate access to the materials of scholarly production is becoming an essential element not just of science, in the English sense of the word, but wissenschaft at large. Numerous studies have shown that the open availability of electronic prints for an article increases readership and citations (both to the author and to the journal), reduces the time to the adoption of new ideas, and facilitates a more rapidly innovating and evolving literature in the scholarly world. While I empathize that you represent a fairly small press and may be worried that the availability of pre-prints would affect 1 sales, I have seen no studies showing this to be the case, although I would of course be open to reading such research if you know of some. In either case, it has been shown that pre-prints at worst do not affect scholarly use and dissemination in the least, and at best increase readership, citation, and impact by up to 250%.

Good luck with your journal, and I look forward to reading the upcoming issue when it becomes available.

It’s a frightening world out there. I considered not posting about this interaction, for fear of the possibility of angering or being blacklisted by the editorial or advisory board of the press, some of whom are respected names in my intended field of study. However, fear is the enemy of change, and the support of Bethany Nowviskie and a host of tweeters convinced me that this was the right thing to do.

With that in mind, I herewith post a draft of my article analyzing the Republic of Letters, currently titled The Networked Structure of Scientific Growth. Please feel free to share it for non-commercial use, citing it if you use it (but making sure to cite the published version if it eventually becomes so), and I’d love your comments if you have any. I’ll dedicate a separate post to this release later, but I figured you all deserved this after reading the whole post.

Notes:

  1. Big thanks to Andrew Simpson for pointing out the error of my ways!

Citing ODH’s Summer Institutes

While I generally like to reserve posts for a wider audience, this is the second time I’ve come across this particular issue, and I’d like help from the masses. Every summer, the NEH’s Office for Digital Humanities funds a series of Institutes for Advanced Topics in the Digital Humanities. I’ve had the great fortune of attending one on computer simulations in the humanities, and teaching at one on network analysis for the humanities. I often find myself wishing I could cite one, as a whole, because of all the valuable experience and knowledge I received there. Unfortunately I have found no standard format to cite whole conferences, workshops, or summer institutes.

Our Great and Glorious Funders

I asked Brett Bobley, the ODH director, if he had any suggestions, but unfortunately he was at as much a loss as I. His reply: “Good question! I’d cite the URL (ex: http://is.gd/QnFs11 ). But we don’t have a format. Want to choose one & we’ll anoint it?” I’m not terribly familiar with citation styles, but I figured I’d try one out and see if the The DH Hive Mind had any better ideas. If so, please post in the comments. Ideally, the citation should include the URL of the grant, the PI(s), the date, the location, and the grant number (this is very important for tracking the impact of these summer institutes). While the PI is important, though, as the cited ideas do not come from the PI but rather the entire institute, I have chosen to place the institute name first.

“Network Analysis for the Humanities.” August 15-27, 2010. ODH Institute for Advanced Topics in the Digital Humanities: HT-50016-09. Tim Tangherlini, PI. https://securegrants.neh.gov/PublicQuery/main.aspx?f=1&gn=HT-50016-09.

“Computer Simulations in the Humanities.” June 1-17, 2011. ODH Institute for Advanced Topics in the Digital Humanities: HT-50030-10. Marvin J. Croy, PI. https://securegrants.neh.gov/PublicQuery/main.aspx?f=1&gn=HT-50030-10

What thoughts?

Early Modern Letters Online

Early modern history! Science! Letters! Data! Four of my favoritest things have been combined in this brand new beta release of Early Modern Letters Online from Oxford University.

EMLO Logo

Summary

EMLO (what an adorable acronym, I kind of what to tickle it) is Oxford’s answer to a metadata database (metadatabase?) of, you guessed it, early modern letters. This is pretty much a gold standard metadata project. It’s still in beta, so there are some interface kinks and desirable features not-yet-implemented, but it has all the right ingredients for a great project:

  • Information is free and open; I’m even told it will be downloadable at some point.
  • Developed by a combination of historians (via Cultures of Knowledge) and librarians (via the Bodleian Library) working in tandem.
  • The interface is fast, easy, and includes faceted browsing.
  • Has a fantastic interface for adding your own data.
  • Actually includes citation guidelines thank you so much.
  • Visualizations for at-a-glance understanding of data.
  • Links to full transcripts, abstracts, and hard-copies where available.
  • Lots of other fantastic things.

Sorry if I go on about how fantastic this catalog is – like I said, I love letters so much. The index itself includes roughly 12,000 people, 4,000 locations, 60,000 letters, 9,000 images, and 26,000 additional comments. It is without a doubt the largest public letters database currently available. Between the data being compiled by this group, along with that of the CKCC in the Netherlands, the Electronic Enlightenment Project at Oxford, Stanford’s Mapping the Republic of Letters project, and R.A. Hatch‘s research collection, there will without a doubt soon be hundreds of thousands of letters which can be tracked, read, and analyzed with absolute ease. The mind boggles.

Bodleian Card Catalogue Summaries

Without a doubt, the coolest and most unique feature this project brings to the table is the digitization of Bodleian Card Catalogue, a fifty-two drawer index-card cabinet filled with summaries of nearly 50,000 letters held in the library, all compiled by the Bodleian staff many years ago. In lieu of full transcriptions, digitizations, or translations, these summary cards are an amazing resource by themselves. Many of the letters in the EMLO collection include these summaries as full-text abstracts.

One of the Bodleian summaries showing Heinsius looking far and wide for primary sources, much like we’re doing right now…

The collection also includes the correspondences of John Aubrey (1,037 letters), Comenius (526), Hartlib (4,589 many including transcripts), Edward Lhwyd (2,139 many including transcripts), Martin Lister (1,141), John Selden (355), and John Wallis (2,002). The advanced search allows you to look for only letters with full transcripts or abstracts available. As someone who’s worked with a lot of letters catalogs of varying qualities, it is refreshing to see this one being upfront about unknown/uncertain values. It would, however, be nice if they included the editor’s best guess of dates and locations, or perhaps inferred locations/dates from the other information available. (For example, if birth and death dates are known, it is likely a letter was not written by someone before or after those dates.)

Visualizations

In the interest of full disclosure, I should note that, much like with the CKCC letters interface, I spent some time working with the Cultures of Knowledge team on visualizations for EMLO. Their group was absolutely fantastic to work with, with impressive resources and outstanding expertise. The result of the collaboration was the integration of visualizations in metadata summaries, the first of which is a simple bar chart showing the numbers of letters written, received, and mentioned in per year of any given individual in the catalog. Besides being useful for getting an at-a-glance idea of the data, these charts actually proved really useful for data cleaning.

Sir Robert Crane (1604-1643)

In the above screenshot from previous versions of the data, Robert Crane is shown to have been addressed letters in the mid 1650s, several years after his reported death. While these could also have been spotted automatically, there are many instances where a few letters are dated very close to a birth or death date, and they often turn out to miss-reported. Visualizations can be great tools for data cleaning as a form of sanity test. This is the new, corrected version of Robert Crane’s page. They are using d3.js, a fantastic javascript library for building visualizations.

Because I can’t do anything with letters without looking at them as a network, I decided to put together some visualizations using Sci2 and Gephi. In both cases, the Sci2 tool was used for data preparation and analysis, and the final network was visualized in GUESS and Gephi, respectively. The first graph shows network in detail with edges, and names visible for the most “central” correspondents. The second visualization is without edges, with each correspondent clustered according to their place in the overall network, with the most prominent figures in each cluster visible.

Built with Sci2/Guess
Built with Sci2/Gephi

The graphs show us that this is not a fully connected network. There are many islands of one or two letters or a small handful of letters. These can be indicative of a prestige bias in the data. That is, the collection contains many letters from the most prestigious correspondents, and increasingly fewer as the prestige of the correspondent decreases. Put in another way, there are many letters from a few, and few letters from many. This is a characteristic shared with power law and other “long tail” distributions. The jumbled community structure at the center of the second graph is especially interesting, and it would be worth comparing these communities against institutions and informal societies at the time. Knowledge of large-scale patterns in a network can help determine what sort of analyses are best for the data at hand. More on this in particular will be coming in the next few weeks.

It’s also worth pointing out these visualizations as another tool for data-checking. You may notice, on the bottom left-hand corner of the first network visualization, two separate Edward Lhwyds with virtually the same networks of correspondence. This meant there were two distinct entities in their database referring to the same individual – a problem which has since been corrected.

More Letters!

Notice that the EMLO site makes it very clear that they are open to contributions. There are many letters datasets out there, some digitized, some still languishing idly on dead trees, and until they are all combined, we will be limited in the scope of the research possible. We can always use more. If you are in any way responsible for an early-modern letters collection, meta-data or full-text, please help by opening that collection up and making it integrable with the other sets out there. It will do the scholarly world a great service, and get us that much closer to understanding the processes underlying scholarly communication in general. The folks at Oxford are providing a great example, and I look forward to watching this project as it grows and improves.

Pledges

I know I’m a little late to the game, but open access is important year-round, and I only just recently got the chance to write these up. Below are my pledges to open access, which can also be found on the navigation tab above.

The system of pay-to-subscribe journals that spent so many centuries helping the scholarly landscape coordinate and collaborate is now obsolete; a vestigial organ in the body of science.

These days, most universities offer free web access and web hosting. These two elements are necessary, though not sufficient, for a free knowledge economy. We also need peer review (or some other, better form of quality control), improved reputation management (citations++), and some assurance that data/information will last. These come at a cost, but those costs can be paid by the entire scholarly market, and the fruits enjoyed within and without.

If you think open access is important, you should also consider pledging to support open access. Publishing companies have a lot of money invested in keeping things as they are, and only a concerted effort on behalf of the scholars feeding and using the system will be able to change it.

Scholarship is no longer local, and it’s about time our distribution system followed suit.

—-

I pledge to be a good scholarly citizen. This includes:

  • Opening all data generated by me for the purpose of a publication at the time of publication. 1
  • Opening all code generated by me for the purpose of a publication at the time of publication.
  • Freely distributing all published material for which I have the right, and fighting to retain those rights in situations where that is not the case.
  • Fighting for open access of all materials worked on as a co-author, participant in a grant, or consultant on a project.
I pledge to support open access by:
  • Only reviewing for journals which plan to release their publications openly.
  • Donating to free open source software initiatives where I would otherwise have paid for proprietary software.
  • Citing open publications if there is a choice between two otherwise equivalent sources.
I pledge never to let work get in the way of play.
I pledge to give people chocolate occasionally if I think they’re awesome.
_

Notes:

  1. unless there are human subjects and privacy concerns

Scientonomy

or Yet Another New Name.

Scientonomy. n.
1. The scientific study of science and scientists, especially their interactions, creative activities, and specific objects of research.
2. A system of knowledge or beliefs about science, broadly construed.

I hope science to be taken in its broader sense, like the German’s wissenschaft, described by Wikipedia as “any study or science that involves systematic research and teaching.” This extends scientonomy to the study of most subjects taught in academia, and many that exist well outside of it. Also, it’s worth noting that “the scientific study of…” should also be taken as wissenschaft; that is, using more than just natural science methodologies to study science. This includes methods from the humanities.

Science comes from a Latin word meaning to know,” and it is knowledge and its creation and assorted practices I wish to explore. The suffix -nomy is ancient Greek, meaning law, custom, arrangement, or system of rules. They come from two different languages; deal with it. I would use episteme rather than scientia, however its connotations are too loaded, and it is too separate from its brother techne, to be useful for my purposes.

It is important that I use the root science, as this project does not seek to understand knowledge in a vacuum, or various possibilities of how knowledge and knowledge creation may work, but rather how  humanity has actually practiced scientific creation and distribution, and the associations and repercussions those practices have had (and gleaned from) the world at large.

The suffix -onomy is the natural choice for two reasons. First, scientonomy could be an unobtrusive measurement in the same way astronomy is. That is, the act of collecting and analyzing scientonomic data in a way that does not intrude on the science and scientists themselves, from a distance and using their traces, much like the way astronomers view their subjects from a distance without direct experimentation. This in no way means scientonomy would make no mark on science; indeed, much like astronomy helped pave the way for the space program and allowed us to put footprints on the moon, scientonomy has the power to greatly affect the objects of its study.

Boyack, Klavans, and others

Like scientometrics, from which springs the dreaded h-index and other terrifying ways of measuring scientific output, scientonomy wields a dangerous weapon: the power to positively or negatively affect the scientific process. Scientonomy should be cautious, but not lame; we should work to improve the rate and process of scientific discovery and dissemination, we just need to be extremely careful about it.

The second reason for –onomy is a bit sillier, and possibly somewhat self-serving. All the other good names were taken, and already mean slightly different things. We already have Science of Science (Burnet, 1774; Fichte, 1808; Ossowska & Ossowski 1935; Goldsmith 1966) which is actually pretty close to what I’m doing, but not a terribly catchy name; Scientometrics (Price, 1963) which focuses a bit too much on communicative traces at the expense of, say, philosophical accounts; Scientosophy (Goldsmith 1966; Konner, 2007) which sounds too much like science as philosophy; Scientography (Goldsmith, 1966; Vladutz, before 1977; Garfield, 1986) which deals mostly with maps; Scientopograhy (Schubert & Braun, 1996) which focuses on geographic/scientific relations; as well as meta-scientific catch-alls like STS, HPS, Sociology of Science, etc. which all have their own associated practices, all of which have a place in scientonomy. There’s also Scientology, which I won’t even bother getting into here, and (hopefully) has no place in scientonomy.