Pledges

I know I’m a little late to the game, but open access is important year-round, and I only just recently got the chance to write these up. Below are my pledges to open access, which can also be found on the navigation tab above.

The system of pay-to-subscribe journals that spent so many centuries helping the scholarly landscape coordinate and collaborate is now obsolete; a vestigial organ in the body of science.

These days, most universities offer free web access and web hosting. These two elements are necessary, though not sufficient, for a free knowledge economy. We also need peer review (or some other, better form of quality control), improved reputation management (citations++), and some assurance that data/information will last. These come at a cost, but those costs can be paid by the entire scholarly market, and the fruits enjoyed within and without.

If you think open access is important, you should also consider pledging to support open access. Publishing companies have a lot of money invested in keeping things as they are, and only a concerted effort on behalf of the scholars feeding and using the system will be able to change it.

Scholarship is no longer local, and it’s about time our distribution system followed suit.

—-

I pledge to be a good scholarly citizen. This includes:

  • Opening all data generated by me for the purpose of a publication at the time of publication. 1
  • Opening all code generated by me for the purpose of a publication at the time of publication.
  • Freely distributing all published material for which I have the right, and fighting to retain those rights in situations where that is not the case.
  • Fighting for open access of all materials worked on as a co-author, participant in a grant, or consultant on a project.
I pledge to support open access by:
  • Only reviewing for journals which plan to release their publications openly.
  • Donating to free open source software initiatives where I would otherwise have paid for proprietary software.
  • Citing open publications if there is a choice between two otherwise equivalent sources.
I pledge never to let work get in the way of play.
I pledge to give people chocolate occasionally if I think they’re awesome.
_

Notes:

  1. unless there are human subjects and privacy concerns

Are we bad social scientists?

There has been a recent slew of fantastic posts about critical theory and discourse in the digital humanities. To sum up: hacking, yacking, we need more of it, we already have enough of it thank you very much, just deal with the French names already, openness, data, Hope! The unabridged version is available for free at an Internet near you. At this point in the conversation, it seems the majority involved agree that the digital needs more humanity, the humans need more digital, and the two aren’t necessarily as distinct as they seem.

The conversation reminds me of a theme that came at the NEH Institute on Computer Simulations in the Humanities this past summer. At the beginning of the workshop, Tony Beavers introduced himself as a Bad Humanist. What is a bad humanist? We tossed the phrase out a lot during those three weeks — we even made ourselves a shirt — but there was never much real discussion of what that meant. We just had the general sense that we were all relatively bad humanists.

One participant was from “The Centre for Exact Humanities” (what is that everyone else is doing?) in Hyderabad, India; many participants had backgrounds in programming or mathematics or economics. All of our projects were heavily computational, some were economic or arguably positivist, and absolutely none of them felt like anything I’d ever read in a humanities journal. Are these sorts of computational humanistic projects Bad Humanities? Of course the question is absurd. These are not Bad Humanities projects, they’re simply new types of research. They are created by people with humanities training, who are studying things about humans and doing so in legitimate (if as-yet-untested) ways.

Stephen Crowley printed this wonderful t-shirt for the workshop participants.

Fast forward to this October at the bounceback for NEH’s Network Analysis in the Humanities summer institute. The same guy who called himself a bad humanist, Tony Beavers, brought up the question of whether we were just adopting old social science methods without bothering to become familiar with the theory behind the social science. As he put it, “are we just bad social scientists?” There is a real danger in adopting tools and methods developed outside of our field for our own uses, especially if we lack the training to know their limitations.

In my mind, however, both the ideas of a bad humanist (lacking the appropriate yack) or of a bad social scientist (lacking the appropriate hack) fundamentally miss the point. The discourse and theory discussion has touched on the changing notions of disciplinarity, as did I the other day. A lot of us are writing and working on projects that don’t fit well within traditional disciplinary structures; their subjects and methods draw liberally from history, linguistics, computer science, sociology, complexity theory, and whatever else seems necessary at the time.

As long as we remain aware of and well-grounded in whatever we’re drawing from, it doesn’t really matter what we call what we do — so long as it’s done well. People studying humans would do well not to ignore the last half-century of humanities research. People using, for example, network analysis should become very familiar with its theoretical and methodological limitations. By and large, though, the computational humanities projects I’ve come across are thoughtful, well-informed, and ultimately good research. Whether it actually is still good humanities, good social science, or good anything else doesn’t feel terribly relevant.

Who am I?

As this blog is still quite new, and I’m still nigh-unknown, now would probably be a good time to mark my scholarly territory. Instead of writing a long description that nobody would read, I figured I’d take a cue from my own data-oriented research and analyze everything I’ve read over the last year. The pictures below give a pretty accurate representation of my research interests.

I’ll post a long tutorial on exactly how to replicate this later, but the process was fairly straightforward and required no programming or complicated data manipulation. First, I exported all my Zotero references since last October in BibTeX format, a common bibliographic standard. I imported that file into the Sci² Tool, a data analysis and visualization tool developed at the center I work in, and normalized all the words in the titles and abstracts. That is, “applied,” “applies” and “apply” were all merged into one entity. I got a raw count of word use and stuck it in everybody’s favorite word cloud tool, Wordle, and the results of that is the first image below. [Post-publication note: Angela does not approve of my word-cloud. I can’t say I blame her. Word clouds are almost wholly useless, but at least it’s still pretty.]

I then used Sci² to extract a word co-occurrence network, connecting two words if they appeared together within the title+abstract of a paper or book I’d read. If they appeared together once, they were appended with a score of 1, if they appeared together twice, 2, and so on. I then re-weighted the connections by exclusivity; that is, if two words appeared exclusively with one another, they scored higher. “Republ” appeared 32 times, “Letter” appeared 47 times, and 31 of those times they appeared together, so their connection is quite strong. On the other hand, “Scienc” appeared 175 times, “Concept” 120 times, but they only appeared together 32 times, so their connection is much weaker. “Republ” and “Letter” appeared with one another just as frequently as “Scienc” and “Concept,” but because “Scienc” and “Concept” were so much more widely used, their connection score is lower.

Once the general network was created, I loaded the data into Gephi, a great new network visualization tool. Gephi clustered the network based on what words co-occurred frequently, and colored the words and their connections based on that clustering. The results are below (click the image to enlarge it).

These images sum up my research interests fairly well, and a look at the network certainly splits my research into the various fields and subfields I often draw from. Neither of these graphics are particularly sophisticated, but they do give a good at-a-glance notion of the scholarly landscape from my perspective. In the coming weeks, I will post tutorials to create these and similar data visualizations or analyses with off-the-shelf tools, so stay-tuned.

#humnets paper/review

UCLA’s Networks and Network Analysis for the Humanities this past weekend did not fail to impress. Tim Tangherlini and his mathemagical imps returned in true form, organizing a really impressively realized (and predictably jam-packed) conference that left the participants excited, exhausted, enlightened, and unanimously shouting for more next year (and the year after, and the year after that, and the year after that…) I cannot thank the ODH enough for facilitating this and similar events.

Some particular highlights included Graham Sack’s exceptionally robust comparative analysis of a few hundred early English novels (watch out for him, he’s going to be a Heavy Hitter), Sarah Horowitz‘s really convincing use of epistolary network analysis to weave the importance of women (specifically salonières) in holding together the fabric of French high society, Rob Nelson’s further work on the always impressive Mining the Dispatch, Peter Leonard‘s thoughtful and important discussion on combining text and network analysis (hint: visuals are the way to go), Jon Kleinberg‘s super fantastic wonderful keynote lecture, Glen Worthey‘s inspiring talk about not needing All Of It, Russell Horton’s rhymes, Song Chen‘s rigorous analysis of early Asian family ties, and, well, everyone else’s everything else.

Especially interesting were the discussions, raised most particularly by Kleinberg and Hoyt Long, about what particularly we were looking at when we constructed these networks. The union of so many subjective experiences surely is not the objective truth, but neither is it a proxy of objective truth – what, then, is it? I’m inclined to say that this Big Data aggregated from individual experiences provides us a baseline subjective reality that provides us local basins of attraction; that is, trends we see are measures of how likely a certain person will experience the world in a certain way when situated in whatever part of the network/world they reside. More thought and research must go into what the global and local meaning of this Big Data, and will definitely reveal very interesting results.

 

My talk on bias also seemed to stir some discussion. I gave up counting how many participants looked at me during their presentations and said “and of course the data is biased, but this is preliminary, and this is what I came up with and what justifies that conclusion.” And of course the issues I raised were not new; further, everybody in attendance was already aware of them. What I hoped my presentation to inspire, and it seems to have been successful, was the open discussion of data biases and constraints it puts on conclusions within the context of the presentation of those conclusions.

Some of us were joking that the issues of bias means “you don’t know, you can’t ever know what you don’t know, and you should just give up now.” This is exactly opposite to the point. As long as we’re open an honest about what we do not or cannot know, we can make claims around those gaps, inferring and guessing where we need to, and let the reader decide whether our careful analysis and historical inferences are sufficient to support the conclusions we draw. Honesty is more important than completeness or unshakable proof; indeed, neither of those are yet possible in most of what we study.

 

There was some twittertalk surrounding my presentation, so here’s my draft/notes for anyone interested (click ‘continue reading’ to view):

Continue reading “#humnets paper/review”

#humnets preview

Last year, Tim Tangherlini and his magical crew of folkloric imps and applied mathematicians put together a most fantastic and exhausting workshop on networks and network analysis in the humanities. We called it #humnets for short. The workshop (one of the oh-so-fantastic ODH Summer Institutes) spanned two weeks, bringing together forward-thinking humanists and Big Deals in network science and computer science. Now, a year and a half later, we’re all reuniting (bouncing back?) at UCLA to show off all the fantastic network-y humanist-y projects we’ve come up with in the interim.

As of a few weeks ago, I was all set to present my findings from analyzing and modeling the correspondence networks of early-modern scholars. Unfortunately (for me, but perhaps fortunately for everyone else), some new data came in that Changed Everything and invalidated many of my conclusions. I was faced with a dilemma; present my research as it was before I learned about the new data (after all, it was still a good example of using networks in the humanities), or retool everything to fit the new data.

Unfortunately, there was no time to do the latter, and doing the former felt icky and dishonest. In keeping with Tony Beaver’s statement at UCLA last year (“Everything you can do I can do meta,”) I ultimately decided to present a paper on precisely the problem that foiled my presentation: systematic bias. Biases need not be an issue of methodology; you can do everything right methodologically, you can design a perfect experiment, and a systematic bias can still thwart the accuracy of a project. The bias can be due to the available observable data itself (external selection bias), it may be due to how we as researchers decide to collect that data (sample selection bias), or it may be how we decide to use the data we’ve collected (confirmation bias).

There is a small-but-growing precedent of literature on the effects of bias on network analysis. I’ll refer to it briefly in my talk at UCLA, but below is a list of the best references I’ve found on the matter. Most of them deal with sample selection bias, and none of them deal with the humanities.

For those of you who’ve read this far, congratulations! Here’s a preview of my Friday presentation (I’ll post the notes on Friday).

 

——–

Effects of bias on network analysis condensed bibliography:

  • Achlioptas, Dimitris, Aaron Clauset, David Kempe, and Cristopher Moore. 2005. On the bias of traceroute sampling. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, 694. ACM Press. doi:10.1145/1060590.1060693. http://dl.acm.org/citation.cfm?id=1060693.
  • ———. 2009. “On the bias of traceroute sampling.” Journal of the ACM 56 (June 1): 1-28. doi:10.1145/1538902.1538905.
  • Costenbader, Elizabeth, and Thomas W Valente. 2003. “The stability of centrality measures when networks are sampled.” Social Networks 25 (4) (October): 283-307. doi:10.1016/S0378-8733(03)00012-1.
  • Gjoka, M., M. Kurant, C. T Butts, and A. Markopoulou. 2010. Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In 2010 Proceedings IEEE INFOCOM, 1-9. IEEE, March 14. doi:10.1109/INFCOM.2010.5462078.
  • Gjoka, Minas, Maciej Kurant, Carter T Butts, and Athina Markopoulou. 2011. “Practical Recommendations on Crawling Online Social Networks.” IEEE Journal on Selected Areas in Communications 29 (9) (October): 1872-1892. doi:10.1109/JSAC.2011.111011.
  • Golub, B., and M. O. Jackson. 2010. “From the Cover: Using selection bias to explain the observed structure of Internet diffusions.” Proceedings of the National Academy of Sciences 107 (June 3): 10833-10836. doi:10.1073/pnas.1000814107.
  • Henzinger, Monika R., Allan Heydon, Michael Mitzenmacher, and Marc Najork. 2000. “On near-uniform URL sampling.” Computer Networks 33 (1-6) (June): 295-308. doi:10.1016/S1389-1286(00)00055-4.
  • Kim, P.-J., and H. Jeong. 2007. “Reliability of rank order in sampled networks.” The European Physical Journal B 55 (February 7): 109-114. doi:10.1140/epjb/e2007-00033-7.
  • Kurant, Maciej, Athina Markopoulou, and P. Thiran. 2010. On the bias of BFS (Breadth First Search). In Teletraffic Congress (ITC), 2010 22nd International, 1-8. IEEE, September 7. doi:10.1109/ITC.2010.5608727.
  • Lakhina, Anukool, John W. Byers, Mark Crovella, and Peng Xie. 2003. Sampling biases in IP topology measurements. In INFOCOM 2003. Twenty-Second Annual Joint Conference of the IEEE Computer and Communications. IEEE Societies, 1:332- 341 vol.1. IEEE, April 30. doi:10.1109/INFCOM.2003.1208685.
  • Latapy, Matthieu, and Clemence Magnien. 2008. Complex Network Measurements: Estimating the Relevance of Observed Properties. In IEEE INFOCOM 2008. The 27th Conference on Computer Communications, 1660-1668. IEEE, April 13. doi:10.1109/INFOCOM.2008.227.
  • Maiya, Arun S. 2011. Sampling and Inference in Complex Networks. Chicago: University of Illinois at Chicago, April. http://arun.maiya.net/papers/asmthesis.pdf.
  • Pedarsani, Pedram, Daniel R. Figueiredo, and Matthias Grossglauser. 2008. Densification arising from sampling fixed graphs. In Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 205. ACM Press. doi:10.1145/1375457.1375481. http://portal.acm.org/citation.cfm?doid=1375457.1375481.
  • Stumpf, Michael P. H., Carsten Wiuf, and Robert M. May. 2005. “Subnets of scale-free networks are not scale-free: Sampling properties of networks.” Proceedings of the National Academy of Sciences of the United States of America 102 (12) (March 22): 4221 -4224. doi:10.1073/pnas.0501179102.
  • Stutzbach, Daniel, Reza Rejaie, Nick Duffield, Subhabrata Sen, and Walter Willinger. 2009. “On Unbiased Sampling for Unstructured Peer-to-Peer Networks.” IEEE/ACM Transactions on Networking 17 (2) (April): 377-390. doi:10.1109/TNET.2008.2001730.

——–

Effects of selection bias on historical/sociological research condensed bibliography:

  • Berk, Richard A. 1983. “An Introduction to Sample Selection Bias in Sociological Data.” American Sociological Review 48 (3) (June 1): 386-398. doi:10.2307/2095230.
  • Bryant, Joseph M. 1994. “Evidence and Explanation in History and Sociology: Critical Reflections on Goldthorpe’s Critique of Historical Sociology.” The British Journal of Sociology 45 (1) (March 1): 3-19. doi:10.2307/591521.
  • ———. 2000. “On sources and narratives in historical social science: a realist critique of positivist and postmodernist epistemologies.” The British Journal of Sociology 51 (3) (September 1): 489-523. doi:10.1111/j.1468-4446.2000.00489.x.
  • Duncan Baretta, Silvio R., John Markoff, and Gilbert Shapiro. 1987. “The selective Transmission of Historical Documents: The Case of the Parish Cahiers of 1789.” Histoire & Mesure 2: 115-172. doi:10.3406/hism.1987.1328.
  • Goldthorpe, John H. 1991. “The Uses of History in Sociology: Reflections on Some Recent Tendencies.” The British Journal of Sociology 42 (2) (June 1): 211-230. doi:10.2307/590368.
  • ———. 1994. “The Uses of History in Sociology: A Reply.” The British Journal of Sociology 45 (1) (March 1): 55-77. doi:10.2307/591525.
  • Jensen, Richard. 1984. “Review: Ethnometrics.” Journal of American Ethnic History 3 (2) (April 1): 67-73.
  • Kosso, Peter. 2009. Philosophy of Historiography. In A Companion to the Philosophy of History and Historiography, 7-25. http://onlinelibrary.wiley.com/doi/10.1002/9781444304916.ch2/summary.
  • Kreuzer, Marcus. 2010. “Historical Knowledge and Quantitative Analysis: The Case of the Origins of Proportional Representation.” American Political Science Review 104 (02): 369-392. doi:10.1017/S0003055410000122.
  • Lang, Gladys Engel, and Kurt Lang. 1988. “Recognition and Renown: The Survival of Artistic Reputation.” American Journal of Sociology 94 (1) (July 1): 79-109.
  • Lustick, Ian S. 1996. “History, Historiography, and Political Science: Multiple Historical Records and the Problem of Selection Bias.” The American Political Science Review 90 (3): 605-618. doi:10.2307/2082612.
  • Mariampolski, Hyman, and Dana C. Hughes. 1978. “The Use of Personal Documents in Historical Sociology.” The American Sociologist 13 (2) (May 1): 104-113.
  • Murphey, Murray G. 1973. Our Knowledge of the Historical Past. Macmillan Pub Co, January.
  • Murphey, Murray G. 1994. Philosophical foundations of historical knowledge. State Univ of New York Pr, July.
  • Rubin, Ernest. 1943. “The Place of Statistical Methods in Modern Historiography.” American Journal of Economics and Sociology 2 (2) (January 1): 193-210.
  • Schatzki, Theodore. 2006. “On Studying the Past Scientifically∗.” Inquiry 49 (4) (August): 380-399. doi:10.1080/00201740600831505.
  • Wellman, Barry, and Charles Wetherell. 1996. “Social network analysis of historical communities: Some questions from the present for the past.” The History of the Family 1 (1): 97-121. doi:10.1016/S1081-602X(96)90022-6.

Scientonomy

or Yet Another New Name.

Scientonomy. n.
1. The scientific study of science and scientists, especially their interactions, creative activities, and specific objects of research.
2. A system of knowledge or beliefs about science, broadly construed.

I hope science to be taken in its broader sense, like the German’s wissenschaft, described by Wikipedia as “any study or science that involves systematic research and teaching.” This extends scientonomy to the study of most subjects taught in academia, and many that exist well outside of it. Also, it’s worth noting that “the scientific study of…” should also be taken as wissenschaft; that is, using more than just natural science methodologies to study science. This includes methods from the humanities.

Science comes from a Latin word meaning to know,” and it is knowledge and its creation and assorted practices I wish to explore. The suffix -nomy is ancient Greek, meaning law, custom, arrangement, or system of rules. They come from two different languages; deal with it. I would use episteme rather than scientia, however its connotations are too loaded, and it is too separate from its brother techne, to be useful for my purposes.

It is important that I use the root science, as this project does not seek to understand knowledge in a vacuum, or various possibilities of how knowledge and knowledge creation may work, but rather how  humanity has actually practiced scientific creation and distribution, and the associations and repercussions those practices have had (and gleaned from) the world at large.

The suffix -onomy is the natural choice for two reasons. First, scientonomy could be an unobtrusive measurement in the same way astronomy is. That is, the act of collecting and analyzing scientonomic data in a way that does not intrude on the science and scientists themselves, from a distance and using their traces, much like the way astronomers view their subjects from a distance without direct experimentation. This in no way means scientonomy would make no mark on science; indeed, much like astronomy helped pave the way for the space program and allowed us to put footprints on the moon, scientonomy has the power to greatly affect the objects of its study.

Boyack, Klavans, and others

Like scientometrics, from which springs the dreaded h-index and other terrifying ways of measuring scientific output, scientonomy wields a dangerous weapon: the power to positively or negatively affect the scientific process. Scientonomy should be cautious, but not lame; we should work to improve the rate and process of scientific discovery and dissemination, we just need to be extremely careful about it.

The second reason for –onomy is a bit sillier, and possibly somewhat self-serving. All the other good names were taken, and already mean slightly different things. We already have Science of Science (Burnet, 1774; Fichte, 1808; Ossowska & Ossowski 1935; Goldsmith 1966) which is actually pretty close to what I’m doing, but not a terribly catchy name; Scientometrics (Price, 1963) which focuses a bit too much on communicative traces at the expense of, say, philosophical accounts; Scientosophy (Goldsmith 1966; Konner, 2007) which sounds too much like science as philosophy; Scientography (Goldsmith, 1966; Vladutz, before 1977; Garfield, 1986) which deals mostly with maps; Scientopograhy (Schubert & Braun, 1996) which focuses on geographic/scientific relations; as well as meta-scientific catch-alls like STS, HPS, Sociology of Science, etc. which all have their own associated practices, all of which have a place in scientonomy. There’s also Scientology, which I won’t even bother getting into here, and (hopefully) has no place in scientonomy.