Are we bad social scientists?

There has been a recent slew of fantastic posts about critical theory and discourse in the digital humanities. To sum up: hacking, yacking, we need more of it, we already have enough of it thank you very much, just deal with the French names already, openness, data, Hope! The unabridged version is available for free at an Internet near you. At this point in the conversation, it seems the majority involved agree that the digital needs more humanity, the humans need more digital, and the two aren’t necessarily as distinct as they seem.

The conversation reminds me of a theme that came at the NEH Institute on Computer Simulations in the Humanities this past summer. At the beginning of the workshop, Tony Beavers introduced himself as a Bad Humanist. What is a bad humanist? We tossed the phrase out a lot during those three weeks — we even made ourselves a shirt — but there was never much real discussion of what that meant. We just had the general sense that we were all relatively bad humanists.

One participant was from “The Centre for Exact Humanities” (what is that everyone else is doing?) in Hyderabad, India; many participants had backgrounds in programming or mathematics or economics. All of our projects were heavily computational, some were economic or arguably positivist, and absolutely none of them felt like anything I’d ever read in a humanities journal. Are these sorts of computational humanistic projects Bad Humanities? Of course the question is absurd. These are not Bad Humanities projects, they’re simply new types of research. They are created by people with humanities training, who are studying things about humans and doing so in legitimate (if as-yet-untested) ways.

Stephen Crowley printed this wonderful t-shirt for the workshop participants.

Fast forward to this October at the bounceback for NEH’s Network Analysis in the Humanities summer institute. The same guy who called himself a bad humanist, Tony Beavers, brought up the question of whether we were just adopting old social science methods without bothering to become familiar with the theory behind the social science. As he put it, “are we just bad social scientists?” There is a real danger in adopting tools and methods developed outside of our field for our own uses, especially if we lack the training to know their limitations.

In my mind, however, both the ideas of a bad humanist (lacking the appropriate yack) or of a bad social scientist (lacking the appropriate hack) fundamentally miss the point. The discourse and theory discussion has touched on the changing notions of disciplinarity, as did I the other day. A lot of us are writing and working on projects that don’t fit well within traditional disciplinary structures; their subjects and methods draw liberally from history, linguistics, computer science, sociology, complexity theory, and whatever else seems necessary at the time.

As long as we remain aware of and well-grounded in whatever we’re drawing from, it doesn’t really matter what we call what we do — so long as it’s done well. People studying humans would do well not to ignore the last half-century of humanities research. People using, for example, network analysis should become very familiar with its theoretical and methodological limitations. By and large, though, the computational humanities projects I’ve come across are thoughtful, well-informed, and ultimately good research. Whether it actually is still good humanities, good social science, or good anything else doesn’t feel terribly relevant.

#humnets paper/review

UCLA’s Networks and Network Analysis for the Humanities this past weekend did not fail to impress. Tim Tangherlini and his mathemagical imps returned in true form, organizing a really impressively realized (and predictably jam-packed) conference that left the participants excited, exhausted, enlightened, and unanimously shouting for more next year (and the year after, and the year after that, and the year after that…) I cannot thank the ODH enough for facilitating this and similar events.

Some particular highlights included Graham Sack’s exceptionally robust comparative analysis of a few hundred early English novels (watch out for him, he’s going to be a Heavy Hitter), Sarah Horowitz‘s really convincing use of epistolary network analysis to weave the importance of women (specifically salonières) in holding together the fabric of French high society, Rob Nelson’s further work on the always impressive Mining the Dispatch, Peter Leonard‘s thoughtful and important discussion on combining text and network analysis (hint: visuals are the way to go), Jon Kleinberg‘s super fantastic wonderful keynote lecture, Glen Worthey‘s inspiring talk about not needing All Of It, Russell Horton’s rhymes, Song Chen‘s rigorous analysis of early Asian family ties, and, well, everyone else’s everything else.

Especially interesting were the discussions, raised most particularly by Kleinberg and Hoyt Long, about what particularly we were looking at when we constructed these networks. The union of so many subjective experiences surely is not the objective truth, but neither is it a proxy of objective truth – what, then, is it? I’m inclined to say that this Big Data aggregated from individual experiences provides us a baseline subjective reality that provides us local basins of attraction; that is, trends we see are measures of how likely a certain person will experience the world in a certain way when situated in whatever part of the network/world they reside. More thought and research must go into what the global and local meaning of this Big Data, and will definitely reveal very interesting results.

 

My talk on bias also seemed to stir some discussion. I gave up counting how many participants looked at me during their presentations and said “and of course the data is biased, but this is preliminary, and this is what I came up with and what justifies that conclusion.” And of course the issues I raised were not new; further, everybody in attendance was already aware of them. What I hoped my presentation to inspire, and it seems to have been successful, was the open discussion of data biases and constraints it puts on conclusions within the context of the presentation of those conclusions.

Some of us were joking that the issues of bias means “you don’t know, you can’t ever know what you don’t know, and you should just give up now.” This is exactly opposite to the point. As long as we’re open an honest about what we do not or cannot know, we can make claims around those gaps, inferring and guessing where we need to, and let the reader decide whether our careful analysis and historical inferences are sufficient to support the conclusions we draw. Honesty is more important than completeness or unshakable proof; indeed, neither of those are yet possible in most of what we study.

 

There was some twittertalk surrounding my presentation, so here’s my draft/notes for anyone interested (click ‘continue reading’ to view):

Continue reading “#humnets paper/review”

#humnets preview

Last year, Tim Tangherlini and his magical crew of folkloric imps and applied mathematicians put together a most fantastic and exhausting workshop on networks and network analysis in the humanities. We called it #humnets for short. The workshop (one of the oh-so-fantastic ODH Summer Institutes) spanned two weeks, bringing together forward-thinking humanists and Big Deals in network science and computer science. Now, a year and a half later, we’re all reuniting (bouncing back?) at UCLA to show off all the fantastic network-y humanist-y projects we’ve come up with in the interim.

As of a few weeks ago, I was all set to present my findings from analyzing and modeling the correspondence networks of early-modern scholars. Unfortunately (for me, but perhaps fortunately for everyone else), some new data came in that Changed Everything and invalidated many of my conclusions. I was faced with a dilemma; present my research as it was before I learned about the new data (after all, it was still a good example of using networks in the humanities), or retool everything to fit the new data.

Unfortunately, there was no time to do the latter, and doing the former felt icky and dishonest. In keeping with Tony Beaver’s statement at UCLA last year (“Everything you can do I can do meta,”) I ultimately decided to present a paper on precisely the problem that foiled my presentation: systematic bias. Biases need not be an issue of methodology; you can do everything right methodologically, you can design a perfect experiment, and a systematic bias can still thwart the accuracy of a project. The bias can be due to the available observable data itself (external selection bias), it may be due to how we as researchers decide to collect that data (sample selection bias), or it may be how we decide to use the data we’ve collected (confirmation bias).

There is a small-but-growing precedent of literature on the effects of bias on network analysis. I’ll refer to it briefly in my talk at UCLA, but below is a list of the best references I’ve found on the matter. Most of them deal with sample selection bias, and none of them deal with the humanities.

For those of you who’ve read this far, congratulations! Here’s a preview of my Friday presentation (I’ll post the notes on Friday).

 

——–

Effects of bias on network analysis condensed bibliography:

  • Achlioptas, Dimitris, Aaron Clauset, David Kempe, and Cristopher Moore. 2005. On the bias of traceroute sampling. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, 694. ACM Press. doi:10.1145/1060590.1060693. http://dl.acm.org/citation.cfm?id=1060693.
  • ———. 2009. “On the bias of traceroute sampling.” Journal of the ACM 56 (June 1): 1-28. doi:10.1145/1538902.1538905.
  • Costenbader, Elizabeth, and Thomas W Valente. 2003. “The stability of centrality measures when networks are sampled.” Social Networks 25 (4) (October): 283-307. doi:10.1016/S0378-8733(03)00012-1.
  • Gjoka, M., M. Kurant, C. T Butts, and A. Markopoulou. 2010. Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In 2010 Proceedings IEEE INFOCOM, 1-9. IEEE, March 14. doi:10.1109/INFCOM.2010.5462078.
  • Gjoka, Minas, Maciej Kurant, Carter T Butts, and Athina Markopoulou. 2011. “Practical Recommendations on Crawling Online Social Networks.” IEEE Journal on Selected Areas in Communications 29 (9) (October): 1872-1892. doi:10.1109/JSAC.2011.111011.
  • Golub, B., and M. O. Jackson. 2010. “From the Cover: Using selection bias to explain the observed structure of Internet diffusions.” Proceedings of the National Academy of Sciences 107 (June 3): 10833-10836. doi:10.1073/pnas.1000814107.
  • Henzinger, Monika R., Allan Heydon, Michael Mitzenmacher, and Marc Najork. 2000. “On near-uniform URL sampling.” Computer Networks 33 (1-6) (June): 295-308. doi:10.1016/S1389-1286(00)00055-4.
  • Kim, P.-J., and H. Jeong. 2007. “Reliability of rank order in sampled networks.” The European Physical Journal B 55 (February 7): 109-114. doi:10.1140/epjb/e2007-00033-7.
  • Kurant, Maciej, Athina Markopoulou, and P. Thiran. 2010. On the bias of BFS (Breadth First Search). In Teletraffic Congress (ITC), 2010 22nd International, 1-8. IEEE, September 7. doi:10.1109/ITC.2010.5608727.
  • Lakhina, Anukool, John W. Byers, Mark Crovella, and Peng Xie. 2003. Sampling biases in IP topology measurements. In INFOCOM 2003. Twenty-Second Annual Joint Conference of the IEEE Computer and Communications. IEEE Societies, 1:332- 341 vol.1. IEEE, April 30. doi:10.1109/INFCOM.2003.1208685.
  • Latapy, Matthieu, and Clemence Magnien. 2008. Complex Network Measurements: Estimating the Relevance of Observed Properties. In IEEE INFOCOM 2008. The 27th Conference on Computer Communications, 1660-1668. IEEE, April 13. doi:10.1109/INFOCOM.2008.227.
  • Maiya, Arun S. 2011. Sampling and Inference in Complex Networks. Chicago: University of Illinois at Chicago, April. http://arun.maiya.net/papers/asmthesis.pdf.
  • Pedarsani, Pedram, Daniel R. Figueiredo, and Matthias Grossglauser. 2008. Densification arising from sampling fixed graphs. In Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 205. ACM Press. doi:10.1145/1375457.1375481. http://portal.acm.org/citation.cfm?doid=1375457.1375481.
  • Stumpf, Michael P. H., Carsten Wiuf, and Robert M. May. 2005. “Subnets of scale-free networks are not scale-free: Sampling properties of networks.” Proceedings of the National Academy of Sciences of the United States of America 102 (12) (March 22): 4221 -4224. doi:10.1073/pnas.0501179102.
  • Stutzbach, Daniel, Reza Rejaie, Nick Duffield, Subhabrata Sen, and Walter Willinger. 2009. “On Unbiased Sampling for Unstructured Peer-to-Peer Networks.” IEEE/ACM Transactions on Networking 17 (2) (April): 377-390. doi:10.1109/TNET.2008.2001730.

——–

Effects of selection bias on historical/sociological research condensed bibliography:

  • Berk, Richard A. 1983. “An Introduction to Sample Selection Bias in Sociological Data.” American Sociological Review 48 (3) (June 1): 386-398. doi:10.2307/2095230.
  • Bryant, Joseph M. 1994. “Evidence and Explanation in History and Sociology: Critical Reflections on Goldthorpe’s Critique of Historical Sociology.” The British Journal of Sociology 45 (1) (March 1): 3-19. doi:10.2307/591521.
  • ———. 2000. “On sources and narratives in historical social science: a realist critique of positivist and postmodernist epistemologies.” The British Journal of Sociology 51 (3) (September 1): 489-523. doi:10.1111/j.1468-4446.2000.00489.x.
  • Duncan Baretta, Silvio R., John Markoff, and Gilbert Shapiro. 1987. “The selective Transmission of Historical Documents: The Case of the Parish Cahiers of 1789.” Histoire & Mesure 2: 115-172. doi:10.3406/hism.1987.1328.
  • Goldthorpe, John H. 1991. “The Uses of History in Sociology: Reflections on Some Recent Tendencies.” The British Journal of Sociology 42 (2) (June 1): 211-230. doi:10.2307/590368.
  • ———. 1994. “The Uses of History in Sociology: A Reply.” The British Journal of Sociology 45 (1) (March 1): 55-77. doi:10.2307/591525.
  • Jensen, Richard. 1984. “Review: Ethnometrics.” Journal of American Ethnic History 3 (2) (April 1): 67-73.
  • Kosso, Peter. 2009. Philosophy of Historiography. In A Companion to the Philosophy of History and Historiography, 7-25. http://onlinelibrary.wiley.com/doi/10.1002/9781444304916.ch2/summary.
  • Kreuzer, Marcus. 2010. “Historical Knowledge and Quantitative Analysis: The Case of the Origins of Proportional Representation.” American Political Science Review 104 (02): 369-392. doi:10.1017/S0003055410000122.
  • Lang, Gladys Engel, and Kurt Lang. 1988. “Recognition and Renown: The Survival of Artistic Reputation.” American Journal of Sociology 94 (1) (July 1): 79-109.
  • Lustick, Ian S. 1996. “History, Historiography, and Political Science: Multiple Historical Records and the Problem of Selection Bias.” The American Political Science Review 90 (3): 605-618. doi:10.2307/2082612.
  • Mariampolski, Hyman, and Dana C. Hughes. 1978. “The Use of Personal Documents in Historical Sociology.” The American Sociologist 13 (2) (May 1): 104-113.
  • Murphey, Murray G. 1973. Our Knowledge of the Historical Past. Macmillan Pub Co, January.
  • Murphey, Murray G. 1994. Philosophical foundations of historical knowledge. State Univ of New York Pr, July.
  • Rubin, Ernest. 1943. “The Place of Statistical Methods in Modern Historiography.” American Journal of Economics and Sociology 2 (2) (January 1): 193-210.
  • Schatzki, Theodore. 2006. “On Studying the Past Scientifically∗.” Inquiry 49 (4) (August): 380-399. doi:10.1080/00201740600831505.
  • Wellman, Barry, and Charles Wetherell. 1996. “Social network analysis of historical communities: Some questions from the present for the past.” The History of the Family 1 (1): 97-121. doi:10.1016/S1081-602X(96)90022-6.