Last year, Tim Tangherlini and his magical crew of folkloric imps and applied mathematicians put together a most fantastic and exhausting workshop on networks and network analysis in the humanities. We called it #humnets for short. The workshop (one of the oh-so-fantastic ODH Summer Institutes) spanned two weeks, bringing together forward-thinking humanists and Big Deals in network science and computer science. Now, a year and a half later, we’re all reuniting (bouncing back?) at UCLA to show off all the fantastic network-y humanist-y projects we’ve come up with in the interim.
As of a few weeks ago, I was all set to present my findings from analyzing and modeling the correspondence networks of early-modern scholars. Unfortunately (for me, but perhaps fortunately for everyone else), some new data came in that Changed Everything and invalidated many of my conclusions. I was faced with a dilemma; present my research as it was before I learned about the new data (after all, it was still a good example of using networks in the humanities), or retool everything to fit the new data.
Unfortunately, there was no time to do the latter, and doing the former felt icky and dishonest. In keeping with Tony Beaver’s statement at UCLA last year (“Everything you can do I can do meta,”) I ultimately decided to present a paper on precisely the problem that foiled my presentation: systematic bias. Biases need not be an issue of methodology; you can do everything right methodologically, you can design a perfect experiment, and a systematic bias can still thwart the accuracy of a project. The bias can be due to the available observable data itself (external selection bias), it may be due to how we as researchers decide to collect that data (sample selection bias), or it may be how we decide to use the data we’ve collected (confirmation bias).
There is a small-but-growing precedent of literature on the effects of bias on network analysis. I’ll refer to it briefly in my talk at UCLA, but below is a list of the best references I’ve found on the matter. Most of them deal with sample selection bias, and none of them deal with the humanities.
For those of you who’ve read this far, congratulations! Here’s a preview of my Friday presentation (I’ll post the notes on Friday).
Effects of bias on network analysis condensed bibliography:
- Achlioptas, Dimitris, Aaron Clauset, David Kempe, and Cristopher Moore. 2005. On the bias of traceroute sampling. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, 694. ACM Press. doi:10.1145/1060590.1060693. http://dl.acm.org/citation.cfm?id=1060693.
- ———. 2009. “On the bias of traceroute sampling.” Journal of the ACM 56 (June 1): 1-28. doi:10.1145/1538902.1538905.
- Costenbader, Elizabeth, and Thomas W Valente. 2003. “The stability of centrality measures when networks are sampled.” Social Networks 25 (4) (October): 283-307. doi:10.1016/S0378-8733(03)00012-1.
- Gjoka, M., M. Kurant, C. T Butts, and A. Markopoulou. 2010. Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In 2010 Proceedings IEEE INFOCOM, 1-9. IEEE, March 14. doi:10.1109/INFCOM.2010.5462078.
- Gjoka, Minas, Maciej Kurant, Carter T Butts, and Athina Markopoulou. 2011. “Practical Recommendations on Crawling Online Social Networks.” IEEE Journal on Selected Areas in Communications 29 (9) (October): 1872-1892. doi:10.1109/JSAC.2011.111011.
- Golub, B., and M. O. Jackson. 2010. “From the Cover: Using selection bias to explain the observed structure of Internet diffusions.” Proceedings of the National Academy of Sciences 107 (June 3): 10833-10836. doi:10.1073/pnas.1000814107.
- Henzinger, Monika R., Allan Heydon, Michael Mitzenmacher, and Marc Najork. 2000. “On near-uniform URL sampling.” Computer Networks 33 (1-6) (June): 295-308. doi:10.1016/S1389-1286(00)00055-4.
- Kim, P.-J., and H. Jeong. 2007. “Reliability of rank order in sampled networks.” The European Physical Journal B 55 (February 7): 109-114. doi:10.1140/epjb/e2007-00033-7.
- Kurant, Maciej, Athina Markopoulou, and P. Thiran. 2010. On the bias of BFS (Breadth First Search). In Teletraffic Congress (ITC), 2010 22nd International, 1-8. IEEE, September 7. doi:10.1109/ITC.2010.5608727.
- Lakhina, Anukool, John W. Byers, Mark Crovella, and Peng Xie. 2003. Sampling biases in IP topology measurements. In INFOCOM 2003. Twenty-Second Annual Joint Conference of the IEEE Computer and Communications. IEEE Societies, 1:332- 341 vol.1. IEEE, April 30. doi:10.1109/INFCOM.2003.1208685.
- Latapy, Matthieu, and Clemence Magnien. 2008. Complex Network Measurements: Estimating the Relevance of Observed Properties. In IEEE INFOCOM 2008. The 27th Conference on Computer Communications, 1660-1668. IEEE, April 13. doi:10.1109/INFOCOM.2008.227.
- Maiya, Arun S. 2011. Sampling and Inference in Complex Networks. Chicago: University of Illinois at Chicago, April. http://arun.maiya.net/papers/asmthesis.pdf.
- Pedarsani, Pedram, Daniel R. Figueiredo, and Matthias Grossglauser. 2008. Densification arising from sampling fixed graphs. In Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 205. ACM Press. doi:10.1145/1375457.1375481. http://portal.acm.org/citation.cfm?doid=1375457.1375481.
- Stumpf, Michael P. H., Carsten Wiuf, and Robert M. May. 2005. “Subnets of scale-free networks are not scale-free: Sampling properties of networks.” Proceedings of the National Academy of Sciences of the United States of America 102 (12) (March 22): 4221 -4224. doi:10.1073/pnas.0501179102.
- Stutzbach, Daniel, Reza Rejaie, Nick Duffield, Subhabrata Sen, and Walter Willinger. 2009. “On Unbiased Sampling for Unstructured Peer-to-Peer Networks.” IEEE/ACM Transactions on Networking 17 (2) (April): 377-390. doi:10.1109/TNET.2008.2001730.
Effects of selection bias on historical/sociological research condensed bibliography:
- Berk, Richard A. 1983. “An Introduction to Sample Selection Bias in Sociological Data.” American Sociological Review 48 (3) (June 1): 386-398. doi:10.2307/2095230.
- Bryant, Joseph M. 1994. “Evidence and Explanation in History and Sociology: Critical Reflections on Goldthorpe’s Critique of Historical Sociology.” The British Journal of Sociology 45 (1) (March 1): 3-19. doi:10.2307/591521.
- ———. 2000. “On sources and narratives in historical social science: a realist critique of positivist and postmodernist epistemologies.” The British Journal of Sociology 51 (3) (September 1): 489-523. doi:10.1111/j.1468-4446.2000.00489.x.
- Duncan Baretta, Silvio R., John Markoff, and Gilbert Shapiro. 1987. “The selective Transmission of Historical Documents: The Case of the Parish Cahiers of 1789.” Histoire & Mesure 2: 115-172. doi:10.3406/hism.1987.1328.
- Goldthorpe, John H. 1991. “The Uses of History in Sociology: Reflections on Some Recent Tendencies.” The British Journal of Sociology 42 (2) (June 1): 211-230. doi:10.2307/590368.
- ———. 1994. “The Uses of History in Sociology: A Reply.” The British Journal of Sociology 45 (1) (March 1): 55-77. doi:10.2307/591525.
- Jensen, Richard. 1984. “Review: Ethnometrics.” Journal of American Ethnic History 3 (2) (April 1): 67-73.
- Kosso, Peter. 2009. Philosophy of Historiography. In A Companion to the Philosophy of History and Historiography, 7-25. http://onlinelibrary.wiley.com/doi/10.1002/9781444304916.ch2/summary.
- Kreuzer, Marcus. 2010. “Historical Knowledge and Quantitative Analysis: The Case of the Origins of Proportional Representation.” American Political Science Review 104 (02): 369-392. doi:10.1017/S0003055410000122.
- Lang, Gladys Engel, and Kurt Lang. 1988. “Recognition and Renown: The Survival of Artistic Reputation.” American Journal of Sociology 94 (1) (July 1): 79-109.
- Lustick, Ian S. 1996. “History, Historiography, and Political Science: Multiple Historical Records and the Problem of Selection Bias.” The American Political Science Review 90 (3): 605-618. doi:10.2307/2082612.
- Mariampolski, Hyman, and Dana C. Hughes. 1978. “The Use of Personal Documents in Historical Sociology.” The American Sociologist 13 (2) (May 1): 104-113.
- Murphey, Murray G. 1973. Our Knowledge of the Historical Past. Macmillan Pub Co, January.
- Murphey, Murray G. 1994. Philosophical foundations of historical knowledge. State Univ of New York Pr, July.
- Rubin, Ernest. 1943. “The Place of Statistical Methods in Modern Historiography.” American Journal of Economics and Sociology 2 (2) (January 1): 193-210.
- Schatzki, Theodore. 2006. “On Studying the Past Scientifically∗.” Inquiry 49 (4) (August): 380-399. doi:10.1080/00201740600831505.
- Wellman, Barry, and Charles Wetherell. 1996. “Social network analysis of historical communities: Some questions from the present for the past.” The History of the Family 1 (1): 97-121. doi:10.1016/S1081-602X(96)90022-6.