So apparently yesterday was a big day for hypothesis testing and discovery. Stanley Fish’s third post on Digital Humanities also brought up the issue of fishing for correlations, although his post was… slightly more polemic. Rather than going over it on this blog, I’ll let Ted Underwood describe it. Anybody who read my post on Avoiding Traps should also read Underwood’s post; it highlights the role of discovery in the humanities as a continuous process of appraisal and re-appraisal, both on the quantitative and qualitative side.
…the significance of any single test is reduced when it’s run as part of a large battery.
That’s a valid observation, but it’s also a problem that people who do data mining are quite self-conscious about. It’s why I never stop linking to this xkcd comic about “significance.”And it’s why Matt Wilkens (targeted by Fish as an emblem of this interpretive sin) goes through a deliberately iterative process of first framing hypotheses about nineteenth-century geographical imagination and then testing them more stringently. (For instance, after noticing that coastal states initially seem more prominent in American fiction than the Midwest, he tests whether this remains true after you compensate for differences in population size, and then proposes a hypothesis that he suggests will need to be confirmed by additional “test cases.”)
It’s important to keep in mind that Reichenbach’s old distinction between discovery and justification is not so clear-cut as it was originally conceived. How we generate our hypotheses, and how we support them to ourselves and the world at large, is part of the ongoing process of research. In my last post, I suggested people keep clear ideas of what they plan on testing before they begin testing; let me qualify that slightly. One of the amazing benefits of Big Data has been the ability to spot trends we were not looking for; an unexpected trend in the data can lead us to a new hypothesis, one which might be fruitful and interesting. The task, then, is to be clever enough to devise further tests to confirm the hypothesis in a way that isn’t circular, relying on the initial evidence that led you toward it.
… I like books with pictures. When I started this blog, I promised myself I’d have a picture in every post. I can’t think of one that’s relevant, so here’s an angry cupcake: