Who am I?

As this blog is still quite new, and I’m still nigh-unknown, now would probably be a good time to mark my scholarly territory. Instead of writing a long description that nobody would read, I figured I’d take a cue from my own data-oriented research and analyze everything I’ve read over the last year. The pictures below give a pretty accurate representation of my research interests.

I’ll post a long tutorial on exactly how to replicate this later, but the process was fairly straightforward and required no programming or complicated data manipulation. First, I exported all my Zotero references since last October in BibTeX format, a common bibliographic standard. I imported that file into the Sci² Tool, a data analysis and visualization tool developed at the center I work in, and normalized all the words in the titles and abstracts. That is, “applied,” “applies” and “apply” were all merged into one entity. I got a raw count of word use and stuck it in everybody’s favorite word cloud tool, Wordle, and the results of that is the first image below. [Post-publication note: Angela does not approve of my word-cloud. I can’t say I blame her. Word clouds are almost wholly useless, but at least it’s still pretty.]

I then used Sci² to extract a word co-occurrence network, connecting two words if they appeared together within the title+abstract of a paper or book I’d read. If they appeared together once, they were appended with a score of 1, if they appeared together twice, 2, and so on. I then re-weighted the connections by exclusivity; that is, if two words appeared exclusively with one another, they scored higher. “Republ” appeared 32 times, “Letter” appeared 47 times, and 31 of those times they appeared together, so their connection is quite strong. On the other hand, “Scienc” appeared 175 times, “Concept” 120 times, but they only appeared together 32 times, so their connection is much weaker. “Republ” and “Letter” appeared with one another just as frequently as “Scienc” and “Concept,” but because “Scienc” and “Concept” were so much more widely used, their connection score is lower.

Once the general network was created, I loaded the data into Gephi, a great new network visualization tool. Gephi clustered the network based on what words co-occurred frequently, and colored the words and their connections based on that clustering. The results are below (click the image to enlarge it).

These images sum up my research interests fairly well, and a look at the network certainly splits my research into the various fields and subfields I often draw from. Neither of these graphics are particularly sophisticated, but they do give a good at-a-glance notion of the scholarly landscape from my perspective. In the coming weeks, I will post tutorials to create these and similar data visualizations or analyses with off-the-shelf tools, so stay-tuned.