Thursday, March 8, 2012

So Much Copy Paste...(My Life In Data Collection {Or, how I stopped worrying and learned to love the alt-tab})

I've never been one to like to write long, analytical papers about OTHER papers that I've read, as I tend to go to one page and say "Yeah, I've pretty much written down all the key points." Therefore, for my papers for this class, I'm experiencing the wonders of data collection and analysis.

For my first paper, I wrote about connections between various members of the furry fandom by analyzing their posts on the furry news site flayrah.com. I managed to get to around 360 posts before I just couldn't take it any more. The Alt-Tabbing was simply frying my brain, as the only way to record information was to copy information into Excel.

Unfortunately, as I started my project in gephi, I started to realize what a paltry number 360 was. I could clearly tell where my data was weak, as I began to see some trends which I felt didn't exactly do much to explain the overall character of the network. Rather, because I only used 4 articles, one was particularly harmful to one members' relationships, and the other articles didn't manage to balance it out enough.

For my more recent project, I'm creating a map of the My Little Pony Music Fandom (as artists in this particular music fandom tend to cite each other very frequently through remixes and the like) and have already mapped out over 600 nodes and around 1000 edges. While that seems like a lot, I still realize that I have a long way to go.

It's incredibly hard to tell how much data you need to create an accurate picture until after you've created your network in gephi and you can visualize where the project is going. With over 4000 songs created in 2011 alone, I realize I'm going to have to make a cut-off point somewhere. But how do I determine the line where it's important to collect data and where it is not too important? While you can't have too much data, I realize that I have other homework that I need to do, so I can't simply spend ALL my time doing this. I will need to find shortcuts if I hope to complete this project in a reasonable period of time.

(On that note, does anyone know if any tools that harvest youtube data? I have all of the links to the pages in my excel, but I need to collect "views" "comments" and "likes". If there's some faster way to do this than me being alt-tab poisoned, I will be eternally grateful.)

No comments:

Post a Comment