Close

Web Ecology Project Twitter Scraper and Database Schema

Re-wrote Ruby Twitter scraper to be more efficient and designed and implemented a normalized database schema for storing tweets on Web Ecology Project’s server.

Revision History

Original Twitter scraper prototype was coded in perl by Ethan Zuckerman (mentioned in a blog post on Apr 13, 2009). The script used Twitter’s URL-based API (since changed) to scrape tweets from a simple search query on a particular term like a hashtag. The script was ported to Ruby by Web Ecology Project member Dave Fisher, who also set up the initial database.

I re-wrote Dave’s code to make the scraper more efficient in how it handled the initial scraping of tweets and in writing to the database. I also designed a database schema for the tweets which organized the various metadata connected to each tweet in specific tables and columns that could be indexed for faster and easier queries across Web Ecology Project’s growing dataset.

Use

My code was used to collect the tweets used in two studies I co-authored, Detecting Sadness in 140 Characters and Afghanistan and its Election on Twitter, and in another study authored by my Web Ecology Project colleagues, The Influentials.

Reimagining Internet Studies

Link

http://www.webecologyproject.org/2009/08/reimagining-internet-studies/

Excerpt

“Our field poses two simple questions to researchers:

  • ‘Where have studies about the web failed?’ and,
  • ‘How can we do better?’

“The emerging field of Web Ecology is an attempt to unify contemporary research and practice under a common focus, set of principles, and general approach to promote new insights and more fruitful forms of exchange in this space. We believe that these lay the groundwork for a more vibrant, more dynamic, and more useful field of research and community of researchers.”

Web Ecology Project

Helped found the Web Ecology Project, an experimental community of social media researchers, in June 2009.

Website

webecologyproject.org

Details of Work

  • Member of core strategy team
  • Led business development trip in September 2009, presenting our research to a government agency and contractor in Washington, DC, and to a marketing agency in New York City
  • Co-organized Web Ecology mailing list, transitioning organization from a for-profit venture to a distributed network of like-minded researchers
  • Participated in all Web Ecology Camps: October 2009 (Boston), February 2010 (NYC), May 2010 (Boston), February 2011 (Boston & SF)
  • Co-authored four studies using Twitter data
  • Contributed to core Twitter scraping scripts and tweet database design
  • Co-developed Web Ecology Project website
  • Designed Adobe Illustrator template for all research reports and Apple Keynote template for all presentations

Related Publications and Presentations