Web Ecology Project Twitter Scraper and Database Schema

Re-wrote Ruby Twitter scraper to be more efficient and designed and implemented a normalized database schema for storing tweets on Web Ecology Project’s server.

Revision History

Original Twitter scraper prototype was coded in perl by Ethan Zuckerman (mentioned in a blog post on Apr 13, 2009). The script used Twitter’s URL-based API (since changed) to scrape tweets from a simple search query on a particular term like a hashtag. The script was ported to Ruby by Web Ecology Project member Dave Fisher, who also set up the initial database.

I re-wrote Dave’s code to make the scraper more efficient in how it handled the initial scraping of tweets and in writing to the database. I also designed a database schema for the tweets which organized the various metadata connected to each tweet in specific tables and columns that could be indexed for faster and easier queries across Web Ecology Project’s growing dataset.


My code was used to collect the tweets used in two studies I co-authored, Detecting Sadness in 140 Characters and Afghanistan and its Election on Twitter, and in another study authored by my Web Ecology Project colleagues, The Influentials.

Reimagining Internet Studies



“Our field poses two simple questions to researchers:

  • ‘Where have studies about the web failed?’ and,
  • ‘How can we do better?’

“The emerging field of Web Ecology is an attempt to unify contemporary research and practice under a common focus, set of principles, and general approach to promote new insights and more fruitful forms of exchange in this space. We believe that these lay the groundwork for a more vibrant, more dynamic, and more useful field of research and community of researchers.”

The Awesome Foundation

Founding Trustee of The Awesome Foundation, started in July 2009 in Boston.

Details of Work

  • Responded (1 of 10) to Tim Hwang’s initial call for trustees
  • Designed Awesome Foundation drop cards
  • Represented Awesome Foundation at many events and in local media
  • Pushed for expansion of the Boston chapter in October 2011 to double the number of trustees and try offering two $1000 grants per monthOngoing
  • Participate in monthly deliberation meetings to choose grant winners
  • Contribute $100 to a $1000 grant given by our chapter each month
  • Help orient new chapters and trustees


Web Ecology Project

Helped found the Web Ecology Project, an experimental community of social media researchers, in June 2009.


Details of Work

  • Member of core strategy team
  • Led business development trip in September 2009, presenting our research to a government agency and contractor in Washington, DC, and to a marketing agency in New York City
  • Co-organized Web Ecology mailing list, transitioning organization from a for-profit venture to a distributed network of like-minded researchers
  • Participated in all Web Ecology Camps: October 2009 (Boston), February 2010 (NYC), May 2010 (Boston), February 2011 (Boston & SF)
  • Co-authored four studies using Twitter data
  • Contributed to core Twitter scraping scripts and tweet database design
  • Co-developed Web Ecology Project website
  • Designed Adobe Illustrator template for all research reports and Apple Keynote template for all presentations

Related Publications and Presentations


Co-founded a nonprofit online mentoring organization for college-bound high school students with Kevin Adler in May 2009.


Details of Work

  • Co-authored the mission, vision, and values statement of the organzation
  • Co-authored the by-laws and business plan
  • Co-wrote applications to numerous funders and startup competitions, and even edited a video emphasizing the geographic distance between my co-founder and me (see below)
  • Co-developed multiple surveys for prospective mentees and mentors, and deployed surveys online: first using hand-coded html, second using Google Forms
  • Developed and manage website (WordPress), re-designed twice with modifications to html, css, and javascript in base templates
  • Authored most static content on the website and several blog entries, and edited all imagery
  • Developed social media strategy and oversaw BetterGrads’ social media team, curating 200+ blog posts including special series
  • Recruited mentors from across the country for our pilot program
  • Moderated dozens of national conference calls with mentors, mentees, and social media team members
  • Co-developed mentoring program curriculum to prepare high school students for the college experienceOngoing
  • Co-directing the nonprofit, which is based in the San Francisco Bay Area where my co-founder lives, while living in Boston
  • Personally mentoring a high school senior at Granada High School in Livermore, CA
  • Advising for-profit spin-off founded by my co-founder

Industrial Cooperation Project

Research Assistant on the Ford Foundation-funded Cooperation Project at the Berkman Center for Internet & Society at Harvard studying the educational materials industry, March 2009 – December 2009.


Industrial Case Studies at

Details of Work

  • Performed market research on the textbook industry using business databases and market reports
  • Conducted phone interviews with industry experts in open educational resources (OER)
  • Mapped educational materials industry according to positions on copyright and business practices
  • Co-authored an essay on US public policy regarding OER, and contributed a research summary to the Cooperation Project’s annual report to the Ford Foundation
  • Documented all research on a public wiki

Related Publications