One of our goals for the Social Feed Manager software we’re developing at GW Libraries is for it to be useful to other cultural heritage organizations who want to collect social media data. To help us understand these use cases and get feedback on our prototype software, we brought together a group of interested people from libraries, archives, and funding organizations on December 11 and 12, 2013. At this meeting, generously supported by an IMLS Sparks Innovation Grant (LG-46-13-0257), the attendees each shared their experiences working with social media data at their institutions, described their needs for the future, and helped us identify areas and priorities for further development. We also spent the next morning helping those interested in getting Social Feed Manager up and running.
We kicked off the day with a round of talks by selected participants who have been working with social media data. We heard about current open source software projects at a number of institutions.
- Cory Lown at North Carolina State University (NCSU) demonstrated software called lentil for collecting, displaying, and managing Instagram photos of their new Hunt Library Building and found it a useful platform for engaging with students.
- Patrick Murray-John at the Roy Rosenzweig Center for History and New Media has been working on integrating social media into the Omeka exhibit software using several social media APIs, such as Twitter and Flickr.
- Ed Summers showed us twarc, a command line tool he wrote for archiving JSON twitter search results.
In addition, we heard about how libraries and archives are actively collecting social media for building their collections and supporting researchers:
- NYU’s Tamiment Library has archived social media and websites from the Occupy Wall Street Movement, including Tumblr, videos, and images, using web crawling tools. Chela Scott Weber described their need to identify and capture web content that becomes significant in a community or provokes a movement--particularly content that is vulnerable to being removed later--archiving conversations around hashtags, mentions, and particular individuals or voices.
- Manuscripts and Archives, in the Yale University Library, worked with the Office of the President to capture social media documenting the departure of the previous president and inauguration of Yale’s twenty-third President, Peter Salovey (http://inauguration.yale.edu/). They investigated open source and commercial providers of social media archiving services to capture different platforms.
- Ivey Glendon from University of Virginia showed the digital archive she and her library created around the UVA presidential crisis in June 2012. Their digital archivist gathered blogs, news articles, and tweets using TweetArchivist and other tools, capturing approximately 80,000 tweets, images from twitter, and blog posts. Some of the collection was submitted directly by users, utilizing Omeka to manage the user self-deposit process.
- University of North Texas’s Mark Phillips described web content they’ve captured as part of their long-standing web archiving program. The have content related to U.S. Presidential term transitions, sites in the federal domain, election websites and candidate sites, and would like to extend that to social media accounts and events. UNT is also seeing increased demand from researchers for social media datasets, especially concerning events in Texas and prominent local figures.
- Declan Fleming gave an overview of UC San Diego’s BigData@UCSD workshop, in which the library and IT departments participated. The event showcased several research data pilots conducted using the campus’s research cyberinfrastructure, including a project using Twitter data to study infectious disease prediction. The library is exploring options for collecting Twitter data for academic researchers.
Several GW faculty also gave brief presentations about their research involving social media and their experience using Social Feed Manager. We’ll cover those in more depth in a future blog post.
As we discussed Social Feed Manager’s development path and further activities that each of our institutions needed SFM to support, two broad use cases emerged: (1) capturing social media as part of archival collections and (2) collecting social media datasets for researchers.
Among the specific needs we discussed were the following:
- Capturing the context of the tweet. How might SFM help to capture both sides of the conversation when harvesting a user’s tweets or tweets on a particular topic? Archives, in particular, have a mission to provide this context for understanding collections. How might collecting content from referenced users be accomplished in a scalable way?
- Archiving the look of Twitter in addition to the data. Current web archiving approaches capture how the site looked. Should a tool that captures tweets also allow the data to be “replayed” as it appeared on Twitter at time of capture?
- Legal issues around collecting data from Twitter. The SFM application requires those who use it, either for collecting or viewing data, to agree to Twitter’s Rules of the Road, which describe its terms of service. Developing local policies for collecting and archiving social media which are mindful of terms of service is a priority.
- Harvesting data from social media platforms beyond Twitter and tying together identities across platforms. The IMLS Sparks grant that is supporting the current development focuses on Twitter, so that remains the immediate focus of SFM development. Yet there is also interest in Instagram, Facebook, and Tumblr, among other platforms for future collecting. How might we connect voices across these platforms and take into consideration metadata necessary to create and maintain those matchpoints?
- Organizational support for social media collection, from both a technical and staffing perspective. Collecting social media data for researchers and archives is a new area of activity for most libraries. Cultural heritage institutions need help communicating the needs for this activity and what requirements, both in staffing and technical infrastructure, are needed for locally supporting SFM. To address this need, we’ve embarked on a documentation push, to be released later this summer.
As we considered these needs around Social Feed Manager, we didn’t reach a clear point of divergence between supporting the two use cases--creating archival collections and research datasets. In fact, the distinction between the use cases may not be meaningful. Web archives are themselves datasets and increasingly treated as such by researchers using computational methodologies. With that in mind, we’ll continue to develop the software to support the needs articulated above and work toward increasing SFM’s overall reliability and ease of use. Looking forward to seeing where this takes us.