WebSci'16 Hackathon

Exploring the Past of the Web: Alexandria & Archive-It Hackathon

Hackathon Chairs

The Web has pervaded all walks of life and has become an important corpus for studying the humanities, social sciences, and for use by computer scientists and other disciplines. Web archives collect, preserve, and provide ongoing access to ephemeral Web pages and hence encode traces of human thought, activity, and history. This makes them a valuable resource for analysis and study. However, there have been only few concerted efforts to bring together tools, platforms, storage, processing frameworks, and existing collections for mining and analysing Web archives.

We present the Alexandria & Archive-It Hackathon @ WebSci’16 as a forum for scientists, engineers, practitioners, and enthusiasts to work with Web archive collections at scale and use and help build tools that can help realize the largely untapped potential of using Web archives in their research and work. The goal of the Hackathon is to bring together a small and focused group of participants to collaboratively work with Web archive collections using open-source tools and platforms and to discuss new ideas in exploring and analyzing these collections.

We will provide access to focused, subject-specific Web archive collections from a diverse set of institutions and topics. The data consists of collections from Archive-It, Internet Archive’s web archiving service, and is housed on a commercial data cluster (provided generously by www.altiscale.com) for processing and analysis, but can be browsed on the Web as well through their collection pages at https://archive-it.org/. The topics range from web pages collected around events (like the U.S. Occupy Movement), interest groups (politics, art, et cetera), home pages (museums, universities) and more. All collections were archived over a notable period of time and can support multiple analytical approaches and tools.

A range of collections will be available for use in the hackathon. Some examples of the types of collections to be included:

To lower the entry barrier in accessing and analysing this data we will provide a small hands-on session on Day 1, using existing open source tools, and will be able to provide some coaching during the Hackathon to groups not yet fully fluent with working with large data clusters.

We want to ensure that participation will be truly cross-disciplinary with the hope of fostering cross-fertilization of ideas from users and researchers from multiple disciplines, including social and political sciences, the humanities, and computer science. We will end the Hackathon on Day 4 with presentations of team accomplishments as well as discussions and exchange of ideas for future projects and collaborations.

The Hackathon will run in parallel to the WebSci’16 conference, to allow participants to register and attend the conference, and will finish one day after the conference. Participants will receive promotional materials from the event hosts and Internet Archive and Archive-It. The research team with the most accomplished plan, project, or future work will receive a complimentary Archive-It account that can be used to build their own web archive collection for use in their own future research. Alexandria and Archive-It also plan on convening additional hackathons and web archive data mining challenges in conjunction with future conferences and events.


The registration for the Hackathon is free for WebSci'16 participants, however we waive off the charges for participating only in the Hackathon.

If you want to register for "Hackathon Only": People who only want to attend the hackathon, can register on http://websci16.org/registration by selecting "Dinner only" first and on the next page below their personal details select "Hackathon only".

Feel free to contact us if you have any questions: websci-hackathon@l3s.de.

Hackathon Schedule: Click Here!