No Time To Wait! 2


Kieran O’Leary of Irish Film Archive did a perfect job summarizing the conference here, so I really recommend reading that one and only reading mine if you want a sub-par sequel. Or don’t even listen to either of us, just go watch the talks! Here you can find the schedule.


Some of the major themes this year:

  • Open source funding models So many of the talks went into the struggles of open source funding, from maintainers to support-contract models to working within institutions to support open source based projects instead of using them for free or paying a closed-source vendor.
  • Labor, from open source and archival perspectives Maybe this is just constantly on my mind, but I felt this was addressed frequently, coinciding with the discussions of funding models for open source.
  • Format normalisation I was maybe a little surprised to hear so many conversations around file format normalisation, but do think it’s an important topic we should be discussing more. I’m maybe also surprised Archivematica didn’t come up more frequently in talks, so I want to mention that if a user chooses to normalize video before creating an AIP in Archivematica, they normalise to FFV1/MKV. :) They will also have MediaConch integrated into their next release at the beginning of 2018, so keep an eye out if you are an Archivematica user!

General highlights

I was really into Dave’s super accessible explanation of the significance of the granular fixity available in FFV1/MKV – comparing it to how if you pull a fire alarm, it notifies the fire department of your exact address and they know exactly where the fire is. But, with file fixity only at the file level, it is like pulling a fire alarm to let the fire department that there is a fire “somewhere in Vienna.” Along with this, I was excited Ethan Gates was able to attend the conference and speak on the work he is doing “indoctrinating” emerging a/v archivists with open source (although I worry a few people didn’t realize he was using that word as a joke!).

nttw Check out the MediaConch-style Star Trek logo!

I got so much out of Martin Below’s talk and look forward to going back to it. In summary, he was explaining the ways in which he uses the Menu/Chapters aspects of Matroska files and did some in-depth demonstrations using a digitized Star Trek boxed set. He uses it to refine down to skipping the intros, showing a list of his favorite episodes, or cutting out the credits, and how that affected the overall time. I’ve been working with the Matroska specification for three years and although I knew of these features, I hadn’t been able to appreciate them and see them in use like this. It’ll help me work with him to refine the specification further.

Agathe Jarczyk’s talk, “Dreaming of an Ideal Software Player for Video”, was a favorite among many, myself included. She did an excellent job at laying out the concerns shared by time-based media conservators when diagnosing work. I’m always a bit jealous of conservators that get to spend lots of time on a small amount of strange objects, rather than the shovel-it-all-through/automate-automate-automate mentality I hold as a combo archivist/developer. I was also happy to hear from Ana Ribeiro from Tate during the Format Implementation Panel and would have liked to hear a whole talk about the formatting and normalisation issues at Tate as they have such a focus on presentation, but was glad to get to talk with her afterwards.

nttw Wow, have never seen a chart like this! – From Agathe’s slides

Reto Kromer also spoke of his wishes in a gentle way for the Matroska and FFV1 standards, and was pleased at Steve Lhomme’s presentation updating us on the Matroska specification efforts, because it sounds like he will be getting many of his wishes. It was also fortunate that Steve works on VLC and could let Agathe know that the upcoming release (available now in beta) of VLC should be able to fulfill some of her wishes too.

These kind of productive conversations between the preservation-practicing tool-users and tool-makers is what makes this conference such a delight, and was largely the motivation for having it exist. It was unfortunate to hear from a presenter dead-set on making sure we knew she didn’t give a fuck about uncompressed files, and who did not seem to understand that her research is dependent on preservationists getting it right for her benefit, particular when it comes to digitized film assets that require framerate and playback speed expertise up-front. Maybe this means we need to do a better job at educating the general public as to why our professional is valuable, or maybe this person was just both very rude and very uneducated?

Speaking of film, the final batch of talks, being film-related, made me surprisingly nostalgic for when I used to work with film materials. I enjoyed the depth of technical knowledge across a wide spectrum of film-based issues and the final panel, which was able to successfully hit the tired “film vs. digital” debate with new perspectives.

I felt I didn’t have much to contribute during the panel on open source and dealing with philosophical challenges within institutions, but thoroughly appreciated Alessandra Luciano’s direction, input, and her perspective that cultural heritage institutions have a moral imperative to choose open source. I also thoroughly appreciated Steven Villereal’s emphasis on acknowledging the need for an understanding of who holds the power within institutions more largely, and his note on how archival institutions often share very little values of SMPTE members, so why do we end up relying on SMPTE-approved standards? Is IETF not trusted enough by old-school broadcasters as a valid standardizing body? (If they don’t, too bad, cuz IETF is the best!). In general, I hope more institutions allow for closer collaboration between their IT teams and archiving teams, and I hope more skill-sharing can happen between these teams, because it is essential to successful preservation of materials.


Amazing to see the conference grow from a group of ~50 to a group of nearly 100 this year, and to see it mature in many ways. I missed the breakout sessions we had last year but I know it would have been difficult given the space and size, and I think a lot of the panels made up for it. I find that breakout sessions help increase the likelihood of more timid audience members to share their experiences, though. Despite this, though, there were always many questions and comments after the talk, and this was the first time I’ve been at a conference where I’ve heard “This is more of a comment than a question…” (and this happened frequently) but then the comments would be actually very valuable, thoughtful, and considerate. As always, I look to Code4lib’s conference model, which has been able to scale itself and maintain a close-knitted community feeling. As Kieran mentioned, it was great to have a core ffmpeg developer come and very actively participate, and he seemed to leave thinking fondly of his two days spent with us and of archivists in general. I like to think that we converted Steve Lhomme over to Team Archives last year, and this year we were able to convert Carl Eugen Hoyos.

Very excited to re-watch many of the talks and review the slides – there was so much information crammed into these two days that I just wasn’t able to take it all in! I also have to go back and watch Jimi’s talk and Kieran’s talk, because they both immediately preceded times when I spoke so my mind went completely blank due to presentation-nervousness. I owe it to them to re-watch in a more calm state!

Finally – Thanks so much to the conference organizers, volunteers, and sponsoring organizations!

nttw Funny to see my Terminal style while someone else presents!

ffmprovisr gets a redesign


ffmprovisr before

If you haven’t been to ffmprovisr in a while but check it out right now, you’ll notice it recently got a makeover (go check, we’ll wait)! ffmprovisr has been looking the same since its initial inception over three years ago, as I recently noticed while looking through old images. The most noticeable difference, at first, is going to be some of the visual changes, but ffmprovisr actually got a full, comprehensive redesign in terms of an information architecture overhaul, lightened codebase, better handling of different screen sizes and improved design/animation for an overhaul better user experience.

ffmprovisr after

So, what changed, and how?

Here is Katherine Nagels to introduce some of the initial changes…


About a month ago, I realised that ffmprovisr had grown so much that its navigability was now a bit lacking. At the beginning of October, we had almost 80 (!) recipes, grouped under 10 different headings. Some of these categories, such as Change codec (transcode) were clear and accurate, but others were less useful: for example, the Other section of miscellany had grown to 20 recipes, whereas we had a solitary entry under Repair files. Would someone looking for a way to synchronise audio find the latter recipe? Likewise, some groupings didn’t seem all that conceptually tight to me: the Make derivative variations category included recipes for making animated GIFs, ISO creation, and trimming video.

I set out to reorganise the page by creating new headings, renaming others, and moving around the recipes accordingly. To give several examples: all the commands to do with trimming, joining, or excerpting a video now became grouped together under a heading of that name; Work with interlaced video was another new section. Change formats, a name which I found quite vague, became Change video properties, as that section groups recipes with which one alters things like a video’s aspect ratio or colourspace.

So far, so good. Or was it? Actually, a lot of these decisions weren’t as trivial as they seem. Classification and taxonomy are big concerns in library and archival world, and they proved to be sometimes tricky even on a small-scale project like this. For example, did the recipe Images to GIF really belong in its original home, the Change codec section? (We decided it did not). Should all the audio-related commands be grouped together in one section, or should we separately retain the Normalize/equalize audio section? (We currently have combined them under the heading Change or view audio properties).

These changes were a process rather than absolute actions; for example, I split out recipes for creating thumbnails and recipes for creating GIFs into two separate sections before more sensibly bringing them back together under the umbrella of Create thumbnails or GIFs. Conversely, we added the entry on filtergraphs in a section called FFmpeg concepts before realising that we were presenting a pretty advanced topic as something of an entry point - not very beginner-friendly. (Thus the FFmpeg basics and Advanced FFmpeg concepts sections were born). This is also a good example of how important the review and feedback cycle was to these changes - it’s easy to get lost in one’s own viewpoint.

The main idea I tried to keep sight during this reorganisation was simple: what would make ffmprovisr a better resource for beginners? Not that it’s not useful for more experienced people too, but as I emailed Ashley recently, I love the idea that people, even from outside archives, could find ffmprovisr and learn how to use ffmpeg from it. Applying this concept to page structure meant that steps like adding a Table of Contents were obvious. But it also provided a good opportunity to fill in certain blanks, like adding an entry describing the basic structure of an ffmpeg command, and a generic rewrap command. I know from experience that unfamiliar and/or technical things can be intimidating, so I’m all about lowering the barrier to entry for such a useful and extremely learnable tool as ffmpeg.

Now at 30th October, we have 18 categories and, if I count correctly, 84 recipes - including just 7 in the Other category. ;-) There’s always a tradeoff to be made re: granularity v. efficiency, but I think the current balance is pretty decent. There is always room for improvement, of course, so feedback and contributions are welcomed!

Of course, usability is about much more than just the structure of information - visual design and user experience are even bigger pieces of the puzzle. Over to Ashley to describe how she refactored ffmprovisr visually, as well as cleaned up the codebase!


ffmprovisr was built on Bootstrap and not optimally sized for smaller screens other than what Bootstrap inherently delivers. It was also built relying on Bootstrap’s Modal feature and used some of Bootstrap JS to perform some magic associated with that.

I’m really obsessed with removing Bootstrap from projects (for some reason) and even more obsessed with replacing it with CSS Grid Layout. I like CSS Grid because it’s the “hot new thing”, but it’s the hot new reliable, well-supported, built-into-the-CSS-specification thing, so it’ll soon grow to become the “stable new thing” and stay that way. It feels really great to remove a large framework library and replace it with just a handful of lines of code.

I also really love extracting jQuery out of projects but I ended up not doing that with ffmprovisr, even though there is very little JavaScript used on the page. It is used just for ensuring anchor tag reliability and updating the site – some more work needs to be done there, and you can take on that task if you want to contribute! There is some reliable but inelegant JavaScript currently keeping it in place.

The first thing to go in this redesign was the Modal view. I replaced it with an inline collapsing open/close functionality instead. I initially and temporarily did this using Bootstrap while I got feedback from the community/fellow maintainers and then replaced it with pure CSS. This way, people can browse through the rest of the site and open multiple scripts at the same time, which wasn’t possible before. Also, modals just aren’t very good, so I was happy to be rid of them.


All of the above changes inevitably caused the site to be changed visually. The Table of Contents section was new, for instance.

After adopting Grid Layout, we were able to make portions of the website resize themselves based on size of screen. For small windows, like on a phone, everything will appear in one long column. For windows with more space, the Table of Contents will appear on the left. For very big screens, there is some space on the right and left so the content isn’t stretched too far across the screen, which would make it hard to read.

The font size increased a little bit and we switched from using pixel sizes (which do not change) to using em sizes (which change in relation to the default screen size). The main header, where it says ffmprovisr with some swirly unicode, is using the vw size. If you resize your browser window, you’ll see that the header automatically shrinks at every step. This is how vw works, it is a size calculated based on the “viewport width” (and the viewport is your browser).

Next, ffmprovisr used to have buttons that would open up modals. After modals were removed, as mentioned above, it was visually less appealing to click through. The content would appear immediately under the button, with other buttons dangling around “in the air.” The buttons were replaced with rows that light up as green when hovering, and the grow-the-icon-slightly-bigger animation feature was maintained but re-written.

Since these big changes, Katherine has come back around to fix some things that needed improvement, especially related to media queries and some CSS sizing. Thank you Katherine! It’s great to have such good teammates to collaborate with on these kinds of projects.

ffmprovisr after


Those are our improvements! We hope that all these changes make ffmprovisr easy to use, which in turn will make ffmpeg easier to use and understand, not just for archivists but anyone wanting to improve their skills around this powerful and valuable open source tool. There are a few more small improvements that can be made, and if you want to learn to submit your first open source pull request, please get in touch with us and we can help you!

Thanks always to fellow maintainers Kieran O’Leary and Reto Kromer, and everyone who has submitted contributions to this project.

CSS Grid and New Order

Hi! 👋 I gave my website a makeover. I gave it many makeovers! Here’s how that went for me last Sunday, bored and unwilling to do actually-productive and meaningful things. This is what I did and a love letter to CSS Grid Layout.


old site

First, here is what I was starting with, my previous website. Fine, minimal, Josephin Sans and Open Sans courtesy of Google Fonts, with skeleton for a CSS framework, normalize.css on top of that, and some light customization on top of that (link colors, et cetera). I didn’t need that framework though, even though it is pretty and the filesize is petite.

So I stripped the site down to essentials, plain HTML. I fixed some structural issues, but overall things were already looking fine because the site is just, like, a series of paragraphs and lists with one image at the top starring me at the Pop Century Resort, 90s Section. I then added a bit of CSS to get my own basic framework in place, based on CSS Grid.

If you don’t know about Grid Layout, it is the best. I wanted to rejoice in Grid and other CSS3 features that are now functional across browser platforms, because laying out pages and designing for the web has FINALLY caught up with the way my brain thinks about things, which is from pen-on-paper and my traditional print-design education, obsession with Adobe Illustrator, structure, honoring grids while breaking them, whitespace, typography, et cetera, whatever. Here is a guide put out by Mozilla for learning more about it. This Complete Guide to Grid is good, too.

Anyway, I’ve been listening to a lot of New Order, and it also sort of felt like New Order songs were chasing me around every bar in the New York metro area for two months. Inescapable. When the season started to change at the end of summer, every year for years I would only want to listen to Boston twee pop band Pants Yell!. This time, it’s New Order, and I like to think that signifies some sort of deep emotional maturity in me, like when I was told in college that I had to be at least 30 to appreciate 10cc songs (this turned out to be true) but I don’t think it signifies anything at all. The point of this paragraph is New Order was in my head, so New Order it is. Honestly, I didn’t put a lot of thought into it because I was ready to make it happen.


new old site

Next, I piled on the design constraints, which is the easiest way to get myself to chill out and actually focus on the task at hand.

  • CONTENT: I dedicated myself to tackling the first six New Order albums, so I could eliminate the stress of thinking about how much I liked some of the others but also saved me (because I counted Substance) from having to look at the Republic cover ever again.
  • SPACE: I can’t modify the HTML structure just because it’s convenient. I also can’t add images because it helps.
  • TIME: I want to start and finish on Sunday and not exceed Sunday and also do other things on this day.


So I got to work! I spent a lot of time on Movement, because it is beautiful already, and not a lot of time on Brotherhood, because I thought it was ugly no matter what I did to it. In a lot of ways, I relied on what happens “above the fold” in the introduction portion of the website and I focused a lot of trying to assess which font that I could easily (lazily) integrate for free that would match the liner notes of each album. For the image-heavy album covers, especially Power, Corruption and Lies, I had to focus on the minimal elements of the design itself, since I constrained myself to not adding anything else, including sneaking images in through CSS. This also made Brotherhood difficult because I had to figure out how to do a metallic effect without adding any texture, and instead had to rely on CSS gradients happening all across the page.

new order
new order
new order
new order
new order
new order

After feeling accomplished (enough) with the layouts, I then had to make them work with clicking actions. I added a super tiny bit of javascript for this:

for(let btn of document.querySelectorAll('.makeover')) {
  btn.addEventListener('click', (e) => {
    document.getElementById('neworder').href = "css/" + + ".css"
  }, false);

And it was all good! I spent way more time than anticipated configuring the taskbar that I set at the top of my webpage into actually looking not like garbage, and unfortunately it still kind of looks like garbage (both at the front and underneath in the codebase). I was getting annoyed and impatient, because it was at the end of the day and the fun stuff had all been done already, otherwise it might look better. Maybe it will look better one day.

And then

At the end of all of this, I posted it on twitter because that’s what I do when I have completed a minimally viable product and want people to see it and maybe make it better, because I never do anything that isn’t open source unless I am forced under capitalism to do so (cats gotta eat). One friend of mine, Travis, jokingly asked where the Unknown Pleasures easter egg was going to be, and I went into a witch-cackle laughing frenzy until it came into existence, which took only a few minutes. I won’t tell you where it is, but it’s also pretty obvious. I also had forsaken using viewports to trick the Movement stylesheet into working well, but I ended up making it come back so things resized better on phones, and then had to patch the kinda-shady methods with which I was causing things to come together, but it was imperfect. Another pal, Aidan, who happens to be a designer-who-misses-CSS, seemed to be bored enough at work to fix some problems for me, including the aforementioned one, which is ideal. It’s nice to collaborate and iterate!

old site

What’s next?

Yeah, like, this could get way more slick, I just didn’t want to keep spending time on it. Like, soft animations can go a long way in making the site more polished and more welcoming, so I’d like to add some of those. I also still want to add the U.K. edition of Ceremony which I think is really visually appealing in a way I’d like to make work on the web? I also tested this on three browsers, desktop and mobile and mobile-simulation, but I’m sure things fail in “edge” cases (Hey, is that why Microsoft rebranded IE in that way? Genius.).

Thank you for reading!

Accessibility and Archivability

Okay, so this is the follow-up post to this (or, rather, that was the aperitif post to this one). I was at the Joint Meeting of New York City & Mid-Atlantic Archive-It partner groups at METRO a little while ago, and a question came up about how to best guide creators towards good practices in the website development stage, to better support future preservation efforts. There are some guides in place for this (jump to the bottom of this page for additional resources). But, like, I’m a developer. I threw out the idea that a good guideline to use was that if a website is accessible, it should also be archivable, that these guidelines can go hand-in-hand. And people building websites should already be concerned with accessibility and aware of related best practices (and if not, they are bad at their jobs and you should replace them with good people). That sounds good, but am I right? Am I wrong? I decided to make sure I wasn’t totally full of shit, and here are the results.

The point of this post is to be able to communicate better with people who build websites, so their websites are more likely to be archivable in the future. (And I’m intentionally holding myself back from using technical jargon here.)

pc pc pc pc pc pc

What does it mean for a site to be accessible?

There are plenty of great resources to explain the basics, but fundamentally a website should be able to be accessed and understood by anybody, regardless of their level of ability. WebAIM summarizes the four major categories as visual, hearing, motor, and cognitive. Some quick ways to assess a site without any additional programs are:

  • Does the structure of the website look OK and make semantic sense?
  • Can you use your website using only your keyboard?
  • Can you size up to 200% and your site won’t break?

There are SO MANY other things to consider. That’s just the tip-top level for testing without installing tools for that purpose or having knowledge about how websites are built.

What does it mean for a site to be archiveable?

Obviously there’s a lot less written on this, but a website should be able to be saved as a WARC file by some sort of web archiving process (like Wayback Machine or Webrecorder) and look&function the same as it did when it was “live.” Can everything be “played back” in the future just as well as when it was captured for posterity?

pc pc pc pc pc pc

And how do these work together (or do they)?

Web crawlers are little robots that save your things and play them back again. Thanks, lil dudes. And one of the core principles of accessibility practice is a website’s ability to work using assistive technology (like screen readers), and those are also like little robots that read things. For accessibility in general, the content of the webpage should always be clear regardless of the way in which it is being accessed. It makes sense that if one works well, the other should work well too.

Below are a few examples that hopefully highlight how care towards accessibility helps archivability.

Example 1: Internet Girlfriend Club / Asynchronous loading

This site was loading CSS asynchronously, after everything else had loaded on the page, and it used JavaScript to do that. I don’t really know why, I guess that has a usecase somewhere, but it wasn’t necessary here. CSS is not necessary for accessibility here (but not to say that it isn’t important) – IGC passes these tests – but if something more significant was loading up in a delayed way with JavaScript, like links, it wouldn’t know how to save them because the Wayback Machine is taking things at initial-load face value. Webrecorder doesn’t have this problem, although the problem was evident by the flash of white before each page would load. Bonus note: this pattern is known as Flash of Unstyled Content (FOUC)! Fortunately, this is my website (meta-hyping here), so I was able to fix it right away. Archive yer heart out! JavaScript doesn’t have to be avoided, but it should be used carefully and correctly (and not flippantly, which I was guilty of!)

Example 2: Karlie Kloss / Wix

Karlie Kloss’s site is built in Wix, which is not very good for accessibility or archivability (this seems to be the general consensus). Accessibility-wise, trying to use this site without a mouse is impossible – it highlights something, but what is being highlighted is unclear. Fumbling, I accidentally opened a link I couldn’t see (because tabbing took me to the bottom of the page – what?), and it was this article about learning how to code. I found it in an auto-rotating carousel, above a gif. And the top nav, which is only a lil “hamburger” (the widely-hated hamburger menu), is unclear. This website is all-around bogus for a lot of other reasons and I hope Karlie calls me to fix it.

The latest version of her site shows nothing in the Wayback Machine and acts weird when making an attempt in Webrecorder, so I don’t have a lot of faith in its archivability. Karlie is just not going to make it into the archive.

awkward karlie

Example 3: Larry Polansky / Clear navigation

Lorena sent me this webpage as an example of a site that looks simple, but was difficult for her to archive. Not necessarily for technical reasons, but “it’s a rabbit hole of confusion because there is no site map and you can’t tell if you captured all the pages cause you don’t know what there is.”

Likewise, this site looks totally accessible, right? No carousel, no models, no Flash… but it’s very important to remember the website’s structure when designing/developing for accessibility and archivability. I actually laughed out loud when I looked at the HTML:

small.. small.. small

WHAAATTTT?? Hahaha. Imagining trying to get to the link on this page without being able to click on it. A screen reader might have to go through small … small .. small .. small .. until it gets to the content, and even then it is confusing.

Okay, this is a contrived example because I thiiiink a screen reader will skip past these tags because it can identify them as insignificant as it relates to the content, but this is just an example of a messy, confusing site structure. This is a good time to mention that ARIA tags should be present on websites for navigational support and those are not present, either.

Clear navigation for the entire website is crucial to archiving, and clear navigation on each page is crucial to accessibility.

Et cetera

That’s all for now, thank you! I know it’s not that much and I am missing a LOT of granularity. I didn’t even talk about Flash and other kinds of JavaScript, well-known enemies to both of these concepts. Even though Webrecorder now makes a lot of this possible, it doesn’t mean it is preferable. (Also I wasn’t willing to install Flash on my computer to test it out. 😘 ) So much more work can be done here (and I’d like to see that), with more research and better examples, but I am just one little human with a finite amount of time to spend on this!

pc pc pc pc pc pc

Resources, Accessibility

Resources, Archivability

Thank you again Karl. Thanks to Samantha Abrams and Lorena Ramirez-Lopez for sample links. I learned a lot about web accessibility in my previous role at the New York Public Library when I worked with Willa Armstrong, an expert in this area of digital ideation. Willa is the best!

How do web archiving frameworks work?

pc pc pc pc pc pc

“If you wish to make apple pie from scratch, you must first create the universe.”

If you wish to explain how web archiving works from a technical standpoint, you must first understand the ecosystem. I was anticipating (jk - it’s live now, here!!) writing a blog post about website archivability from a development perspective (“How can I make my website more archivable?”) but realized I needed to provide an overview of web archiving in general (even if just for my own comprehension). I don’t feel like this information is especially clear and available on the web – but if I am just missing out on a solid resource, let me know. Likewise, if I’m wrong about something (which I probably am), let me know. BIG THANKS to Karl-Rainer Blumenthal for speaking at the Joint Meeting of New York City & Mid-Atlantic Archive-It partner groups at METRO last week, which gave me the inspo, and for clarifying my questions on the above information (and correcting my incorrect assumptions).


pc pc pc pc pc pc

Some champion frameworks (not counting top-dawg Wayback Machine) in the web-archiving ecosystem funnel are Archive-It and Webrecorder. There are other methods (good ol’ wget, e.g.), but these get a lot of hype. And I feel like they are frequently pitted against each other, even though they serve different purposes, so they shouldn’t be thought of in this way. Archive-It and Webrecorder are platforms that allow people to easily use the underlying web archiving technology. When unhappy with one or the other, it’s probably mostly comments on the provided design experience rather than the backend frameworks.

Archive-It is built for institutional-level integration and scale. On top of archiving, it is able to perform scheduling and also exists as a hosting service that stores this data for institutions. Webrecorder can do this too, but isn’t built with this in mind as a business model. It expects the users to manage their own digital preservation storage, although you can still save and share “recorded” websites. Webrecorder’s tagline is “Web Archiving For All!” – it really does democratize and make-more-open what Internet Archive has been doing, which makes me like it a lot. But they are different!

So, technically…

pc pc pc pc pc pc

This section should really emphasize the power of open source tools. Are you ready?

First! All these tools produce the same output, the WARC (Web ARChive) file format. Standards are great. This is ostensibly an open format, although unfortunately standardized through ISO, so it’s not free to view. (But that is a personal digression…) The IIPC (International Internet Preservation Consortium) has put together an open warc-specifications page for more information about this, specifically here, which seems to be very close to what is probably in the official ISO that costs a hundred bucks. Anyway, WARC is essentially a wrapper for web-related materials, and it aggregates that information along with relevant metadata for future context. Like a zip file, but specifically made for websites.

Second! Here things get more complicated.

Internet Archive’s Wayback Machine is built on Heritrix (heritrix3), web-crawling software. Heritrix is built in Java. It does the heavy work of turning websites into web archives. The Wayback Machine used to use a Java-based framework for “replaying” archived websites, but it was re-written in Python, so now it uses that. It also depends on umbra a queue-based browser automation tool, to grab up those sites and dependencies.

Archive-It’s stack looks like the Wayback Machine, for the most part. However! Archive-It is transitioning to using brozzler and warcprox. Brozzler is built on top of Chromium, among other things, and is able to view and “replay” websites. warcprox is built on pymiproxy and helps with turning the websites into WARC files. Combined, and built on the tools below, they are able to do the work of grabbing (brozzler) and saving (warcprox) websites. For audiovisual material, it uses youtube-dl, an open source tool for downloading media (and not just for YouTube, despite the name).

Webrecorder, on the other hand, depends on pywb, a Python implementation of the Wayback Machine, for the “replaying” of the generated web archives. For capture, warcprox, I think, but not necessarily. Webrecorder is then developed on top of pywb (conveniently both primary created by web archiving developer superstar Ilya Kreymer!)

Nota bene Open source is great! We can do so many great things because people have shared their code and collaborated together, and have been able to build things on top of other things.

pc pc pc pc pc pc


For more web archiving resources, see…

OK, stay tuned while I put together the blog post I originally intended to write, about web accessibility and website archivability, in which I argue with myself about both. Update: here.