Chat with a staff member _

Please introduce yourself with your name and email address.

Loading chat...
Chat with a staff member
Skip to content

File types in the Icelandic online archive

    File types in the Icelandic online archive



    The National Library of Iceland – University Library has been systematically collecting Icelandic websites since the fall of 2004. Our web archive is much more comprehensive in terms of Icelandic content than the comparable Internet Web Archive collection, which collects websites from all over the world. The web archive is enormous and currently contains almost two billion items (collected URLs, excluding duplicates). In addition to enabling people to view websites back in time at the URL vefsafn.is (for example, websites of political parties before the last election), this vast collection of Icelandic websites offers all kinds of possibilities for research into the digital contemporary history of Iceland and digital archaeology.

    For example, it is interesting to look at the percentage of different file types in the collection. It turns out that the vast majority of digital objects are in the form of web pages (text, html, css, xml, and so on) or almost 70% of the collected URLs. Next come images of various kinds, which are 23% of the collected URLs. PDF documents are close to 1%, but other document types, such as videos and audio files, are far from that.

    If we look at the amount of data, however, we see that websites/text make up almost 60% of the collection, images about 14%, PDFs about 6%, videos about 10% and audio files about 4%. For example, the collection contains approximately 750 thousand different audio files or about 2.5 terabytes of audio material, but the average size of the audio files is 3.5 megabytes. It is also noteworthy that there are many more videos than audio files in the collection. The number of videos is 1.7 million and their average size is 4.2 megabytes. In total, there are almost seven terabytes of videos in the Web Collection.

    The largest category of documents outside of text and images are PDFs. The Web Archive holds about 8.2 million different PDFs, or a total of 3.7 terabytes of data. However, the average size of a PDF in the archive is only 477 kilobytes.