[Netarchivesuite-curator] Netarchive NAS update for July

Pérez Morillo, Mar mar.perez at bne.es
Tue Jul 17 12:41:03 CEST 2018


Dear all,

            This is the update from the National Library of Spain:


·        We are about to finish the "at-risk collection" of Wikispaces, which is closing by the end of July. This collection has almost 400 seeds of Spanish wikispaces.

·        The regional collections are growing quickly. It is remarkable the growth of Navarra web collection. Last harvest archived 1.5 Tb of information. Our IT team used what they call "high-storage crawlers" with capacity of 1 Tb for these big crawls.

·        Both web archiving and IT teams are in a period of transition:

o   Our IT team has a new boss and they have several vacant positions, due to staff that either changed position or retired. So they are specially busy.

o   As far as the web archiving team is concerned, as you know, we received a strong support both on human and infrastructure resources from the agreement we signed with the entity Red.es. We engaged three web curators, one librarian to support the non-print legal deposit and an IT developer, but one of them already left, a second one is about to leave in one week and the rest will stay only until the end of the year. So in the last one year and a half we have been 8 and we are going to be 3. So we are working on administrative processes to renew the agreement with Red.es, on one side, and to sign short-term contracts to cover some needs temporarily, on the other side.

o   We've just welcomed two grant holders that will be in our team for one year. So we'll have some additional support, but they are in the training period so far.

·        Our IT developer is partly on the management of data coming from publishers (when depositing e-publications) and partly working on an interface for our web archive that can provide to users a more flexible search than only by URL (collections, subjects, titles... and also as a last step full-text search).
Best,

Mar Pérez Morillo
Jefe del Área de Gestión del Depósito de las Publicaciones en Línea
División de Procesos y Servicios Digitales
Tfno.: 91 516 89 92
Biblioteca Nacional de España


De: Sabine Schostag [mailto:sas at kb.dk]
Enviado el: martes, 17 de julio de 2018 10:04
Para: Pérez Morillo, Mar; peter.stirling at bnf.fr; netarchivesuite-curator at ml.sbforge.org
Asunto: RE: [Netarchivesuite-curator] Netarchive NAS update for July

Dear all.

Hereby an update from KB DK:


*         We upgraded from old Wayback to OpenWayback. Still many images "are lost" and https is only partly supported (maybe the problem is different use of dns-secure/dirty setup in Copenhagen /Aarhus). The https based Social Media are still invisible.
The loss of images surely is a browser problem. We use IE - that is an old technology. When using the Edge browser all images get visible. Integrating Edge in our Wayback setup needs an update of our Citrix platform.

*         We started testing SOLRWayback in our production environment - the results look good. Our https based Twitter- and Facebook crawls are visible.
The great challenge is the proxy browser setup. A Firefox based setup will not be supported on the Citrix platform by a National IT service. The IT service will take charge of the support of almost all our IT platforms, devices, software, ... (a political decision of centralizing all IT support for national institutions)

*         Our second broad crawl for 2018 is half way done with step 2 (with a limit of 14 GB/domain). We have problems with jobs hanging with long breaks - and they need "manual help"

*         We set a new version of H3 (supporting "scrset" repsonsive design tags) in our production environment. Images using these responsive design tags have not been harvested from 2014 to june 2018. We still miss support for data-srcset tags.

*         We upgraded our Blacklight search front end to the newest version with support for new SOLR index, but there are still problems with the graphic design.

On behalf of the Netarchive curators

Best,
Sabine

PS. Peter, we would like to see the documetation of your study, too.

From: Netarchivesuite-curator <netarchivesuite-curator-bounces at ml.sbforge.org> On Behalf Of Pérez Morillo, Mar
Sent: Monday, July 16, 2018 11:13 AM
To: peter.stirling at bnf.fr; netarchivesuite-curator at ml.sbforge.org
Subject: Re: [Netarchivesuite-curator] BnF NAS update for July

Hi Peter,

            Is it posible to share with us the documentation of this study?

            Best,

Mar Pérez Morillo
Jefe del Área de Gestión del Depósito de las Publicaciones en Línea
División de Procesos y Servicios Digitales
Tfno.: 91 516 89 92
Biblioteca Nacional de España


De: Netarchivesuite-curator [mailto:netarchivesuite-curator-bounces at ml.sbforge.org] En nombre de peter.stirling at bnf.fr<mailto:peter.stirling at bnf.fr>
Enviado el: viernes, 13 de julio de 2018 16:33
Para: netarchivesuite-curator at ml.sbforge.org<mailto:netarchivesuite-curator at ml.sbforge.org>
Asunto: [Netarchivesuite-curator] BnF NAS update for July

Hello all,

Since 2011, the BnF has published each year a summary and statistics on national publishing output based on the collections made via legal deposit, including the web archives. In addition to the general analysis, each year has a focus, and for this year it was politics. We compared the web collections relating to three elections: 2007, 2012 and 2017. The three selections were based on the same categories and covered a lot of regions. The analysis confirms our hypothesis: since 2007, videos have become ubiquitous and the use of social networks from blogs to Twitter and Facebook has exploded.

We also measured the percentage of web sites still online. After 10 years, 26% of sites for elections are still on line, 19% redirect to other websites and 55% have disappeared. After 5 years, 44% are sitll online, 22 % redirect and 33% have disappeared. After 1 year, 81% are still on line, 10% redirect and 6% have disappeared. The lifespan of a web site changes a lot but the results show that the other collections (outside electoral collections) are complementary.

During May, a placement student worked on the scope of the broad and focused crawls in view of the legal definition of the French domain. With 4.5 million sites, the BnF covers less than 60% of the French web. To be more representative, BnF must extend its contacts to other registers, especially those with generic TLDs. We hope to contact the company Gandi to obtain their list of sites and improve the coverage. For the focused crawl, it is suggested that selection must be related to how the web is used and not only the traditional collections, and also that new ways of selection can help, such as more cooperation with librarians, researchers... In addition, more legal clarification is needed relating to the harvest of social networks: for the moment the BnF only crawls accounts and some hashtags of public figures or organisations. Finally, the study proposes to document the dynamics of sites through filmed tutorials, especially when there are technical difficulties for the crawler.

Best regards,
The BnF digital legal deposit team
________________________________

Exposition Picasso et la danse<http://www.bnf.fr/fr/evenements_et_culture/anx_expositions/f.picasso_et_danse.html> - du 19 juin au 16 septembre 2018 - BnF - Biblioth?que-mus?e de l'Op?ra

Avant d'imprimer, pensez ? l'environnement.

________________________________
Este mensaje y cualquier fichero adjunto están dirigidos únicamente a sus destinatarios y contiene información confidencial. Si usted ha recibido este correo electrónico por error, le informamos que no puede realizar ninguna revisión, alteración, impresión, copia, transmisión, difusión ni utilización alguna de este mensaje ni de cualquier fichero adjunto que pudiese contener. La realización de cualquiera de los actos indicados está expresamente prohibida por las Normas que regulan estas materias. Por todo ello se solicita que, en caso de existir error en la recepción de este mensaje, se lo notifique al remitente respondiendo a este e-mail y elimine el mensaje y su contenido inmediatamente. La Biblioteca Nacional de España se reserva las acciones legales que le correspondan en el caso de que se infrinja lo indicado anteriormente.
________________________________
The information in this e-mail and any attachments is confidential and it is intended for the addressee only. If you have received this e-mail in error, you are notified that any revision, amendment, print, copy, disclosure, distribution or use of the contents is unauthorized. Carrying out any of the above actions, is expressly banned by rules governing this matter. Hence we request that if you are not the intended recipient, please notify the sender answering this e-mail, and delete the message and any attachments. The National Library of Spain reserves itself the right to take the appropriate legal actions in the event of the above mentioned matter is being infringed.
________________________________
________________________________
Este mensaje y cualquier fichero adjunto están dirigidos únicamente a sus destinatarios y contiene información confidencial. Si usted ha recibido este correo electrónico por error, le informamos que no puede realizar ninguna revisión, alteración, impresión, copia, transmisión, difusión ni utilización alguna de este mensaje ni de cualquier fichero adjunto que pudiese contener. La realización de cualquiera de los actos indicados está expresamente prohibida por las Normas que regulan estas materias. Por todo ello se solicita que, en caso de existir error en la recepción de este mensaje, se lo notifique al remitente respondiendo a este e-mail y elimine el mensaje y su contenido inmediatamente. La Biblioteca Nacional de España se reserva las acciones legales que le correspondan en el caso de que se infrinja lo indicado anteriormente.
________________________________
The information in this e-mail and any attachments is confidential and it is intended for the addressee only. If you have received this e-mail in error, you are notified that any revision, amendment, print, copy, disclosure, distribution or use of the contents is unauthorized. Carrying out any of the above actions, is expressly banned by rules governing this matter. Hence we request that if you are not the intended recipient, please notify the sender answering this e-mail, and delete the message and any attachments. The National Library of Spain reserves itself the right to take the appropriate legal actions in the event of the above mentioned matter is being infringed.
________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20180717/0caf37b0/attachment-0001.html>


More information about the Netarchivesuite-curator mailing list