[Netarchivesuite-curator] Brief update form KB DK

Pérez Morillo, Mar mar.perez at bne.es
Tue Sep 11 11:51:14 CEST 2018


Dear all,

This is the brief update from the BNE.

Our selective crawls are running as usual. We are considering launching a couple of new selective crawls on “Gastronomy” (which is an important topic in Spain) and “Folklore and popular traditions”.
We have a big gap on collections about important fields like Language and Literature, History, Social Science, Biology and Medicine or Science and Technology, as we don’t have special departments in charge of this kind of collections. We have signed an agreement with the University Libraries Network in Spain (REBIUN) for them to cooperate with us on selecting and managing web collections on these fields. In the meantime, we are creating a basis with a small bunch of seeds per subject.
We are still working on a new search interface that can provide access by collection and subject.
We are also considering the possibility of opening the access on internet to our web archive excluding the previous year, following the example of the Portuguese Web Archive. We have to consult our legal staff yet to be sure that we prevent claims or complaints from the web content providers.

Best,

Mar Pérez Morillo
Jefe del Área de Gestión del Depósito de las Publicaciones en Línea
División de Procesos y Servicios Digitales
Tfno.: 91 516 89 92
Biblioteca Nacional de España


De: Netarchivesuite-curator [mailto:netarchivesuite-curator-bounces at ml.sbforge.org] En nombre de Sabine Schostag
Enviado el: martes, 11 de septiembre de 2018 11:25
Para: netarchivesuite-curator at ml.sbforge.org
Asunto: [Netarchivesuite-curator] Brief update form KB DK

Dear all.

Here is what we are grappling with just now:

Broad crawl: first step of our third broad crawl for 2018 started on August 25 and is still ongoing.

Selctive crawl: September 5 is the official commemoration day for Danish soldiers, who had been deployed in war or conflict zones. Together with partners from the Danish National Archives, we are running an event crawl on this commemoration. We used BCWeb for the nomination of the url’s. Everything went fine – Steven has hardcoded the needed schedules, as we have no schedules with integrated hops with Heritrix 5. However, after the fourth crawl all crawls failed without any changes. We made test crawls with the BCWeb schedules – they work fine. We still have not solved the problem. Therefore, we created a “replacement” event harvest definition without using BCWeb.

Open wayback: We are now able to display pages using https, but far from all https-pages. For instance, we are not able to display social media pages.

Blacklight (fulltext search): the facets to refine a search do not work.
SOLRWayback: we made some tests in our production environment. The results are promising: we are able to display pages form Twitter and Facebook crawls after they started using https. Now the most important is to resolve problems with the proxy browser setup.
On behalf of the Netarchive Team

Best,
Sabine

________________________________
Este mensaje y cualquier fichero adjunto están dirigidos únicamente a sus destinatarios y contiene información confidencial. Si usted ha recibido este correo electrónico por error, le informamos que no puede realizar ninguna revisión, alteración, impresión, copia, transmisión, difusión ni utilización alguna de este mensaje ni de cualquier fichero adjunto que pudiese contener. La realización de cualquiera de los actos indicados está expresamente prohibida por las Normas que regulan estas materias. Por todo ello se solicita que, en caso de existir error en la recepción de este mensaje, se lo notifique al remitente respondiendo a este e-mail y elimine el mensaje y su contenido inmediatamente. La Biblioteca Nacional de España se reserva las acciones legales que le correspondan en el caso de que se infrinja lo indicado anteriormente.
________________________________
The information in this e-mail and any attachments is confidential and it is intended for the addressee only. If you have received this e-mail in error, you are notified that any revision, amendment, print, copy, disclosure, distribution or use of the contents is unauthorized. Carrying out any of the above actions, is expressly banned by rules governing this matter. Hence we request that if you are not the intended recipient, please notify the sender answering this e-mail, and delete the message and any attachments. The National Library of Spain reserves itself the right to take the appropriate legal actions in the event of the above mentioned matter is being infringed.
________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20180911/24e46c11/attachment-0001.html>


More information about the Netarchivesuite-curator mailing list