[Netarchivesuite-curator] BnF NAS update for March

peter.stirling at bnf.fr peter.stirling at bnf.fr
Mon Mar 12 17:18:30 CET 2012

Dear all,

Here is our update for March, focusing on our election crawl.

Harvest for the 2012 Presidential and Parliamentary Elections
We have already started to harvest websites for the elections. As the 
candidates use social networks massively, we conducted special analyses 
about Facebook and Twitter. We had no problem with Facebook but Twitter 
was a real nightmare with redirections and # in the URL. Fortunately we 
manage to harvest it thanks to a special profile (without the mention 
Mozilla in the user agent), four times per day. However we have not yet 
resolved the problem of its access in the wayback machine.

We have made also a focus on videos, especially the platform Dailymotion. 
With a Beanshell script, we succeed in crawling more than 17 000 videos in 
two days. We'll use the same solution for our big Dailymotion harvest at 
the end of the month.

If you would like any more details please don't hesitate to ask.

Best regards,

The BnF digital legal deposit team

Exposition  Miniatures flamandes  - jusqu'au 10 juin 2012 - BnF - François-Mitterrand Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20120312/b0083005/attachment.html>

More information about the Netarchivesuite-curator mailing list