[Netarchivesuite-devel] NAS and HTTP redirections

sara.aubry at bnf.fr sara.aubry at bnf.fr
Mon Jan 25 16:37:48 CET 2010

Hi all,

We're moving foward and getting close to our broad crawl.
We spent a while analyzing stats and stop reasons linked to specific 
domains within a job
and found out that HTTP redirect (like bikealot.fr goes to bikealot.eu), 
DNS no-reply (cesar-et-ses-cartes.fr) 
and HTTP errors (criminologic.fr) are given "Domain Completed" as stop 

It makes sense for DNS no-reply and HTTP errors, but it's quite different 
for HTTP redirect
which we want to collect beyond the first step, using the steps system ("
Harvest only 
domains that were not completely harvest in a previous harvest:" 

How do you manage crawls for these specific domains?
How do you gather stats on these domains?

Thanks for your help!


Avant d'imprimer, pensez à l'environnement. 
Consider the environment before printing this mail.   

More information about the Netarchivesuite-devel mailing list