[Netarchivesuite-users] NAS and HTTP redirections

sara.aubry at bnf.fr sara.aubry at bnf.fr
Mon Jan 25 16:37:48 CET 2010


Hi all,

We're moving foward and getting close to our broad crawl.
We spent a while analyzing stats and stop reasons linked to specific 
domains within a job
and found out that HTTP redirect (like bikealot.fr goes to bikealot.eu), 
DNS no-reply (cesar-et-ses-cartes.fr) 
and HTTP errors (criminologic.fr) are given "Domain Completed" as stop 
reasons.

It makes sense for DNS no-reply and HTTP errors, but it's quite different 
for HTTP redirect
which we want to collect beyond the first step, using the steps system ("
Harvest only 
domains that were not completely harvest in a previous harvest:" 
checkbox).

How do you manage crawls for these specific domains?
How do you gather stats on these domains?

Thanks for your help!

Sara





Avant d'imprimer, pensez à l'environnement. 
Consider the environment before printing this mail.   



More information about the NetarchiveSuite-users mailing list