[Netarchivesuite-devel] NAS and HTTP redirections
sara.aubry at bnf.fr
sara.aubry at bnf.fr
Mon Jan 25 16:37:48 CET 2010
Hi all,
We're moving foward and getting close to our broad crawl.
We spent a while analyzing stats and stop reasons linked to specific
domains within a job
and found out that HTTP redirect (like bikealot.fr goes to bikealot.eu),
DNS no-reply (cesar-et-ses-cartes.fr)
and HTTP errors (criminologic.fr) are given "Domain Completed" as stop
reasons.
It makes sense for DNS no-reply and HTTP errors, but it's quite different
for HTTP redirect
which we want to collect beyond the first step, using the steps system ("
Harvest only
domains that were not completely harvest in a previous harvest:"
checkbox).
How do you manage crawls for these specific domains?
How do you gather stats on these domains?
Thanks for your help!
Sara
Avant d'imprimer, pensez à l'environnement.
Consider the environment before printing this mail.
More information about the Netarchivesuite-devel
mailing list