[Netarchivesuite-users] RE Harvesting aborted

Nicchiarelli Eleonora eleonora.nicchiarelli at onb.ac.at
Thu Feb 18 16:55:13 CET 2010


Hi again Sara and others, 

which tools have you used to inspect the metadata arc file? Ours have sizes that vary from a few hundred KBs to almost 2 GBs...

Eleonora 

Eleonora Nicchiarelli Bettelli
Digital Preservation
Austrian National Library
Josefsplatz 1, 1015 Wien

Tel:  +43 1 53 410 686
Fax: +43 1 53 410 610
Web: http://www.onb.ac.at/
Mail: eleonora.nicchiarelli at onb.ac.at


> -----Ursprüngliche Nachricht-----
> Von: sara.aubry at bnf.fr [mailto:netarchivesuite-users-
> bounces at lists.gforge.statsbiblioteket.dk] Im Auftrag von sara.aubry at bnf.fr
> Gesendet: Mittwoch, 17. Februar 2010 17:34
> An: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
> Betreff: Re: [Netarchivesuite-users] RE Harvesting aborted
> 
> Hi Eleonora,
> 
> GUIApplication0.log.0 remains on the Admin machine and the
> HarvestControllerApplication_low0.log stay on each crawler.
> heritrix_out.log and progress-statistics.log should be available in the
> metadata ARC file of your job.
> 
> Sara
> 
> 
> 
> 
> 
> 
> 
> Message de : Nicchiarelli Eleonora <eleonora.nicchiarelli at onb.ac.at>
>                       17/02/2010 17:25
> 
> Envoyé par :
> <netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk>
> 
> Veuillez répondre à
> <netarchivesuite-users at lists.gforge.statsbiblioteket.dk>
> 
> 
> 
> Pour
> <netarchivesuite-users at lists.gforge.statsbiblioteket.dk>
> Copie
> 
> Objet
> Re: [Netarchivesuite-users] RE  Harvesting aborted
> 
> 
> 
> Hi Sara and all,
> 
> thank you again. But as far as I understand, most of those logs are
> available only for jobs that are currently running, is that right? What
> kind of diagnostics is available for a "done" job that has terminated
> possibly a couple of days ago?
> 
> Eleonora
> 
> Eleonora Nicchiarelli Bettelli
> Digital Preservation
> Austrian National Library
> Josefsplatz 1, 1015 Wien
> 
> Tel:  +43 1 53 410 686
> Fax: +43 1 53 410 610
> Web: http://www.onb.ac.at/
> Mail: eleonora.nicchiarelli at onb.ac.at
> 
> 
> > -----Ursprüngliche Nachricht-----
> > Von: sara.aubry at bnf.fr [mailto:netarchivesuite-users-
> > bounces at lists.gforge.statsbiblioteket.dk] Im Auftrag von
> sara.aubry at bnf.fr
> > Gesendet: Dienstag, 16. Februar 2010 16:19
> > An: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
> > Betreff: Re: [Netarchivesuite-users] RE Harvesting aborted
> >
> > > Is there anything that can be done about it now?
> > If your crawl is running, you cannot change this value. You would need
> to
> > redeploy NS.
> >
> > > Is there a quick way to see if it was an inactivity or a noresponse
> > timeout? (I will search in the logs of course)
> > We looked at the following logs :
> > - GUIApplication0.log.0
> > - HarvestControllerApplication_low0.log
> > - heritrix_out.log
> > - progress-statistics.log
> >
> > Sara
> >
> >
> >
> >
> >
> >
> >
> > Message de : Nicchiarelli Eleonora <eleonora.nicchiarelli at onb.ac.at>
> >                       16/02/2010 16:08
> >
> > Envoyé par :
> > <netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk>
> >
> > Veuillez répondre à
> > <netarchivesuite-users at lists.gforge.statsbiblioteket.dk>
> >
> >
> >
> > Pour
> > <netarchivesuite-users at lists.gforge.statsbiblioteket.dk>
> > Copie
> >
> > Objet
> > Re: [Netarchivesuite-users] RE  Harvesting aborted
> >
> >
> >
> > Hi Sara,
> >
> > thanks a lot, I see now that Andreas had had the same problem, only I
> had
> > not realised that it had at least partially the same cause. We had set
> > both these timeouts at 10800 or 3 hours, and we thought this was
> > sufficient, but it clearly isn't.
> >
> > Is there anything that can be done about it now?
> >
> > Is there a quick way to see if it was an inactivity or a noresponse
> > timeout? (I will search in the logs of course)
> >
> > Many thanks again,
> >
> > Eleonora
> >
> > Eleonora Nicchiarelli Bettelli
> > Digital Preservation
> > Austrian National Library
> > Josefsplatz 1, 1015 Wien
> >
> > Tel:  +43 1 53 410 686
> > Fax: +43 1 53 410 610
> > Web: http://www.onb.ac.at/
> > Mail: eleonora.nicchiarelli at onb.ac.at
> >
> >
> > > -----Ursprüngliche Nachricht-----
> > > Von: sara.aubry at bnf.fr [mailto:netarchivesuite-users-
> > > bounces at lists.gforge.statsbiblioteket.dk] Im Auftrag von
> > sara.aubry at bnf.fr
> > > Gesendet: Dienstag, 16. Februar 2010 15:27
> > > An: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
> > > Betreff: [Netarchivesuite-users] RE Harvesting aborted
> > >
> > > Hi Eleonora,
> > >
> > > We had to face the same problem at BnF for several jobs.
> > >
> > > NS runs activity checks (see
> > > https://lists.gforge.statsbiblioteket.dk/pipermail/netarchivesuite-
> > > users/2010-February/000342.html
> > > to see what kind of checks)
> > > and if it finds there has been no activity for a configurable period
> of
> > > time (inactivityTimeout  and noResponseTimeout ), NS terminates the
> job.
> > > The "Stopped due to" field for many domains is marked as "Harvesting
> > > aborted".
> > >
> > > We spent quite a bit of time to analyse the problem with Soren's help
> > and
> > > found no other solution than desactivate
> > > this checks by raising inactivityTimeout  and noResponseTimeout  to
> very
> > > high values.
> > >
> > > Sara
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Message de : Nicchiarelli Eleonora <eleonora.nicchiarelli at onb.ac.at>
> > >                       16/02/2010 14:52
> > >
> > > Envoyé par :
> > > <netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk>
> > >
> > > Veuillez répondre à
> > > <netarchivesuite-users at lists.gforge.statsbiblioteket.dk>
> > >
> > >
> > >
> > > Pour
> > > <netarchivesuite-users at lists.gforge.statsbiblioteket.dk>
> > > Copie
> > >
> > > Objet
> > > [Netarchivesuite-users] Harvesting aborted
> > >
> > >
> > >
> > > Dear all,
> > >
> > > thank you very much for your support so far. I have another question
> > > regarding our domain crawl: we have a job in which for many seeds the
> > > "Stopped due to" field says "Harvesting aborted". I know that this
> > happens
> > > when a job has been terminated through the Heritrix interface, but I
> > can't
> > > recall having done that recently. In which other conditions, if any,
> > does
> > > this happen?
> > >
> > > Many thanks in advance,
> > >
> > > Eleonora
> > >
> > > Eleonora Nicchiarelli Bettelli
> > > Digital Preservation
> > > Austrian National Library
> > > Josefsplatz 1, 1015 Wien
> > >
> > > Tel:  +43 1 53 410 686
> > > Fax: +43 1 53 410 610
> > > Web: http://www.onb.ac.at/
> > > Mail: eleonora.nicchiarelli at onb.ac.at
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > NetarchiveSuite-users mailing list
> > > NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
> > >
> >
> https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchivesuite-
> > > users
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Avant d'imprimer, pensez à l'environnement.
> > > Consider the environment before printing this mail.
> > > _______________________________________________
> > > NetarchiveSuite-users mailing list
> > > NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
> > >
> >
> https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchivesuite-
> > > users
> >
> >
> >
> > _______________________________________________
> > NetarchiveSuite-users mailing list
> > NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
> >
> https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchivesuite-
> > users
> >
> >
> >
> >
> >
> >
> > Avant d'imprimer, pensez à l'environnement.
> > Consider the environment before printing this mail.
> > _______________________________________________
> > NetarchiveSuite-users mailing list
> > NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
> >
> https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchivesuite-
> > users
> 
> 
> 
> _______________________________________________
> NetarchiveSuite-users mailing list
> NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
> https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchivesuite-
> users
> 
> 
> 
> 
> 
> 
> Avant d'imprimer, pensez à l'environnement.
> Consider the environment before printing this mail.
> _______________________________________________
> NetarchiveSuite-users mailing list
> NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
> https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchivesuite-
> users






More information about the NetarchiveSuite-users mailing list