[Netarchivesuite-users] Error Uploading metadata-File

Bjarne Andersen bja at statsbiblioteket.dk
Thu Jan 15 12:56:32 CET 2009


OK. So the reason that the manual upload afterwards does not work either could then possible be as Søren describes - an already existing metadata-file

best
Bjarne

> -----Original Message-----
> From: netarchivesuite-users-
> bounces at lists.gforge.statsbiblioteket.dk [mailto:netarchivesuite-
> users-bounces at lists.gforge.statsbiblioteket.dk] On Behalf Of Søren
> Vejrup Carlsen
> Sent: Thursday, January 15, 2009 11:54 AM
> To: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
> Subject: Re: [Netarchivesuite-users] Error Uploading metadata-File
>
> Hi Bjarne.
> The parser actually does continue. This exception is just written
> to the log for informational purposes at level FINE.
>
>
> Regards
> Søren
>
> -----Oprindelig meddelelse-----
> Fra: netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk
> [mailto:netarchivesuite-users-
> bounces at lists.gforge.statsbiblioteket.dk] På vegne af Bjarne
> Andersen
> Sendt: 14. januar 2009 19:57
> Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
> Emne: Re: [Netarchivesuite-users] Error Uploading metadata-File
>
> This looks like a bug in the parsing of the crawl.log:
> dk.netarkivet.common.exceptions.IOFailure: Unparsable URI in field
> 4 of
> crawl.log: 'invalid:https:/'.
>         at
> dk.netarkivet.harvester.harvesting.HeritrixDomainHarvestReport.proc
> essHarvestLine(HeritrixDomainHarvestReport.java:158)
>
> The parser looks for the Top Level Domain of the URI:
> invalid:https:/ - which is not possible.
>
> I think the parser should just continue upon such errors.
>
> best
> Bjarne Andersen
> ________________________________________
> Fra: netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk
> [netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk]
> På vegne af aponb at gmx.at [aponb at gmx.at]
> Sendt: 14. januar 2009 16:55
> Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
> Emne: [Netarchivesuite-users] Error Uploading metadata-File
>
> Hi!
>
> A full harvest with a small number of domains (~1300) ends on my
> system
> with one file failed to upload. I tested it more than once and it
> is
> always the metadata-File which can not be uploaded.  I also tried
> the
> Upload-Tool, but this also gave up to upload.
> Is it possible that the "Unparsable URI in field 4 of crawl.log
> "-Message is the reason for it? Do you have any idea why this
> happens?
> I am using the NetarchiveSuite Version 3.6.1 for this.
>
> Thanks for your time
> Regards
> a.
>
> See enclosed the log of the HarvesterController:
>
> INFO: Uploading file '1-metadata-1.arc' to arcrepository.
> Jan 14, 2009 4:41:17 PM
> dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClie
> nt store
> FINE: Sending a StoreMessage with file
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> Jan 14, 2009 4:41:17 PM
> dk.netarkivet.common.distribute.HTTPRemoteFileRegistry registerFile
> FINE: Registered file
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> with URL 'http://webcrawler06.onb.ac.at:8306/6198a797'
> Jan 14, 2009 4:41:18 PM
> dk.netarkivet.common.distribute.Synchronizer
> sendAndWaitForOneReply
> FINE: Received reply for message:
> ID:466-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947678246: To
> ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
> OK
> Arcfile: 1-metadata-1.arc
> Jan 14, 2009 4:41:18 PM
> dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClie
> nt store
> WARNING: The returned message
> 'ID:466-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947678246: To
> ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
> Error: Failure while trying to store ARC file: 1-metadata-1.arc
> Arcfile:
> 1-metadata-1.arc' was not ok while waiting for reply on store of
> file
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> on attempt number 1 of 3. Error message was 'Failure while trying
> to
> store ARC file: 1-metadata-1.arc'
> Jan 14, 2009 4:41:18 PM
> dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClie
> nt store
> FINE: Sending a StoreMessage with file
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> Jan 14, 2009 4:41:18 PM
> dk.netarkivet.common.distribute.HTTPRemoteFileRegistry registerFile
> FINE: Registered file
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> with URL 'http://webcrawler06.onb.ac.at:8306/ccf2ef88'
> Jan 14, 2009 4:41:18 PM
> dk.netarkivet.common.distribute.Synchronizer
> sendAndWaitForOneReply
> FINE: Received reply for message:
> ID:469-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947678704: To
> ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
> OK
> Arcfile: 1-metadata-1.arc
> Jan 14, 2009 4:41:18 PM
> dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClie
> nt store
> WARNING: The returned message
> 'ID:469-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947678704: To
> ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
> Error: Failure while trying to store ARC file: 1-metadata-1.arc
> Arcfile:
> 1-metadata-1.arc' was not ok while waiting for reply on store of
> file
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> on attempt number 2 of 3. Error message was 'Failure while trying
> to
> store ARC file: 1-metadata-1.arc'
> Jan 14, 2009 4:41:18 PM
> dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClie
> nt store
> FINE: Sending a StoreMessage with file
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> Jan 14, 2009 4:41:18 PM
> dk.netarkivet.common.distribute.HTTPRemoteFileRegistry registerFile
> FINE: Registered file
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> with URL 'http://webcrawler06.onb.ac.at:8306/8f7305eb'
> Jan 14, 2009 4:41:19 PM
> dk.netarkivet.common.distribute.Synchronizer
> sendAndWaitForOneReply
> FINE: Received reply for message:
> ID:472-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947679161: To
> ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
> OK
> Arcfile: 1-metadata-1.arc
> Jan 14, 2009 4:41:19 PM
> dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClie
> nt store
> WARNING: The returned message
> 'ID:472-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947679161: To
> ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
> Error: Failure while trying to store ARC file: 1-metadata-1.arc
> Arcfile:
> 1-metadata-1.arc' was not ok while waiting for reply on store of
> file
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> on attempt number 3 of 3. Error message was 'Failure while trying
> to
> store ARC file: 1-metadata-1.arc'
> Jan 14, 2009 4:41:19 PM
> dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClie
> nt store
> WARNING: Could not store
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> after 3 attempts. Giving up.
> The returned message
> 'ID:466-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947678246: To
> ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
> Error: Failure while trying to store ARC file: 1-metadata-1.arc
> Arcfile:
> 1-metadata-1.arc' was not ok while waiting for reply on store of
> file
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> on attempt number 1 of 3. Error message was 'Failure while trying
> to
> store ARC file: 1-metadata-1.arc'
> The returned message
> 'ID:469-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947678704: To
> ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
> Error: Failure while trying to store ARC file: 1-metadata-1.arc
> Arcfile:
> 1-metadata-1.arc' was not ok while waiting for reply on store of
> file
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> on attempt number 2 of 3. Error message was 'Failure while trying
> to
> store ARC file: 1-metadata-1.arc'
> The returned message
> 'ID:472-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947679161: To
> ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
> Error: Failure while trying to store ARC file: 1-metadata-1.arc
> Arcfile:
> 1-metadata-1.arc' was not ok while waiting for reply on store of
> file
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> on attempt number 3 of 3. Error message was 'Failure while trying
> to
> store ARC file: 1-metadata-1.arc'
>
> Jan 14, 2009 4:41:19 PM
> dk.netarkivet.harvester.harvesting.HarvestController uploadFiles
> WARNING: Error uploading arcfile
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> Will be moved to
> '/home/netarchive/apps/netarchivesuite/ONB/oldjobs'
> dk.netarkivet.common.exceptions.IOFailure: Could not store
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> after 3 attempts. Giving up.
> The returned message
> 'ID:466-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947678246: To
> ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
> Error: Failure while trying to store ARC file: 1-metadata-1.arc
> Arcfile:
> 1-metadata-1.arc' was not ok while waiting for reply on store of
> file
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> on attempt number 1 of 3. Error message was 'Failure while trying
> to
> store ARC file: 1-metadata-1.arc'
> The returned message
> 'ID:469-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947678704: To
> ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
> Error: Failure while trying to store ARC file: 1-metadata-1.arc
> Arcfile:
> 1-metadata-1.arc' was not ok while waiting for reply on store of
> file
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> on attempt number 2 of 3. Error message was 'Failure while trying
> to
> store ARC file: 1-metadata-1.arc'
> The returned message
> 'ID:472-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947679161: To
> ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
> Error: Failure while trying to store ARC file: 1-metadata-1.arc
> Arcfile:
> 1-metadata-1.arc' was not ok while waiting for reply on store of
> file
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/metadata/1-metadata-1.arc'
> on attempt number 3 of 3. Error message was 'Failure while trying
> to
> store ARC file: 1-metadata-1.arc'
>
>         at
> dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClie
> nt.store(JMSArcRepositoryClient.java:286)
>         at
> dk.netarkivet.harvester.harvesting.HarvestController.uploadFiles(Ha
> rvestController.java:320)
>         at
> dk.netarkivet.harvester.harvesting.HarvestController.storeFiles(Har
> vestController.java:266)
>         at
> dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServ
> er.processHarvestInfoFile(HarvestControllerServer.java:550)
>         at
> dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServ
> er.access$300(HarvestControllerServer.java:83)
>         at
> dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServ
> er$HarvesterThread.run(HarvestControllerServer.java:647)
> Jan 14, 2009 4:41:23 PM
> dk.netarkivet.harvester.harvesting.HeritrixDomainHarvestReport
> parseCrawlLog
> FINE: Invalid line in
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806/logs/crawl.log'
> line 89748: '2009-01-14T14:19:57.101Z    -7          -
> invalid:https:/
> EX http://www.bg-bab.ac.at/menu.files/dmenu.js no-type #022 - - - -
> '.
> Ignoring.
> dk.netarkivet.common.exceptions.IOFailure: Unparsable URI in field
> 4 of
> crawl.log: 'invalid:https:/'.
>         at
> dk.netarkivet.harvester.harvesting.HeritrixDomainHarvestReport.proc
> essHarvestLine(HeritrixDomainHarvestReport.java:158)
>         at
> dk.netarkivet.harvester.harvesting.HeritrixDomainHarvestReport.pars
> eCrawlLog(HeritrixDomainHarvestReport.java:107)
>         at
> dk.netarkivet.harvester.harvesting.HeritrixDomainHarvestReport.<ini
> t>(HeritrixDomainHarvestReport.java:86)
>         at
> dk.netarkivet.harvester.harvesting.HarvestController.generateHeritr
> ixDomainHarvestReport(HarvestController.java:294)
>         at
> dk.netarkivet.harvester.harvesting.HarvestController.storeFiles(Har
> vestController.java:268)
>         at
> dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServ
> er.processHarvestInfoFile(HarvestControllerServer.java:550)
>         at
> dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServ
> er.access$300(HarvestControllerServer.java:83)
>         at
> dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServ
> er$HarvesterThread.run(HarvestControllerServer.java:647)
> Jan 14, 2009 4:41:28 PM
> dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServ
> er
> processHarvestInfoFile
> INFO: Done post-processing files for job 1 in dir:
> '/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942
> 064806'
> Jan 14, 2009 4:41:28 PM
> dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServ
> er$HarvesterThread
> run
> INFO: Ending crawl of job : 1
>
>
>
>
>
> _______________________________________________
> NetarchiveSuite-users mailing list
> NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
> https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchiv
> esuite-users
>
> _______________________________________________
> NetarchiveSuite-users mailing list
> NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
> https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchiv
> esuite-users
>
> _______________________________________________
> NetarchiveSuite-users mailing list
> NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
> https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchiv
> esuite-users




More information about the NetarchiveSuite-users mailing list