[Netarchivesuite-users] Error Uploading metadata-File

Søren Vejrup Carlsen svc at kb.dk
Thu Jan 15 11:53:42 CET 2009


Hi Bjarne.
The parser actually does continue. This exception is just written to the log for informational purposes at level FINE. 


Regards
Søren

-----Oprindelig meddelelse-----
Fra: netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk [mailto:netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk] På vegne af Bjarne Andersen
Sendt: 14. januar 2009 19:57
Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
Emne: Re: [Netarchivesuite-users] Error Uploading metadata-File

This looks like a bug in the parsing of the crawl.log:
dk.netarkivet.common.exceptions.IOFailure: Unparsable URI in field 4 of
crawl.log: 'invalid:https:/'.
        at
dk.netarkivet.harvester.harvesting.HeritrixDomainHarvestReport.processHarvestLine(HeritrixDomainHarvestReport.java:158)

The parser looks for the Top Level Domain of the URI: invalid:https:/ - which is not possible.

I think the parser should just continue upon such errors.

best
Bjarne Andersen
________________________________________
Fra: netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk [netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk] På vegne af aponb at gmx.at [aponb at gmx.at]
Sendt: 14. januar 2009 16:55
Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
Emne: [Netarchivesuite-users] Error Uploading metadata-File

Hi!

A full harvest with a small number of domains (~1300) ends on my system
with one file failed to upload. I tested it more than once and it is
always the metadata-File which can not be uploaded.  I also tried the
Upload-Tool, but this also gave up to upload.
Is it possible that the "Unparsable URI in field 4 of crawl.log
"-Message is the reason for it? Do you have any idea why this happens?
I am using the NetarchiveSuite Version 3.6.1 for this.

Thanks for your time
Regards
a.

See enclosed the log of the HarvesterController:

INFO: Uploading file '1-metadata-1.arc' to arcrepository.
Jan 14, 2009 4:41:17 PM
dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClient store
FINE: Sending a StoreMessage with file
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
Jan 14, 2009 4:41:17 PM
dk.netarkivet.common.distribute.HTTPRemoteFileRegistry registerFile
FINE: Registered file
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
with URL 'http://webcrawler06.onb.ac.at:8306/6198a797'
Jan 14, 2009 4:41:18 PM dk.netarkivet.common.distribute.Synchronizer
sendAndWaitForOneReply
FINE: Received reply for message:
ID:466-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947678246: To
ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050 OK
Arcfile: 1-metadata-1.arc
Jan 14, 2009 4:41:18 PM
dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClient store
WARNING: The returned message
'ID:466-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947678246: To
ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
Error: Failure while trying to store ARC file: 1-metadata-1.arc Arcfile:
1-metadata-1.arc' was not ok while waiting for reply on store of file
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
on attempt number 1 of 3. Error message was 'Failure while trying to
store ARC file: 1-metadata-1.arc'
Jan 14, 2009 4:41:18 PM
dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClient store
FINE: Sending a StoreMessage with file
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
Jan 14, 2009 4:41:18 PM
dk.netarkivet.common.distribute.HTTPRemoteFileRegistry registerFile
FINE: Registered file
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
with URL 'http://webcrawler06.onb.ac.at:8306/ccf2ef88'
Jan 14, 2009 4:41:18 PM dk.netarkivet.common.distribute.Synchronizer
sendAndWaitForOneReply
FINE: Received reply for message:
ID:469-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947678704: To
ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050 OK
Arcfile: 1-metadata-1.arc
Jan 14, 2009 4:41:18 PM
dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClient store
WARNING: The returned message
'ID:469-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947678704: To
ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
Error: Failure while trying to store ARC file: 1-metadata-1.arc Arcfile:
1-metadata-1.arc' was not ok while waiting for reply on store of file
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
on attempt number 2 of 3. Error message was 'Failure while trying to
store ARC file: 1-metadata-1.arc'
Jan 14, 2009 4:41:18 PM
dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClient store
FINE: Sending a StoreMessage with file
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
Jan 14, 2009 4:41:18 PM
dk.netarkivet.common.distribute.HTTPRemoteFileRegistry registerFile
FINE: Registered file
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
with URL 'http://webcrawler06.onb.ac.at:8306/8f7305eb'
Jan 14, 2009 4:41:19 PM dk.netarkivet.common.distribute.Synchronizer
sendAndWaitForOneReply
FINE: Received reply for message:
ID:472-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947679161: To
ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050 OK
Arcfile: 1-metadata-1.arc
Jan 14, 2009 4:41:19 PM
dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClient store
WARNING: The returned message
'ID:472-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947679161: To
ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
Error: Failure while trying to store ARC file: 1-metadata-1.arc Arcfile:
1-metadata-1.arc' was not ok while waiting for reply on store of file
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
on attempt number 3 of 3. Error message was 'Failure while trying to
store ARC file: 1-metadata-1.arc'
Jan 14, 2009 4:41:19 PM
dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClient store
WARNING: Could not store
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
after 3 attempts. Giving up.
The returned message
'ID:466-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947678246: To
ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
Error: Failure while trying to store ARC file: 1-metadata-1.arc Arcfile:
1-metadata-1.arc' was not ok while waiting for reply on store of file
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
on attempt number 1 of 3. Error message was 'Failure while trying to
store ARC file: 1-metadata-1.arc'
The returned message
'ID:469-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947678704: To
ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
Error: Failure while trying to store ARC file: 1-metadata-1.arc Arcfile:
1-metadata-1.arc' was not ok while waiting for reply on store of file
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
on attempt number 2 of 3. Error message was 'Failure while trying to
store ARC file: 1-metadata-1.arc'
The returned message
'ID:472-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947679161: To
ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
Error: Failure while trying to store ARC file: 1-metadata-1.arc Arcfile:
1-metadata-1.arc' was not ok while waiting for reply on store of file
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
on attempt number 3 of 3. Error message was 'Failure while trying to
store ARC file: 1-metadata-1.arc'

Jan 14, 2009 4:41:19 PM
dk.netarkivet.harvester.harvesting.HarvestController uploadFiles
WARNING: Error uploading arcfile
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
Will be moved to '/home/netarchive/apps/netarchivesuite/ONB/oldjobs'
dk.netarkivet.common.exceptions.IOFailure: Could not store
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
after 3 attempts. Giving up.
The returned message
'ID:466-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947678246: To
ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
Error: Failure while trying to store ARC file: 1-metadata-1.arc Arcfile:
1-metadata-1.arc' was not ok while waiting for reply on store of file
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
on attempt number 1 of 3. Error message was 'Failure while trying to
store ARC file: 1-metadata-1.arc'
The returned message
'ID:469-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947678704: To
ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
Error: Failure while trying to store ARC file: 1-metadata-1.arc Arcfile:
1-metadata-1.arc' was not ok while waiting for reply on store of file
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
on attempt number 2 of 3. Error message was 'Failure while trying to
store ARC file: 1-metadata-1.arc'
The returned message
'ID:472-127.0.0.1(99:e3:ee:cf:65:a4)-49304-1231947679161: To
ONB_COMMON_THE_ARCREPOS ReplyTo ONB_COMMON_THIS_HACO_127_0_0_1_7050
Error: Failure while trying to store ARC file: 1-metadata-1.arc Arcfile:
1-metadata-1.arc' was not ok while waiting for reply on store of file
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/metadata/1-metadata-1.arc'
on attempt number 3 of 3. Error message was 'Failure while trying to
store ARC file: 1-metadata-1.arc'

        at
dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClient.store(JMSArcRepositoryClient.java:286)
        at
dk.netarkivet.harvester.harvesting.HarvestController.uploadFiles(HarvestController.java:320)
        at
dk.netarkivet.harvester.harvesting.HarvestController.storeFiles(HarvestController.java:266)
        at
dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer.processHarvestInfoFile(HarvestControllerServer.java:550)
        at
dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer.access$300(HarvestControllerServer.java:83)
        at
dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer$HarvesterThread.run(HarvestControllerServer.java:647)
Jan 14, 2009 4:41:23 PM
dk.netarkivet.harvester.harvesting.HeritrixDomainHarvestReport parseCrawlLog
FINE: Invalid line in
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806/logs/crawl.log'
line 89748: '2009-01-14T14:19:57.101Z    -7          - invalid:https:/
EX http://www.bg-bab.ac.at/menu.files/dmenu.js no-type #022 - - - -'.
Ignoring.
dk.netarkivet.common.exceptions.IOFailure: Unparsable URI in field 4 of
crawl.log: 'invalid:https:/'.
        at
dk.netarkivet.harvester.harvesting.HeritrixDomainHarvestReport.processHarvestLine(HeritrixDomainHarvestReport.java:158)
        at
dk.netarkivet.harvester.harvesting.HeritrixDomainHarvestReport.parseCrawlLog(HeritrixDomainHarvestReport.java:107)
        at
dk.netarkivet.harvester.harvesting.HeritrixDomainHarvestReport.<init>(HeritrixDomainHarvestReport.java:86)
        at
dk.netarkivet.harvester.harvesting.HarvestController.generateHeritrixDomainHarvestReport(HarvestController.java:294)
        at
dk.netarkivet.harvester.harvesting.HarvestController.storeFiles(HarvestController.java:268)
        at
dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer.processHarvestInfoFile(HarvestControllerServer.java:550)
        at
dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer.access$300(HarvestControllerServer.java:83)
        at
dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer$HarvesterThread.run(HarvestControllerServer.java:647)
Jan 14, 2009 4:41:28 PM
dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer
processHarvestInfoFile
INFO: Done post-processing files for job 1 in dir:
'/home/netarchive/apps/netarchivesuite/ONB/harvester_7050/1_1231942064806'
Jan 14, 2009 4:41:28 PM
dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer$HarvesterThread
run
INFO: Ending crawl of job : 1





_______________________________________________
NetarchiveSuite-users mailing list
NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchivesuite-users

_______________________________________________
NetarchiveSuite-users mailing list
NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchivesuite-users




More information about the NetarchiveSuite-users mailing list