[Netarchivesuite-users] Upload error

Søren Vejrup Carlsen svc at kb.dk
Fri Mar 20 16:09:33 CET 2009


Hi A.
Just for the record. Are you using NetarchiveSuite 3.6.0, or have you built it yourself?
I assume now, you are using 3.6.0.

About the IO error, I also think it is a disk error. Your filesystem is maybe in need of repair, or it could be a remote filesystem, that is no longer accessible.

You could check, if Heritrix has written any errors to its logs during its harvesting.
All the Heritrix errorlogs are also written to the 5-metadata-1.arc file (for Job 5).

About the job. The job is not resubmitted automatically. I don't even think, you can resubmit the job. I believe, this is only possible, if the harvesting is terminated prematurely.

>Will the uploaded files belonging to this failed job used for 
>deduplication (I assume yes)? 
That is correct.
The job will return a HarvestReport, even if upload of some files fail.

/Søren

-----Oprindelig meddelelse-----
Fra: netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk [mailto:netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk] På vegne af aponb at gmx.at
Sendt: 20. marts 2009 14:26
Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
Emne: [Netarchivesuite-users] Upload error

Just got an upload error from just one file (out of some 100) with the 
following Message:

Error uploading arcfile '/home/netarchive/apps/netarchivesuite/ONB/harvester_7051/5_1237407101891/arcs/5-3-20090319002508-00023-webcrawler06.onb.ac.at.arc.gz' Will be moved to '/home/netarchive/apps/netarchivesuite/ONB/oldjobs'
dk.netarkivet.common.exceptions.IOFailure: Could not store '/home/netarchive/apps/netarchivesuite/ONB/harvester_7051/5_1237407101891/arcs/5-3-20090319002508-00023-webcrawler06.onb.ac.at.arc.gz' after 3 attempts. Giving up.
Client-side exception occurred while storing '/home/netarchive/apps/netarchivesuite/ONB/harvester_7051/5_1237407101891/arcs/5-3-20090319002508-00023-webcrawler06.onb.ac.at.arc.gz' on attempt number 1 of 3.
dk.netarkivet.common.exceptions.ArgumentNotValid: Error creating singleton of class 'dk.netarkivet.common.distribute.HTTPRemoteFile': 
	at dk.netarkivet.common.utils.SettingsFactory.getInstance(SettingsFactory.java:102)
	at dk.netarkivet.common.distribute.RemoteFileFactory.getInstance(RemoteFileFactory.java:51)
	at dk.netarkivet.common.distribute.RemoteFileFactory.getDistributefileInstance(RemoteFileFactory.java:74)
	at dk.netarkivet.archive.arcrepository.distribute.StoreMessage.<init>(StoreMessage.java:55)
	at dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClient.store(JMSArcRepositoryClient.java:240)
	at dk.netarkivet.harvester.harvesting.HarvestController.uploadFiles(HarvestController.java:320)
	at dk.netarkivet.harvester.harvesting.HarvestController.storeFiles(HarvestController.java:263)
	at dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer.processHarvestInfoFile(HarvestControllerServer.java:550)
	at dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer.access$300(HarvestControllerServer.java:83)
	at dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer$HarvesterThread.run(HarvestControllerServer.java:647)
Caused by: dk.netarkivet.common.exceptions.IOFailure: Unable to checksum file '/home/netarchive/apps/netarchivesuite/ONB/harvester_7051/5_1237407101891/arcs/5-3-20090319002508-00023-webcrawler06.onb.ac.at.arc.gz'
	at dk.netarkivet.common.distribute.HTTPRemoteFile.<init>(HTTPRemoteFile.java:88)
	at dk.netarkivet.common.distribute.HTTPRemoteFile.getInstance(HTTPRemoteFile.java:114)
	at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at dk.netarkivet.common.utils.SettingsFactory.getInstance(SettingsFactory.java:100)
	... 9 more
Caused by: java.io.IOException: Input/output error
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:199)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
	at java.io.DataInputStream.read(DataInputStream.java:83)
	at dk.netarkivet.common.utils.MD5.generateMD5onFile(MD5.java:91)
	at dk.netarkivet.common.distribute.HTTPRemoteFile.<init>(HTTPRemoteFile.java:86)
	... 14 more


I also tried it with the Upload Tool, but it failed with the same error.
The message obviously means that there was an error in writing the file 
on the local harvest machine (maybe a harddisk problem) - do you agree 
in that?
Is there anyway to bring that file in the bitarchive? But probably not, 
as I am not able to unzip that file.

And another question. This happened with a job which included 153 
configurations in a full harvest. What does this failed status now mean 
for the next step in the full harvest? Will the whole job repeated now? 
Will the uploaded files belonging to this failed job used for 
deduplication (I assume yes)?

Thanks again in advance for your answers
Regards
a.
_______________________________________________
NetarchiveSuite-users mailing list
NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchivesuite-users




More information about the NetarchiveSuite-users mailing list