[Netarchivesuite-users] Database capacity a possible bottleneck?
Peter Svanberg
Peter.Svanberg at kb.se
Fri Dec 13 18:34:44 CET 2019
At a test with 100000 domains (500 kByte) in our test environment today the harvesting and job management was okay. But again there were problems with non-updated GUI: the "all jobs" page gave correct info but the "all running jobs" did not. It seems they are fetching data from different tables in the database.
And again the database processes was very busy and the queues were not emptied in time. So we are now suspecting the database handling to be the bottleneck in our setup. For example, the post-processing of the harvest report for 10000 domains took around 35 minutes. Is that a database-heavy task?
We are using Postgresql, are there others using that?
And the frontier reporting, which is done every frontierReportWaitTime second (default 600, we have 120 but it is done even more often). What do we lose if we make that value higher? And what can we gain from it? Is the amount of data involved in the post-processing affected by how often we have done frontier reports?
Any hints on how to unburden our database is appreciated!
Regards,
Peter Svanberg, Sweden
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20191213/438ee347/attachment.html>
More information about the NetarchiveSuite-users
mailing list