[Netarchivesuite-users] How to get better performance for the History list per harvestid in the GUI
Tue Hejlskov Larsen
tlr at kb.dk
Tue May 17 08:37:55 CEST 2022
I found out that the current indexes is not used by our code in HarvestDefinitionDBDAO in the method getHarvestRunInfo().
If somebody add "and jobs.harvest_id = historyinfo.harvest_id" to each where clause, you get immediate response for the selects instead of 30 secs or more!
Here are the 2 different queries:
s = c.prepareStatement("SELECT name, harvest_num, status, MIN(startdate), MAX(enddate), COUNT(job_id)"
+ " FROM jobs, harvestdefinitions"
+ " WHERE harvestdefinitions.harvest_id = ? AND jobs.harvest_id = harvestdefinitions.harvest_id"
+ " GROUP BY name, harvest_num, status ORDER BY harvest_num DESC");
and
s = c.prepareStatement("SELECT jobs.harvest_num, SUM(historyinfo.bytecount), "
+ "SUM(historyinfo.objectcount)," + "COUNT(jobs.status)" + " FROM jobs, historyinfo "
+ " WHERE jobs.harvest_id = ? AND historyinfo.job_id = jobs.job_id" + " GROUP BY jobs.harvest_num"
+ " ORDER BY jobs.harvest_num");
I have verified the where clause changes by following example in our jobs database:
prod_harvestdb=# SELECT jobs.harvest_num, SUM(historyinfo.bytecount), SUM(historyinfo.objectcount),COUNT(jobs.status) FROM jobs, historyinfo WHERE jobs.harvest_id = 370 AND historyinfo.job_id = jobs.job_id GROUP BY jobs.harvest_num ORDER BY jobs.harvest_num;
prod_harvestdb=# SELECT jobs.harvest_num, SUM(historyinfo.bytecount), SUM(historyinfo.objectcount),COUNT(jobs.status) FROM jobs, historyinfo WHERE jobs.harvest_id = 370 AND historyinfo.job_id = jobs.job_id and jobs.harvest_id = historyinfo.harvest_id GROUP BY jobs.harvest_num ORDER BY jobs.harvest_num;
Hopefully somebody can add that change to the code....
Best regards
Tue
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20220517/c1edcc47/attachment.html>
More information about the NetarchiveSuite-users
mailing list