[Netarchivesuite-devel] BnF ready to commit changes

nicolas.giraud at bnf.fr nicolas.giraud at bnf.fr
Thu Apr 15 17:34:45 CEST 2010


Hi Søren,

The changes have been committed. Here's a quick breakdown of the new 
features :

- PostgreSQL connectivity (using the PostgreSQL driver version 8.4 - JDBC 
4)

- New sort criteria and results pagination in the Harvest status main page

- New page "Running jobs" that monitors jobs being crawled, and allows to 
search which active jobs are crawling a specific domain (in case the 
webmaster is complaining ;) )

- HeritrixLauncher is now abstract and instancated through a factory 
method. This allows to propose different implementations of the crawl 
control loop.
DefaultHeritrixLauncher is the default legacy implementation, and I have 
introduced a slightly different implementation for BnF, that comes along 
with an 
optimized Heritrix JMX controller that solves the connection loss issue 
(BnFHeritrixController). I didn't want to impact the legacy 
implementation, hence I made all this 
pluggable, leaving you guys to judge wheher this implementation interests 
you too or not.

- a configurable wait period after ending a crawl and before sending the 
shutdown command to Heritrix, to allow the report generation to complete.

- an updated french translation

And things to fix/enhance :

- new strings to translate in harvester translation.properties files, I've 
only translated them to french.
- the pagination mechanism in the Harvest status main page relies on the 
LIMIT and OFFSET syntax. Though it is not SQL standard, this syntax is 
supported by many DB systems, MySQL and PostgreSQL in particular. 
Unfortunately Derby does not support it (cf. 
http://db.apache.org/derby/faq.html#limit), so this feature
is currently broken if the installation uses a Derby DB.

We should probably do a series of reviews when I come back from paternity 
leave on monday may 24th.

Best,

Nicolas Giraud




Avant d'imprimer, pensez à l'environnement. 
Consider the environment before printing this mail.   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.gforge.statsbiblioteket.dk/pipermail/netarchivesuite-devel/attachments/20100415/18fa970a/attachment.html 


More information about the Netarchivesuite-devel mailing list