[Netarchivesuite-devel] Status for opgrading to 3.18 in production

Tue Feb 28 10:33:50 CET 2012

Hi All

Here is more details and fixes/workarounds for the new 3.18 in production at our site:

We are on track again and the indexing for the broad crawl is now parallelized and the total start up time
for the broad crawl including creation of a 80 Gb deduplication index took only about 24 hours
without any manual intervention ( in 3.16 took it 4-5 days)

Be aware of, that the new index creation method places a heavy load during sorting in the folder tmpdircommon.

We had  24 broad crawl harvesters and 33 selective harvesters active during startup (no single low prio
harvester)

What was the main problems during the upstart:

1) Every low prio harvester died with Java out of heap space after it got the index.
   It seems, that the new parallelized broad crawl index demands more memory for the Heritrix processes.
   Fix: increased memory to 3 GB per Heritrix instanse in the local settings.xml file
 <harvester>
            <harvesting>
                <heritrix>
                    <heapSize>2936M</heapSize>

  on each 64 bit server and closed all 32 bit harvesters (4).

2) Continuously start, running and fail of harvesters and log spam about trying to generate or to find a new
   index, even though the index was in place and ready  - until no more jobs in queue.
   Fix: The new requested index name was created as a link to the already created in each havester cache/DEDUP_CRAWL_LOG/  e.g.
   ln -s 127268-127269-127270-127271-27f47726643b267f48a0368d21f7a0fe-cache 127261-127262-127263-127264-571d26967e066b5a3ccbf384c937d74d-cache
   ( the first folder is the new requested and last folder contains actually the generated lucene index)
   and all jobs was resubmitted.

3) Selective harvest waits for index until broad crawl index is finished.
   Fix: no fix currently.

4) Running jobs GUI out of sync with actually running jobs.
   Fix: used SVC's adhoc java tool to delete zombee "Running jobs"

What was our main problems during the upgrade to 3.18 in production:

1) corrupt indexes in the derby admin database.
   Fix: recreated the indexes.

2) very slow new lookup table in the admin database.
   Fix: reconfigurated the lookup table to one with only 1 record

All the best

Tue

Daily manager, Netarkivet.dk

Fra: Tue Larsen
Sendt: 27. februar 2012 08:44
Til: netarchivesuite-devel at ml.sbforge.org
Cc: webarkivering-teknik at statsbiblioteket.dk; Bjarne Andersen; Christen Hedegaard; netarkivet-kuratorer-forward
Emne: SV: Status for opgrading to 3.18 in production

Sorry for the delay

All the best
Tue

________________________________
Fra: Tue Larsen [tlr at kb.dk]
Sendt: 24. februar 2012 09:30
Til: netarchivesuite-devel at lists.gforge.statsbiblioteket.dk
Cc: webarkivering-teknik at statsbiblioteket.dk; Bjarne Andersen; Christen Hedegaard; netarkivet-kuratorer-forward
Emne: Status for opgrading to 3.18 in production
Hi All

We have upgraded  our production system to 3.18 and started the first broadcrawl.
Unfortunately, we have a number of serious issues and need to find workarounds or fixes for them.

Currently, I will recommend,  that you postpone your upgrade to 3.18 in production until we have an overview and
I will come back to you next week with more details.

All the best

Tue Larsen

Daily manager, Netarkivet.dk

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-devel/attachments/20120228/c5c0d18c/attachment.html>