[Netarchivesuite-devel] [archive-crawler] Heritrix 3.0 alpha preview available

Søren Vejrup Carlsen svc at kb.dk
Thu Jun 11 12:16:53 CEST 2009


FYI

-----Oprindelig meddelelse-----
Fra: archive-crawler at yahoogroups.com [mailto:archive-crawler at yahoogroups.com] På vegne af Gordon Mohr
Sendt: 11. juni 2009 11:08
Til: archive-crawler at yahoogroups.com
Emne: [archive-crawler] Heritrix 3.0 alpha preview available

A preview/alpha testing version of Heritrix 3.0 is now available.

We encourage expert Heritrix users curious about upcoming changes to 
review this alpha and share their feedback.

Information on obtaining and running this release is available on the 
project wiki:

http://webarchive.jira.com/wiki/display/Heritrix/Heritrix3

Heritrix 3 has a new, Spring-based system for configuring and 
instantiating/launching crawls. The Spring-originated XML configuration 
metadata format is now our format for describing crawls, as well.

The web-based user-interface in Heritrix 3 has been streamlined and 
updated to have consistent URLs and simple forms for most actions, 
including viewing and editing job files or running arbitrary script code 
within the context of a job. Programmatic operations against the web 
interface have replaced JMX as the preferred manner to remote-control 
Heritrix.

Also, Heritrix 3 moves to a model where a single job, in a single job 
directory, may be be relaunched in place many times (instead of creating 
a new job directory before each launch).

As a prerelease test version, there are still known gaps in 
functionality, interface, and documentation; we're working towards a 
official 3.0 release in July. The current prioritized roster of issues 
to be addressed is viewable in the project issue tracker.

Distribution packages (.tar.gz or .zip) may be downloaded directly from 
our Maven2 repository:

http://builds.archive.org:8080/maven2/org/archive/heritrix/heritrix/3.0.0-alpha/

As always, problem reports, ideas, fix/feature contributions, and other 
feedback are all welcome here on the list and on the project wiki and 
JIRA issue tracker:

Heritrix Wiki: http://webarchive.jira.com/wiki/display/Heritrix
Heritrix JIRA: http://webarchive.jira.com/browse/HER

Thanks!

- Gordon @ IA


------------------------------------

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/archive-crawler/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/archive-crawler/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:archive-crawler-digest at yahoogroups.com 
    mailto:archive-crawler-fullfeatured at yahoogroups.com

<*> To unsubscribe from this group, send an email to:
    archive-crawler-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/





More information about the Netarchivesuite-devel mailing list