[Netarchivesuite-users] question

Kåre Fiedler Christiansen kfc at statsbiblioteket.dk
Mon Aug 25 13:15:07 CEST 2008


On Fri, 2008-08-22 at 15:13 +0300, Marius Papickas wrote:
> 
> Hello. 
> 
> Lithuanian National Library is working for archiving Lithuanian
> internet archiving project. Our company help them with this project.
> But we have got a problem. Library want to archive all top-level .lt
> domain. Is it possible with NetarchiveSuite 3.4.1archive all
> top-level .lt domain? And how can we do this? 

Dear Marius Papickas

NetarchiveSuite is designed to be able to control harvests over entire
national domains. So the short answer is "Yes, it is possible".

However, it builds on the assumption that you have access to a list of
domains for that national top level. In Denmark we have a national
domain registry, who have provided the list. You will need a similar
agreement with the domain registry in your country.

Once you have that list, you can import it, using "Definitions->Create
Domain(s)", and uploading the file with domain information. The format
should be UTF-8 text with one domain per line. The ingest of a full
national domain list will take some time.

After that, using the "Snapshot Harvest"-functionality, will harvest the
entire .lv national domain.


I would suggest you upgrade your NetarchiveSuite software from 3.4.1 to
3.6.0 at your convenience, some bugs have been fixed, and a few features
added.


If you cannot get access to a list of domains in .lv, NetarchiveSuite
currently isn't well suited to your needs, since it builds on an
assumption about knowing the domains beforehand. It might be worthwhile
to investigate extending the software to such a scenario. If you are
interested in cooperating on such a project, please let us know. 

An alternative solution is to use Heritrix directly with a scope of all
domains in .lv and a reasonable list of starting points. However,
running a harvest of the entire .lv domain in a single Heritrix job may
strain the scalability of a single Heritrix instance.


For more detailed information of web archiving and the advanced
possibilities of NetarchiveSuite I'll suggest you attend the two
upcoming tutorials in Denmark, September 14, during the ECDL conference:


Preserving Websites on a National Scale, with Demonstration in
NetarchiveSuite - morning session 

        Preservation of the World Wide Web is of increasing importance
for digital libraries. Requirements ranging from preservation of online
publication by agreement to Legal Deposit laws require libraries to have
a strategy for collecting, preserving and giving access to material
published on the web. NetarchiveSuite provides a librarian-friendly
interface for setting up and managing scheduled harvests of well-defined
parts of the Internet, as well as preserving and giving access to the
harvested material. This tutorial will cover collection strategies of
websites, quality assurance of your harvested material, and end-user
access, using NetarchiveSuite as the tool for managing the harvests.
        
        
and 


Installing, Maintaining and Running the NetarchiveSuite software -
afternoon session

        Experience shows, that after evaluating and identifying your
needs for web harvesting, getting software up and running on your
institution is a major task requiring planning, knowledge and skill.
This tutorial will aim at presenting an overview of what is needed for
an institution to implement and maintain the NetarchiveSuite software in
a setup relevant to the institution. It will cover needed hardware and
software setup and investments, needed training for system maintainers
and web curators, and how to maintain the system once it is running.


The best of luck on your web archiving project!

Kåre Fiedler Christiansen
NetarchiveSuite developer




More information about the NetarchiveSuite-users mailing list