[Netarchivesuite-users] can't add new web address

Bjarne Andersen bja at statsbiblioteket.dk
Wed Nov 11 16:50:53 CET 2009


You can use the QA tool (viewerproxy) to pickup URLs for the missing objects. After browsing around list the missing URLs and copy/paste into the seed list. Next time the same harvestdefiniton is run the result should be better.
Some times it takes a couple of rounds doing this before everything is perfect. At other times the flash content is simply too dynamic and complex to crawl with current heritrix - or at least requires very specialized setting up of the webcrawler
_
Bjarne andersen

Sent fra min HTC Touch Pro

________________________________
Fra: Branislav Kovacevic <Kovacev at ceu.hu>
Sendt: 11. november 2009 16:40
Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk <netarchivesuite-users at lists.gforge.statsbiblioteket.dk>
Emne: Re: [Netarchivesuite-users] can't add new web address

Dear Eleonora and Bjarne

yes the problem was "www" in the domain name.
Now I created osf.bg domain and added www.osf.bg<http://www.osf.bg> as a seed.
Also started the job and harvested around 300 Mb according to the QA Job Selection window.
What an achievement. I feel proud of myself :)

Than I set the browser to use localhost and started reviewing the site.
Unfortunately, I do not see those parts with are in flash, like menu and logo on the main page.
Can anything be done about that?

Many thanks
Branko


>>> Bjarne Andersen <bja at statsbiblioteket.dk> 11/11/2009 12:55 PM >>>
The configuration allows you to define the level of domains you can add as domains in the system. Per default only ordinary TLDs are allowed, so without changing any settings you should add the domains as:
Osf.bg
Org.ba
Org.me

The you can offcause add the complete seeds at the seed-list level under the domains.
You can also configure org.ba and org.me to be valid TLDs them selves, that would allow you to handle soros.org.ba as a domain itself if thats what makes sence to you
Best
Bjarne andersen

Sent fra min HTC Touch Pro

________________________________
Fra: Branislav Kovacevic <Kovacev at ceu.hu>
Sendt: 11. november 2009 12:30
Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk <netarchivesuite-users at lists.gforge.statsbiblioteket.dk>
Emne: Re: [Netarchivesuite-users] can't add new web address

Hi Eleonora
Many thanks for the swift reply.

The addresses I'm trying to add as harvesting domains are:
www.osf.bg/<http://www.osf.bg/>
www.soros.org.ba<http://www.soros.org.ba>
www.osim.org.me<http://www.osim.org.me>

These are our Network existing websites, of course...



>>> Nicchiarelli Eleonora <eleonora.nicchiarelli at onb.ac.at> 11/11/2009 12:07 PM >>>
Hi Branko,
I am sure the Netarchive team can answer this in more detail, but I just wanted to say that this has happened to me as well at the beginning, and that in this context “illegal” just means “of invalid format” and the like. What form had the domains that you wanted to add?
Regards,
Eleonora

Eleonora Nicchiarelli Bettelli

Digital Preservation

Austrian National Library

Josefsplatz 1, 1015 Wien



Tel:  +43 1 53 410 686

Fax: +43 1 53 410 610

Web: http://www.onb.ac.at/

Mail: eleonora.nicchiarelli at onb.ac.at

________________________________
Von: Kovacev at ceu.hu [mailto:netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk] Im Auftrag von Branislav Kovacevic
Gesendet: Mittwoch, 11. November 2009 11:49
An: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
Betreff: [Netarchivesuite-users] can't add new web address

Dear All

we just started experimenting with the Netarchivesuite and unfortunately got stacked at the very beginning.
When trying to create selective harvest by adding the web domain there is a message:
"The following domains are illegal and cannot be added".

Could anybody please explain what does this message mean. Why adding domain of one of our institutions is illegal?

Many thanks
Branko






Branislav Kovacevic
Senior Records Officer
Open Society Archives
Arany Janos u. 32
1051 Budapest, Hungary
phone: (36-1) 327-3266
e-mail: kovacev at ceu.hu
website: www.osa.ceu.hu
++++++++++++++++++++++++++++
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20091111/1669a988/attachment-0002.html>


More information about the NetarchiveSuite-users mailing list