[Netarchivesuite-users] can't add new web address
Bjarne Andersen
bja at statsbiblioteket.dk
Wed Nov 11 16:50:53 CET 2009
You can use the QA tool (viewerproxy) to pickup URLs for the missing objects. After browsing around list the missing URLs and copy/paste into the seed list. Next time the same harvestdefiniton is run the result should be better.
Some times it takes a couple of rounds doing this before everything is perfect. At other times the flash content is simply too dynamic and complex to crawl with current heritrix - or at least requires very specialized setting up of the webcrawler
_
Bjarne andersen
Sent fra min HTC Touch Pro
________________________________
Fra: Branislav Kovacevic <Kovacev at ceu.hu>
Sendt: 11. november 2009 16:40
Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk <netarchivesuite-users at lists.gforge.statsbiblioteket.dk>
Emne: Re: [Netarchivesuite-users] can't add new web address
Dear Eleonora and Bjarne
yes the problem was "www" in the domain name.
Now I created osf.bg domain and added www.osf.bg<http://www.osf.bg> as a seed.
Also started the job and harvested around 300 Mb according to the QA Job Selection window.
What an achievement. I feel proud of myself :)
Than I set the browser to use localhost and started reviewing the site.
Unfortunately, I do not see those parts with are in flash, like menu and logo on the main page.
Can anything be done about that?
Many thanks
Branko
>>> Bjarne Andersen <bja at statsbiblioteket.dk> 11/11/2009 12:55 PM >>>
The configuration allows you to define the level of domains you can add as domains in the system. Per default only ordinary TLDs are allowed, so without changing any settings you should add the domains as:
Osf.bg
Org.ba
Org.me
The you can offcause add the complete seeds at the seed-list level under the domains.
You can also configure org.ba and org.me to be valid TLDs them selves, that would allow you to handle soros.org.ba as a domain itself if thats what makes sence to you
Best
Bjarne andersen
Sent fra min HTC Touch Pro
________________________________
Fra: Branislav Kovacevic <Kovacev at ceu.hu>
Sendt: 11. november 2009 12:30
Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk <netarchivesuite-users at lists.gforge.statsbiblioteket.dk>
Emne: Re: [Netarchivesuite-users] can't add new web address
Hi Eleonora
Many thanks for the swift reply.
The addresses I'm trying to add as harvesting domains are:
www.osf.bg/<http://www.osf.bg/>
www.soros.org.ba<http://www.soros.org.ba>
www.osim.org.me<http://www.osim.org.me>
These are our Network existing websites, of course...
>>> Nicchiarelli Eleonora <eleonora.nicchiarelli at onb.ac.at> 11/11/2009 12:07 PM >>>
Hi Branko,
I am sure the Netarchive team can answer this in more detail, but I just wanted to say that this has happened to me as well at the beginning, and that in this context “illegal” just means “of invalid format” and the like. What form had the domains that you wanted to add?
Regards,
Eleonora
Eleonora Nicchiarelli Bettelli
Digital Preservation
Austrian National Library
Josefsplatz 1, 1015 Wien
Tel: +43 1 53 410 686
Fax: +43 1 53 410 610
Web: http://www.onb.ac.at/
Mail: eleonora.nicchiarelli at onb.ac.at
________________________________
Von: Kovacev at ceu.hu [mailto:netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk] Im Auftrag von Branislav Kovacevic
Gesendet: Mittwoch, 11. November 2009 11:49
An: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
Betreff: [Netarchivesuite-users] can't add new web address
Dear All
we just started experimenting with the Netarchivesuite and unfortunately got stacked at the very beginning.
When trying to create selective harvest by adding the web domain there is a message:
"The following domains are illegal and cannot be added".
Could anybody please explain what does this message mean. Why adding domain of one of our institutions is illegal?
Many thanks
Branko
Branislav Kovacevic
Senior Records Officer
Open Society Archives
Arany Janos u. 32
1051 Budapest, Hungary
phone: (36-1) 327-3266
e-mail: kovacev at ceu.hu
website: www.osa.ceu.hu
++++++++++++++++++++++++++++
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20091111/1669a988/attachment-0002.html>
More information about the NetarchiveSuite-users
mailing list