30
Cross Navigation Deep in web extraction is used to perform ‘cross navigation’
extraction, as example if you indicate a value of 3, Url Extractor starting form the
indicated page will navigate ALL the linked web pages going deep for 3 levels of
link, this will cause that starting from a single page, you can navigate and extract a
lot of pages if there are many linked pages in any of them.
You can have web page that can be in the list and not used for extraction. Use the
'Use Url' check box to toggle between use and not use.
If 'Use' is unselected, the page won't be used for extraction and it won't be used also
for cross navigation (extracting links and using linked pages to continue)
Single domain extraction
Url Extractor can navigate only inside the file it’s starting from (using the inserted url)
The default is to jump also outside, to linked sites. If you want to extract only from
the starting site without jumping to other sites select ‘Single Domain Extraction’.
You will have a lot less result in that case.
In the other cases, when you disable ‘Single Domain Extraction’ the url represents a
‘seed’ from which URL Extractor just start to navigate, and the navigation, providing
a high Cross Navigation Deep value, can be really unlimited.
To extract also from PDF (they will be recognized by the extension and read using
the OS X native engine for pdf files) you must select the ‘Extract also from PDF’
checkbox
Url Extractor allows you to specify when to stop web extraction.
Considering that cross link navigation allows the software to extract (if starting with a
page with many links) a great amount of data, this is required to limit the amount of
data that the program can download from the web when not requested and the time
of the total extraction process.
You can specify the following stop extraction events:
No Limits: The program will stop extract when it run's out of page to extract or
because the user pressed the 'Stop Extraction' button.
URL Extractor User Guide © 2006-2016 Tension Software
page !12