ISYS Spider 1.5
ISYS Spider is an add-on to the well-known ISYS indexing engine. It’s designed to index the contents of Web domains that aren’t available to the vanilla ISYS engine, such as the contents of remote Web sites, remote intranet servers or Lotus Domino servers. It works like the spiders used by the big commercial search providers like AltaVista and HotBot: an agent follows links from Web sites and indexes the pages that it passes through. In the case of ISYS Spider, you determine which domains you want to Index (for example, http://apcmag.com) and it will go through and create an index of all the pages accessible from the site’s main page. If desired, it can even follow links off-site.
Getting Spider up and running is as easy as entering the starting URL. setting up options for static or dynamic pages and specifying the depth to which the Spider should index. There’s also an option for Spider to ignore the ROBOTS.TXT file used to flag sites where Robots Exclusion Protocol de facto standard is in force.
Setting the Spider to pull in multiple threads speeds up the crawl; despite this, indexing is a bandwidth and server-intensive process (think of having to download every single page on some of the larger sites!). Sites using a low-end or marginal server, or those with bandwidth constrictions should perform the crawl during low demand periods or out of hours. Fortunately, Spider can be set up to perform automatic crawls at pre-specified periods according to preset criteria. Still, it is best suited to intranet use.
Once the crawl has been completed, the index can be rolled into a traditional ISYS index and accessed by the ISYS Query tool. The alternative method, and the one most suited to Spider, is accessing the Index using ISYS Web, an HTTP server and Web front end to ISYS Query. Either method generally works well, but note that Lotus Domino servers are particularly troublesome to index with ISYS Spider.
Work also needs to be done on Spider’s user interface — its Visual Basic/Windows 3.1 ancestry is particularly noticeable at certain points, and although it is usable, commercial product interfaces should display a higher degree of integration. Online documentation and help is also lacking.
Overall, Spider is great for raw HTML and some types of dynamic sites. Webmasters of such sites looking for an easy indexing capability should look no further. Its level of configurability is commendable. Work, however, needs to be done to make it easier to use for dynamic and database-driven sites.