Swish-e open xml indexing

http://swish-e.org/
Swish-e is an open source indexer/search engine. It excels at indexing
(X)HTML files, but indexes plain text and XML files almost as easily.
It comes with C, PHP, and Perl API’s, and it runs under (over?) Unix as
well as Window’s operating systems.
I am/will be using swish-e as the underlying indexer for searches
against TEI documents. Specifically, I have been marking sets of
literature up in TEI. I then convert the sets into a number of formats
such as plain text, XHTML, PDF, various Palm flavors, etc. I then use
swish-e to index the XHTML because swish-e does makes it easy to pull
out the meta tags of HTML head elements and make them field searchable
as well as the body of the text being free-text searchable. I could
have almost as easily indexed the raw TEI files, then then I have to
deal with transforming the XML before it gets to the browser. (“I know.
There are many ways to do that.”). See:
http://infomotions.com/alex2/
I have also been fiddling with Plucene, a Perl port of Lucene, a
Java-based indexer/search engine library:
http://search.cpan.org/dist/Plucene/
Unlike swish-e, Lucene/Plucene are libraries. Swish-e is a
indexer/search engine binary as well as a library.

This entry was posted in Techow. Bookmark the permalink.

Leave a Reply

Your email address will not be published.