From Fedora Project Wiki
(Created page with "= Sphinx = * [http://www.mediawiki.org/wiki/Extension:SphinxSearch This page is helpful] and I used their config and modified it to our needs. * The Sphinx indexer simply runs o...") |
(→Xapian) |
||
(4 intermediate revisions by the same user not shown) | |||
Line 4: | Line 4: | ||
* The Sphinx indexer simply runs on a cron, so that part is simple. | * The Sphinx indexer simply runs on a cron, so that part is simple. | ||
* As far as front end, we are going to look at packaging the above linked MW extension. | * As far as front end, we are going to look at packaging the above linked MW extension. | ||
** The extension depends on sphinxapi.php, which is in the libsphinxclient package, at | ** The extension depends on sphinxapi.php, which is in the libsphinxclient package, at '''/usr/share/doc/libsphinxclient-0.9.9/sphinxapi.php'''. | ||
** The extension does not seem to work with MW 1.16, but we want to upgrade eventually anyway. | ** The extension does not seem to work with MW 1.16, but we want to upgrade eventually anyway. | ||
* '''Sphinx does not crawl, it only indexes databases, which kind of defeats the purpose for us.''' | |||
= Xapian = | |||
* Doesn't have a crawler built in. | |||
* Most stuff is done via Omega, Xapian just backs it. | |||
* Hacky way to crawl sites: Crawl with htdig, convert into a format omega understands and can index. | |||
* htdig is unsupported and '''OLD'''. | |||
* htdig seems to segfault on https sites in my testing. | |||
* Omega's default UI is '''ugly''' but that is changeable. | |||
= Mnogosearch = | |||
* [http://www.mnogosearch.org/ Link] | |||
* Looks nice. Has a somewhat nice UI, and is customizable. | |||
* Built in crawler, with a default 1000 line (with comments) config file. | |||
* CGI barfs when there are results: [http://mnogosearch.org/bugs/index.php?id=19129 bug 19129] and [http://mnogosearch.org/bugs/index.php?id=19141 bug 19141] upstream. | |||
** Being able to view results might be important, in a search engine. :) | |||
= Others to try = | |||
* Apache Lucene (with Apache Nutch to crawl). | |||
** Heavily relies Java so probably out of the question (Lucene is Java, Nutch is a Tomcat servlet. Nuff said.) | |||
* [http://www.dataparksearch.org/ Datapark Search] | |||
** Fork of Mnogosearch? | |||
** Written in C. | |||
* ASPseek | |||
** C++ | |||
** Last copyright year on [http://www.aspseek.org/ their site] is 2003. Is it unmaintained? |
Latest revision as of 20:35, 9 February 2012
Sphinx
- This page is helpful and I used their config and modified it to our needs.
- The Sphinx indexer simply runs on a cron, so that part is simple.
- As far as front end, we are going to look at packaging the above linked MW extension.
- The extension depends on sphinxapi.php, which is in the libsphinxclient package, at /usr/share/doc/libsphinxclient-0.9.9/sphinxapi.php.
- The extension does not seem to work with MW 1.16, but we want to upgrade eventually anyway.
- Sphinx does not crawl, it only indexes databases, which kind of defeats the purpose for us.
Xapian
- Doesn't have a crawler built in.
- Most stuff is done via Omega, Xapian just backs it.
- Hacky way to crawl sites: Crawl with htdig, convert into a format omega understands and can index.
- htdig is unsupported and OLD.
- htdig seems to segfault on https sites in my testing.
- Omega's default UI is ugly but that is changeable.
Mnogosearch
- Link
- Looks nice. Has a somewhat nice UI, and is customizable.
- Built in crawler, with a default 1000 line (with comments) config file.
- CGI barfs when there are results: bug 19129 and bug 19141 upstream.
- Being able to view results might be important, in a search engine. :)
Others to try
- Apache Lucene (with Apache Nutch to crawl).
- Heavily relies Java so probably out of the question (Lucene is Java, Nutch is a Tomcat servlet. Nuff said.)
- Datapark Search
- Fork of Mnogosearch?
- Written in C.
- ASPseek
- C++
- Last copyright year on their site is 2003. Is it unmaintained?