(→Software Investigation and Evaluation: Preliminary result from looking more closely at RigorousSearch documentation) |
(→Software Investigation and Evaluation: Added quick search results for additional candidates) |
||
Line 64: | Line 64: | ||
: Not suitable | : Not suitable | ||
: RigorousSearch crawls the MediaWiki database, not the web site. It doesn't work for non-MediaWiki web sites, including any non-wiki web site. | : RigorousSearch crawls the MediaWiki database, not the web site. It doesn't work for non-MediaWiki web sites, including any non-wiki web site. | ||
* DataparkSearch | |||
: written in C | |||
* Egothor | |||
: written in Java | |||
* Gonzui (specializes in source code search) | |||
:* ? written in Ruby | |||
:* ? not actively maintained | |||
* Grub | |||
: ? written in C# | |||
* Ht://dig | |||
:* written in C++ | |||
:* not actively maintained | |||
* Isearch | |||
: written in C++ | |||
* Lucene | |||
:* originally in Java, ported to others | |||
:* Perl ports are Plucene and KinoSearch | |||
:* Ruby port is Ferret | |||
:* see http://wiki.apache.org/lucene-java/LuceneImplementations | |||
* Lemur Toolkit & Indri Search Engine | |||
:* written in C/C++ | |||
:* not really a search engine, more like a toolkit | |||
* mnoGoSearch | |||
: written in C | |||
* Namazu | |||
: written in Perl | |||
* Nutch | |||
:* written in Java | |||
:* based on Lucene | |||
* OpenFTS | |||
:* written in Perl or TCL on top of PostgreSQL | |||
:* Python interface available | |||
:* not actively maintained | |||
* Sciencenet (for scientific knowledge, based on YaCy technology) | |||
: written in Java | |||
* Sphinx | |||
: written in C++ | |||
* SWISH-E | |||
: written in C | |||
* Terrier Search Engine | |||
: written in Java | |||
* Wikia Search | |||
: shut down | |||
* Xapian | |||
: written in C++ | |||
* YaCy | |||
: written in C | |||
* Zettair | |||
: written in C | |||
== Public Testing == | == Public Testing == |
Revision as of 00:24, 12 October 2009
Project Sponsor
Name: Mike McGrath
Fedora Account Name: mmcgrath
Group: Infrastructure
Infrastructure Sponsor: mmcgrath
Secondary Contact info
Name: Huzaifa Sidhpurwala
Fedora Account Name: huzaifas
Group: Infrastructure
Name: Allen Kistler
Fedora Account Name: akistler
Group: Infrastructure
Name: Keiran Smith
Fedora Account Name: affix
Group: Infrastructure
Project Info
Project Name: Search Engine Enhancement
Target Audience: All users of Fedora web sites
Expiration/Delivery Date (required): F13
Description/Summary: Fedora needs a search engine[1]
Requirements:
- Crawl the web sites (wiki and non-wiki)
- Search the web sites (wiki and non-wiki)
Preferences:
- Python-based (no Java)
- Programmable keywords to have control over what pages get displayed for certain keywords
- XML or library interface so other applications can use it
Project plan (Detailed):
- Investigate and evaluate existing open source search engines
- Select candidate software
- Create public test instance of candidate software
- Test for functionality, performance, and impact (re-evaluate, if necessary)
- Create capacity and deployment plans
- Deploy
Specific resources needed
- Public Test for testing candidate software
- Permanent home(s) for deployment
- Web server(s)
- Database server(s)
Software Investigation and Evaluation
- HtdigSearch [2]
- Huzaifa (in progress)
- SphinxSearch [3]
- Huzaifa (in progress)
- MWSearch [4]
- Not suitable
- MWSearch requires EzMwLucene to be running on the servers to be searched. EzMwLucene is Java, therefore not preferable.
- MWSearch is a client to EzMwLucene, which is wiki-only, therefore MWSearch is wiki-only.
- RigorousSearch [5]
- Not suitable
- RigorousSearch crawls the MediaWiki database, not the web site. It doesn't work for non-MediaWiki web sites, including any non-wiki web site.
- DataparkSearch
- written in C
- Egothor
- written in Java
- Gonzui (specializes in source code search)
- ? written in Ruby
- ? not actively maintained
- Grub
- ? written in C#
- Ht://dig
- written in C++
- not actively maintained
- Isearch
- written in C++
- Lucene
- originally in Java, ported to others
- Perl ports are Plucene and KinoSearch
- Ruby port is Ferret
- see http://wiki.apache.org/lucene-java/LuceneImplementations
- Lemur Toolkit & Indri Search Engine
- written in C/C++
- not really a search engine, more like a toolkit
- mnoGoSearch
- written in C
- Namazu
- written in Perl
- Nutch
- written in Java
- based on Lucene
- OpenFTS
- written in Perl or TCL on top of PostgreSQL
- Python interface available
- not actively maintained
- Sciencenet (for scientific knowledge, based on YaCy technology)
- written in Java
- Sphinx
- written in C++
- SWISH-E
- written in C
- Terrier Search Engine
- written in Java
- Wikia Search
- shut down
- Xapian
- written in C++
- YaCy
- written in C
- Zettair
- written in C
Public Testing
<tbd>
Deployment Plan
<tbd>
References
- ↑ "Fedora Search Engine". Infrastructure/Tickets. https://fedorahosted.org/fedora-infrastructure/ticket/1055.
- ↑ "HtdigSearch Extension". MediaWiki. https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:HtdigSearch.
- ↑ "SphinxSearch Extension". MediaWiki. https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:SphinxSearch.
- ↑ "MWSearch Extension". MediaWiki. https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:MWSearch.
- ↑ "RigorousSearch Extension". MediaWiki. https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:RigorousSearch.