m (→Software Investigation and Evaluation: Remove test references tag) |
(Reorganize headers, including breaking Software Investigation and Evaluation into In Progress and Not Suitable) |
||
Line 1: | Line 1: | ||
{{header|infra}} | {{header|infra}} | ||
= Project Sponsor = | = Points of Contact = | ||
=== Project Sponsor === | |||
'''Name:''' Mike McGrath<br> | '''Name:''' Mike McGrath<br> | ||
'''Fedora Account Name:''' mmcgrath<br> | '''Fedora Account Name:''' mmcgrath<br> | ||
Line 7: | Line 9: | ||
'''Infrastructure Sponsor:''' mmcgrath<br> | '''Infrastructure Sponsor:''' mmcgrath<br> | ||
== Secondary Contact info == | === Secondary Contact info === | ||
'''Name:''' Huzaifa Sidhpurwala<br> | '''Name:''' Huzaifa Sidhpurwala<br> | ||
'''Fedora Account Name:''' huzaifas<br> | '''Fedora Account Name:''' huzaifas<br> | ||
Line 25: | Line 27: | ||
'''Expiration/Delivery Date (required):''' F13<br> | '''Expiration/Delivery Date (required):''' F13<br> | ||
Description/Summary | === Description/Summary === | ||
Fedora needs a search engine<ref name="Trac">{{cite web|url=https://fedorahosted.org/fedora-infrastructure/ticket/1055|title=Fedora Search Engine|publisher=[[Infrastructure/Tickets]]}}</ref> | Fedora needs a search engine<ref name="Trac">{{cite web|url=https://fedorahosted.org/fedora-infrastructure/ticket/1055|title=Fedora Search Engine|publisher=[[Infrastructure/Tickets]]}}</ref> | ||
Requirements | === Requirements === | ||
* Crawl the web sites (wiki and non-wiki) | * Crawl the web sites (wiki and non-wiki) | ||
* Search the web sites (wiki and non-wiki) | * Search the web sites (wiki and non-wiki) | ||
Preferences | === Preferences === | ||
* Python-based (no Java) | * Python-based (no Java) | ||
* Programmable keywords to have control over what pages get displayed for certain keywords | * Programmable keywords to have control over what pages get displayed for certain keywords | ||
* XML or library interface so other applications can use it | * XML or library interface so other applications can use it | ||
Project | === Project Plan === | ||
# Investigate and evaluate existing open source search engines | # Investigate and evaluate existing open source search engines | ||
# Select candidate software | # Select candidate software | ||
Line 45: | Line 50: | ||
# Deploy | # Deploy | ||
== | === Resources Needed === | ||
* Public Test for testing candidate software | * Public Test for testing candidate software | ||
* Permanent home(s) for deployment | * Permanent home(s) for deployment | ||
Line 52: | Line 58: | ||
== Software Investigation and Evaluation == | == Software Investigation and Evaluation == | ||
=== In Progress === | |||
* HtdigSearch <ref name="HtdigSearch">{{cite web|url=https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:HtdigSearch|title=HtdigSearch Extension|publisher=[[MediaWiki]]}}</ref> | * HtdigSearch <ref name="HtdigSearch">{{cite web|url=https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:HtdigSearch|title=HtdigSearch Extension|publisher=[[MediaWiki]]}}</ref> | ||
Line 80: | Line 88: | ||
* Plucene | * Plucene | ||
: Perl port of Lucene | : Perl port of Lucene | ||
=== Not Suitable === | |||
* DataparkSearch <ref name="DataparkSearch">{{cite web|url=http://www.dataparksearch.org/|title=DataparkSearch|publisher=DataparkSearch}}</ref> | * DataparkSearch <ref name="DataparkSearch">{{cite web|url=http://www.dataparksearch.org/|title=DataparkSearch|publisher=DataparkSearch}}</ref> | ||
: written in C | : written in C | ||
* Egothor <ref name="Egothor">{{cite web|url=http://www.egothor.org/|title=Egothor|publisher=Egothor}}</ref> | * Egothor <ref name="Egothor">{{cite web|url=http://www.egothor.org/|title=Egothor|publisher=Egothor}}</ref> | ||
: written in Java | : written in Java | ||
* Grub <ref name="Grub">{{cite web|url=http://grub.org/|title=Grub|publisher=Wikia, Inc.}}</ref> | * Grub <ref name="Grub">{{cite web|url=http://grub.org/|title=Grub|publisher=Wikia, Inc.}}</ref> | ||
: written in C# | : written in C# | ||
* ht://dig <ref name="htDig">{{cite web|url=http://www.htdig.org/|title=ht://Dig|publisher=The ht://Dig Group}}</ref> | * ht://dig <ref name="htDig">{{cite web|url=http://www.htdig.org/|title=ht://Dig|publisher=The ht://Dig Group}}</ref> | ||
:* written in C++ | :* written in C++ | ||
:* not actively maintained | :* not actively maintained | ||
* Indri <ref name="Indri">{{cite web|url=http://www.lemurproject.org/indri/|title=Indri|publisher=The Lemur Project}}</ref> | * Indri <ref name="Indri">{{cite web|url=http://www.lemurproject.org/indri/|title=Indri|publisher=The Lemur Project}}</ref> | ||
: written in C/C++ | : written in C/C++ | ||
* Isearch <ref name="Isearch">{{cite web|url=http://isite.awcubed.com/|title=Isearch|publisher=Isite}}</ref> | * Isearch <ref name="Isearch">{{cite web|url=http://isite.awcubed.com/|title=Isearch|publisher=Isite}}</ref> | ||
: written in C++ | : written in C++ | ||
* Lucene <ref name="Lucene">{{cite web|url=http://lucene.apache.org/|title=Lucene|publisher=Apache Software Foundation}}</ref> | * Lucene <ref name="Lucene">{{cite web|url=http://lucene.apache.org/|title=Lucene|publisher=Apache Software Foundation}}</ref> | ||
:* originally in Java, ported to others | :* originally in Java, ported to others | ||
:* Perl ports are Plucene and KinoSearch; Ruby port is Ferret | :* Perl ports are Plucene and KinoSearch; Ruby port is Ferret | ||
Line 113: | Line 116: | ||
* mnoGoSearch <ref name="mnoGoSearch">{{cite web|url=http://www.mnogosearch.org/|title=mnoGoSearch|publisher=LavTech}}</ref> | * mnoGoSearch <ref name="mnoGoSearch">{{cite web|url=http://www.mnogosearch.org/|title=mnoGoSearch|publisher=LavTech}}</ref> | ||
: written in C | : written in C | ||
* MWSearch <ref name="MWSearch">{{cite web|url=https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:MWSearch|title=MWSearch Extension|publisher=[[MediaWiki]]}}</ref> | * MWSearch <ref name="MWSearch">{{cite web|url=https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:MWSearch|title=MWSearch Extension|publisher=[[MediaWiki]]}}</ref> | ||
:* Requires EzMwLucene (Java, not desirable) to be running on the servers to be searched | :* Requires EzMwLucene (Java, not desirable) to be running on the servers to be searched | ||
:* EzMwLucene is wiki-only, therefore MWSearch is wiki-only | :* EzMwLucene is wiki-only, therefore MWSearch is wiki-only | ||
* Nutch <ref name="Nutch">{{cite web|url=http://lucene.apache.org/nutch/|title=Nutch|publisher=Apache Software Foundation}}</ref> | * Nutch <ref name="Nutch">{{cite web|url=http://lucene.apache.org/nutch/|title=Nutch|publisher=Apache Software Foundation}}</ref> | ||
:* written in Java | :* written in Java | ||
:* based on Lucene | :* based on Lucene | ||
* RigorousSearch <ref name="RigorousSearch">{{cite web|url=https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:RigorousSearch|title=RigorousSearch Extension|publisher=[[MediaWiki]]}}</ref> | * RigorousSearch <ref name="RigorousSearch">{{cite web|url=https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:RigorousSearch|title=RigorousSearch Extension|publisher=[[MediaWiki]]}}</ref> | ||
: Crawls the MediaWiki database, not the web site. It doesn't work for non-MediaWiki web sites, including any non-wiki web site. | : Crawls the MediaWiki database, not the web site. It doesn't work for non-MediaWiki web sites, including any non-wiki web site. | ||
* Sphinx <ref name="Sphinx">{{cite web|url=http://sphinxsearch.com/|title=Sphinx|publisher=Sphinx Technologies}}</ref> | * Sphinx <ref name="Sphinx">{{cite web|url=http://sphinxsearch.com/|title=Sphinx|publisher=Sphinx Technologies}}</ref> | ||
: written in C++ | : written in C++ | ||
* Swish-e <ref name="Swish-e">{{cite web|url=http://swish-e.org/|title=Swish-e|publisher=Swish-e}}</ref> | * Swish-e <ref name="Swish-e">{{cite web|url=http://swish-e.org/|title=Swish-e|publisher=Swish-e}}</ref> | ||
: written in C | : written in C | ||
: Swish++ is a rewrite in C++ | : Swish++ is a rewrite in C++ | ||
* Terrier (TERabyte RetrIEveR) <ref name="Terrier">{{cite web|url=http://ir.dcs.gla.ac.uk/terrier/|title=Terrier|publisher=Terrier Project}}</ref> | * Terrier (TERabyte RetrIEveR) <ref name="Terrier">{{cite web|url=http://ir.dcs.gla.ac.uk/terrier/|title=Terrier|publisher=Terrier Project}}</ref> | ||
: written in Java | : written in Java | ||
* Xapian <ref name="Xapian">{{cite web|url=http://xapian.org/|title=Xapian|publisher=Xapian Project}}</ref> | * Xapian <ref name="Xapian">{{cite web|url=http://xapian.org/|title=Xapian|publisher=Xapian Project}}</ref> | ||
: written in C++ | : written in C++ | ||
* YaCy <ref name="YaCy">{{cite web|url=http://yacy.net/|title=YaCy|publisher=Karlsruhe Institute of Technology}}</ref> | * YaCy <ref name="YaCy">{{cite web|url=http://yacy.net/|title=YaCy|publisher=Karlsruhe Institute of Technology}}</ref> | ||
: written in C | : written in C | ||
* Zettair <ref name="Zettair">{{cite web|url=http://www.seg.rmit.edu.au/zettair/|title=Zettair|publisher=Search Engine Group, Royal Melbourne Institute of Technology}}</ref> | * Zettair <ref name="Zettair">{{cite web|url=http://www.seg.rmit.edu.au/zettair/|title=Zettair|publisher=Search Engine Group, Royal Melbourne Institute of Technology}}</ref> | ||
: written in C | : written in C | ||
== Public Testing == | == Public Testing == | ||
<tbd> | <tbd> | ||
== Deployment Plan == | == Deployment Plan == | ||
<tbd> | <tbd> | ||
= References = | = References = | ||
{{reflist}} | {{reflist}} | ||
[[Category:Infrastructure]] | [[Category:Infrastructure]] |
Revision as of 22:50, 12 October 2009
Points of Contact
Project Sponsor
Name: Mike McGrath
Fedora Account Name: mmcgrath
Group: Infrastructure
Infrastructure Sponsor: mmcgrath
Secondary Contact info
Name: Huzaifa Sidhpurwala
Fedora Account Name: huzaifas
Group: Infrastructure
Name: Allen Kistler
Fedora Account Name: akistler
Group: Infrastructure
Name: Keiran Smith
Fedora Account Name: affix
Group: Infrastructure
Project Info
Project Name: Search Engine Enhancement
Target Audience: All users of Fedora web sites
Expiration/Delivery Date (required): F13
Description/Summary
Fedora needs a search engine[1]
Requirements
- Crawl the web sites (wiki and non-wiki)
- Search the web sites (wiki and non-wiki)
Preferences
- Python-based (no Java)
- Programmable keywords to have control over what pages get displayed for certain keywords
- XML or library interface so other applications can use it
Project Plan
- Investigate and evaluate existing open source search engines
- Select candidate software
- Create public test instance of candidate software
- Test for functionality, performance, and impact (re-evaluate, if necessary)
- Create capacity and deployment plans
- Deploy
Resources Needed
- Public Test for testing candidate software
- Permanent home(s) for deployment
- Web server(s)
- Database server(s)
Software Investigation and Evaluation
In Progress
- HtdigSearch [2]
- Huzaifa (in progress)
- SphinxSearch [3]
- Huzaifa (in progress)
- Ferret
- Ruby port of Lucene
- Gonzui [4] (specializes in source code search)
- written in Ruby
- not actively maintained
- KinoSearch
- Perl port of Lucene
- Namazu [5]
- written in Perl
- OpenFTS [6]
- Not suitable
- written in Perl or TCL on top of PostgreSQL
- Python interface available
- not actively maintained
- Plucene
- Perl port of Lucene
Not Suitable
- DataparkSearch [7]
- written in C
- Egothor [8]
- written in Java
- Grub [9]
- written in C#
- ht://dig [10]
- written in C++
- not actively maintained
- Indri [11]
- written in C/C++
- Isearch [12]
- written in C++
- Lucene [13]
- originally in Java, ported to others
- Perl ports are Plucene and KinoSearch; Ruby port is Ferret
- see Lucene Implementations [14]
- mnoGoSearch [15]
- written in C
- MWSearch [16]
- Requires EzMwLucene (Java, not desirable) to be running on the servers to be searched
- EzMwLucene is wiki-only, therefore MWSearch is wiki-only
- Nutch [17]
- written in Java
- based on Lucene
- RigorousSearch [18]
- Crawls the MediaWiki database, not the web site. It doesn't work for non-MediaWiki web sites, including any non-wiki web site.
- Sphinx [19]
- written in C++
- Swish-e [20]
- written in C
- Swish++ is a rewrite in C++
- Terrier (TERabyte RetrIEveR) [21]
- written in Java
- Xapian [22]
- written in C++
- YaCy [23]
- written in C
- Zettair [24]
- written in C
Public Testing
<tbd>
Deployment Plan
<tbd>
References
- ↑ "Fedora Search Engine". Infrastructure/Tickets. https://fedorahosted.org/fedora-infrastructure/ticket/1055.
- ↑ "HtdigSearch Extension". MediaWiki. https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:HtdigSearch.
- ↑ "SphinxSearch Extension". MediaWiki. https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:SphinxSearch.
- ↑ "Gonzui". SourceForge. http://gonzui.sourceforge.net/.
- ↑ "Namazu". Namazu Project. http://www.namazu.org/.
- ↑ "OpenFTS". SourceForge. http://openfts.sourceforge.net/.
- ↑ "DataparkSearch". DataparkSearch. http://www.dataparksearch.org/.
- ↑ "Egothor". Egothor. http://www.egothor.org/.
- ↑ "Grub". Wikia, Inc.. http://grub.org/.
- ↑ "ht://Dig". The ht://Dig Group. http://www.htdig.org/.
- ↑ "Indri". The Lemur Project. http://www.lemurproject.org/indri/.
- ↑ "Isearch". Isite. http://isite.awcubed.com/.
- ↑ "Lucene". Apache Software Foundation. http://lucene.apache.org/.
- ↑ "Lucene Implementations". Apache Software Foundation. http://wiki.apache.org/lucene-java/LuceneImplementations.
- ↑ "mnoGoSearch". LavTech. http://www.mnogosearch.org/.
- ↑ "MWSearch Extension". MediaWiki. https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:MWSearch.
- ↑ "Nutch". Apache Software Foundation. http://lucene.apache.org/nutch/.
- ↑ "RigorousSearch Extension". MediaWiki. https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:RigorousSearch.
- ↑ "Sphinx". Sphinx Technologies. http://sphinxsearch.com/.
- ↑ "Swish-e". Swish-e. http://swish-e.org/.
- ↑ "Terrier". Terrier Project. http://ir.dcs.gla.ac.uk/terrier/.
- ↑ "Xapian". Xapian Project. http://xapian.org/.
- ↑ "YaCy". Karlsruhe Institute of Technology. http://yacy.net/.
- ↑ "Zettair". Search Engine Group, Royal Melbourne Institute of Technology. http://www.seg.rmit.edu.au/zettair/.