m (→Not Suitable: re-sort alphabetically) |
(→In Progress: Added investigation assignments) |
||
Line 61: | Line 61: | ||
=== In Progress === | === In Progress === | ||
* KinoSearch <ref name="KinoSearch">{{cite web|url=http://www.rectangular.com/kinosearch/|title=KinoSearch|publisher=Rectangular Research}}</ref> | * KinoSearch <ref name="KinoSearch">{{cite web|url=http://www.rectangular.com/kinosearch/|title=KinoSearch|publisher=Rectangular Research}}</ref> '''- Allen investigating''' | ||
: Perl port of Lucene | : Perl port of Lucene | ||
* Namazu <ref name="Namazu">{{cite web|url=http://www.namazu.org/|title=Namazu|publisher=Namazu Project}}</ref> | * Namazu <ref name="Namazu">{{cite web|url=http://www.namazu.org/|title=Namazu|publisher=Namazu Project}}</ref> '''- Huzaifa investigating''' | ||
: written in Perl | : written in Perl | ||
* OpenFTS <ref name="OpenFTS">{{cite web|url=http://openfts.sourceforge.net/|title=OpenFTS|publisher=SourceForge}}</ref> | * OpenFTS <ref name="OpenFTS">{{cite web|url=http://openfts.sourceforge.net/|title=OpenFTS|publisher=SourceForge}}</ref> '''- Huzaifa investigating''' | ||
:* written in Perl or TCL on top of PostgreSQL | :* written in Perl or TCL on top of PostgreSQL | ||
:* Python interface available | :* Python interface available | ||
:* not actively maintained | :* not actively maintained | ||
* Plucene <ref name="Plucene">{{cite web|url=http://search.cpan.org/~tmtm/Plucene-1.25|title=Plucene|publisher=CPAN}}</ref> | * Plucene <ref name="Plucene">{{cite web|url=http://search.cpan.org/~tmtm/Plucene-1.25|title=Plucene|publisher=CPAN}}</ref> '''- Allen investigating''' | ||
:* Perl port of Lucene | :* Perl port of Lucene | ||
:* not actively maintained | :* not actively maintained |
Revision as of 21:38, 14 October 2009
Points of Contact
Project Sponsor
Name: Mike McGrath
Fedora Account Name: mmcgrath
Group: Infrastructure
Infrastructure Sponsor: mmcgrath
Secondary Contact info
Name: Huzaifa Sidhpurwala
Fedora Account Name: huzaifas
Group: Infrastructure
Name: Allen Kistler
Fedora Account Name: akistler
Group: Infrastructure
Name: Keiran Smith
Fedora Account Name: affix
Group: Infrastructure
Project Info
Project Name: Search Engine Enhancement
Target Audience: All users of Fedora web sites
Expiration/Delivery Date (required): F13
Description/Summary
Fedora needs a search engine[1]
Requirements
- Crawl the web sites (wiki and non-wiki)
- Search the web sites (wiki and non-wiki)
Preferences
- Python-based (no Java)
- Programmable keywords to have control over what pages get displayed for certain keywords
- XML or library interface so other applications can use it
Project Plan
- Investigate and evaluate existing open source search engines
- Select candidate software
- Create public test instance of candidate software
- Test for functionality, performance, and impact (re-evaluate, if necessary)
- Create capacity and deployment plans
- Deploy
Resources Needed
- Public Test for testing candidate software
- Permanent home(s) for deployment
- Web server(s)
- Database server(s)
Software Investigation and Evaluation
In Progress
- KinoSearch [2] - Allen investigating
- Perl port of Lucene
- Namazu [3] - Huzaifa investigating
- written in Perl
- OpenFTS [4] - Huzaifa investigating
- written in Perl or TCL on top of PostgreSQL
- Python interface available
- not actively maintained
- Plucene [5] - Allen investigating
- Perl port of Lucene
- not actively maintained
Not Suitable
- DataparkSearch [6]
- written in C
- Egothor [7]
- written in Java
- Grub [8]
- written in C#
- Heritrix [9]
- written in Java
- archives content rather than simply indexing it
- ht://Dig [10]
- written in C++
- not actively maintained
- HtdigSearch [11]
- It's just a MediaWiki plugin, not suitable for searching non-wiki sites
- Indri [12]
- written in C/C++
- Isearch [13]
- written in C++
- Lucene [14]
- written in Java, but ported to others [15]
- mnoGoSearch [16]
- written in C
- MWSearch [17]
- Requires EzMwLucene (Java, not desirable) to be running on the servers to be searched
- EzMwLucene is wiki-only, therefore MWSearch is wiki-only
- Nutch [18]
- written in Java
- based on Lucene
- RigorousSearch [19]
- Crawls the MediaWiki database, not the web site
- Doesn't work for non-MediaWiki web sites, including any non-wiki web site
- Sphinx [20]
- written in C++
- SphinxSearch [21]
- Written in C++
- Swish-e [22]
- written in C
- Swish++ is a rewrite in C++
- Terrier (TERabyte RetrIEveR) [23]
- written in Java
- Xapian [24]
- written in C++
- YaCy [25]
- written in C
- Zettair [26]
- written in C
Public Testing
<tbd>
Deployment Plan
<tbd>
References
- ↑ "Fedora Search Engine". Infrastructure/Tickets. https://fedorahosted.org/fedora-infrastructure/ticket/1055.
- ↑ "KinoSearch". Rectangular Research. http://www.rectangular.com/kinosearch/.
- ↑ "Namazu". Namazu Project. http://www.namazu.org/.
- ↑ "OpenFTS". SourceForge. http://openfts.sourceforge.net/.
- ↑ "Plucene". CPAN. http://search.cpan.org/~tmtm/Plucene-1.25.
- ↑ "DataparkSearch". DataparkSearch. http://www.dataparksearch.org/.
- ↑ "Egothor". Egothor. http://www.egothor.org/.
- ↑ "Grub". Wikia, Inc.. http://grub.org/.
- ↑ "Heritrix". Internet Archive. http://crawler.archive.org/.
- ↑ "ht://Dig". The ht://Dig Group. http://www.htdig.org/.
- ↑ "HtdigSearch Extension". MediaWiki. https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:HtdigSearch.
- ↑ "Indri". The Lemur Project. http://www.lemurproject.org/indri/.
- ↑ "Isearch". Isite. http://isite.awcubed.com/.
- ↑ "Lucene". Apache Software Foundation. http://lucene.apache.org/.
- ↑ "Lucene Implementations". Apache Software Foundation. http://wiki.apache.org/lucene-java/LuceneImplementations.
- ↑ "mnoGoSearch". LavTech. http://www.mnogosearch.org/.
- ↑ "MWSearch Extension". MediaWiki. https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:MWSearch.
- ↑ "Nutch". Apache Software Foundation. http://lucene.apache.org/nutch/.
- ↑ "RigorousSearch Extension". MediaWiki. https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:RigorousSearch.
- ↑ "Sphinx". Sphinx Technologies. http://sphinxsearch.com/.
- ↑ "SphinxSearch Extension". MediaWiki. https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:SphinxSearch.
- ↑ "Swish-e". Swish-e. http://swish-e.org/.
- ↑ "Terrier". Terrier Project. http://ir.dcs.gla.ac.uk/terrier/.
- ↑ "Xapian". Xapian Project. http://xapian.org/.
- ↑ "YaCy". Karlsruhe Institute of Technology. http://yacy.net/.
- ↑ "Zettair". Search Engine Group, Royal Melbourne Institute of Technology. http://www.seg.rmit.edu.au/zettair/.