From Fedora Project Wiki

Revision as of 22:40, 11 October 2009 by Akistler (talk | contribs) (→‎Software Investigation and Evaluation: Preliminary result from looking more closely at RigorousSearch documentation)

Project Sponsor

Name: Mike McGrath
Fedora Account Name: mmcgrath
Group: Infrastructure
Infrastructure Sponsor: mmcgrath

Secondary Contact info

Name: Huzaifa Sidhpurwala
Fedora Account Name: huzaifas
Group: Infrastructure

Name: Allen Kistler
Fedora Account Name: akistler
Group: Infrastructure

Name: Keiran Smith
Fedora Account Name: affix
Group: Infrastructure

Project Info

Project Name: Search Engine Enhancement
Target Audience: All users of Fedora web sites
Expiration/Delivery Date (required): F13

Description/Summary: Fedora needs a search engine[1]

Requirements:

  • Crawl the web sites (wiki and non-wiki)
  • Search the web sites (wiki and non-wiki)

Preferences:

  • Python-based (no Java)
  • Programmable keywords to have control over what pages get displayed for certain keywords
  • XML or library interface so other applications can use it

Project plan (Detailed):

  1. Investigate and evaluate existing open source search engines
  2. Select candidate software
  3. Create public test instance of candidate software
  4. Test for functionality, performance, and impact (re-evaluate, if necessary)
  5. Create capacity and deployment plans
  6. Deploy

Specific resources needed

  • Public Test for testing candidate software
  • Permanent home(s) for deployment
    • Web server(s)
    • Database server(s)

Software Investigation and Evaluation

  • HtdigSearch [2]
Huzaifa (in progress)
  • SphinxSearch [3]
Huzaifa (in progress)
Not suitable
  • MWSearch requires EzMwLucene to be running on the servers to be searched. EzMwLucene is Java, therefore not preferable.
  • MWSearch is a client to EzMwLucene, which is wiki-only, therefore MWSearch is wiki-only.
  • RigorousSearch [5]
Not suitable
RigorousSearch crawls the MediaWiki database, not the web site. It doesn't work for non-MediaWiki web sites, including any non-wiki web site.

Public Testing

<tbd>

Deployment Plan

<tbd>

References