From Fedora Project Wiki

Revision as of 10:36, 18 September 2016 by Jibecfed (talk | contribs) (real number is 9887 bad links for second category)

The Fedora Wiki has a lot of content : it exists since 16:24, 24 May 2008 !

When you are using a wiki and want to link to an other wiki page, we should use wiki links and not full URL. mediawiki's link documentation

For example : https://fedoraproject.org/wiki/Lohit_fonts should be Lohit fonts

Why is it important ? To use this features :

What have I done ?

I cleaned almost all external links pointing to wiki to internal links.

How ?

I extracted the full list :

I talked with Wikipedia administrators (thank you Saper) who suggested me to use this Python bot https://www.mediawiki.org/wiki/Manual:Pywikibot/replace.py

Configuration

1. Get lated git version of pybox : https://github.com/wikimedia/pywikibot-core/

2. Create """fedora_family.py""" in folder pywikibot/families/ with this content :

# -*- coding: utf-8  -*-
"""Family module for Fedora wiki."""
#
# (C) Pywikibot team, 2016
#
# Distributed under the terms of the MIT license.
#

from pywikibot import family


# The project wiki of the Fedora Project
class Family(family.SingleSiteFamily):

    """Family class for Fedora Project wiki."""

    name = 'fedora'
    domain = 'fedoraproject.org'
    code = 'en'

    def protocol(self, code):
        """Return https as the protocol for this family."""
        return "https"

3. go to scripts folder and add i18n content :

git clone https://gerrit.wikimedia.org/r/pywikibot/i18n.git

Command lines / regex

The command line is pretty easy (I use a txt file with list of pages) :

python pwb.py replace.py -file:pages.txt -regex -summary:"internal link cleaning" -regex "PART ONE" "PART TWO"

Step 0 : clean badly written links mixing internal and external [[ http://fedoraproject.org/wiki/My_page text ]]

"\[\[http(s|)://fed([^\]]+)\]\]" "[https://fed\2]"

Step 1 : convert external links that have alternative text (example: [https://fedoraproject.org/wiki/L10N/Teams/LowGerman Low German])

"\[https://fedoraproject\.org/wiki/([^] ]*?) ([^]]+)\]" "[[\1|\2]]"

Step 2 : convert normal external links (example: [https://fedoraproject.org/wiki/L10N/Teams/LowGerman])

"\[https://fedoraproject\.org/wiki/([^] ]*?)\]" "[[\1]]"

Step 3 : convert minimalist external links (example: https://fedoraproject.org/wiki/L10N/Teams/LowGerman)

"https://fedoraproject\.org/wiki/([^ \r\n]+)" "[[\1]]"

Obvious use-case : migration to pagure

Only for i18n 488 links use old trac instance https://fedoraproject.org/w/index.php?title=Special:LinkSearch&limit=500&offset=0&target=https%3A%2F%2Ffedorahosted.org%2Fi18n

To clean it, we can use this regex :

"https://fedorahosted\.org/i18n/ticket/([^ \r\n\.:]+)" "https://pagure.io/i18n/issue/\1"