From Fedora Project Wiki
m (real number is 9887 bad links for second category)
m (link typo)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
The Fedora Wiki has a lot of content : it exists since 16:24, 24 May 2008 !
When you are using a wiki and want to link to an other wiki page, we should use wiki links and not full URL. [https://meta.wikimedia.org/wiki/Help:Link mediawiki's link documentation]
For example : https://fedoraproject.org/wiki/Lohit_fonts should be [[Lohit fonts]]
Why is it important ? To use this features :
* https://meta.wikimedia.org/wiki/Help:Related_changes : show every changes to pages link into the current page
* https://www.mediawiki.org/wiki/Help:What_links_here : gives the possibility to know every pages related to the current page
* https://www.mediawiki.org/wiki/Help:Linksearch : search external link, here it is more to clean fake external link
* All broken internal links for pages/categories/templates : [[Special:SpecialPages]]' most wanted
== What have I done ? ==
== What have I done ? ==


I cleaned almost all external links pointing to wiki to internal links.
The Fedora Wiki has a lot of content : it exists since 16:24, 24 May 2008 ! But a lot of content is old and there is a global lack of structure/gardening.


== How ? ==
My biggest problem is to clean L10N pages, but it was difficult to have a comprehension of what links where because of badly written links.


I extracted the full list :
I decided to clean almost all external links pointing to wiki to internal links.
* https://fedoraproject.org/w/index.php?Special:LinkSearch&limit=5000&offset=0&target=https%3A%2F%2Ffedoraproject.org%2Fwiki%2F ~2000 links
* https://fedoraproject.org/w/index.php?Special:LinkSearch&limit=5000&offset=0&target=http%3A%2F%2Ffedoraproject.org%2Fwiki%2F ~10000 links


I talked with Wikipedia administrators (thank you Saper) who suggested me to use this Python bot https://www.mediawiki.org/wiki/Manual:Pywikibot/replace.py
I talked with Wikipedia administrators (thank you Saper) who suggested me to use this Python bot https://www.mediawiki.org/wiki/Manual:Pywikibot/replace.py


=== Configuration ===
== Pywikibot configuration ==


1. Get lated git version of pybox : https://github.com/wikimedia/pywikibot-core/
1. Get lated git version of pybox : https://github.com/wikimedia/pywikibot-core/
Line 58: Line 44:
  git clone https://gerrit.wikimedia.org/r/pywikibot/i18n.git
  git clone https://gerrit.wikimedia.org/r/pywikibot/i18n.git


== Command lines / regex ==
== Replace external link to internal link ==
 
When you are using a wiki and want to link to an other wiki page, we should use wiki links and not full URL. [https://meta.wikimedia.org/wiki/Help:Link mediawiki's link documentation]
 
For example : https://fedoraproject.org/wiki/Lohit_fonts should be [[Lohit fonts]]
 
Why is it important ? To use this features :
* https://meta.wikimedia.org/wiki/Help:Related_changes : show every changes to pages link into the current page
* https://www.mediawiki.org/wiki/Help:What_links_here : gives the possibility to know every pages related to the current page
* https://www.mediawiki.org/wiki/Help:Linksearch : search external link, here it is more to clean fake external link
* All broken internal links for pages/categories/templates : [[Special:SpecialPages]]' most wanted
 
=== Detect errors ===
 
Extract the full list :
* https://fedoraproject.org/w/index.php?title=Special:LinkSearch&limit=5000&offset=0&target=https%3A%2F%2Ffedoraproject.org%2Fwiki%2F ~2000 links
* https://fedoraproject.org/w/index.php?title=Special:LinkSearch&limit=5000&offset=5000&target=http%3A%2F%2Ffedoraproject.org%2Fwiki%2F ~10000 links
 
Then build a text file with each page name (each page name should be written that way : <nowiki>[[page name]]</nowiki>, I used libreoffice to do that).
 
=== Command lines / regex ===


The command line is pretty easy (I use a txt file with list of pages) :
The command line is pretty easy (I use a txt file with list of pages) :
Line 75: Line 81:
  <nowiki>"https://fedoraproject\.org/wiki/([^ \r\n]+)" "[[\1]]"</nowiki>
  <nowiki>"https://fedoraproject\.org/wiki/([^ \r\n]+)" "[[\1]]"</nowiki>


== Obvious use-case : migration to pagure ==
=== Obvious use-case : migration to pagure ===


Only for i18n 488 links use old trac instance https://fedoraproject.org/w/index.php?title=Special:LinkSearch&limit=500&offset=0&target=https%3A%2F%2Ffedorahosted.org%2Fi18n
Only for i18n 488 links use old trac instance https://fedoraproject.org/w/index.php?title=Special:LinkSearch&limit=500&offset=0&target=https%3A%2F%2Ffedorahosted.org%2Fi18n
Line 82: Line 88:


  <nowiki>"https://fedorahosted\.org/i18n/ticket/([^ \r\n\.:]+)" "https://pagure.io/i18n/issue/\1"</nowiki>
  <nowiki>"https://fedorahosted\.org/i18n/ticket/([^ \r\n\.:]+)" "https://pagure.io/i18n/issue/\1"</nowiki>
== Language template cleanup ==
Correct usage :
* Page L10N is the main page, it as this template <nowiki>{{autolang|base=yes}</nowiki>. This creates a template LANG/L10N with an automatic list of translated language.
** Page L10N/fr is the french translation, it has <nowiki>{{autolang}}</nowiki>
=== Detect errors ===
<nowiki>$ python3 pwb.py listpages.py -search -namespace:template -titleregex:"(/zh_TW|/zh_HK|/zh_CN|/yo|/wba|/ur|/uk|/tw|/tr|/tg|/te|/ta|/sv|/sr/sr@latin|/sq|/sk|/sl|/si|/ru|/ro|/pt_br|/pt|/pl|/pa|/or|/no/nb/nn|/nl|/ne|/nds|/ms|/mr|/mn|/ml|/mk|/mai|/lt|/lv|/ky|/kw/kw-GB/kw-uccor/kw-ucrcor/kw-kkcor|/ko|/kn|/kk|/km|/ka|/ja|/it|/is|/id|/ia|/hu|/hi|/he|/gu|/gn|/gl|/ga|/fr|/fi|/fa|/eu|/et|/es|/eo|/en_GB|/el|/de|/da|/cy|/cs|/ca|/bs|/brx|/br|/bo|/bn_IN|/bn|/bg|/bal|/ast|/as|/ar)+$"
What do you want to search for? Lang</nowiki>
=== Solve errors ===
<nowiki>python pwb.py replace.py -file:templates.txt -regex -summary:"lang template cleanup" "{{autolang\|base=yes}}" "{{autolang}}"</nowiki>
Then, how to cleanup useless templates :
Search in """Special:UnusedTemplates""" for templates starting with Lang and ending with /LANGUAGE_CODE

Latest revision as of 17:15, 19 September 2016

What have I done ?

The Fedora Wiki has a lot of content : it exists since 16:24, 24 May 2008 ! But a lot of content is old and there is a global lack of structure/gardening.

My biggest problem is to clean L10N pages, but it was difficult to have a comprehension of what links where because of badly written links.

I decided to clean almost all external links pointing to wiki to internal links.

I talked with Wikipedia administrators (thank you Saper) who suggested me to use this Python bot https://www.mediawiki.org/wiki/Manual:Pywikibot/replace.py

Pywikibot configuration

1. Get lated git version of pybox : https://github.com/wikimedia/pywikibot-core/

2. Create """fedora_family.py""" in folder pywikibot/families/ with this content :

# -*- coding: utf-8  -*-
"""Family module for Fedora wiki."""
#
# (C) Pywikibot team, 2016
#
# Distributed under the terms of the MIT license.
#

from pywikibot import family


# The project wiki of the Fedora Project
class Family(family.SingleSiteFamily):

    """Family class for Fedora Project wiki."""

    name = 'fedora'
    domain = 'fedoraproject.org'
    code = 'en'

    def protocol(self, code):
        """Return https as the protocol for this family."""
        return "https"

3. go to scripts folder and add i18n content :

git clone https://gerrit.wikimedia.org/r/pywikibot/i18n.git

Replace external link to internal link

When you are using a wiki and want to link to an other wiki page, we should use wiki links and not full URL. mediawiki's link documentation

For example : https://fedoraproject.org/wiki/Lohit_fonts should be Lohit fonts

Why is it important ? To use this features :

Detect errors

Extract the full list :

Then build a text file with each page name (each page name should be written that way : [[page name]], I used libreoffice to do that).

Command lines / regex

The command line is pretty easy (I use a txt file with list of pages) :

python pwb.py replace.py -file:pages.txt -regex -summary:"internal link cleaning" -regex "PART ONE" "PART TWO"

Step 0 : clean badly written links mixing internal and external [[ http://fedoraproject.org/wiki/My_page text ]]

"\[\[http(s|)://fed([^\]]+)\]\]" "[https://fed\2]"

Step 1 : convert external links that have alternative text (example: [https://fedoraproject.org/wiki/L10N/Teams/LowGerman Low German])

"\[https://fedoraproject\.org/wiki/([^] ]*?) ([^]]+)\]" "[[\1|\2]]"

Step 2 : convert normal external links (example: [https://fedoraproject.org/wiki/L10N/Teams/LowGerman])

"\[https://fedoraproject\.org/wiki/([^] ]*?)\]" "[[\1]]"

Step 3 : convert minimalist external links (example: https://fedoraproject.org/wiki/L10N/Teams/LowGerman)

"https://fedoraproject\.org/wiki/([^ \r\n]+)" "[[\1]]"

Obvious use-case : migration to pagure

Only for i18n 488 links use old trac instance https://fedoraproject.org/w/index.php?title=Special:LinkSearch&limit=500&offset=0&target=https%3A%2F%2Ffedorahosted.org%2Fi18n

To clean it, we can use this regex :

"https://fedorahosted\.org/i18n/ticket/([^ \r\n\.:]+)" "https://pagure.io/i18n/issue/\1"

Language template cleanup

Correct usage :

  • Page L10N is the main page, it as this template {{autolang|base=yes}. This creates a template LANG/L10N with an automatic list of translated language.
    • Page L10N/fr is the french translation, it has {{autolang}}

Detect errors

$ python3 pwb.py listpages.py -search -namespace:template -titleregex:"(/zh_TW|/zh_HK|/zh_CN|/yo|/wba|/ur|/uk|/tw|/tr|/tg|/te|/ta|/sv|/sr/sr@latin|/sq|/sk|/sl|/si|/ru|/ro|/pt_br|/pt|/pl|/pa|/or|/no/nb/nn|/nl|/ne|/nds|/ms|/mr|/mn|/ml|/mk|/mai|/lt|/lv|/ky|/kw/kw-GB/kw-uccor/kw-ucrcor/kw-kkcor|/ko|/kn|/kk|/km|/ka|/ja|/it|/is|/id|/ia|/hu|/hi|/he|/gu|/gn|/gl|/ga|/fr|/fi|/fa|/eu|/et|/es|/eo|/en_GB|/el|/de|/da|/cy|/cs|/ca|/bs|/brx|/br|/bo|/bn_IN|/bn|/bg|/bal|/ast|/as|/ar)+$"
What do you want to search for? Lang

Solve errors

python pwb.py replace.py -file:templates.txt -regex -summary:"lang template cleanup" "{{autolang\|base=yes}}" "{{autolang}}"

Then, how to cleanup useless templates :

Search in """Special:UnusedTemplates""" for templates starting with Lang and ending with /LANGUAGE_CODE