Plan of attack
Steps 1 through 4 are the ones I'll focus on. Steps 5 and 6 are for the case where I finish everything else in record time, have no bugs left to fix and have documented everything: a common scenario in the IT business.
All throghout this project I'm planning on working as closely as possible with the moinmoin developers to ensure my contributions get accepted in to moinmoin-proper.
All development is done agains the devlopment branch of moinmoin. This means that the fedora wiki will need to be upgraded when the version comes out. Backporting is probably not sencible, since it is likely that a lot of small issues around the code have been fixed to improve compatability with different formatters.
Step 1: Generate valid Docbook
Currently there are a few problems with the generated Docbook, aka bugs. The first step is to make sure that any page, that doesn't contain macros, will be formatted in to a valid docbook file. This will mean fixing any bugs with the current implementation, and refactoring and cleaning the code for generating tables. For details see ["MoinDocBookProject/Bugs"] .
A lot of work will also go in to writing unit tests for different parts of the formatter, covering the most common cases:
- Empty page
- Paragraphs
- Text styles like emphasize, underline etc.
- Lists
- ...
- The page SyntaxReference
Step 2: Aggregate multiple pages in to one book
Currently each page is transformed in to a docbook article. The goal of this step is to make it possible to generate a book, by specifying what articles will be sections of the book. This will probably be accomplished either by using the Include macro directly, or by writing a simplifying wrapper around it, hiding unsupported aspects of the macro.
There is some rudimentary support for the include macro in the current docbook formatter, but the partial support relies on breaking the dom handling and causes other bugs. I will approach this problem from a very different angle, and the Include macro's support as it stands now, will be ripped out completely before I re-implement the functionality.
An effort will be made to make sure the generated docbook is valid, but as there is currently no way to validate a DOM against a DTD, it will probably not be watertight.
It currently seems that each macro needs special cased handling inside the docbook formatter, so fixing the Include macro, will not mean that other macros will start working.
Step 3: Generate wiki markup from docbook
The goal of this step is to make it possible to import docbook articles and possibly books in to the wiki. The converter will parse a docbook, and generate wiki syntaxed ascii text as output.
There is no generic infrastructure for this in moinmoin, but there is one other plugin that does a conversion. A recent version of moinmoin adds support for a wysiwyg editor. This javascript bases editor actually generates HTML but this (subset of) html is then converted to wiki syntax.
I will research if it is feasable to leverage the existing work, or to some way make a generic conversion class for generating wiki syntax.
One issue will certainly be the sheer size of the docbook specification. Even though the most common elements can be mapped straight to wikisyntax, the number of elements available is staggering (417 different elements that can have attributes). I will use the docs in the fedora cvs to harvest information about which element-types to prioritize.
Step 4: Split one docbook to multiple pages
Once converting one article/book in to a single wiki page is working, I will work on converting a single book to multiple pages. This splitting will create one page for the book which will simply use the aggregation method from Step 2. to include the actual text from the subpages. I will take ideas from DocsProject/WritingUsingTheWiki for how the book will be split up.
Step 5: Create a way to embed xml in the wiki syntax
To make it possible to represent information in the original docbook, in the wiki syntax, I have been asked to generate some way to embed docbook tags in the wiki markup.
This is a seriously complex issue. I will fill out the information here as it gets clearer. The following stuff is more brainstorming than an official plan. Please feel free to comment them either here or on the wiki
Update: I don't like this approach anymore, and I've outlined a similar but cleaner way in ../PassThroughBlocks. What you see below is outdated, left here beacause I don't have the time to clean it up right now.
It has been suggested that a special prosessing instruction would enable this function, which by default would be disabled. This seems like a very good idea, to minimize the risk of breaking existing pages.
I really want to use a real xml parser to parse the xml tags in the wiki syntax. My current idea is to first replace all & that aren't already either & < or > marks with & and then look for "< " and " >" and repace those with "< " and " >" (notice the location of the spaces inside the quotation marks), which should take care of most common cases. If this isn't enough, converting all xml tags that don't use a predefined namespace identifier, would get their < and > converted in to < and >. This way a page could have xml as long as the namespace identifier isn't one of the predefined ones.
Then I would add the whole raw text in to "fake" xml tags, prepend an "<?xml?>", and shove it to an xml-parser. Every textnode of the root element, would get parsed by the usual wikisyntax parser one line at the time, while the elementnodes would get the special xml handling.
These elementnodes would get walked through, and each node would cause a call to the formatter's function. The formatter api for this could be something similar to what the current api for the formatter is. It could first check if the formatter has support for the namespace in question. If it does, it would take all arguments of that elemen and make a tuple of each. It would then take the first textnode inside the element if such exist and pass these as variables to the formatter accompanied with the traditional on=1 variable.
It then advances to the firs subelement of the elementnode, and does the same to that. When the elementnode has no more textnodes or elemntnodes, it will call the formatters method for this elementnode with on=0, and back out to the parent element.
Step 6: Make twoway conversion of moinmoin lossless
In this step I'll add support for the embedding of tags in to the docbook-to-wikisyntax converter, and add support for handling these in to the docbook formatter.
The issue here is that even with the possibility to embed xml, I'm not sure how one would go about adding information that an image on the wiki is in fact a screenshot. If the image is put inside the xml, it will not show up on other backends (like regular browsing). Also, there is a huge number of tags in the docbook specification, and writing handlers for each on both sides will probably be beyond the scope of this project. I will implement handling for the most common docbook elements (as found in the fedora-docs' cvs), so that extending it later will be simple.
Timetable
You can find out about the actual situation on MoinDocBookProject/ProgressReports . As you can read at the start of this page, only goals 1,2,3 and 4 are primary goals, so they are the ones included in this timetable.
- 26.5 Start
- Mon 12.6. Step 1 done
- Mon 26.6. Step 2 done
- Mon 17.7. Step 3 done
- Mon 7.8. Step 4 done
- Mon 21.8. All documentation and tests done. Project done.