Welcome to this extra OpenWeek session from the Documentation Team.
I'm Phil Bull, and I'm going to be talking about writing documentation in an XML format.
The topic is "Mallard from scratch: Writing XML documentation for the GNOME desktop".
This session is aimed at people who have never written XML before, but would like to get involved with writing system documentation.
See https://wiki.ubuntu.com/DocumentationTeam for more information on contributing.
We'll be concentrating on Mallard. This is a new format which has specifically been developed for the GNOME Documentation project.
We're hoping that it will be pretty useful for other open source projects too!
I'm going to go through the material pretty quickly. I'll stop a couple of times during the session for questions, and I'll make the transcript available online.
There is also a tarball containing example files and a few exercises available from here:
OK, let's begin!
2. XML basics
2.1 What is XML, why do we use it?
XML stands for "eXtensible Markup Language".
Markup languages are often used to give information on how parts of a text document should be displayed.
For example, HTML is an example of a markup language. With HTML, you can mark-up the text to tell the computer to display it in bold, or change the colour of it, for example.
So, markup languages let us add formatting and structure to a text document.
We want our documentation to have formatting and structure, so that's why we're interested in using a markup language.
But which markup language? For documentation, we're usually more interested in the structure than the formatting. The important thing is the content; a designer can format the document afterwards.
We also want to organise and link topics together easily (and automatically, if possible). Translation and outputting the document in multiple formats (such as HTML and PDF) are also desirable.
HTML isn't great for these things, so most technical writers use XML formats like DocBook and DITA. We're using Mallard, which is an XML markup language.
It provides a set of XML "elements" which let us define the structure and formatting of some text.
Once we've added these elements to the text, we can open the file in a viewer which understands the Mallard XML format. The end result is a help file, which your users can view and navigate on their computer screen.
See here for more information on what XML is:
2.2. What is an element?
Elements are the building blocks of XML documents.
Here is an example of an element:
The "<p>" is the start tag, and denotes the start of an element. The bit between the angle brackets (a "p" in this example) tells you what *type* of content the element contains.
Here, <p> means that the element contains a paragraph of text. Another example, <link>, means that the element contains a link. There are lots of different elements that you can use.
The "</p>" is the end tag, and tells you when the content has finished. It is identical to the start tag, except for the forward slash.
The bit in between the start and end tags is the "content". It can be all sorts of things, such as text, a link or a filename to an image.
Whitespace (like spaces and new-lines) don't matter much in XML. You can have your start and end tags on different lines to the content, and the element will still be valid. For example, this:
is exactly equivalent to the example I gave above.
Whitespace, no matter how much of it you have, is usually displayed as just a single space. This:
should be displayed as:
Some elements don't have any content. These elements don't have a start and an end tag; instead, they have a combined tag, which looks like this:
Note the forward slash at the end of the tag here.
We'll see lots of examples of different elements soon.
2.3. Elements can contain other elements
Some elements can contain other elements!
This allows us to *nest* content, which is very important for structuring your document.
An obvious example is that you might have a paragraph, <p>, inside a section, <section>.
There are lots of other reasons why you might want to put one element inside another, though; for example, if you have some bold text in a paragraph, or if your section has a title.
2.4. Only certain elements are allowed in other elements
XML is a very strict markup. HTML tends to be very forgiving if you put the wrong element in the wrong place, or miss an end tag.
Not so with Mallard. There are rigid rules for what is allowed where, and how it must be used.
This is actually a feature! By insisting that everyone follows a strict standard, you ensure consistency and compatibility.
This means that you will sometimes need to refer to the Mallard specification to see what is allowed. It's not difficult once you're used to it, though.
An important example of the strictness of the standard is that only certain elements are allowed in a given element. This makes sure that the structure of the document makes sense.
For example, the <table> element cannot contain a <page>.
Some elements have compulsory elements too.
For example, the <section> element *must* contain a <title> element.
2.5. Elements can have attributes
As well as content (the stuff in between the start and end tags), elements can have attributes.
Attributes give extra information about an element.
They are used for lots of things, such as identifying specific elements, defining the URLs of links and providing alternative text.
Here is an example of an attribute:
The name of the attribute is "id" and the value of it is "introduction".
Each element can have several attributes, but it can't have more than one attribute with the same name.
Some attributes are compulsory, or expect a value with a certain format. We'll see more examples of attributes soon.
Open the file Example-bad.page and see if you can identify what is wrong with the document.
Once you've had a go at correcting it, compare it with Example-good.page to see how you've done.
You can see a screenshot of how that page looks here:
We'll take a break now for questions.
3. A simple Mallard document
Let's write a simple Mallard document for ourselves.
Go to the "first" directory in the "mallard" folder and open the file index.page in a text editor.
3.1. The page element
At the top of the file you will see a page element with three attributes.
The xmlns describes what type of XML document we are using.
The type gives the type of page, which can be either "guide" or "topic". We'll discuss these more later.
The id is required, and is a unique identifier for this page in the document (more on that later, too).
For now, leave it as "index".
3.2. The info section
The info section provides information about the document, such as its version and who wrote it.
You can put lots of different types of information in this section, but we're normally only interested in a couple of things.
So, let's make this document our own. Change the revision date to todays date, and bump the version up to "1.1".
Change the name and email address of the author too.
This document is your copyright, so lets add some copyright information.
Somewhere in between the <info> start and end tags, add a <copyright> element, with two elements inside it, <year> and <name>. Add appropriate content.
Don't forget to add an end tag for <copyright> too!
3.3. The title and description
After the info section has been completed, we can start writing the document itself.
Every document needs a title, so let's add a <title> element after the info end tag "</info>".
Titles should be short but descriptive. Avoid unnecessary words!
In Mallard, we also give slightly longer descriptions of pages to supplement the title. The description should elaborate on the title, but not repeat it.
It's also a good place to use synonyms of the key words in the title, to help people searching for certain terms to find the right article.
Here's an example:
Title: Change the color of highlighted text
Description: Change the background color of text which has been selected with the mouse.
Use the <desc> element to add a description. This element belongs inside the <info> section, not outside it like the <title> element.
3.4. A section
Sections are used to structure longer pages.
Mallard is designed for "topic-based" help; each page in a Mallard document should cover one topic and nothing more.
It's not like writing a book, where sections flow from one to the next. Each page should stand on its own, although of course you can link to other topics.
As such, you shouldn't need to use sections all of the time, since your topics should be reasonably concise.
I've found that many of the topics that I've written aren't long enough to be split into sections.
Anyway, you'll need them for longer pages.
You can add a section just by using the <section> element, anywhere below <title>. You can add as many sections as you like to a document.
Remember to give each section its own <title>, which must be in between the start and end tags of the <section> element.
3.5. A section with some paragraphs
There's not much point having an empty section, so let's add some paragraphs of text.
This is as easy as adding <p> elements into the section.
You can add as many <p> elements as you like.
3.6. A paragraph with italic text
All of the elements that we've used so far have been "block" elements or non-displaying elements.
Block elements, such as tables and lists, occupy a paragraph of their own. They can be thought of as the building blocks of a page.
Non-displaying elements, like most of the elements in the <info> section, aren't displayed on the page itself. They might be displayed on an "About this document" page, though.
Another type of element is an "inline" element. Inline elements are used to markup parts of a string of text, but necessarily the whole string.
They are used for adding links, making text bold and so on.
Here's an example of making text italic:
<p>Some of the text here is <em>italic</em>, but none is bold.</p>
The <em> (emphasis) element is used to make text italic.
You can't make text bold at the moment, because Shaun McCance (the author of Mallard) hates bold fonts.
3.7. The end of the document
The end of the document is simply the page end tag, </page>. To summarise, we have:
The page start tag, <page>
The <info> section.
The contents of the page, which might be <section>s, <p>s, or other block elements.
Maybe some inline elements inside the block elements.
The page end tag, </page>.
Compare your document with index.page in the "example1" folder. Are all of the tags in the right places?
3.8. Displaying the document in Yelp
What good is the document if we can't display it?
At the moment, the only help viewer which has been modified to work properly with Mallard documents is Yelp, the GNOME help viewer.
You can convert Mallard docs to HTML, though.
To view Mallard docs, you need the versions of gnome-doc-utils and yelp which are included with Ubuntu 9.10. If you don't have 9.10, you can compile them from source yourself.
That's left as an exercise for the reader, though...
To open your Mallard document in Yelp:
Open a Terminal and change to the directory that your index.page is in, e.g.
Then, type the following and hit return:
Yelp looks for a .page document in the current directory which has the ID "index".
If your document is valid Mallard XML, you should now see something like this:
If not, look at any messages which might have appeared in the terminal. These might give you some idea of what is wrong with the document.
4. Other Mallard elements
There are lots of useful elements that you can use in Mallard. I'm going to briefly go through a few of the more common ones.
You can find some examples in the "elements" folder.
There are several different types of list. For a full listing of them: http://www.gnome.org/~shaunm/mallard/mal_block.html#lists
The most important one is probably <steps>. This is used for procedural instructions, like the steps in a how-to.
You start the list with the <steps> start tag, and then put in as many steps as you like using the <item> element.
Each <item> element *must* contain one or more block elements, usually a single paragraph, <p>.
You can put other block elements in there, like an image or even another list!
It's best to keep lists simple, though. Complicated lists are difficult to read.
I've put lots of examples in the "elements" folder.
One last thing to note is that lots of people forget that you need to put a block element in the <item> element. This is right:
<item><p>A list item.</p></item>
This is wrong:
<item>A list item.</item>
Mallard has some interesting linking features, but we're going to concentrate on simple inline linking for now.
These links are displayed like hyperlinks on webpages; the content of the <link> element is clickable.
The <link> must have a "xref" or "href" attribute. These give the address which is opened when the link is clicked.
If you want to link to a webpage, use the href attribute. For example:
<p>Visit the <link href="http://www.ubuntu.com">Ubuntu</link> website for more info.</p>
You can also link to a different section on a page, or a different page in a document. Use the xref attribute for this:
<p>See the <link xref="intro#overview">introduction</link> for a brief overview.</p>
The "intro" part is the ID of the page that you want to link to (remember the ID attribute of the <page> element?)
The "overview" part is the ID of a section in the intro page. The "#" separates the two.
If you want to link to a section in the same document as the link, just miss out the bit before the "#":
<p>The final section of this introduction is an <link xref="#overview">overview</link>.</p>
In this case, you would have a section start tag that looks like:
4.3. GUI elements
When you refer to part of a graphical user interface in some instructions, use the <gui> element.
This is most commonly used for buttons and labels.
Here's an example:
<p>Click <gui>Save</gui> and then close the document</p>
4.4. GUI menus
When you refer to a list of buttons in a user interface with <gui>, like in a menu, use the <guiseq> element to group them together.
In Yelp, this results in the buttons being displayed with nice arrows in between them. For example, this:
<p>Click <guiseq><gui>System</gui><gui>Help and Support</gui></guiseq> to get more help.</p>
will be displayed something like this:
Click System -> Help and Support to get more help.
4.5. Summary of others
There are lots of other useful elements.
Take a look at the spec for more information:
5. Multi-page documents
One of the strengths of Mallard is the ease with which you can organise information.
This is done by putting each topic on a separate page, and then linking the topics together into a document.
5.1. Structuring a document
The main aim of structuring a document is to put information where your readers are expecting to find it.
This means that they can get the information that they want quickly, and solve their problem with the minimum of fuss. The best help files are often the ones that users spend the least time reading.
Normally, this means that you will want to split information into self-contained topics. Each topic is displayed on a page of its own.
You also need some way of navigating between these pages, or grouping them together when they are about similar things.
To help you with this, there are three types of page in Mallard.
5.2. The "index" page
The index page is the "first" page in the document. When a user clicks Help -> Contents, this is probably what they will see.
The purpose of this page is to provide a starting point for people to go looking for the information that they need.
As such, the index page will normally just be full of links.
You won't have to put these links in by yourself, though! Mallard pages can automatically link themselves in to the index page.
The index page has the <page> element with attributes "id" with value "index" and "type" with value "guide".
When you try to open a document in Yelp (see section 3.8), it will look for a .page file with the ID "index".
5.3. Guide pages
The index page is a "guide" page. You can have other guide pages too, though.
A guide page is one which collects together links to similar topics, sort of like displaying the contents of just one chapter of a book (rather than the whole table of contents).
As with the index page, other pages (including other guides) can automatically link themselves into a guide page.
Guide page <page> elements can have almost any ID that you like, but must have type="guide".
5.4. Topic pages
Topic pages are where you provide instructions or explain concepts. They are where the user reads the information they are looking for.
Most of your pages will be topic pages.
Topic pages can link themselves into guide pages, as mentioned before.
Topic pages have <page> with type="topic".
5.5. What are reciprocal links?
Reciprocal links are a pretty cool feature of Mallard.
If you link to topic A from topic B, then topic A will automatically have a link to B added at the bottom of the page.
This is useful because it helps the reader to navigate between topics, and makes it easy to extend a document by simply plugging pages into it, without needing to modify any of the other pages.
5.6. Linking a page into a guide
To link a topic page into a guide, you need to add a <link> element into the <info> section of that topic page.
You *don't* need to put a link element into the guide page.
This <link> element is similar to the inline <link> element. This time, however, it can go anywhere inside the <info> element.
Now, it's a single "combined" tag, rather than an element with a start and end tag and some content:
<link type="guide" xref="index#behavior"/>
The "type" tells Mallard how to link the topic into the guide ("guide" just means to include it as a normal link).
The "xref" is similar to the xref in section 4.2, but it tells you where the link will *appear*, not where it is linking to.
In this example, the link will appear in a section with id="behavior" in the index page.
5.7. Linking guides to other guides
Guides are linked in to other guides just as topics are linked into guides.
Just use the <link> element in the <info> section, as above.
6. Where next?
APPENDIX. Reading the Mallard specification
This bit might seem a little complicated, but bear with me. It will all seem clearer when we look at a few examples.
Let's look at what the Mallard specification has to say about the <section> element:
We're looking at the grey box at the top of the page. Ignore the first few lines (starting with "attribute") and skip to the bit that says:
The "mal_info" means that elements of the type mal_info are allowed.
There are several elements of this type, which you can see by clicking on the link. The options include descriptions and links.
The "?" at the end of the line means that this element is optional – you can choose to have a mal_info element inside your <section> or not.
The next line is:
This is the title for the section, given by <title>.
Note that the line just ends with a comma. The lack of an extra character, like "*" or "?", means that the <title> element is compulsory, and there can only be one.
The next line is:
This means that any element that is of the type "mal_block" can go inside the section.
A "block" element is any element that occupies a paragraph of its own, such as a table, a list or a paragraph of text.
The asterisk "*" means that the section can contain any number of block elements, including zero.
For example, a section might contain three paragraphs and a list, or it might just contain a table, or any other mixture of block elements.
The next line is:
This means that a section can contain any number of other sections! Each of those sections has the same rules as a normal section.
What would happen if we tried to put an element which isn't explicitly allowed in the specification into a <section>?
Let's look at <file>, which is an inline element:
Inline elements are used to markup parts of a string of text which is inside another element. For example:
<p>Open <file>example.txt</file> to see an example.</p>
According to the Mallard spec, inline elements (those of the type mal_inline) aren't allowed in <section> elements, since "mal_inline" isn't in the list of allowed types.
A section like the first one in Example1.page isn't valid, because inline elements like <file> aren't allowed in a section on their own.
However, the second section in Example1.page *is* allowed, because the <file> element is inside a <p>.
The <p> element is allowed in a <section> because it's a block element, and the <file> element is allowed in <p> because <p> can contain inline elements.