Gentler: A tool for systematic web authoring<1>

Harold Thimbleby

Middlesex University
LONDON, N11 2NQ

http: www.cs.mdx.ac.uk/harold
mail: harold@mdx.ac.uk

Abstract

We argue, with theoretical justification, that authoring hypertext and World Wide Web documents requires tool support if it is to be done well. Tools are essential for good design; without them iterative design and user testing are impractical to follow through because of the prohibitive costs of making even small changes reliably.

Gentler is one such authoring tool. It uses a database of pages and a page layout language, providing reliable design features including hypertext linkage and navigation. With Gentler as a concrete example, we introduce an important principle for design: dual requirements. Features that hypertext document readers find beneficial are beneficial for document authors, and vice versa.

Note. There are minor stylistic differences between this Web version of the paper and the text appearing in International Journal of Human-Computer Studies. This is because part of the process of publishing for the Journal involved paper. The paper was originally written in Gentler (discussed below), converted to Microsoft Word when it became desirable to take advantage of spell checking and other 'publishing' features, then submitted to the Journal electronically. The Journal printed the article on paper and made various changes (for example, preferring "e.g." to my "for example"). At the same time I converted the Word to HTML to put on the Web. Therefore the texts diverge in so far as Academic Press's editors modified my original text. There are no substantive differences; except for this paragraph and a note on new developments.

1. Introduction

The World Wide Web increases the numbers of users of hypertext systems enormously. Any phenomena that effects usability of hypertext has its effects magnified by factors of millions. The World Wide Web, like any new technology, poses design problems that have not yet been solved. There is a tension between how we think we would like to use it, what the technology is capable of, and what technical support is actually provided. Many see the limitations of the Web as primarily something to do with time delays in delivering information to users (e.g., Kellogg and Richards, 1995); this may be so, or it may be a symptom of an inappropriate use of the technology. In contrast, we do not see it as a usability problem that we cannot sensibly read books over the telephone - but we would if that was how we tried to use phones.

Brown (1996), discussing the development of his hypermedia system Guide, argues for a balance between giving users what they want, and for computing science researchers to create what they think users ought to have. This is a contentious issue within HCI; but Brown points out that concern for users' needs has driven the technologists to provide facilities for creating documents, not for maintaining them. Indeed, the more easily an author can create a large document, the sooner they meet the difficulties of its quality control.

This paper emphasises hypertext authoring, and identifies its difficulty as contributing to the difficulty of delivering quality hypertext documents to their readers. We describe a prototype tool that shows how many of the difficulties facing authors can be overcome.

The Web allows for a great variety of documents, and the tasks that are intended to be supported by those documents varies enormously. Included are artistic experiments in the use of the new medium, sometimes to celebrate its potential anarchy. This paper explores issues around document design for comparatively staid purposes. We will be particularly interested in Web documents that contain many pages and are creatively authored, rather than being generated automatically from databases. This paper examines the difficulties of writing such documents, and argues that special tools are required to write good hypertext documents. We discuss a particular tool intended to improve the flexibility and quality assurance of hypermedia documents. This tool, called Gentler, illustrates the concept of dual requirements in design, leading to features that both authors and readers of documents can use in similar ways. Dual requirements are important because they help design more complete user interfaces for browsing and authoring tools, and they also reduce the total design effort as two otherwise disparate areas are unified.

This paper refers to all hypermedia and World Wide Web documents as documents, and we refer to each unit of information as a page, following Web terminology. (Other systems use words such as node, screen or card instead of page.) A document usually contains many pages, and has one or more unifying themes. On the Web the boundaries of a document may be hard for a user of the document to discern, but for our purposes the boundaries are defined by the author's control over design and content. A document, in our terms, may reside on a single server, or may be distributed across several servers: on a distributed system it does not matter where the document 'is' (at least, whilst it is working reliably!) As a matter of style, we shall use the singular for the author of a document, and refer specifically to the end user of a document as a reader. Readers use browsers to view documents.

2. Cognitive challenges facing web authors

For the readers of hypertext, there is the problem of getting lost in hyperspace (Maurer, 1996; Theng, Jones & Thimbleby, 1996). Getting lost arises because a hypertext document provides the reader with many choices. In a conventional print document, the reader is directed and makes obvious progress (e.g., by reading and turning pages). In a hypertext document, each step gives the user more choices, and a reader soon has too many uncompleted options to keep track of. Worse, some apparent choices may take the user back to an earlier point in their reading with the result that the reader goes around in circles. Just like going round in circles when lost in the physical world, by the time a reader notices they are repeating themselves, 'turning round' means backtracking over an unknown number of steps, and creates further problems in itself. Getting lost in a document is an experience that can occur in a single session, or it can occur over months as reading a document is interleaved with other activities. Of course the Web is so large and varied that many users 'surf' and enjoy serendipity, but some users start with specific tasks, and getting lost for them is more serious and interferes with their tasks. Surfing for them is a euphemism for being lost.

For the authors, the management of a large document raises analogous problems: instead of knowing what to read next, authors need to know what to write next. As a reader can get lost in what to read next, an author can get lost in the threads of their own writing. Although an author may try to be more organised than a reader (authors may plan their writing), authoring takes longer than reading. In hypertext, readers typically only experience a very small part of the authors' total work. If users of hypertext get lost in their briefer and smaller excursions into hyperspace, it would not be surprising if authors did. Moreover, if authors get lost then the documents they create are likely to be low quality and ones that readers find particularly confusing.

Although a reader only follows a particular choice of links as they read a hypertext document, the author has at least two threads to write for each choice they introduce. After only ten choices for the reader, the author has up to a thousand places to continue writing. Thimbleby (1996) suggested that the incompleteness of Web documents is a wave front of expansion, never fully filled in, that is increasing more rapidly than authors can cope with. As we shall argue below, the problems of authoring are also apparent when an author tries to maintain (e.g., proof read, update, spell check) an existing document, not just when a new document is being authored. It is roughly what Green (1989) means by 'viscosity.' Viscosity is resistance to change; but when writing large documents, the organisation is not just difficult to change, it is difficult to create reliably, and it is difficult to check. It takes so long to create a substantial document, that the viscosity imposes itself well before the document is complete. Indeed, it is impractical to tell when a reasonably-sized document is complete.

On the Web, the authors' problem is so prevalent the consequences have become an accepted, if irritating, fact of life. Many Web pages link to non-existent pages, or they link to pages that are under construction, or they link to pages that have changed and are no longer appropriate. A common icon of the Web is the black and yellow stripes imitating a road block barrier, the wasp-coloured warning sign of an unfinished 'under construction' page.

2.1. Links and associations

All documents have many sorts of semantic associations that need to be managed by the author. The conventional hypertext link that takes the reader from one place to another "at the click of a mouse" is only one sort of association between different parts of a document. All associations require the author (or a tool the author is using) to keep two or more items of information related. An author can only do one thing at a time, so every association introduced to a document creates an item that the author may have to return to later to finish.

Print documents typically have few associations that cannot all be seen at once; for example, pictures associated with text are usually on the same page as the text referring to them. A print document can be read "from start to finish" by simply reading each page in sequence, and therefore it is straight forward to locate all associations, and take appropriate action on each one (e.g., to ensure it is consistent).

In contrast, hypertext and Web pages have many associated fragments of information. It is impractical to view all of them simultaneously - there is no paper page spread with a natural limit on its size. In fact, each link goes to further units, and so on indefinitely: associated material may be out of mind, not just out of sight. There is no systematic way for the author to read a hypertext document and systematically consider each association - unless the author has access to appropriate tools, for instance, of the sort we discuss in this paper.

2.2. Best/worst case analysis

We wish to make comparisons of the usability of various aspects of hypertext and conventional document authoring tasks. Yet the difficulty of performing any task depends on many factors. We have to decide what knowledge and skill the users, in this case the authors, have. Tasks and individuals vary: we have to make assumptions about statistical distributions. We have to consider error rates, fatigue, and so forth. Unless one is careful, the analysis of the difficulty of doing a task becomes so circumscribed by particulars that no general insight is revealed.

We shall find it more convenient to compare task complexity rather than measure usability. We shall make three assumptions for best/worse case analysis. First, that the user employs the best possible general method for performing the task without error; secondly, that their task is the worst possible; thirdly, that task complexity for a given user interface can be expressed as a function of some natural measure of task size. In other words, nobody, however skilled, could do better under the circumstances. This approach we shall call "best/worst case analysis." The aim is to obtain a functional measure of complexity so that we can compare the usability of various tasks performed under various user interfaces. See Appendix 1 for more discussion.

Suppose a document has n units of information (sections, chapters, pages, or even smaller units such as dictionary entries). As discussed in Appendix 1, it makes no difference to our analysis what difficulty writing the specific units of information, chapters pages or whatever, presents to the author.

We consider the task of an author introducing a new unit of information to a document.

Conventional document
If the author decides to write another unit (or to split an existing unit), they have to choose one out of n+1 places to position it in the sequence of units. A conventional document only has 'before' and 'after' choices for placing a new unit; if there are n existing units, there are n+1 available positions for a new unit. Thus, the complexity of adding another unit is O(n). This is the best/worst case; in the best/best case, the user might decide that the new unit comes after the current unit and there would be no 'choice' - except in the decision itself!

Note that a conventional document has such a simple structure that renumbering the units (e.g., the pages) can be done automatically. Following the choice of unit position, the author has little extra work to perform.
Hypertext document
There is no "place" with previous and next units, with no obvious (e.g., page or chapter) numbering, but rather a structure of links. Each link can go from any unit to any other unit, in either direction - four choices in all.<2> If the document has n units, there are 4ⁿ ways of linking the new unit with the existing units. The best/worst case complexity of adding another unit is O(4ⁿ).

Note that 2ⁿ⁺¹-1 of these ways can be rejected by a simple automatic check, assuming the author would like the document to be strongly connected (i.e., to ensure it has no 'dead end' pages and no unreachable pages).<3> Unfortunately, this only rejects a relatively tiny proportion of possible designs; worse, conventional tool support for detecting such cases is very tedious. It would be far better to use an authoring tool that avoids creating badly linked documents, rather than try to fix them up later!

Under our assumptions, then, extending a conventional document is a linear complexity task, whereas extending a hypertext document is an exponential task. The same simplifying assumptions were applied to both linear and hypertext documents: the comparisons are robust, with hypertext being astronomically harder to author than linear documents. Small hypertext documents are harder to extend than large linear documents. It makes a huge difference! In a sense, if you have "enough" time to write n units of information for a linear document, you have enough time to connect them together; but for writing a hypertext document, a some stage you cannot have enough time to add another unit to the ones already organised - because for some n the exponential complexity exceeds whatever you might have thought "enough" time allowed for.

But worse than this! Hypertext, and especially Web publishing, is not "one off" as print. A Web document is not first designed, then published. It is a dynamic document, that is expected to evolve and remain timely. Like developing any good interactive system, iterative design should be used (Nielsen, 1993). The continual re-design, whether motivated by timeliness, a commitment to improved usability, or to user-responsiveness as good marketing practice, means that the Web author is continually struggling with the exponential design complexity, unlike the conventional author who just once (or once per edition, which might be annual) works with the comparatively trivial linear complexity.

If we allow a Web document to be repeatedly re-structured as each of its n pages are introduced, then the complexity leaps from O(4ⁿ) to O(2^n(n-1)) compared to a continually revised linear document, which has a complexity of O(n!). (A factorial increases slower than nⁿ, which in turn is slower than 2^n(n-1)). If we allowed for deleting pages, the comparisons would be more extreme. Moreover, ifwe allowed that a linear document can have a structure (e.g., sections are inside chapters), we only achieve a worst case complexity of O(n^n-1) when we allow for revised nesting as well. This is still easier.

The World Wide Web is even harder to design for than conventional hypertext (as on CDroms) because there is little scope for author control over the user interface, as provided by the browser. Also, a Web document rarely stands in isolation like a CD can: it is part of the World Wide Web. The author of a Web hypertext wants to relate their document to the rest of the world. This obviously exacerbates the authoring complexity. The rest of the world is not only huge, but is a moving target. In contrast, the conventional print document - so far as the author is concerned - effortlessly slides into its correct place on the library shelf and into bibliographic indexing systems.

However we look at it, the hypertext authoring complexities seem prohibitively large. In fact, they are not realistically possible: if we conservatively allow a second per decision, there are some existing hypertext documents for which there seemingly has not been enough time to write! Indeed, revising a conventional document also looks suspiciously difficult. Yet people do write and revise quite substantial documents, even hypertext documents. The physical times of the unit tasks must be very brief, or our best/worst case analysis overlooked some factor. The former explanation is unlikely.

We conclude that authors must use strategies to control the complexity of document production. The author of a 200 page book does not consider all of the possible 200! page orderings, because they simply don't live long enough to make that number of decisions.

Centuries of development have provided us with useful organisational devices (e.g., alphabetical ordering, temporal ordering, and various naturalistic methods such as transcribed speech) that make the conventional author's job much easier. These conventions are also structures that are familiar to the readers of printed documents, and hence make reading easier.<4> Conventional documents that are naturally based on more complex structures (e.g., geographical information with spatial structure) contain linearised structures such as gazetteers, and the spatial structure is thereby typically mapped onto a linear page numbering (and a subsidiary co-ordinate system to aid visual search on a given page). Given such a structure, the space of choices the conventional author has to choose from is drastically reduced. For example, given a heading, a dictionary entry has only 1 place to go, not n+1.

Of course, conventional documents and hypertext documents should normally be written differently (e.g., Maurer, 1996), not just have different high-level designs; our comparison of print/hypertext authoring does not include the author's stylistic experimentation that is still required, and is still very much an open research issue. The hypertext author does not have recourse to helpful resources like the Chicago Manual of Style (Grossman, 1993) where most of the trade-offs have already been analysed, in this case for almost a century. There are few obviously useful structures for hypertext - other than mimicking linear documents or the simple cross-referencing of highly structured documents such as encyclopaedias or concept graphs (Gaines & Shaw, 1995).

If authors do not use such strategies for making planning hypertext feasible, then we must conclude that they do not choose optimal hypertext structures (there are simply too many to examine), or, perhaps, that there are no worthwhile usability gains in doing so: it may be (though I think very unlikely) that almost all hypertext structures are indistinguishable in performance for the user's tasks. In Simon's sense (1969), hypertext authors satisfice - with considerable losses to style, ease of use, quality. The question is, and which we address below, is whether authoring tools can improve their performance.

From the foregoing arguments, we think it plausible that readers "get lost in hyperspace" at least in part because hypertext authors do too.<5> Indeed, the authors' task is harder than the readers', since authors have to design a complete document with all alternatives anticipated, whereas a reader need only keep track of their personal course in their exploration of the hypertext. Changing requirements and iterative design further exacerbate the author's problems.

In summary, we conclude that structural tools are required for authors to create good Web documents of any size in reasonable time.

2.3. Quality control

Authoring is not just about creating documents, which we analysed above, but also requires quality control: checking and evaluating them. Any complex system requires iterative design, involving user testing and evaluation; hypertext is no exception. A document is designed for a purpose, and the author must check that the document is suitable for that purpose.

A document, especially a Web document, has evolving content, at least in part chosen to suit the readers' requirements (or to suit marketing criteria for those readers). These considerations suggest that authors should test their document designs. As with creating documents, testing documents is considerably harder for hypertext than for conventional documents.

Suppose, as part of the quality control process, we require a document to be proof read. A conventional linear document can be proof read easily: the proof reader starts at the beginning and reads sequentially to the end. If they need to pause, they can insert a single bookmark, so that they can resume proof reading from the same point later. This proof reading algorithm is very simple, O(n), so simple that it hardly takes any instruction to perform adequately. In contrast, there is no easy way to proof read a hypertext document - very few hypertext systems even provide any feature as easy to use as bookmarks. (Hypertext bookmarks are usually implemented to help return to pages, not to help make constant progress reading through a hypertext; see Thimbleby, 1992, for a fuller discussion.)

For concreteness, consider the Royal Society of Art's (RSA's) prototype Web site. Alternate incarnations of it are illustrated in Figures 1 and 2. In early 1994 it had only 26 pages. That document represented an evaluation task, had it been done manually, of at least 621 mouse clicks<6> to check its main navigation links plus the (conventional) overhead of proof reading the text of each individual page (and the time to check the several hundred index links). Six hundred checks is too high to be done reliably by hand.

Almost every document must be changed at a rate where manual checking cannot keep up. Manual checking would have been crazy! In summary, though not argued in detail, we conclude that structural tools are required for authors to manage good Web documents of any size in reasonable time.

Figure 1. Early attempt at an RSA page design. For illustrative purposes only, the page chosen has unusually little body text. The icons and the text that is underlined represent hypertext links.

Figure 2. Revised RSA page design. Compare with Figure 1.

What's new

The Society's lecture programme held at the House normally runs from October to May. Contact the Lecture Programme Office for bookings (direct line: 0171 930 9286).

RSA lectures 4:21 pm Thursday, May 2, 1996
"A Dream Fulfilled" - Lecture on the Globe Theatre, 12 June.

Start Right 11:22 am Thursday, March 14, 1996
Start Right and its agenda for action.

Figure 3. Main text from the What's New page; the text below the second rule was inserted by the ·news macro. Note that in this page, the dates shown are inappropriately the date and times the pages were edited, not the future dates of the items being reported.

3. Gentler

Gentler is a Web authoring tool that manages a collection of Web pages. Its design was driven by the aim of improving the usability of generated Web pages. In particular, it was designed to conveniently handle changing design requirements, as might arise through the feedback from user testing of page content or site design. In turn this led to features for authors; interestingly, most author and reader facilities supported by Gentler are duals of each other - as we discuss below.

Gentler is not a WYSIWYG editor, and in fact for page layout it has little to recommend it against commercial authoring tools. But page layout is only a small part of document production. Within Gentler, a page is just a "skeleton" of HTML text; running Gentler builds full Web pages from each skeleton, with consistent design and navigation links, as well as the appropriate HTML protocols (title, body codes etc.). Gentler includes a page editor, but its main advantage is the consistency and the ease with which design details such as structure and page layout can be changed throughout a whole document of many pages. It also has features to help manage a Web document being authored over a long period of time.

3.1. Gentler's user interface

Gentler is a prototype tool (see Appendix 2) and its user interface was not designed for general use. Fortunately few features require elaborate user interface support, beyond buttons, menus and editable text fields. Gentler provides a basic text-oriented HTML page editor, requiring a separate previewer, typically a commercial Web browser.

Gentler has a simple outliner, which allows the author to rearrange the document structure. Direct manipulation is used to move pages around the structure, much as in commercial word processors with outliners. Howsoever the structure is modified, Gentler ensures the navigation and other links in the Web document are correctly modified.

Gentler can insert HTML links directly for any page it has in its database (see Appendix 3). Other links, such as email URLs or references to documents elsewhere on the Web, can be typed by hand (though the editor provides a simple macro template system than can reduce the keystroke count). Gentler can follow HTML links provided they point to pages elsewhere in the database, and hence allows the author to navigate within a document much like a user working with a browser would.

3.2. Gentler's abstract document model

Gentler separates page content, page design, document structure, as well as several quality control and authoring issues. Conceivably these components of document design might be handled by different people, with different skills. Regardless, since these components are separated, the viscosity of authoring is greatly reduced.

A document constructed by Gentler can be imagined as a generalisation of a conventional linear document. A conventional document has sections and subsections, contents, index, running headers and footers, and so forth. A Gentler document is based on a sequence of skeletons, which may be classified as sections, subsections, and so on. These skeletons map onto individual Web pages, which Gentler links with navigation menubars and other automatically generated links as defined by the document style. The author's user interface to Gentler works entirely with the ordered tree that represents the nested linear document. A tree is more constrained than a general graph, so Gentler reduces the complexity of authoring. Gentler is not intended for authoring completely arbitrary documents, though almost any sensible document would embed a spanning tree that Gentler could author. And there is still plenty that can be done with trees (Thimbleby, 1994b).

Within pages, Gentler supports any number of style specifications, to convert the skeletons to Web pages in various styles. Thus Gentler can be used to do convenient experiments on document design, because a document's design can be changed easily and independently of its contents. Gentler can create documents with styles suitable for special purposes: draft Web pages with diagnostic information added, or linearised documents for convenient printing on paper.<7>

The style specifications can insert navigation menu bars, fish eye tables of contents, background textures, trademarks, frames and so on. Navigation links can be generated by Gentler so they require no specific work from the author; conversely, they are correct, maintained automatically, and provide uniformity for readers of complex documents. If an author changes the icon or thumbnail associated with a page, changes its title, moves it elsewhere in the document structure or even deletes it, Gentler takes care of the consequent navigational changes and updates the icons everywhere throughout the Web document.

Pages within a Gentler document can be references to external files (using URLs) and so can reside anywhere on the Web. So far as Gentler is concerned these external files are treated and linked to like all other pages, except that they are not created by Gentler. In particular, external files may themselves be Gentler documents, and can contain their own local contents and indexing information, as well as bookmarks, reminders and news (see below). External files are a convenient way of managing large, distributed documents that have been divided into more convenient chunks for authoring.

Gentler could have been a conventional batch compiler. It is, after all, just a means of compiling a set of pages, a design specification combined with a structure specification. However, Gentler is an integrated database, and can provide a further range of useful features. Gentler knows when any page is edited, and, significantly, it knows this automatically without imposing any overhead on the author. This knowledge can be used by the author to find old pages; equally, Gentler can construct "What's New" pages automatically, with annotations.

Gentler knows when a link is introduced (it provides tools to insert links conveniently), and it can insert reminders elsewhere in the document so that the author need not lose track of their current stream of thought. However, if an author chooses to write raw HTML, for instance to insert a link character-by character, Gentler does not parse it and provides no automatic support as it is typed (a summary can be provided at a later stage; see below). Any link, however created, to another page in a Gentler database can be followed, much as in the user interface to a standard Web browser.

3.3. Links and associations

The links and associations managed by Gentler are distinguished as follows:

Navigation links: help the reader of the document find themselves around. In Gentler, the author does not really need to be concerned with these links, since they are maintained automatically (though the author can change their design and where they are used on pages).
Indexing links: create index (or tables of contents) entries for the document. In Gentler, the author merely flags a page, word or phrase as indexable, and needs to do nothing else, since Gentler keeps track of the index and index entries.
Implicit associations: are links, usually between different sorts of media, that are not made formally explicit. In conventional print media, implicit links are indicated by juxtaposition and other associations. The HTML tag img inserts an image (picture) and the author should ensure that the image fits the context of its use. Gentler can associate titles and icons with pages.
External links: are hypertext links going outside the document, typically to remote Web sites over which the author has no control. Amongst external links, it is convenient to classify mail, CGI links and all other non-document links.
Local links: are links within the document. (Gentler 'knows about' the document and therefore provides several sorts of short cut to create local links. It is possible to enter a local link to a page specified by file name or by page title.)
Component associations: are links to various parts of the same page for the author, but which a reader may see in widely separate places. An example in Gentler is the "What's New" text. This obviously has to be linked to the new page, but its value to the reader of a document is that it is summarised elsewhere.

All associations raise management problems: as a document is edited, or as its structure is changed, they must be kept valid. Gentler has no problem ensuring the first three sorts of association (navigation, indexing, icons) are correct. For arbitrary associations, however, there is the problem of knowing the author's intended semantics: how is Gentler (or any other tool) supposed to interpret what the author wanted? It would be possible - as an extreme case - to use a bad link as an example to help write a hypertext style guide! Gentler's approach is to gather all links together into a single summary, giving the context and details of each link, thus converting intended semantic links into physical proximities. Whether these links make sense is left to the judgement of the author. The primary gain is that the associations are all in one place, and can be reviewed by the author in linear time, without having to also navigate around the document and keep track of which pages' links have been checked.

The summary is a good example of linearising a hypertext to systematise quality control. Also, the author can check that anchor texts 'make sense' - something that no automatic process can reliably do. An extract from the summary table for the RSA site's links is shown below; the summary is itself a HTML page, and the author can follow the summary links because they are actual hypertext links (indicated here by underlining). Note the use of text between dollars ($), which Gentler uses to create index entries.

Link and title of linked file where known

Hot text

Used in file(s) with title(s)

External link:

http://www.delphi.co.uk/delphi/stories/9508/16.Globe/intro.html

details of its reconstruction

lectures.html

RSA lectures

External link:

http://www.delphi.co.uk/delphi/stories/9508/16.Globe/walk_intro.html

'virtual tour'

lectures.html

RSA lectures

External link:

http://www.epsrc.ac.uk/epsrcp2.htm

$EPSRC$Engineering and Physical Sciences Research Council$

projects.html

Some current RSA projects

http://www.transcend.co.uk/llis/cfl

The Campaign for Learning

Details of the Campaign are now available

projects.html

Some current RSA projects

journal.html

The RSA Journal

RSA Journal

fellows.html

The Fellowship

RSA Journal

rsa23.html

Access for fellows

lectures.html

RSA lectures

"A Dream Fulfilled" - lecture on the Globe Theatre

rsa3.html

Highlights of the RSA Website

External link:

mailto:rsa@ftech.co.uk

rsa@rsa.ftech.co.uk

manage.html

RSA governance and management

External link:

mailto:rsa@rsa.ftech.co.uk

Lesley James

rsa35.html

Start Right Annex

From the link summary extract, it is easy to see some potential problems with this document. The author has italicised one link to the RSA Journal, and not the other. One of the external links to email addresses is apparently wrong, for the hot text (what the user sees) is a different email address from the one that is actually used! The table allows the author to see this easily (rather more easily than when the links are embedded in various pages throughout the document). Authors can click on the hot texts in the table to try out the actual link, to see what works. The right hand column provides links to the pages that contain the links themselves, so that the author can easily get to any problematic pages.

3.4. Page design language

Gentler's page design language is basic: it grew as the easiest way of specifying the page design for various sorts of pages in various sorts of document. And bare in mind that I did not want a language so onerous to implement that I never reached the time to put it to use!

To generate a Web document, Gentler calls a chosen macro once for every page in the document database. An author might provide several such macros if the document is intended to be presented (or tested) in various alternative forms.

A simple page layout macro might say

·pagebody
·menubar

in which case each page of the output would be the unadorned skeleton page text followed by the standard navigation menubar. This text would be placed in each page's associated Web file, and the navigation menubar would correctly link to the pages in whatever order and structure the Gentler document specified. Gentler ensures that generated pages have suitable <head>, <title> and <body> codes inserted automatically; the author does not specify these in a page design.

Typically a page design would be more sophisticated than this example suggests, and would take advantage of various properties of the page that Gentler knows about, such as whether it has news annotations, whether it has children in the document hierarchy, and so on.

Conditionals can be used to create both flexible and consistent page designs. The page layout which resulted in Figure 2 is, in part, as shown below, with added explanatory comments in italics:

·homepage? ·pagebody The home page has its own special design, so its body is just copied
·!homepage? Any other page has menubars and all sorts...
[ ·menubar
  <p>
  ·hastext? Only include the RSA logo for pages with text
  [ <table><tr>
    <td valign=top>·macro? [RSA edgewise logo]</td>
    <td valign=top>
      ·section? <h1>·title </h1>
      ·subsection? <h2>·title </h2>
      <hr>
      ·pagebody
    </td>
    </tr></table>
  ]
]
... Now put out the corporate image and table of contents...

The language currently does not have general expressions, such as "home page and not hypertext." Instead the author has to write complex conditions out in full; this condition could be written as "·homepage? ·!hypertext? [text to process]" The language is sufficiently expressive, but it is not easy to use.

The author can create new macros. A typical use is to define the corporate logo as text or images in HTML, as done above. By having a single macro with conditions the author can keep all uses of it consistent easily. Macros are just text that can be edited, copied and pasted as normal, so it is possible for authors to share their style macros and hence spread good practice.

3.5. Reminders and annotations

Each time an author decides to refer to another page, they give themselves an authoring commitment they probably need to return to at a later time. Sometimes, the author will merely want to refer to some existing text, but more often the act of referring also entails the referred-to text to be modified, or even to be created. When an author creates a link (or any other sort of association they will need to manage) in Gentler, they can create a reminder.

A reminder is a piece of text describing what the author wants to do. Like HTML comments, Gentler's reminders do not appear in the readers' view of the document, but, unlike comments, Gentler monitors them. For example, if a document contains any reminders, Gentler checks with the author before it creates a Web version of the document from its database. It may be that the author has some tasks to complete before Gentler should proceed, and Gentler makes it easy for the author to get to any pages with reminders. Since an author may spend many weeks writing a large document, the reminders are descriptive and are automatically dated. The author can add reminders for any purpose, and they can be edited in any way at any time.

Entering reminders could be tedious, but as Gentler's user interface enables authors to insert hypertext links within a document very easily, Gentler knows about both the source and destination of the links as they are inserted. So when the author refers to another page, Gentler asks if it can insert a reminder in that page, or if it can take the author to the referred-to page and put a reminder in the original page. Thus, in many cases, the author gets the benefits of reminders for no effort.

Gentler provides various user interface tools for reminders, so that they can be created as well as located easily within a large document. Gentler can add reminders to the current page, any page, or to pages recently visited (as in: the author has jumped to a new page, but wants to put a reminder on the page they came from). In the special case that the author creates a link with a reference (e.g., file.html#reference), then Gentler automatically creates a dated reminder text that the reference should be added to the target page.

3.6. Overviews

Users - whether authors or readers - may like overviews of documents. Gentler provides full tables of contents as well as several forms of fish eye contents (contents centred around the current page, giving more detail locally, less detail for pages 'further' away). Similarly the author wants to find their way around a document. Gentler's user interface provides an outliner and browser bookmark files. The bookmark files can be loaded into browsers and allow a user (reader or author) to move around a document very easily using hierarchical menus. Gentler can generate bookmark files that include the document's reminders (for authors) or the news (for users). Annotated bookmark entries are flagged, and the flagging carries on up the menu hierarchy so that it is easy to locate annotated pages.

Bookmark files have been found useful when Gentler has been used to generate presentations for live audiences; they make it very easy for the lecturer to navigate around the talk, or to go to out-of-sequence pages to help answer audience questions.<8>

Of course, very large webs may need special treatment. For a sufficiently large web the whole table of contents might be a navigational problem in its own right! At present Gentler does not provide breakdowns of the table of contents.

3.7. Supporting usability analysis

Gentler can generate Mathematica specifications of document linkage. Mathematica is a flexible, symbolic mathematics package (Wolfram, 1991); used with Gentler it is possible to visualise link connectivity as graphs, work out network statistics - such as average number of links between pages - and so on. Gentler's use of Mathematica specifically is not important; the point to be made is that a tool such as Gentler can support mathematical insight into hypertext document design and quality control.

How easy is it to get lost in the RSA pages? As yet we do not know, and anyway such answers would have to be compared with figures for other documents and appropriate user tasks. However we can ask analytic questions that relate to getting lost and other usability problems.

Suppose a user accidentally clicks on a navigation button, thus going to a new page. This scenario is likely in the RSA pages because of the design of the navigation bar. As can be seen in Figures 1 or 2, the navigation bar has an icon "more detail," but this only appears when the page has a subsection giving more detail. If a user keeps clicking in the place of "more detail," on a page without a subsection they will accidentally click on the "next topic" icon, which would appear in the same place (except on the very last page, which has no next topic). Should we redesign the navigation bar, so that the "more detail" is always present but sometimes dimmed or crossed-out, and navigation icons have a consistent physical position on all pages? To explore alternative designs, we can start by asking questions whether the cost of the error is significant. For example, "If this page is not where the user wishes to be, how many mouse clicks does it take to get back (without using the browser's "go back")?"

For the RSA pages, the answer is 1.09 (it cannot be less than 1.0). In other words, if page x is linked to page y, usually page y is linked back to page x. Interestingly this near-symmetry was not a deliberate design decision, but emerged as a consequence of the overall design of the RSA pages. The number would be exactly 1.0 if there was a link back on every page; for the RSA pages there isn't because subsections are not connected to following sections in both directions.

There are a wide range of mathematical properties that relate directly to usability and to quality control procedures. In Mathematica it is easy to confirm that the network of links is strongly connected, a property that requires that there is a linkage route from any page to any other. If a document was not strongly connected, a page might be inaccessible to a user, or a page might be a 'dead end' with no continuation (other than resorting to a browser's back button). Strong connectivity is a simple property (and many commercial web authoring tools only check for a rather restricted connectivity<9>), but other properties are more complex and are not supported on any commercial tools. We mention just a few, which are trivial to determine with Gentler and Mathematica:

Travelling salesman tour
A proof reader wants the shortest 'recipe' to check the contents of every page. This recipe is the travelling salesman tour. In special cases, it is possible to visit each page just once: in which case the document is said to be Hamiltonian. The RSA site is Hamiltonian.
Chinese postman tour
Gentler can summarise all links of a document in a table, but this lifts them out of context: a proof reader might further want to see the links and read their surrounding material in the order a user of the document would follow. The proof reader will want the shortest 'recipe' to check that every link makes sense. This is the Chinese postman tour. In special cases, it is possible to test each link just once: in which case the document is said to be Eulerian. The RSA site is not Eulerian.
Spanning tree
A linearised version of the hypertext document may be required. Linear documents, in fact, have nesting (section, subsections, etc.), and can be represented as ordered trees. Given a hypertext document, the author may wish to find a spanning tree that can be presented to the user as a conventional print document.

Most documents will have many spanning trees. Of all spanning trees, an author may want to choose a tree rooted on the home page and optimised, say, to have minimal depth (to create a linear document with minimal section nesting). Gentler is based (on might say rooted) on spanning trees.
Domination number
Gentler's navigation bars can refer to specific pages (such as 'home'), as well as to adjacent pages (such as 'next topic'). A document design might call for the smallest navigation bar that gives the best access to the document. A navigation bar based on minimal dominating sets is one way to do this (depending on what "best access" is taken to mean). Since navigation bars should provide a conceptual overview, it is arguable that the domination number (i.e., the cardinality of the minimal dominating set) of the document is an indicator of the cognitive complexity of the document regardless of the details of the actual navigation features chosen.
Median vertex, central vertex, cut-vertex
Web document pages are vertices in the document graph. Graph theory identifies many interesting classes of vertex, such as median, central and cut-vertex (or hinge).

If we measure the distance from a page to each of the other pages, and sum this distance, then the pages with minimum total distance are the median pages.

Each page has an eccentricity, defined as its maximum distance from other pages. We define the radius of a document as the smallest eccentricity. The central pages of a document are then those pages whose eccentricity is the radius of the document. In other words, the central pages can be used to reach all pages in the document as directly as possible.

A cut-vertex is one whose removal would disconnect the document. In other words, parts of the document can only be reached via the cut-vertex.

All these classes of pages would be likely to be prominent pages, since they are closest to the rest of the document, or on critical paths for exploring it. Just how prominent they are in practice would depend on the users' tasks. For different sorts of task, different sorts of vertex might be of interest to the document author or reader; in many cases, the identification of the interesting pages is then a routine application of mathematical analysis. One might automatically construct a 'home page' that links directly to these pages.

User interface features of browsers change the structural nature of the document as presented to the user. If a browser has a 'back' button then the properties identified above are changed. It is interesting to wonder whether features like 'back' have been introduced to, for example, reduce eccentricity, because authors find it difficult to provide symmetric links, or because the properties of documents optimised for reading and writing are different. Authors may be confused by symmetry, but readers may find it beneficial? Or perhaps both authors' and readers' apparent requirements are too easily affected by the features provided by their respective tools? Certainly all users adopt strategies to make satisfactory use of their tools, and this may be causing a divergence in the document structures they find most convenient to read or write, respectively. We shall propose below that for system designers it is productive to consider user interface features that are equivalent for authors and readers. (Quite deliberately, Gentler provides a 'back' feature for authors.)

Mathematical analysis need not be dryly formal. It can be used to visualise information in more effective ways, and hence stimulate designer's intuitive processes. In addition to mathematical analysis, Gentler and Mathematica can produce movies based on Web site access logs or any other data. This allows authors to visualise how readers navigate around documents. Visualisation allows large sites to be better understood by their authors. Movies of site access would be an aid for informed iterative design.

3.8. Comparison with other tools

A confusion in human computer interaction today is the distinction between research and commerce. Many research tools can be unfavourably compared with superficially superior commercial tools, or with tools that are superior in limited ways. Commercial marketing together with thousands of programmers results in very attractive packages, but ones that may yet have specific problems that can be useful topics of research. Gentler does a useful job, parts of which might be better done, or done more colourfully by commercial tools, but its aim was to be a research tool to explore authoring issues.

Drakos's LaTeX2HTML is a system that converts Latex documents into HTML Web documents. Drakos argues (1993) that many linear documents are linear only because of the constraints of paper and that, in fact, documents have intrinsic structure compatible with hypertext. This is a similar view to that underlying Gentler: that a document embeds an ordered tree of nested pages, and that this tree can be converted to a basic hypertext immediately. Given that LaT_EX runs in T_EX, a powerful programmable typesetting program, LaTeX2HTML has a much richer macro language and related capabilities than Gentler, but LaT_EX's power is only available in the linear document forms, as LaTeX2HTML itself cannot interpret arbitrary LaT_EX commands. Thus it is considerably more restricted in its Web design options.

LaTeX2HTML starts with LaT_EX source text files and generates linked pages of HTML, which it does in a relatively fixed way. Gentler has two advantages here: its interactive database means it knows more about the components of the document, such as their dates, reminders and so forth; secondly, the Web design (e.g., the use of the navigation bars) is entirely under the author's control.

LaTeX2HTML was developed for the practical purpose of moving legacy LaT_EX onto the Web, and (so far as I can tell) is not part of a larger research programme.

Creech's (1996) proposed CLT/WW approach is interesting due to its different philosophy. Authors are assumed to manage documents entirely, and this system attempts to monitor the quality of the resulting documents. Of course this raises problems like the author renames a file that was previously linked to, or has changed its title, or has deleted it altogether. These problems do not arise in Gentler because, first almost all of the links are generated by Gentler (so they are correct and don't need checking), secondly, those links and files that are created or edited by the author are monitored and any problems are fixed as they occur. There is no need for a Web walk to find page changes. On the other hand, as presently implemented, Gentler does little to help manage links to external files around the world (though it can summarise them conveniently for its draft document design macros).

One might instead invert the approach, and be 'user centred' rather than 'author centred.' The Atlas approach proposed by Pitkow and Jones (1996) creates a database of the entire Web. Atlas could help identify and fix a document's links with the rest of the world, though it would not help manage links within a document. A major problem with the Atlas approach is scaling it to be both timely and of useful scope for managing documents.

Mea et al. (1996) describe a system broadly similar to Gentler, for generating Web documents for use in anatomic pathology. Like Gentler their system, HistMaker, is implemented in HyperCard. Unlike Gentler it provides a standard, but comprehensive, document structure by generating additional HTML around skeletons (!) written by authors. It seems that the document design is fixed; on the other hand, HistMaker is much more domain-oriented, for example in requiring the author to supply certain components of a document such as a patient's clinical history and pathology.

Hyper-G (Maurer, 1996) is a sophisticated system that is an alternative to the World Wide Web, though it can create Web documents. Its hypertext management and separation of linkage from content can be compared with Gentler's approach. Hyper-G requires much more support, commensurate with the features it offers. My understanding of the project's philosophy is that it has a solid foundation that simplifies many document management problems that are also simplified by Gentler, but it is now having to provide many features and to support many new standards to stay credible. In contrast, Gentler is very simple, does only hypertext authoring, and can leave all other advances in Web technology to the market leaders - it is not an alternative to the Web but part of it.

4. Author/reader dual requirements

Dual design requirements arise because readers and authors do semantically similar things. Both navigate documents over periods of time, so many of their problems can have similar, or dual, user interface features to support their activities.

"What's New" information (e.g., dated lists of news updates, typically hypertext links to further information) is a feature intended for readers to help locate new material. An author may want to find pages that are obsolete; this, then, is a dual design requirement. Dual features are reminiscent of the programming language design Principle of Correspondence (Tennent, 1981), which points out to designers possible inconsistencies or deficiencies in a language design. Features with corresponding semantics could correspond more closely than they typically do.<10> When comparing author and reader requirements, these inconsistencies are only visible to the designer of tools, since rarely will either reader or author, unlike a programming language user, be in a position to see both at once and experience the deficiency.

Gentler helps the author create "What's New" information for the reader by associating four pieces of information with every page: the time it was edited by the author, the time the author would like a reader to think it was edited, a descriptive text of how or why the page is new, and a flag that says whether the page is new. The flag is required so that an author can leave the descriptive text around for future use, but 'out of sight' of a reader.

Gentler collects the active descriptions together and inserts them into pages using the ·news macro; that is, the macro provides a summary of any pages that have news items. Gentler ensures that the dates, texts, time ordering, and cross-referencing of new pages is consistent throughout a document. For example, every table of contents flags new pages with icons to draw a reader's attention to them. Little 'NEW' flags can be seen in Figures 1 and 2. Figure 3 illustrates the text of the document's news page.

In Gentler, the dual of the reader's "What's New" feature is the author's reminders. Unlike "What's New," reminders are active features for the author: Gentler reminds authors of what they had planned to do next (they were discussed more fully in §3.5 above). Of course, readers might also like more active help to be available to help them manage their reading a hypertext document over a period of time. A more proactive approach might record when they have choices so that later they can come back to them to review earlier reading decisions. Or, an author might like a Web site to know what is new for them, and to direct them to the updated pages automatically - for this is what Gentler is already doing for the document's authors.

Thus, considering the dual of any feature can be profitable and reciprocally:

Support automatic generation of "What's New" - a conventional requirement so that readers can find what is new, including what they have not yet read.
The dual is to support authors, so that they can find what is new or even not yet written.
But in the dual domain (here, the authors') there are differences. Gentler is interactive, and the dual of "What's New" could be made interactive. This, then, suggests interactive (or other extended) variations of the original concept.
Hence taking a dual requirement from the one domain to the other or vice versa, new design ideas may emerge. This process can iterate until each required feature is completed in each domain.

If what readers and authors did was exactly the same then they might as well have the same user interfaces. The interest of duality is that authors and readers do not do exactly the same things, but that they are sufficiently similar for cross-fertilisation to be profitable. Duals raise creative design possibilities. We do not require duals to be precise; it may be that some dual requirements are impractical or lead to unnecessary duplication, or perhaps for some independent reason (such as security) duals cannot be used. Inevitably, duality is inexact, and there may be several correspondences for any given requirement.

Another example. Navigation underlies all non-trivial tasks the reader undertakes. A reader navigates, following their train of thought, and makes choices which route through the hypertext to take. They leave behind them a stack of open paths that they did not take. Depending on their task, which could be anywhere in the spectrum from surfing to systematic exploration, they may need to return to review their earlier choices. A history list can help do this. Dually, any non-trivial authoring of hypertext will involve following links - in fact, generating new links. As an author follows links, they will edit text, and like the reader, they will leave behind them the paths they did not take. The management of this history stack leads to dual requirements. A history list for the author could help the author keep track of their various writings in the document. Quite likely the history list for the author might be more detailed than for the reader, since the system that builds the history list has more to know about the links (e.g., whether the author has named the links or not) than in the reader's case. Indeed Gentler provides a history list for the author, and provides a history, chronologically ordered, list of (the author's choice of) new pages for the reader. Tauscher and Greenberg (1997, this issue) consider history lists in much greater detail.

Appendix 2 (on the implementation of Gentler) mentions another important dual.

5. A sample document

The Royal Society of Arts pages are organised so that they can be read as a conventional document, which has prelims, a table of contents, and then a sequence of sections and subsections, then indexical material such as "What's New" and the main index. This entails several advantages:

The document has a familiar structure;
The document can be read sequentially and has been written so that there is a major 'trail' making sense of the RSA's site.
The document can be printed on paper and related easily to the Web version (this has been found important for proof reading);

Of course, the linear sequence is only one way of viewing the document on the Web and making sense of it. A user can jump around as if it was a full hypertext system, which indeed it is. In particular, each page has a RSA logo, email contact point and a fish eye table of contents showing major sections of the rest of the document. This allows users who land on any page (e.g., after using a search engine) to know where they are and where they can go to easily. For those readers who continue browsing the RSA pages, following some route, each page has a navigation menubar that can be used to get more detail of the current topic, move to 'previous' and 'next' topics, or move to major landmarks (such as the "What's New" page) in the document.

5.1. Alternative designs for the RSA pages

Figure 1 shows an early version of a page from the RSA site; for the purposes of illustration it is a very brief page with little content. Gentler constructed the RSA site easily, and I was proud enough that the system worked at all, let alone generated pages that looked well designed! User surveys, however, revealed infelicities. For example:

The top navigation menubar had a left-to-right model, yet the home page's icon was to its right, giving the impression that it was 'further' on than the index. It should be to the left of the menubar.
The page layout was unimaginative.
The page's table of contents was really obscure. My intention had been to show pages earlier than the current one to the left, the current one in the middle, and later pages to the right. But HTML restricted my design choices, and I was blind to the obscurity of the layout originally chosen - this was in 1994, and HTML standards were more restricted then. It gives users an idea that sections are nested. We need a clear table of contents (if at all) and one that does not need an explanation on how to use it every page!
The section's "Start Right continues"-headed table of following subsections would be clearer as a part of the page's table of contents (seen lower down on the page), thus making an integrated "fish eye" table of contents.
The table of contents had a vertical organisation (top is early in the document), whereas the menubar was horizontal (left is early in the document).
The corporate logo at the bottom used the wrong font (as chosen by the browser, not the author). It should be a scanned image.
The lower menubar duplicated everything in the table of contents - as well as the top menubar. Why have all three? (The intention was to try to reduce users scrolling.)
Overall, the page looked cluttered and had lots of hidden conventions.

In Figure 2, we see how Gentler has been used to improve the page layout. Note that all pages follow this design, though we only show one in this paper. The improvement was achieved through very minor changes to the style macros, and these changes were localised in one place, and were easy to manage. It is, of course, still possible to improve this design. In future research it is planned to take advantage of Gentler's ability to create many stylistic representations of the same document (cf. Nielsen, 1995), so that focused questions of style might be empirically addressed.

The original page design made the subsection's table of contents look awkward. A consequence of this was that I could see no easy way of generalising it, and allowing Gentler to handle documents with sections, subsections and sub-subsections to any depth. The new page design can clearly handle sub-subsections to any depth, and can do so uniformly. It is pleasing that a graphical design choice that looks better is also semantically easier to generalise.

6. The future was yesterday the day before

Gentler is a step towards a larger vision. To continue the evolutionary trend whereby Gentler becomes easier to use, more powerful, and simpler - particularly in a world of competing products - its design, and the needs of its users and authors needs to be continually re-appraised.

6.1. Distributed design

Gentler is a centralised authoring tool. Ironic, then, that it was designed to support the generation of quality documents on the most distributed multimedia medium the world has ever seen! The most important development of Gentler will be to allow it to support distributed document authoring. Of course there has been much work in CSCW (computer supported co-operative work) we can build on.

Gentler separates page design from their contents, but it does not separate annotations nor document structure. Also, though it provides a page layout viewer (by invoking a commercial Web browser), it does not provide a WYSIWYG page editor. Given the huge commercial availability of WYSIWYG editors, the further development of Gentler will not compete, but will build on the separation of content from structure that already underlies its philosophy. The current separation is 'small' - the information has to reside in the same database. When developed further, the page design and structure will be specified anywhere on the Web. In turn this suggests various mechanisms for inheritance and overriding of design elements to solve potential conflicts among the distributed document specifications.

6.2. Design specification

The current design language is nasty. Future versions of Gentler will improve it. In particular, the page design mechanism requires inheritance. For example the index page could specify its design, overriding a more general page design for the document; but if it specified no design it would inherit the default page design for that class of page in the document. To do this, each design element can be either present or absent at any design level. Designs would be applied inside out, with 'parent' designs successively supplying missing design elements from 'children' designs. This approach also permits design specifications to be distributed across the Web.

Yang & Kaiser (1996) describe an object oriented database integrated with the Web, and show that it is possible to have local and dynamic views of the Web. The proposal here is that these views might further be associated with specific document designs, and typically that the referenced objects would not have significant internal design (as conventional Web pages do).

6.3. Page model

The page model can be generalised. A Web tool might be built around a simple, general concept, like T_EX's boxes (Knuth, 1992), allowing some boxes to correspond to pages, some to parts of pages.

At present indexes are special boiler plate, but they could be generated just like full documents with their own styles. Consider if the index was generated as a series of 'pages' called aindex, bindex, cindex...zindex. They could be formatted by an index macro, which - depending on design choices - would create a single linear index (as now) or create an interesting Web index across many Web files. Even the menubar is a concatenation of design elements (text or icons representing the main linked pages to the current page). It could have its own style specification, working in a similar way.

6.4. Structure specification

Gentler keeps all pages (or references to them) in a database, so document structure is easy to specify explicitly. At present it is done by a semi-direct manipulation outliner. Pages can be moved to other positions or nestings in the document. A more flexible mechanism is required which would permit document structure to be specified for distributed document pages. To achieve this, any page can refer to any other page (anywhere with a URL). Gentler could collect all pages referring to each other, and subject to other constraints (e.g., only those containing a specific organisation's signature). Topological information from each page would then be used to construct the document's structure. A page may merely say that it contains another page, or comes after another. Standard topological sorting algorithms can be used to create a network from order relations (inconsistent or under-specified orders are easy to deal with). This approach allows document structures to be developed on many scales, and for documents to include subsidiary (or supersidiary) documents without problem. It also permits pages to exist in different ways in different documents, or to occur in several positions within a single document.

6.5. WYSIWYG

Gentler currently provides WYSIWYG previewing but it does not support WYSIWYG editing. This will be achieved when page skeletons are separated from the database, as already planned. When that is done, it won't matter how authors choose to edit pages.

Should the design language be compliant with SGML or even look remotely like HTML, which it certainly does not at present? I believe not. To allow authors the greatest flexibility in their choice of editing environments, relying on any HTML extensions being handled 'properly' would be unwise. A better approach is to use a syntax orthogonal to HTML which WYSIWYG editors can handle as if it was ordinary text. At present, this is achieved by using the bullet symbol to signify a Gentler macro.

6.6. Dynamic documents

Gentler is an authoring tool that creates static documents. We plan to make the design language permit distributed generation of documents, but still the model is of compiling 'source' documents into 'object' Web documents. Instead, the object pages could be generated on the fly when requested by HTTP. Then users, rather than authors, would be specifying the structure and design of the documents they browse.

A special case of a dynamic document is one constructed by a search engine as a result of a user's query. Since Gentler knows the structure of a document, it is possible to present the structure with query results. Users would be given structural information as part of the context of the results; knowing where a hit resides in a document's structure would help the user decide what sort of information they have found (e.g., that it occurs in pages nested within the introduction of a document, or in pages nested in glossaries). As a special case, Gentler need never return a document's index page, even though it matches the search terms, since Gentler knows any such page is a reference to what the reader really wants.

6.7. Maintenance and transformation

Writing Web documents is hard. Where possible an author might want to use one document, perhaps carefully developed, as a template for another. For example, a Web document on Paris might be transformed into a Web document for London: they might share structure (e.g., about transport), but details would be different (e.g., the metro is called the tube in London). It is trivial to duplicate a document, but to be useful the author's tool would have to keep track of which pages (as well as images and other media) had been updated. Simple ways to facilitate document transformation, such as maintaining flags with each page, are not difficult to envisage. Techniques that help transform documents would also be of advantage in maintaining individual documents. A technique that supports systematic transformation of a document from French to English (e.g., just flagging pages that have not been translated yet) could also be used for helping keep a document up to date: the translation is then not from French to English, but from day to day (or month to month, etc.). Thimbleby and Ladkin (1995) discuss how to track documents, in their case a formal specification document and a user manual.

6.8. Mathematical analysis

For general usability purposes, most theoretical properties fall into one of two categories: essential but basically trivial (e.g., strong connectivity), or fascinating but apparently irrelevant (e.g., chromatic number). There is potential for considerable research in this rich area. The best/worst case analysis approach provides many insights into usability and task complexity, but obviously more research would make it a sharper tool for usability analysis.

7. Conclusions

This paper showed that developing hypertext documents requires support, far more support than for conventional print documents. We argued for tool support, but alternatives include the development of conventions to make hypertext authoring much easier. Such conventions would have to be appropriate for users' tasks, and in turn they might be internalised in tool support.

Gentler is an example authoring tool that reduces the cognitive load of authors, particularly by separating content and design, and by supporting quality control. By easing quality control, authors are freed to use their skills more creatively and consistently, arguably resulting in better documents and greater opportunities for users to contribute effectively to iterative design.

As a research tool, Gentler is notable because it provides flexible support for mathematical analysis of hypertext structure, and may help to lead to insights into formal descriptions of hypertext properties relating to cognitive issues.

We noted the semantic correspondence of the author's and the reader's tasks. This leads to the concept of dual requirements. What tool support may be required for an author may correspond to useful features for readers, and conversely. Dual requirements may not be exact, and this leads to creative design.

In short, we argued abstractly that tools are necessary to author hypertext documents well; we showed concretely that useful tools can be used successfully; and we showed that there are principles to design both author's and reader's tools systematically.

The purpose of the Web is empowering users. As the future plans of Gentler are developed, Gentler or its successors will support distributed document editing that can be combined into integrated or interrelated Web documents. This would allow many authors to contribute to documents, and stop one author becoming a bottleneck for document development. Technologies such as Java will allow the implementation of Gentler itself to be distributed, and allow individuals to publish 'inside' larger documents on the Web. This direction of Gentler is compatible with the aims of the Royal Society of Arts and many similar organisations with members around the world, which require tools to co-ordinate their contributions to collaborative documents.

References

Brown, P. J. (1996). "Building novel software: the researcher and the marketplace," in Computing Tomorrow, edited by Wand, I. C. & Milner, R., 21-32, Cambridge University Press.

Creech, M. L. (1996). "Author-oriented link management," Proceedings of the Fifth International World Wide Web Conference, Paris, Computer Networks and ISDN Systems, 28(7-11), 1015-1025.

Drakos, N. (1993). "Text to hypertext conversion with LaTeX2HTML," Baskerville, 3(2), 12-15.

Gaines B. R. & Shaw, M. L. G. (1995). "Concept maps as hypermedia components," International Journal of Human-Computer Studies, 43(3), 323-361.

Green, T. R. G. (1989). "Cognitive dimensions of notations," in Proceedings of the Fifth Conference of the British Computer Society Human Computer Interaction Specialist Group, People and Computers, V, edited by Sutcliffe, A. & Macaulay, L., 443-460, Cambridge University Press.

Grossman, J., managing editor (1993).Chicago Manual of Style, 14th. edition.

Kellogg W. A. & Richards, J. T. (1995). "The human factors of information on the Internet," in Advances in Human-Computer Interaction, 5, Nielsen, J., editor, 1-36, Ablex.

Knuth, D. E. (1968). The Art of Computer Programming, 1, Fundamental Algorithms, Addison-Wesley.

Knuth, D. E. (1992). The TEXbook, Addison-Wesley.

Mea, V. D., Beltrami, C. A. Roberto, V. & Brunato, D. (1996). "HTML generation and semantic markup for telepathology," Proceedings of the Fifth International World Wide Web Conference, Paris, Computer Networks and ISDN Systems, 28(7-11), 1085-1094.

Maurer, H. (1996). HYPER-G now HYPERWAVE™, Addison-Wesley.

Nielsen, J. (1993). Usability Engineering, Academic Press.

Nielsen, J. (May 1995). "A home-page overhaul using other Web sites," IEEE Software, 75-78.

Pitkow J. E. & Jones, R. K. (1996). "Supporting the Web: A distributed hyperlink database system," Proceedings of the Fifth International World Wide Web Conference, Paris, Computer Networks and ISDN Systems, 28(7-11), 981-991.

Royal Society of Arts World Wide Web Pages, [http://www.cs.mdx.ac.uk/rsa/]

Simon, H. A. (1969). The Sciences of The Artificial, MIT Press.

Tauscher, L. & Greenberg, S. (1997). "How People Revisit Web Pages: Empirical Findings and Implications for the Design of History Systems," International Journal of Human-Computer Studies, [this issue].

Tennent, R. D. (1981). Principles of Programming Languages, Prentice-Hall International.

Theng, Y. L., Jones, M. & Thimbleby, H. W. (1996). "Lost in hyperspace: Psychological problem or bad design?" Proceedings First Asia Pacific Conference on Computer Human Interaction, APCHI'96, 387-396.

Thimbleby, H. W. (1990). User Interface Design, Addison-Wesley.

Thimbleby, H. W. (1992). "Heuristics for cognitive tools," in NATO ASI Series F, Proceedings NATO Advanced Research Workshop on Mindtools and Cognitive Modelling, Cognitive Tools for Learning, Kommers, P. A. M., Jonassen, D. H. & Mayes, J. T., editors, 161-168, Springer Verlag.

Thimbleby, H. W. (1994a). "Formulating usability," ACM SIGCHI Bulletin, 26(2), 59-64.

Thimbleby, H. W. (1994b). "Designing user interfaces for problem solving, with application to hypertext and creative writing," AI & Society, 8, 29-44.

Thimbleby, H. W. (1995). ""Users as computers": An approach to VR design and conceptual evaluation," Proceedings Interface to Real and Virtual Worlds, IV, 305-313.

Thimbleby, H. W. (1996). "Internet, discourse and interaction potential," Proceedings of the First Asia Pacific Conference on Computer Human Interaction, APCHI'96, 3-18.

Thimbleby, H. W. & Ladkin, P. B. (1995). "A proper explanation when you need one," in Kirby, M. A. R., Dix, A. J. & Finlay, J. E., editors, BCS Conference HCI'95, People and Computers, X, 107-118, Cambridge University Press.

Wolfram, S. (1991). Mathematica (2nd. ed.), Addison-Wesley.

Yang, J. J. & Kaiser, G. E. (1996). "An architecture for integrating OODBs with WWW," Proceedings of the Fifth International World Wide Web Conference, Paris, Computer Networks and ISDN Systems, 28(7-11), 1243-1254.

Additional links

This section does not appear in the Journal version.

As anticipated in §7, Gentler has now been superceded by a Java program. The main difference is that Gentler's page database is replaced with HTML pages, which can therefore be anywhere. The new system replaces Gentler's database fields with variable names (such as 'icon') that can be chosen by the author for any purpose. An example site gives references to further details of the new approach, and provides an interesting perspective on the life and times of Benjamin Franklin, the American scientist and Founding Father.

Acknowledgements

This project is (at the time of writing, about to be) funded by EPSRC under grant number GR/K79376. Simon Buckingham Shum, Matthew Jones, Gil Marsden, Tamara Sumner, Yin Leng Theng and Ian Witten gave invaluable advice on this paper and its presentation. The author is grateful for extensive and helpful comments from the editors of this special issue and from the anonymous referees.

Appendix 1: Best/worse case assumptions

As stated in the body of the paper: we make three assumptions for best/worse case usability analysis. First, that the user employs the best possible general method (deterministic algorithm) for performing the task without error; secondly, that their task is the worst possible; thirdly, that task complexity for a given user interface can be expressed as a function of some natural measure of task size. In other words, nobody, however skilled, could do better under the circumstances. This approach we called 'best/worst case analysis.'

Why best in best/worst case? If the user is using anything but the best general method for performing their task, then the reasons for not performing optimally may have no specific cause, and there may be no design feature that could improve performance. It is possible, too, that some tasks might take some users forever because they do not understand how to make progress. If we wanted some sort of 'standard user' rather than a 'best' user, we would need to consider cognitive modelling and the interactions of the user's developing cognitive models with the performance of the tasks: analysis would get very difficult if not intractable. We can always make allowances after the analysis from a complexity measure that has been optimistic, for instance by allowing for a particular user's behaviour and performance. In other words, the best case provides useful information with least assumptions about user performance.

Why worst in best/worst case? Tasks are not of uniform difficulty; some are easier than others, perhaps due to a fortuitous combination of circumstances. The best case is often trivial and gives little information about the general task. Therefore we do not wish to find best case times for tasks, because they can be misleadingly optimistic. Unfortunately, considering an 'average task' begs questions about distributions of tasks, and even where the distributions are known, the average case is often intractable to analyse. We therefore consider the worst case of task, accepting that sometimes it may be possible to find better.

We have to decide what dimensions to measure complexity. In a predominantly physical task, energy might be appropriate; in a predominantly cognitive task, time might be appropriate. We may decide to treat different dimensions of a task as comparable, such as converting financial cost to units of time. A standard approach is to ask for the total time of completing a task, assuming each basic step of a task takes unit time. This approach allows us to compare difficulties of tasks without comparing speeds of individuals. It measures the difficulty of doing a task in arbitrary time units that can be compared between tasks whose complexities are calculated in the same way. As a first approximation, we do not need to consider error rates or fatigue separately, since we can assume these increase with basic time. Thus, the timings we obtain do not translate to 'seconds' or 'hours' but they can be compared reliably between themselves. A small time complexity means a task is easier than a larger time; if a specific user performing the task has a higher error rate or lower skill, this might mean the faster task is much easier, but it is still easier, which is what we wish to know.

For a detailed complexity analysis, time alone is inadequate. The user's memory requirements may be significant to utilise some algorithms, and for some tasks if the time taken is long enough then the user's memory becomes a crucial factor in whether the algorithm can be reliably employed. In this paper, we choose to ignore memory space complexity. We shall also assume that the user's perceptual discrimination is perfect, though in reality the user may find two different document pages sufficiently similar that they are confused, even if the user interface attempts to disambiguate them by, for instance, showing their unique URLs.

Next we have the problem that the time taken for the basic steps of the various tasks we are comparing might differ in ways that are hard to quantify. For example, writing a sentence in a conventional word processor might be faster than writing a sentence in an editor designed for the World Wide Web, because the latter might present the user with far more choices at every step. How do we compare different tasks using complexities measured the same way? One solution is not to calculate a time to perform a specific task, but to determine how the time varies with the size of the task. We choose some natural measure of the size of the task, such as the number of pages to be written. If we double the size of a task (say, from n to 2ⁿ) and find that two ways of doing the tasks each double in time, then we know they are equally difficult relative to the difficulties of the basic steps of which each is composed. On the other hand, if one task doubled in time but the other increased by a factor of n we could be certain that that task would always be harder for some sufficiently large n, however much faster its steps could be performed than the other's.There is a mathematical notation that allows us to be precise, despite these apparent uncertainties. Instead of writing, say, 4ⁿ for the complexity and spelling out the qualifications, we simply write O(4ⁿ), where the "big O" notation means we are interested (in a way we need not define precisely here) in the trends.

This is a brief introduction to best/worse usability complexity analysis. See Knuth (1968) for a conventional discussion of algorithm complexity analysis, or Thimbleby (1994a & 1995) for application to user interfaces.

Appendix 2: Implementation of Gentler

Gentler is implemented in HyperCard (version 2.3) and is a fully working system as described in this paper, though not all its features are defined here. Unlike most HyperCard applications, the database is external (saved as a text file) to allow the same program to be used to develop many Web documents from different databases. In fact, Gentler works more like ordinary word processors with New, Open and Save commands for creating, opening and saving multi-page Web databases. Gentler is unusual in HyperCard terms in having a large script (over 110k), so large it required programming tricks to overcome HyperCard's limitations.

Gentler started as a database of HTML pages, which were copied from existing RSA printed paper brochures. Special purpose programming then expanded the pages into complete pages, with navigation links to other pages. As the RSA Web site developed, it became obvious that Gentler was capable of managing other sorts of document, so the special purpose programming was parameterised. Initially, just selected strings were turned into database fields; later a general purpose macro language was developed.

As Gentler was developed new features have been integrated into it, rather than just added. The current version is more powerful than its predecessors, yet is simpler and easier to use. Often, as Gentler's iterative design progressed, two or more concepts were combined into a single, general feature. Conditional macros proliferated, with names like ·parent? and ·nonparent?. But by adding ! as a negate operator, the number of conditional names was halved, made completely consistent (somehow I originally had ·nothomepage?, using not rather than non; now it is consistently just ·!homepage?), and some gaps were automatically filled where previously I had not provided a matching pair of commands. Moreover the program to implement macros became shorter.

Another example was the move from the document being 'driven' by pages to being 'driven' by macros. Originally the page-centric view drove the page design. Now, the author selects a macro (e.g., the 'make a linear document design' macro) and this macro combines the pages into an appropriate design.

Gentler's user manual is a document written using Gentler. Writing the manual for Gentler suggested further improvements to Gentler's design because it provided another context of use for what it could do - another aspect of dual requirements. The semantics of programmer/program correspond to manual writer/manual, and this leads to dual requirements. When the manual writer is the program author (as I was!) the duality can be exploited to improve both manual and program, and one obtains a synergistic boost by so doing. There is no need for the user manual to be a reactive document, becoming incomprehensible, rather, one should fix design infelicities instead of explaining them!

Finally, Gentler is a complex program, and authors can "get lost" using it, not just in what they are writing! Gentler therefore provides various forms of help. It can explain all its objects, all its design language codes, and so on. There is scope for the user interface of Gentler to be treated as another hypertext, and so on indefinitely.

Appendix 3: Information in the Gentler database

Each Gentler document specifies:

a name for the database
the number of the version of Gentler that last saved the database
a sequence of pages
a default file name for a linear document (if any)
a set of named macros (if a macro contains a call to ·pagebody it can be used to create hypertext documents)

Each page within a document specifies:

several texts: page contents, page title, reminder text (if any), an associated file name or URL (normally supplied by Gentler, but can be overridden); news text
dates: 'official' news date, actual last change date
flags: whether to ignore the news text; whether to mention this page in all menubars, whether 'root' or default page for the document, whether an external file
an optional icon or thumbnail, alternate text and size
a nesting depth

Interactively, when a new set of pages is to be compiled from the database, the following are requested from the author: a target directory to compile to; and, if required, a bookmark file and a diagnostics file.

Note that there are no explicit links in the database except those written in HTML by the author in any text fields. Before a document is generated, many consistency checks are performed (checking HTML syntax, macro use, and that the menubar for no page has mixed text and thumbnails, etc.).

Notes

<1>To appear in: "Web Usability," Special Issue of International Journal of Human-Computer Studies, S. Buckingham Shum and C. McKnight, eds. (1997).

<2>Not counting multiple links or self links, there are four ways the author can consider linking two units a and b: (1) link from a to b; (2) link from b to a; (3) link both from a to b and from b to a; (4) no links between a and b.

<3>Assume the original n nodes are strongly connected. The new node may be unconnected, linked from the n nodes in 2ⁿ-1 ways, or linked to the n nodes in 2ⁿ-1 ways. If it is linked both to and from any two nodes apart from itself, then the graph is strongly connected. Therefore there are 2ⁿ⁺¹-1 choices that can be eliminated automatically on the grounds of preserving strong connectivity. Note that 4ⁿ2ⁿ⁺¹+1 choices still gives a task complexity of O(4ⁿ).

<4>Best/worse case analysis assumes the 'best' algorithm. Clearly for authoring, the best author's algorithm is tied to the best reader's algorithm, and as we see in print technology the evolution of effective tied algorithms is a cultural issue that takes centuries.

<5>Best/worse case analysis of readers' tasks formally supports this claim, but that is not a concern for this paper to develop.

<6>This is the length of the shortest Chinese postman tour of the RSA network. (See later in the paper for a definition of the Chinese postman tour.) Of course, a user is highly unlikely to know a Chinese postman tour (there is no elementary algorithm for it), and would use a much less effective algorithm.

<7>To generate a standard multi-page Web document, each page skeleton is successively processed by one of the style specifications, generating a set of complete pages on a Web server. To generate a single linear file (typically for printing on paper), each page skeleton is successively processed by one of the style specifications, but this time appending the text to a single file instead of to separate files.

<8>At one presentation, the display equipment could not project from my portable Macintosh, so I copied the presentation files and images onto a PC that worked with the projector. Unfortunately DOS garbled all the file names, so nothing worked. Fortunately a few moments in Gentler allowed me to regenerate the entire talk and all its links with new DOS-compliant file names.

<9>Many do not identify disconnected pages they have not been told about. A strong connectivity test should not require to be told the very pages that it should be able to identify for itself.

<10>Thimbleby (1990) gives examples in user interface design.

Gentler: A tool for systematic web authoring<1>

Harold Thimbleby

Middlesex UniversityLONDON, N11 2NQ

Abstract

1. Introduction

2. Cognitive challenges facing web authors

2.1. Links and associations

2.2. Best/worst case analysis

2.3. Quality control

3. Gentler

3.1. Gentler's user interface

3.2. Gentler's abstract document model

3.3. Links and associations

3.4. Page design language

3.5. Reminders and annotations

3.6. Overviews

3.7. Supporting usability analysis

3.8. Comparison with other tools

4. Author/reader dual requirements

5. A sample document

5.1. Alternative designs for the RSA pages

6. The future was yesterday the day before

6.1. Distributed design

6.2. Design specification

6.3. Page model

6.4. Structure specification

6.5. WYSIWYG

6.6. Dynamic documents

6.7. Maintenance and transformation

6.8. Mathematical analysis

7. Conclusions

References

Additional links

Acknowledgements

Appendix 1: Best/worse case assumptions

Appendix 2: Implementation of Gentler

Appendix 3: Information in the Gentler database

Notes

Middlesex University
LONDON, N11 2NQ