site site-navbranch.xsl site

Skip to main content.

Looking for PowerPoint and/or Google Analytics solutions? Please follow the above links to ShufflePoint.com.

section no_callout.xsl no_callout

page static_html.xsl thincmsPaperSummary

Summary

The objective of a Web Content Management System (CMS) is to simplify and streamline the process of creating high-quality, accessible web sites. ThinCMS is a research project which demonstrates the application of current Internet technologies and standards to the task of web site creation and maintenance. ThinCMS is an outgrowth of the authors “Web Services” course at the Heinz School of Carnegie Mellon in Pittsburgh, PA. and of consulting work over the years on web sites and web site creation tools.

The impetus for ThinCMS came from a frustration with the standard industry practices in mapping web design onto implementation, which typically follow an ad-hoc “bottom-up” creation process using server-side includes (SSI). SSI was an important stepping-stone or most developers - the author has built many web sites that way. But a recent project became an opportunity to address a nagging feeling that a better approach was possible. The decision was made to abandon the SSI approach and to build a top-down framework of web page composition. This framework is based upon an XML vocabulary called the Web Page Composition Markup Language (WPCML). ThinCMS uses document “generators” to transform a site's WPCML content resource files into web pages. The simple markup structure of WPCMS allowed for the development of a browser-based resource editor, which communicates via SOAP and WebDAV to the workflow engine and resource repositories. The results to date of this project are presented.

A web page composition markup language

Frustrations with bottom-up page composition

Page composition is the process of building web pages from logical representations of page structure, and using tools to generate HTML from that structure. For most web site developers, this typically involves looking for repeated design elements, and extracting these out into server-side includes (SSIs). While SSI was a major improvement over the duplication of repeated elements, it is typically pursued in a “bottom-up” fashion. The page creator will typically build a new page by copying an existing page and then changing the “content”. This content page (bottom) includes constructs from higher levels (up) in a logical content/layout inheritance tree.

The problem with this type of bottom-up composition is that one inevitably finds, after creating dozens of pages, that you want to make some small but significant changes to something which has been replicated into each page instance. Since this replication was done at edit/design-time instead of at build/run-time, the site developer will need to edit (by hand or with a script) all of these files to make the change. Granted, there are other approaches for reuse involving parameterized server-side includes, but these usually put one down the road of using a server-side scripting engine and/or web application server to programmatically deliver pages. It is the author's opinion that content creation should, when possible, be decoupled from content delivery.

Envisioning top-down composition

There are many high-quality commercial and open-source Web Content Management System (Web CMS) applications available today which aim to streamline web site creation and maintenance. But the author has chosen to pursue his own vision of a light-weight Web CMS, and to focus specifically on creating a framework for top-down page composition using XSLT and other modern web standards and technologies.

Since the decision was made to use XSLT, the process started with a review of existing samples of web site creation using XSLT. Such examples are still scarce. An important finding came from a re-reading of Khun Yee Fung's XSLT book, which gave renewed inspiration and insight. Chapter 12 provides a case study in HTML generation which demonstrated top-down composition. As an exercise, the author recreated this sample web site using an XSLT processor. This exercise demonstrated that the XSLT top-down page composition design pattern was at least possible.

The seed of a generic, multi-level, nested template approach to web development could be seen in the page generator which came out of this exercise. But since all XSLT and XML pages were hand-coded in this exercise, the effort appeared to roll back any advantages gained from this top-down, standards-based approach. What was needed was a more generic representation of web page content and web page composition, and a framework for managing these resources.

Why WPCML?

Evolving this hand-coded sample site into a generic site generation framework necessitated the emergence of conventions and grammars for the resources which feed the page generator. These conventions and grammars are what are referred to by WPCML. The need for a framework was largely driven by the goal of building a browser-based editor, which meant ad-hoc, one-off document structures would not suffice. WPCML also represented a recognition that a general-purpose, schema-driven XML editor, while ideal, was beyond the scope and resources of this project.

WPCML uses a reduced and normalized XML structure to represent document composition using a small number of tags. Why not use XHTML? XHTML is too layout-focused - what the author needed was a semantic content focus and pre-XHTML representation. By changing the problem from one of schema-driven XML editing to WPCML editing, the complexly of the user interface creation task was reduced to one which was manageable.

ThinCMS Framework

As with most applications, ThinCMS combines a platform with conventions and application logic to create a framework. The current ThinCMS platform is .NET 1.1 and Windows 2003 Server. The conventions involve the layout of a WebDAV repository and the WPCML schema for XML content. The application logic implements the transformation of resources into pages. In ThinCMS, this application logic is implemented as a SOAP web service. The major components which comprise the ThinCMS framework are described below.

Page Template Model

ThinCMS models pages as having three levels of nested content/layout page parts or "Level" resources. ThinCMS also supports included resources through “Element” resources, which are content/layout components which can be shared and reused within a site. Element resources can be included in any of the three types of level resources and also into other element resources. Level and Element resource types may be referred to elsewhere as Nested and Referenced resources respectively.

Level Resources

Level resources are used to define a three-level page structure inheritance tree. The named levels are “Site”, “Section”, and “Page”. It was found in studying many site designs that three levels of nested composition was sufficient to model most sites. And of those which did not, many would probably have benefited from this design constraint. The Site resource provides the outer composition which is shared by all pages in the site. The site level will typically contain branding elements (logos, graphics, etc.), site-wide navigation controls (top menu, search form, etc.) and footer text (copyright info, contact info, etc.).

Section resources contain design and content elements which are not global to the site but which are shared by a collection of pages. The section level eliminates the need to have such content duplicated within these pages. This level might be used, for example, in an Intranet to allow each department to have its own look. In a public web site, they may be used to give the corporate section a different look from the product section of the web site. And for almost all sites, it is used to give the home page a different layout from other pages.

Finally, the Page level resources contain content specific to an individual web page. Page resources will focus on content, but may also contain Element Parts. Every HTML document in the site instantiates a triplet of templates from the Site, Section and Page levels. Another way to visualize the page three-level page design scheme is to draw it as a nested template hierarchy. Figure 1 graphically depicts content and template nesting. Referring to this diagram, each content resource on the left gets rendered by the template resource pointed to on the right.

Figure 1. Content and template nesting

Element Resources

An Element Resource is a web part which gets embedded into a Site, Section, or Page part. WPCML supports three types of referenced resources: Static elements, Navigation elements, and Dynamic elements. A Static element is merely a resource following the WPCML schema which can be included by reference into level resources (or into another Element resource). Static elements provide the same type of reuse semantics as do server-side includes but the with WPCML benefit of the content being separated from the XHTML presentation.

A good example of included content is a vertical news callout placed outside the flow of copy on the right side of selected pages. An included element allows this same content to be inserted into other pages, including pages which have different layouts. Contrast this with level resources. If the news callout was placed in the Section level, then this element would be present only in pages which inherited that layout. And it would be present in all such pages. Element resources provides for a more fine-granular control of content reuse.

Navigation elements are used to represent any navigation controls in pages. Navigation elements receive content from a site-wide sitemap definition. Many generic XHTML navigation elements have been built in WPCML including breadcrumbs, horizontal and simple menu bars, and dynamic hierarchical menus.

Dynamic elements are XML data sources resolved at run-time. They may use static URLs or parameterized (HTTP GET) URLs. Resource names and properties may be used as run-time parameter to this URL. Existing dynamic syndicated content feeds which implement XML web services can be brought into ThinCMS as dynamic elements. SOAP web services using GET semantics can be invoked by Dynamic elements, and the results obtained become content to be transformed and incorporated into the document response. Dynamic elements require that page generation be targeted to a web server which supports a run-time script interpreter with XML and XSLT libraries. Recent examples of dynamic elements created by the author include:

  • A SOAP proxy to Exchange which gets events from a public events folder.
  • An XSL extension function which converts market quotes from Yahoo’s CSV interface to XML

Resource XML grammar

Each level and element resource follows a simple XML grammar involving nested entity tags. Entities can have properties and content. This simple grammar took inspiration from Joe Slovinski's “Advanced UI Design” article series. This simple schema is important as WPCML is the bridge between the UI which manages part resources and the generators which transform part resource into previews and into published pages.

WebDAV Site Resource Repository

ThinCMS maintains all site resource in one or more WebDAV repositories. Resources are organized into company folders and then into site folders. Each customer also has a templates folder and an assets folder. The templates folder contains folders for each level and element resource template. In the level folders are XSL files which transform the WPCML for a part into XHTML. The assets folder contains all file resources referenced by the generated HTML documents (images, css, Flash, etc.).

The plan for ThinCMS is to migrate to DeltaV based repositories. DeltaV is an emerging IETF specification for an HTTP-based versioned resource store. DeltaV adds versioning extensions to WebDAV. Using such a server removed the burden of version management from the application developer. Subversion is the dominant DeltaV framework.

XHTML Generation

Three different types of XHTML generation are implemented in ThinCMS. All use an XSLT processor to transform WPCML resource files into XHTML files.

Page Generator

The page generator is at the heart of ThinCMS. It is responsible for such tasks as a) generating a composite web page document, b) setting metadata values, and c) delivering generated document to one or more web servers. These three processing steps occur in sequence during each page generation operation.

Composite page generation is accomplished modifying an XHTML meta-page-template using the XML DOM. An XSL processing is then created from this template. To generate a page, the three layer parts are passed in as xsl parameters and the processor is run against a dummy/empty document. Pages can be generated in preview mode or in publish mode. In preview mode, the links to other managed pages are created as links back to the page generator. In publish mode, the generated page has link URLs resolved to actual files (unless page is marked for runtime generation). If page was rendered in publish mode, then a destination stream is passed into the render method. The stream can be opened from File, FTP, or WebDAV URIs.

Element Generator

The Element Generator provides an element preview. This preview allows the user to “browse” the elements which have been created for a site, before perhaps adding the element into another resource. The element generator uses a Site Level resource stub to build the outer HTML container for the element, since the element itself is insufficient to create a complete HTML document.

Template Generator

The Template Generator provides a template preview. Each level and element content resource refers to an XSLT template which renders the content. These templates can be previewed during site development to allow the user see how content resources rendered with will appear. Each XSL template has a corresponding XML content template which satisfies both the template preview and the creation of an instance of that part template. The element generator and template generator are only used within the user interface for performing previews.

ThinCMS Workflow

The workflow model implemented by ThinCMS is very simple. There are several "times" involved: Design-time, Edit-time, Publish-time, and Run-time. Design-time is when the XSLT stylesheets which implement a sites design are chosen or created or modified. This is a task best done by someone skilled in XHTML and XSLT, and a willingness to learn the WPCML schemas and conventions. A growing palette of templates now exist from we sites built with ThinCMS. These templates are an excellent starting point or learning tool for a designer building a new site.

Edit-time refers to the point in time when one is creating or editing web pages or the sitemap. This task can be accomplished either a) by editing XML files by hand, or b) by using the ThinCMS browser-based user interface.

Publish-time refers to the time when one instructs ThinCMS to generate pages and publish these pages to a web server. Pages are published to the target server(s) specified in the customer’s configuration file. The target file name is one of the properties specified in a page level resource file.

Run-time refers to the time when a visitor is viewing the generated web site. For pages using Dynamic elements, additional XML content acquisition and transformation can take place at this time. Pages with no specified publish target are treated as dynamic pages.

The idea of clearly separating a site's layouts and styles from its content is one which still does not have universal support within the web development community. For ThinCMS, it was clearly the right decision, both from an architectural and a business perspective. The author’s view is that successful web site development requires the intersection of three different skills and capabilities. One is design, one is content, and the third is technology. ThinCMS gives clear responsibility to each of these three roles. The designer and/or information architect has responsibility for design-time template creation. The content owner/author places verbiage and images into these templates, and creates the navigation flow. The technologist provides software and services to the designer and author.

ThinCMS application tiers

Resource catalog

ThinCMS resources are stored in a WebDAV repository which has been described earlier. Because WebDAV follows open standards, any conformant WebDAV server can act as the resource repository tier for ThinCMS.

Web Service

The application server tier of ThinCMS is a SOAP web service. Application services exposed as XML web services are accessible by any modern hardware/software platform. The application tier implements the Edit-time and Build-time workflow services. The ThinCMS web service tier is implemented in C# on the Microsoft .NET framework. There are no strong dependencies on the .NET platform and so a Java implementation/port could be easily accomplished.

Edit-time services are provided for creating, retrieving and updating the various WPCML resource types. There are about two dozen SOAP methods in this category. These methods are permutations of resource types and resource actions. Resource types include Templates, Parts, Sitemaps, Customers, etc. Resource actions include Add, Update, Delete, Query, etc. For example, the function PageUpdate() is used to update a Page resource instance in the resource catalog.

The application logic determines what published pages are affected by operations performed against the repository using the web service. Effected pages can be published immediately or marked for future scheduled publication. Build-time services are provided for publishing pages to one or more web servers. Page generation is accomplished by invoking the generators previously described.

User Interface

The ThinCMS presentation tier is implemented as a browser "application". The current implementation is for Internet Explorer. The browser tier makes direct SOAP calls to the web service tier and makes extensive use of XML, XSLT, and DHTML. The user agent tier is written in JavaScript. Since the application tier is a published web service, different user interfaces could be created. The SOAP tier could also act a bridge to other parts of a larger content management system.

The advantage of a SOAP API approach over a page request/response approach for the presentation tier is two-fold. First, the users experience is more like a rich desktop application because UI components are visually stable (they don’t get redrawn on every action due to page reloads). And because all of the presentation tier logic runs on the user's workstation, server load is reduced and interactivity is enhanced.

Page Generator

The generators can be considered a fourth tier of the application since they operate independently of the application tier and have their own interface semantics, which are URL-based.

Summary

ThinCMS and the WPCML are evolving into a strong platform for our and our clients needs to rapidly create and then maintain web sites of medium complexity. However, ThinCMS currently does not have the advanced workflow and scheduled publication, to allow it to support larger web sites administered by dozens of people.

By using newer technologies such as XML, XHTML, XSLT, WebDAV, and SOAP, ThinCMS is able to achieve significant sophistication and capability without being a heavy-weight and complex application. Because all resources are stored as file-based or WebDAV documents, these resources are inherently sharable with other applications. The ThinCMS tiered architecture allows other types of integrations and customizations to be done. For example, a completely different user interface (perhaps non browser-based) could be created which calls the existing SOAP web service tier.