Simplifying dynamic websites with mod_perl and The GIMP


The permanent home of this document is:
http://www.aceldama.com/~tomr/papers/2001/web-gimp/

"The makers may make and the users may use,
but the fixers must fix with but minimal clues"
-- I wish I knew who said that.

Table of Contents


Abstract

The GIMP (Gnu Image Manipulation Program) is one of free software's killer apps. There is a strong developer and user community around it, but the main website at http://www.gimp.org/ has been suffering from neglect for years for various reasons. I will outline these problems and present my solutions and introduce a new templating system (yes, another one) which simplifies the process of adding and editing content and separates programming from content enough to keep people off of each others' toes.

I will also show how to run the GIMP and the Gimp-Perl server in the background on a headless server and then use Gimp-Perl to dynamically generate graphics for the website.


Introduction

This paper has turned out to be more about my work with mod_perl and GIMP-Perl than about the GIMP website, as very little progress has been made on the latter.

I have been building template-driven websites for over 5 years now, using all sorts of tools from Apache's SSI (Server-Side Includes) to batch-mode page generation to CGI to integrated dynamic template systems. I've worked with sites built with PHP, Tcl (AOLServer), and of course lots of Perl.

I've worked with other programmers and teams of designers and content writers. Each of these groups has different needs and skill levels and it's hard to find a balance between all these on the technical end of things. I've never been quite satisfied with any of the systems I've worked with. Here are a couple of my key gripes:

Code mixed with content

Mixing code with content is awkward for programmers and content editors. I have found that editing the content of HTML pages will inevitably break code in it through cut-and-paste or other errors. This also makes it impractical to edit content with WYSIWYG editors -- which is, in my opinion, a good thing! Certainly there are ways to manage this; all content can go through a verifier before being accepted into the site.

When it comes down to it, though, I have never had really good experiences with code pretending to be markup. It seems to limit what the code can do. The code should be outside of the content altogether -- that is, the writer should not even be aware of the code and it should never, ever, be possible to write a program in the content of a page. I prefer to keep my code in its native environment - Perl modules - and have it manipulate the content from outside.

Content mixed with style

Another thing that should be kept out of the content is style. HTML was originally a markup language. I remember when the fanciest layout was a horzontal rule. Things are different now, but it's still important to keep actual appearance out of the page content. At some point it becomes impossible to go back and modify all your old pages to fit the new look of your site. CSS is useful but not backward-compatible. Maybe XSL will solve this problem more nicely.

The way I have constructed things, the templates used for static pages are the same as those used by code-generated pages. I've found that this keeps things simple and makes it easy to make site-wide changes that affect all sorts of pages.

So, over the past year I've tried to come up with something that I will dislike less than any other system I've used. My stuff is built on top of mod_perl, and can do nifty things like using The GIMP to render pretty graphical titles.


mod_perl

Apache's mod_perl (http://perl.apache.org/) embeds a Perl interpreter into the Apache HTTP daemon and exposes the full Apache API to Perl programmers. This system is built of a few utility classes which further reduce the effort required to make a dynamic website with mod_perl.

Time-stamped hashes

The template system (see below) needs to read HTML blocks quickly and simply, but template editors need to be able to see their updates immediately. As a compromise, I developed a time-stamped hash; these hashes store the last-modified date of the value for (e.g.) key 'foo' in key '_TS_foo'.

The Tie::TSHashDir module retrieves files from a directory, and returns the last-modified date of the file when the corresponding '_TS_' entry is retrieved.

The Tie::TSHashMirror module mirrors another time-stamped hash in a faster hash. For example, this can be used to cache the contents of a directory (using TSHashDir) in a Berkeley db file; only a stat() call is required to verify that the file being read hasn't changed.

Whether this actually speeds anything up is of course debatable.

BlockBin

BlockBin is the core template system. It completely encapsulates the process of creating a document from template components. The BlockBin constructor requires only two parameters: a reference to a hash that supplies all of the template blocks, and a reference to a print subroutine which prints out its arguments.

In the context of this web system, BlockBin can be fed a Tie::TSHashMirror hash to read files from a directory cached in a db file as described above. The print subroutine can output through the Apache request object, and, if the file is a candidate for caching, to a file on disk so that the next time the file is requested, the completed template file can be send out directly with sendfile().

BlockBin also manages the printing of page headers and footers; to force a consistent look on a site -- that is, the _header block from the hash is always printed first (using the header method) and the _footer is always printed last (using the footer method).

The BlockBin object is typically called $page, and all output is sent through it just as it would be sent through the Apache request. The BlockBin->print() method passes its parameters through to the user-supplied print subroutine. However, $page->block('name') is the usual way to print something to the page; neither content nor style ever appear directly in the code.

BlockBin template blocks can also include magical incantations which do various server-side-include-style things. In particular, variables can be interpolated into content. These variables are stored in another hash in the BlockBin object; there is a global hash which can be added to with the global method. When an individual block is printed, additional settings can be specified. This was designed specifically to interact well with the DBI fetchrow_hashref function.

Because I'm not very good at parsers, all of the block interpolation is done with regular expressions.

The magical incantations are, at the moment:

Because there is no proper parser, nested {| |} pairs are differentiated by inserting a character just inside the { }. For example: {|{x|var|x}|?|true|:|false|} ... crude, but mostly effective.

CompoSite

CompoSite connects TSHash and BlockBin to Apache. Simply feed it a list of handlers, a set of default global variables for every page, the template directory, and a hash in which to cache the templates, and it does the rest.

The handlers list is a hash that maps short strings to subroutines. This would probably be better using the Perl object syntax, but it doesn't right now. Each handler subroutine gets access to the BlockBin $page object, the input filehandle, and the arguments received from the user's browser. Different handlers are called by simple lines reading --USE-- handler in the files on disk.

The Site Handler

The Site Handler (our sample is called GimpSite) does all the dirty site-specific stuff; it defines the handlers that are passed to CompoSite. This reduces the amount of site-specific code immensely, while keeping it out of the content. Of course, one could make an 'evaluate-this-perl-please' handler quite easily -- but that would be totally defeating the purpose of this system.

There are a few interesting handlers, including one that displays a directory of links stored in an SQL database -- but you can discover that in the code.


Integrating The GIMP

The far more interesting pieces of code here are used to run The GIMP in the background to generate titles for the site. Basically, we use Xvfb to provide an X server, then run GIMP in it and run the GIMP-Perl server to allow us to communicate with it from the web server.

Running The GIMP in Xvfb

Xvfb is an X server with only a virtual frame buffer -- no physical display. It starts fine for me like this:

nohup Xvfb :0 -screen 0 640x480x24 -fp /usr/X11R6/lib/X11/fonts/misc/ -nolisten tcp -ac < /dev/null > Xvfb.out 2> Xvfb.err &

Then I start The GIMP in that display with this command:

DISPLAY=:0.0 nohup /usr/local/bin/gimp --verbose --no-splash --no-splash-image --enable-stack-trace never --console-messages -i -b '(extension-perl-server 1 0 0) (gimp-quit 0)' < /dev/null > Gimp.out 2> Gimp.err &

Note that Apache, Xvfb, and The GIMP are all running as the same user.

Rendering Page Titles

Starting The GIMP as above will cause the GIMP-Perl server to create a socket through which you can communicate with it by opening a connection with Gimp::init(). Gimp::init() accepts various different connection methods; to connect to the local UNIX socket on wilber.gimp.org, I used:
unix/tmp/gimp-perl-serv-uid-1043/gimp-perl-serv

To render page titles, the gen_text coderef variable calls GimpSite::Text::Render. In the _header block, we see: {|{-|Title|-}:&gen_text|}. This passes the current value of the Title variable to the text rendering function and returns the appropriate HTML to insert the image in the page. The page rendering stops while the image is being rendered, but the image file is written to disk so it needn't be rendered twice.

    gen_text => sub {
        require GimpSite::Text;
        my ($page, $text) = @_;
        # my ($style, $text) = split /\//, $arg, 2;
        my ($uri, $w, $h) = &GimpSite::Text::Render ($text);
        return qq(<img src="$uri" alt="$text" width="$w" height="$h" border="0">);
    },

The messy internals of GimpSite::Text::Render output silly-putty-like text using the FreeType plug-in.


Future Work

Obviously all of this is a work in progress, and something that reflects my personal preferences very strongly. So, it may or may not fit with the GIMP website, but I am certainly using it for some commercial websites over which I have total control.

The regular expressions could be replaced by a proper parser, but I have found that the limitations of the regex system stop me from doing things that I might regret later.

The title generator is currently hard-coded to generate graphical titles of a certain size and shape for the GIMP website. This should be replaced by a generic title style system so that various simple GIMP effects can be condensed into a simple description of how to make a particular style of title.

The --USE-- separator is not enforced. A handler can read the entire file and ignore intended handler changes. In fact this is what the 'dump' handler does. A magic variant of IO::Handle could prevent this.

It might be interesting to see how this stuff would fit in with the new stacked handlers and filters in Apache 2.0.


© Copyright June 2001 by Tom Rathborne - tomr@aceldama.com
This document, the ideas it reveals, and its associated images are in the public domain.
The code is licensed under the GPL.