Embedding Perl in HTML with Mason
Dave Rolsky
Ken Williams

Here

Home
News
Read it online
Example Code
Authors
Errata
Praise
Contact us
Buy the book! Mason HQ
O'Reilly
mod_perl

Chapter 10: Scalable Design

So now that you know how to do things with Mason, it's time to start thinking about how to do things cleanly, scalably, and maintainably. Mason is a good tool, but it is not magic, and you still need to think about design when you use it.

Modules Versus Components

This book was written before things like Catalyst, MasonX::WebApp and MasonX::Interp::WithCallbacks were available or widely used. The latter two modules were written to make it easier to move application logic out of components into modules.

Catalyst is a full-featured framework which makes it easy to build a nicely architected MVC web application. You can use Mason as the templating piece (the view) with Catalyst quite easily, and you're encouraged to check it out at http://catalyst.perl.org.

Mason is a powerful tool for generating content. Its combination of easy templating syntax, powerful component structures, and features like autohandlers, dhandlers, and component inheritance all combine to make it much like Perl itself: it makes easy things easy, and difficult things possible.

However, exactly like Perl itself, the facilities it provides can make it all too tempting to do things the easy way, and Mason makes no attempt to enforce any sort of discipline in your design. Instead, this is your responsibility as a programmer and application designer. This is where the responsibility always lies, no matter what language or tool you are using.

Though Mason is at its core a text templating tool, it also provides much more functionality. One such piece of functionality is that individual components are almost exactly like subroutines. They can be called anywhere in your processing and they can, in turn, call other components, generate output, and/or return values to the caller. And, like Perl's subroutines, variables defined inside a component are lexically scoped to that component.

It is this similarity between components and subroutines that can lead to design trouble. As long-time Mason users, we have come to believe that Mason components should be used almost exclusively for generating output. For data processing, we believe that Perl modules are the better solution. In our experience, this division of labor leads to long-term benefits in maintainability and clarity of design.

When we say "generating output," we mean generating binary or text output of any sort (HTML, XML, plain text, images, etc.) to be sent somewhere (STDOUT, a web client, etc.). In a web environment, this includes things like sending redirect headers or custom error responses as well as HTML. When we say "data processing," we mean the work of retrieving data from an external data source such as a database, processing data and constructing useful objects or data structures, doing calculations, implementing business logic, or munging data.

Our exception to this rule is when the data processing is entirely part of the UI that Mason is generating. For example, in a web context, it may be necessary to do some munging of POSTed data or to translate data from the manner in which it is presented in the UI to a format suitable for your backend.

But Mason is not the right tool for all jobs, and it should not form the entire infrastructure of any project.

The rest of this discussion will assume a web environment, as that is Mason's primary domain, though this discussion can apply to any environment in which Mason could be used.

Another important goal is to minimize duplication of code. You will never eliminate this entirely, but this should always be your goal. Duplicated code leads to bugs when one piece changes and the other doesn't, increases the difficulty of understanding the entire code base, and increases implementation time for bug fixes and changes.

Obviously, the line between generating output and data processing is extremely blurry. Given that fact, perhaps the best goal is to reduce the data processing in Mason components to the minimal amount necessary to properly generate output. All other application logic should be placed in Perl modules and called from your components.

The line that needs to be drawn is one that makes the code flow in both your modules and your components as natural as possible. We don't want to go into impossible contortions in order to eliminate four lines of processing from a component, nor do we want to put knowledge about Mason or our components into our modules. Like all design tasks, there is as much art as skill involved.

For example, as mentioned before, we consider it entirely appropriate for Mason components to handle incoming request argument processing. A component could use these arguments to determine what library function to call or what object to instantiate. It might also use these arguments to change the way it generates output, for example if there were a parameter indicating that no images should be included on a page.

There is little reason to handle this particular processing task with a module. Indeed, this would be creating exactly the kind of dependency we believe is so problematic in using Mason for application logic. Your modules should be generically useful and if they depend on being called by Mason components, they are useless outside of the Mason environment.

What exactly is the danger of blurring these lines? Well, Mason is a fine system for generating HTML or other forms of output. However, let's assume that you plan to also provide your data via an email interface. A user may write an email to you with a specific body such as "fetch file 1," and your application will respond with the contents of file 1.

In a case such as this, you just want to execute some application logic to fetch a file and then spit it out to your mailer. It is unlikely that any of Mason's powerful features would be necessary in order to perform this task; in fact, Mason would probably get in the way.

Another example can illustrate this issue further. Let's assume we want to build an application to serve as the backend for a new web site focused on news about Hong Kong movies. Let's assume you intelligently decide to make a single component to generate a story box. A story box has a headline, an author, and the first 500 characters of the story. If there are more, it has a link to read the whole thing.

Here's the HTML-making portion of the component:

  <h1><% $story{headline} | h %></h1>
  
  <p>
  written by <b><% $story{author} | h %></b>
  </p>
  
  <p>
  <% substr ($story{body}, 0, 500) | h %>
  </p>
  % if ( length $story{body} > 500 ) {
  <p>
  <a href="full_story.html?story_id=<% $story{story_id} %>">
  Read the full story
  </a>
  </p>
  % }

Pretty simple, no? The component contains some application logic, of course. It checks the length of the story's body and changes the output depending on it. But the real question is where the %story hash comes from. Let's assume that we call another component to get it. So then we have this:

  <%init>
   my %story = $m->comp('get_newest_story.mas');
  </%init>

So what's the problem? Well, there is none as long as the only time you want to get the newest story is in a Mason environment. But what if you wanted to send out the top story anytime someone sent an email to you at newest_story@hkmovienews.example.com?

Hmm, let's write a quick program to do that:

  #!/usr/bin/perl -w
  
  use HTML::Mason::Interp;
  
  my $outbuf;
  my $interp = HTML::Mason::Interp->new( out_method => \$outbuf );
  
  my %story = $interp->exec('/path/to/get_newest_story.mas');
  send_story_mail(%story);
  
  # imagine the mail is sent

Not so bad, we suppose. Here are some issues to consider:

You just loaded a couple of thousand lines of Perl code in order to do a simple database fetch and then send an email. And because this email interface has become quite popular, it's happening a few times every minute. Your sysadmin is looking for you and she's carrying a big spiked club!
The return value of $interp->exec() may not be what you'd expect. If the component you called did an $m->abort('something') internally, the return value will be 'something'. This works fine when using the Mason ApacheHandler code, but it isn't what you expected in this situation.
If any component you call (or that it calls) references $r (the Apache request object), it will fail spectacularly. It's nice to feel free to access $r in your components, but if you were trying to make a multipurpose Mason system you'd have to be sure not to use $r in any component that might be used outside of a web context, and you would feel fettered and stifled.

Now imagine that you multiply this by 40 more data processing and application logic components. Then remember that if you try to do 'perldoc get_newest_story' from the command line, it won't do anything! And remember that you have 40 separate files, one per API call. Now imagine that you take advantage of Mason's inheritance and other fancy features in your data processing code. Now imagine trying to debug this later.

If, however, you put the 'get_newest_story' functionality into a module, you could call this module from both your component and your email sending program, looking something like this:

  #!/usr/bin/perl -w
  
  use MyApplication;
  
  my %story = MyApplication->get_newest_story( );
  MyApplication->send_story_mail(%story);

The advantages include:

You can easily preload your shared library code in the main Apache server at startup, resulting in a memory savings.
Performancewise, calling a subroutine in a module is much more lightweight than calling a Mason component. A Mason component call involves calling a subroutine and also performing a bunch of overhead tasks like checking the age of the component file, checking required arguments and types, and so on.
Perl modules have well-known mechanisms for documentation and regression testing. Psychologically, we feel that an API is more stable when we have a documented module that instantiates it. A tree of components feels more mutable, and we hate feeling as if we've built a shaky house of logic that we don't necessarily understand in the end.

The Other Side

However, that's not to say you don't lose anything. Here's a summary of a number of arguments we've heard on the possible advantages of using Mason components for data processing, along with our responses.

Data processing in Mason components provides developers with a unified way of writing both display and processing code. This is especially appreciated by less experienced developers not accustomed to writing modules.
Perl modules are one of the fundamental tools for writing reusable code and creating maintainable applications. It may be convenient to use Mason for data processing in the short term, but in the long term you'll be better served by moving to a more formalized approach involving separate mechanisms for processing and display.
For rapid development environments, it's hands-down faster to create a new component, and you are less likely to have a merge conflict with another person's work.
Once a module is created, adding a new function or method to it is fairly trivial, but the initial process does require some thought. And yes, merge conflicts are more likely when using version control because you will have fewer files, though in our experience this is not terribly common.
Mason has support for private versions of processing code. One person said that where they work everyone has a version of the site checked out from version control and views his version through TransHandler magic via <name>.dev.example.com. Developers can change their own version of the processing components and preview the changes. If the processing code were in modules, every developer would need his own Perl interpreter, thus a separate server.
It is possible, though not completely trivial, to provide every developer a unique copy of the modules in his own server. This can be more of a maintenance hassle, particularly when adding new developers, though some automation can eliminate the hassle. Again, this is a case of investing time up front as an investment in the future. This issue is discussed in Chapter 11.

For example, giving each developer his own Apache daemon is relatively easy, running it on a unique high-numbered port. Each developer's server can then use the developer's local copies of the code, modules, and components, so the developer can work in isolation and feel free to break things without slowing anyone else down.

Or, just as easily, each developer can run a daemon locally on his own computer, perhaps connecting to a central test database or even running a RDBMS locally.¹ Most importantly, nothing can replace solid coding guidelines, development practices, and testing, coupled with tools like version control.
Components give you many fringe benefits over Perl subroutines: named argument passing and checking, result caching, a lightweight hierarchical naming structure, component logging, and so on.

We can't really argue with this. It's true. However, we have yet to find ourselves really wishing for this functionality when developing application logic. Named arguments are nice, but CPAN provides several nice solutions for validating named arguments, including Params::Validate, which Mason uses internally.

There have been times when shoving data processing into a Mason component was exactly what we've needed. The code sits there right next to the code that calls it, not off in site_perl/, which should usually have some tight controls over what gets put in it. In a matter of seconds you can try things out without worrying about module naming, namespace collisions, server restarts, and so forth. Then when you've had a chance to think about what a good interface should be like, you can migrate the code to a module. It's all well and good to extol the virtues of good planning, but the creative process is seldom very plannable unless you've done a similar task before.

On yet another hand, you can always maintain your own module directories and add them to Perl's search path via a quick use lib.

We are certainly not advocates of the "design everything and make sure it's perfect before coding" school of design. Our points are more about the end product than the development process itself. Your process should lead to the creation of clean, maintainable code. If you make a mess while writing it, we certainly won't criticize as long as it gets cleaned up in the end.

Our summary is simple. Writing your application logic and data processing as Mason components is a shortcut that can bite you later. Like many design trade-offs, it speeds up initial time to release while guaranteeing maintenance pain in the future.

Components as Independent Units

Like subroutines in any language, Mason components are vulnerable to the disease known as "jumboitis." Symptoms of this disease include monstrous chains of if-elsif clauses as well as a general excess of code. This disease, untreated, can lead to developer confusion, application fragility, and apathetic mindset toward fixing bugs because "it doesn't matter, the code will still suck."

It is never a good idea to pack all your decisions into a single component. Even if you're not planning to reuse a particular piece of Mason code, it doesn't hurt to turn it into a separate component in order to demarcate pieces of code as having different functions. In some cases, you may prefer to use subcomponents instead of actually creating a separate file.

One practice that often leads to jumboitis is often seen in the CGI world. Quite commonly, a single CGI program starts off with a big chain of if-elsif clauses that basically try to figure out what the program is supposed to do. First it displays a form then it processes the form output, and then it might show the form again with errors marked, or it might show another page, then update the database, then show an index page, and then...

OK, we're out of breath and our brains are throbbing. This sort of code is scary though we've all probably written something just like it in the past.

It would be easy to do this with Mason, but there's no need. In a cleaner design, you'd have one component display the form. Then it would post to a component that would handle the form input, which can call another component or a module to do data validation. If the data has errors, it redirects back to the form component. Otherwise it might redirect to a component that shows a preview of the data. Then a Submit button could post the data to yet another component that updates the database (after doing data validation again, no doubt). As long as you've got smooth pathways for sharing data among components, you'll be able to design a component tree that makes sense and isn't a nightmare to maintain.

And don't forget about autohandlers and dhandlers, which can go a long way to reducing code duplication with very little effort. We saw this sort of pattern in Chapter 8 when we looked at the user and project editing components.

Component Layout

If you are working with other people on a Mason project, you should probably standardize the layout of code within your components, if for no other reason than consistency. You may find that putting all of the text generation at the top of the component, followed by other sections like <%args> and <%init>, is a good layout. This means that when HTML folks have to look at your components, they won't be overwhelmed by many lines of what is, to them, gibberish. Here is another possible component layout stardard:

A <%doc> section describing what the component does. You can omit this section if it purpose is obvious -- for some suitable strict definition of "obvious."
The <%args> section.
The <%flags> section.
The <%attr> section.
The text generation portion of the component, along with whatever embedded code it contains.
The <%once> section.
The <%shared> section.
The <%init> section.
The <%cleanup> section.
The <%filter> section.
All of the component's <%def> sections.
All of the component's <%method> sections.

In turn, each subcomponent or method should follow the same ordering of sections as the main component.

The general aesthetic is that we first put sections that define the component's interface (<%doc>, <%args>, <%flags>, <%attr>), then the main body of the component, then any sections written in Perl. This tends to balance the needs of Perl developers, HTML developers, and code administrators.

Alternatively, you could place the <%args>, <%flags>, and <%attr> sections after the main body. You might do this if Mason components may be edited by nonprogramming web designers; they will probably prefer to see the text portions of the component first without being distracted by code sections that they may not understand anyway. This is the style we adopted for our sample site, in Chapter 8.

Of course, use a layout that makes sense in your specific situation. For instance, if a subcomponent is tiny enough, you might just put it near the code that calls it. Choose a layout that gives you inner peace.

File Naming and Directory Layout

As with your component layout, the most important aspect of naming is consistency. Give your components consistent names and file extensions. For example, components intended to be called by a client might end in .html or .html, while components intended only for use by other components might end in .mas. Consistently naming your files will simplify your web server setup and lots of maintenance tasks as well as slightly lowering the barrier to entry for new developers.

One other consideration when determining your directory and file layout is how you plan to use autohandlers. One symptom of a bad layout is finding yourself frequently using the inherit flag to override a component's default inheritance. Even worse is when you have many components in a single directory, all with differentinherit flags. That is a strong sign that you should consider grouping files together based on inheritance.

Of course, you can always use the inherit flag to change a component's parent, but if you can avoid it and simply use the closest autohandler file, that's one less complication to deal with and one less source of potential bugs.

Random Advice

Finally, we want to say a few things that don't justify their own section:

Always put whitespace around the contents of a substitution tag. This looks nicer.
Don't output content and return a value from the same component. This makes for a confusing API.
Put as much code as possible in external modules, and try to stick as much of the rest into <%init>, <%once>, and <%shared> blocks. Don't litter your components with <%perl> blocks and Perl lines. Interspersing HTML (or other text) with lots of code makes for hard-to-read components.
Use a period (.) as the first character of subcomponent names.

None of these rules are etched in stone, but they provide some good guidelines to your coding that may make your life, and your fellow coders' lives, a little more pleasant.

Footnotes

1. Though a local RDBMS may be more trouble than it's worth with a high-maintenance RDBMS like Oracle. -- Return.

These HTML pages were created by running this script against the pseudo-POD source.