Small cover
Embedding Perl in HTML with Mason
Dave Rolsky
Ken Williams

Table of Contents | Foreword | Preface
Chapters: 1 2 3 4 5 6 7 8 9 10 11 12
Appendices: A B C D
Glossary | Colophon | Copyright


Chapter 12: Custom Mason Subclasses

Something that we have tried very hard to do beginning with the 1.10 release of Mason is to make it easier to customize Mason's behavior. Jon Swartz was already on this track even back with the release of 0.80, which saw the first appearance of the HTML::Mason::Resolver classes, but 1.10 tries to bring this to new levels.

Starting with 1.10 it has become possible to subclass almost every core class that comes with Mason. Some obvious candidates for subclassing include the Lexer, Compiler, and Resolver. This chapter will demonstrate how you might go about implementing subclasses of various Mason objects.

Class::Container as a Superclass

A number of modules in Mason are subclasses of Class::Container. This is a class that was created to encapsulate some common behaviors for Mason objects. Originally, it was called HTML::Mason::Container, but Ken Williams decided to package this class separately and release it to CPAN, as it solves some fundamental problems of a large object-oriented system. Any Mason object that takes parameters to its constructor must inherit from this module. Of course, since all of the classes that you might consider subclassing inherit from Class::Container already, you shouldn't need to inherit from it directly. However, you may need to use some of its methods. We will briefly cover a few of them here, but see the Class::Container documentation for more details.

The modules in the Mason core distribution that are Class::Container subclasses are HTML::Mason::ApacheHandler, HTML::Mason::CGIHandler, HTML::Mason::Interp, HTML::Mason::Compiler, HTML::Mason::Lexer, HTML::Mason::Resolver, and HTML::Mason::Request.

The most important methods that Class::Container provides are valid_params() and contained_objects(), both of which are class methods.

The first, valid_params(), is called in order to register the valid parameters for a class's new() constructor. The second method, contained_objects(), is used to register the objects, if any, that a given class contains.

The contained_objects() method is not something you will have to use for all of your subclasses, since most of the time you won't be altering the structure of Mason's framework, you'll just be plugging your own classes into it. This method is called with a hash that contains as its keys parameter names that the class's constructor accepts and as its values the default name of the contained class.

For example, HTML::Mason::Compiler contains the following code:

  __PACKAGE__->contained_objects( lexer => 'HTML::Mason::Lexer' );

This says that the HTML::Mason::Compiler->new() method will accept a lexer parameter and that, if no such parameter is given, then an object of the HTML::Mason::Lexer class will be constructed.

Class::Container also implements a bit of magic here, so that if HTML::Mason::Compiler->new() is called with a lexer_class parameter, it will load the class, instantiate a new object of that class, and use that for the lexer. In fact, it's even smart enough to notice if parameters given to HTML::Mason::Compiler->new() are really intended for this subclass, and it will make sure that they get passed along.

The valid_params() method is a bit more complex. It also takes a list of key/value pairs as arguments. The keys are the names of parameters accepted by the new() method, while the values are hash references defining a validation specification for the parameter. This specification is largely the same as that used by the Params::Validate module, with a few additions (but no subtractions).

One addition is that each parameter, excluding those that represent contained objects, may also define a value for parse. This tells Mason how to parse this parameter if it is defined as part of an Apache configuration file. If no parse parameter is provided, a sensible default will be guessed from the value of the Params::Validatetype argument.

The upshot of this is that your subclasses can define their own constructor parameters and Mason will then check for these parameters in an Apache configuration file.

As an example, HTML::Mason::Compiler contains the following:

  __PACKAGE__->valid_params
      (
       allow_globals =>
       { parse => 'list',   type => ARRAYREF, default => [ ],
         descr => "An array of names of Perl variables that are" .
                  " allowed globally within components" },
  
       default_escape_flags =>
            { parse => 'string', type => SCALAR,   default => '',
              descr => "Escape flags that will apply by default to" .
                  " all Mason tag output" },
  
       lexer =>
         { isa => 'HTML::Mason::Lexer',
           descr => "A Lexer object that will scan component" .
                  " text during compilation" },
  
       preprocess =>
         { parse => 'code',   type => CODEREF,  optional => 1,
           descr => "A subroutine through which all component text" . 
                  " will be sent during compilation" },
  
       postprocess_perl =>
         { parse => 'code',   type => CODEREF,  optional => 1,
           descr => "A subroutine through which all Perl code" .
                  " will be sent during compilation" },
  
       postprocess_text =>
         { parse => 'code',   type => CODEREF,  optional => 1,
           descr => "A subroutine through which all plain text will" .
                  " be sent during compilation" },
      );
  
  __PACKAGE__->contained_objects( lexer => 'HTML::Mason::Lexer' );

The type , default, and optional parameters are part of the validation specification used by Params::Validate. The various constants used, ARRAYREF , SCALAR, and so on, are all exported by Params::Validate. The parameters passed to valid_params() correspond to the MasonAllowGlobals, MasonDefaultEscapeFlags, MasonLexerClass, MasonPreprocess, MasonPostprocessPerl, and MasonPostprocessTexthttpd.conf configuration variables. Yes, Class is added automatically to the lexer param because lexer was also given to the contained_objects() method.

The descr parameter is used when we generate the HTML::Mason::Params documentation and is probably not something you'd need to use.

For more details, see both the Class::Container and Params::Validate documentation.

Syntax: Your Very Own Lexer

A request heard every so often on the Mason users list is for some way to create an XML-based markup language that can be used with Mason and that can be compiled to a Mason component object.

Despite the panic the thought of such a thing inspires in us, in the interests of good documentation, we will show the beginnings of such a lexer.

This lexer object will make use of several modules from CPAN, including XML::SAX::ParserFactory and XML::SAX::Base . The former is what it sounds like, a factory for SAX parsers (SAX2 parsers, actually). The latter is what any SAX2 handler should use as a base class. It implements a default no-op method for all the possible SAX2 methods, allowing you to simply implement those that you need. Our lexer will be a SAX2 handler, so we will inherit from XML::SAX::Base.

A quick side note on SAX (Simple API for XML): SAX is an event-based API for parsing XML. As the parser finds XML constructs, such as tags or character data, it calls appropriate methods in a SAX handler, such as start_element() or characters(). The parser is an event producer and the handler, like our Lexer, is an event consumer. In our case, the Lexer will also be generating events for the Compiler, though these will not be SAX events.

For more information on Perl's implementation of SAX2, see the perl-xml project on Sourceforge at http://perl-xml.sourceforge.net/.

For the purposes of our example, let's assume that any element that is not in the mason XML namespace will be output verbatim, as will any text. For tags, we'll just implement <mason:args>, <mason:init>, <mason:perl>, and <mason:output> in this example.1 The <mason:init> tag will contain XML-escaped Perl code, while the <mason:args> tag will contain zero or more <mason:arg> tags. Each <mason:arg> tag will have the attributes name and default , with name being required.

We will also implement a <mason:component> tag in order to provide a single top-level containing tag for the component, which is an XML requirement.

This is only a subset of the Mason syntax set, but it's enough to show you how to customize a fairly important part of the system.

Using these tags, we might have some XML like this:

  <?xml version="1.0"?>
  <mason:component xmlns:mason="http://www.masonbook.com/">
   This is plain text.
   <b>This is text in an HTML tag</b>
   <mason:perl>
    my $x;
    if ($y &gt; 10) {
        $x = 10;
    } else {
        $x = 100;
    }
   </mason:perl>
   $x is <mason:output>$x</mason:output>
   $y is <mason:output>$y</mason:output>
  
   <mason:args>
    <mason:arg name="$y" />
    <mason:arg name="@z" default="(2,3)" />
   </mason:args>
   <mason:init>
    $y *= $_ foreach @z;
   </mason:init>
  </mason:component>

OK, that looks just beautiful!

Let's start with the preliminaries.

  package HTML::Mason::Lexer::XML;
  $VERSION = '0.01';
  
  use strict;
  
  use HTML::Mason::Exceptions( abbr => [ qw( param_error syntax_error error ) ] );
  
  use HTML::Mason::Lexer;
  use Params::Validate qw(:all);
  use XML::SAX::Base;
  use XML::SAX::ParserFactory;
  use base qw(HTML::Mason::Lexer XML::SAX::Base);  # Lexer comes first

As mentioned before, XML::SAX::Base provides default no-op methods for all of the possible SAX2 events, of which there are many. Since we're not interested in most of them, it's nice to have them safely ignored. We inherit from HTML::Mason::Lexer because it provides a few methods that the compiler class needs, such as object_id().

Because we're staunch generalists, we won't insist that the XML namespace of our tags needs to be ' mason' . We'll let the user override this with a parameter if desired:

  __PACKAGE__->valid_params
    (
     xml_namespace => { parse => 'string', type => SCALAR, default => 'mason',
                        descr => "Prefix of XML tags indicating Mason sections" },
    );

We don't need to make a separate new() method in our module, since we can just inherit the one provided by our base Lexer class. The main action will happen in the lex() method:

  sub lex {
      my ($self, %p) = @_;
  
      local $self->{name} = $p{name};
      local $self->{compiler} = $p{compiler};

We need a convenient place to keep these, so we stick them into $self for the duration of lexing. Perl's local() function makes sure these entries expire at the end of the lex() method:

      $self->{state} = [ ];

We'll need to keep a stack of what tags we've seen so we can check that tags aren't improperly nested and in order to handle characters() events correctly:

      my $parser = XML::SAX::ParserFactory->parser( Handler => $self );

We could have created the parser object in our new() method, but to store it we would have had to save it in the lexer object's structure, which would have created a circular reference. Doing it this way guarantees that the reference to the parser will go out of scope when we're finished using it.

      $parser->parse_string( $p{comp_source} );
  }

The last bit tells the parser to parse the component text we were given. That will cause the parser to in turn call methods for each SAX event that occurs while parsing the string.

Now we'll take a look at our event-handling methods. The first is start_element() , which will be called whenever an XML tag is first encountered:

  sub start_element {
      my $self = shift;
      my $elt  = shift;
  
      if ( ! defined $elt->{Prefix} ||
           $elt->{Prefix} ne $self->{xml_namespace} ) {
          $self->_verbatim_start_element($elt);
          return;
      }

If we got something that isn't in our designated namespace we'll just pass it through to the compiler as text to be output:

      if ( $elt->{LocalName} eq 'component' ) {
          $self->{compiler}->start_component;
      }

When the component starts, we notify the compiler so it can do any initialization that it needs to do:

      foreach my $block ( qw( init perl args ) ) {
          if ( $elt->{LocalName} eq $block ) {
              $self->_start_block($block);
            last;
          }
      }

      if ( $elt->{LocalName} eq 'output' ) {
          $self->_start_output;
      }

      if ( $elt->{LocalName} eq 'arg' ) {
          $self->_handle_argument($elt);
      }
  }

The rest of this method is basically a switch statement. Depending on what type of element we receive, we call the appropriate internal method to handle that element.

Let's look at some of the individual methods that are called:

  sub _verbatim_start_element {
      my $self = shift;
      my $elt  = shift;
      my $xml = '<' . $elt->{Name};
  
      my @att;
      foreach my $att ( values %{ $elt->{Attributes} } ) {
          push @att, qq|$att->{Name}="$att->{Value}"|;
      }
  
      if (@att) {
          $xml .= ' ';
          $xml .= join ' ', @att;
      }
  
      $xml .= '>';
  
      $self->{compiler}->text( text => $xml );
  }

Basically, this method goes through some contortions to regenerate the original XML element and then passes it on to the compiler as plain text. It should be noted that this implementation will end up converting tags like <foo/> into tag pairs like <foo></foo>. This is certainly valid XML but it may be a bit confusing to users. Unfortunately, there is no easy way to retrieve the exact text of the source document to determine how a tag was originally written, and with XML you're not supposed to care anyway.

Back to our subclass. The next method to implement is our _start_block() method. This will handle the beginning of a number of blocks in a simple generic fashion:

  sub _start_block {
      my $self  = shift;
      my $block = shift;
  
      if ( $self->{state}[-1] &&
           $self->{state}[-1] ne 'def' &&
           $self->{state}[-1] ne 'method' ) {
          syntax_error "Cannot nest a $block tag inside a $self->{state}[-1] tag";
      }

What we are doing here is making it impossible to do something like nest a <mason:init> tag inside a <mason:perl> block. In fact, the only tags that can contain other tags are method and subcomponent definition tags, which are unimplemented in this example.

We notify the compiler that a new block has started and then push the block name onto our internal stack so we have access to it later:

      $self->{compiler}->start_block( block_type => $block );

      push @{ $self->{state} }, $block;
  }

Again, we check for basic logical errors:

  sub _start_output {
      my $self = shift;
  
      if ( $self->{state}[-1] &&
           $self->{state}[-1] ne 'def' &&
           $self->{state}[-1] ne 'method' ) {
          syntax_error "Cannot nest an output tag inside a $self->{state}[-1] tag";
      }

Again, we push this onto the stack so we know that this was the last tag we saw:

      push @{ $self->{state} }, 'output';
  }

The variable name and default are expressed as attributes of the element. The weird '{}name' syntax is intentional. Read the Perl SAX2 spec mentioned earlier for more details on what this means.

  sub _handle_argument {
      my $self = shift;
      my $elt  = shift;
  
      my $var = $elt->{Attributes}{'{}name'}{Value};
      my $default = $elt->{Attributes}{'{}default'}{Value};

We want to check that the variable name is a valid Perl variable name:

      unless ( $var =~ /^[\$\@%][^\W\d]\w*/ ) {
          syntax_error "Invalid variable name: $var";
      }

Then we tell the compiler that we just found a variable declaration.

      $self->{compiler}->variable_declaration( block_type => 'args',
                                               type => substr( $var, 0, 1 ),
                                               name => substr( $var, 1 ),
                                               default => $default );
  }

That wraps up all the methods that start_element() calls. Now let's move on to handling a characters() SAX event. This happens whenever the SAX parser encounters data outside of an XML tag.

  sub characters {
      my $self  = shift;
      my $chars = shift;
  
      if ( ! $self->{state}[-1] ||
           $self->{state}[-1] eq 'def' ||
           $self->{state}[-1] eq 'method' ) {
          $self->{compiler}->text( text => $chars->{Data} );
          return;
      }

If we're in the main body of a component, subcomponent, or method, we simply pass the character data on as text:

      if ( $self->{state}[-1] eq 'init' ||
           $self->{state}[-1] eq 'perl' ) {
          $self->{compiler}->raw_block( block_type => $self->{state}[-1],
                                        block => $chars->{Data} );
          return;
      }

Character data in a <mason:init> or <mason:perl> section is passed to the compiler as the contents of that block. The compiler knows what type of tag is currently being processed and handles it appropriately.

      if ( $self->{state}[-1] eq 'output' ) {
          $self->{compiler}->substitution( substitution => $chars->{Data} );
      }
  }

If we are in a substitution tag, we call a different compiler method instead. Otherwise, we'll simply end up discarding the contents.

Since we may be dealing with text where whitespace is significant (as opposed to HTML), we'll want to pass on whitespace as if it were character data:

  sub ignorable_whitespace { $_[0]->characters($_[1]->{Data}) }

This method may be called if the XML parser finds a chunk of "ignorable whitespace." Frankly, we can never ignore whitespace, because it is just so cool, and without it our code would be unreadable. But apparently XML parsers can ignore it.2 The last thing we need to handle is an end_element() event:

  sub end_element {
      my $self = shift;
      my $elt  = shift;
  
      if ( ! defined $elt->{Prefix} ||
           $elt->{Prefix} ne $self->{xml_namespace} ) {
          $self->_verbatim_end_element($elt);
          return;
      }

Again, XML elements not in our designated namespace are passed on verbatim to the compiler:

      if ( $elt->{LocalName} eq 'component' ) {
          $self->{compiler}->end_component;
          return;
      }

If we have reached the end tag of the component, we inform the compiler that the component is complete and we return:

      return if $elt->{LocalName} eq 'arg';

We don't need to do anything to end argument declarations. The work needed to handle this element happened when we called _handle_argument() from our start_element() method.

      if ( $self->{state}[-1] ne $elt->{LocalName} ) {
          syntax_error "Something very weird happened.  " .
                       "We encountered an ending tag for a $elt->{LocalName} tag " .
                       "before ending our current tag ($self->{state}[-1]).";
      }

Actually, this should just never happen: XML does not allow tag overlap and, if the parser finds overlapping tags, it should die rather than passing them to us. But we believe in being paranoid. If there is an error in the logic of this lexer code, this might help us in catching it.

      if ( $elt->{LocalName} eq 'output' ) {
          pop @{ $self->{state} };
          return;
      }

Any output that needed to be sent has already been dealt with via the characters() method so we simply need to change our state if the end tag was </mason:output>:

      $self->{compiler}->end_block( block_type => $elt->{LocalName} );

      pop @{ $self->{state} };
  }

The only remaining possibilities at this point are either <mason:perl>, <mason:init>, or <mason:args>. For these we simply tell the compiler that the block is over, change our state, and finish.

The last method we need to write is _verbatim_end_element() to pass through tag endings for non-Mason tags:

  sub _verbatim_end_element {
      my $self = shift;
      my $elt  = shift;
  
      $self->{compiler}->text( text => "</$elt->{Name}>" );
  }

This concludes our sample lexer subclass. Note that there are a couple of things missing here. First of all, there is no handling of subcomponents or methods. This wouldn't be too terribly hard as it's mostly an issue of calling the right methods on the compiler.

We also would want to handle line numbers. The default Mason lexer keeps track of line numbers in the source file so that the compiler can output appropriate #line directives in the object file, meaning that errors are reported relative to the source file. This feature isn't required but can be very nice to have.

Some of the unhandled potential tags like <mason:text> would be extremely trivial to implement. The <mason:flags> and <mason:attr> tags could be modeled on the code for handling <mason:args>. And of course, we need to handle component calls too. This is the point in this example where we say, "finishing this is left as an exercise to the reader."

To use this new lexer class, we would either place the following in the httpd.conf file:

  PerlSetVar MasonLexerClass HTML::Mason::Lexer::XML

or, when creating the ApacheHandler object, we would simply pass in 'HTML::Mason::Lexer::XML' as the value of the lexer_class parameter.

Output: Compiling to a Different Output

So you've decided that you really hate Mason and you want to use Embperl instead. But you have a number of Mason components you've already written that you'd like to save. Well, you can create your own compiler to generate Embperl code from Mason. In this case, we'll use the lexer as is and rewrite the compiler from scratch. There isn't really a one-to-one match between Mason and Embperl's features so this example will, like the lexer example, be limited in scope. Finding an intelligent way to convert Mason's methods and subcomponents to Embperl is beyond the scope of this book.

In case you are unfamiliar with Embperl, it uses the following syntax: [+ +] tags contain code whose results should be sent to the browser, like Mason's substitution tag (<% %>). The [* *] tags contain Perl code that is not intended to generate output. This is equivalent to Mason's % -lines and <%perl> blocks. Finally, Embperl also has a [! !] tag similar to Mason's <%once> block.

There are other Embperl tags but, once again, this is a simplified example.

Embperl does have a feature similar to Mason's inheritance system called EmbperlObject, but translating between the two is nontrivial.

So let's make our new compiler:

  package HTML::Mason::Compiler::ToEmbperl;
  
  $VERSION = '0.01';
  
  use strict;
  
  use HTML::Mason::Lexer;
  use HTML::Mason::Exceptions ( abbr => [qw(syntax_error)] );
  use HTML::Mason::Compiler;
  use base qw(HTML::Mason::Compiler);

This pulls in the basic packages we'll need. Even though we really aren't inheriting much from HTML::Mason::Compiler , we still subclass it as anything expecting a compiler will check that what it is given is a subclass of HTML::Mason::Compiler.

Of course, in our case, we won't be using this compiler with the HTML::Mason::Interp class, so the point is moot but important to mention.

  sub compile {
      my ($self, %p) = @_;
  
      $self->lexer->lex( comp_source => $p{comp_source}, 
                         name => 'Embperl', 
                         compiler => $self );
  
      return $self->component_as_embperl;
  }

The only parameter we expect is

comp_source. We tell the lexer the name of the component is 'Embperl' since we don't really care what the name is in this context. Presumably we are being called by some sort of script that is simply going to take the Embperl-ized component and write it to disk somewhere. The name is used for reporting syntax errors when a component is run, but that won't be an issue in this case.

  sub start_component {
      my $self = shift;
  
      $self->{once_header} = '';
      $self->{header} = '';
      $self->{body}   = '';
      $self->{footer} = '';
  
      $self->{current_block} = '';
  }

This method is called to give the compiler a chance to reset its state, so that's what we do.

We will be storing blocks of code in each of the first four attributes. When we encounter a <%once> block, it will go in the once_header attribute. For <%init> blocks, we can put then in the header attribute. % -lines, <%perl> blocks, <%text> blocks, substitution tags, and text will be placed immediately into the body attribute. Finally, any <%cleanup> blocks will go into the footer attribute.

The current_block() attribute will be used to keep track of what type of block we are in after a call to our start_block() method.

This example will ignore other Mason syntax such as component calls, subcomponents, methods, and <%shared>. Again, this will be left as an exercise for the reader.

  sub start_block {
      my ($self, %p) = @_;
  
      syntax_error "Cannot nest a $p{block_type} inside a $self->{in_block} block"
          if $self->{in_block};

This is to make sure that the component is following the syntax rules we expect.

      $self->{in_block} = $p{block_type};
  }

Then we record what kind of block we are starting, which will be something like init or perl .

The next method, raw_block() , is called for all of the blocks that we handle except the <%text> block:

  sub raw_block {
      my ($self, %p) = @_;
  
      for ($self->{in_block}) {
          /^once$/     and   $self->{once_header} .= $p{block};
          /^init$/     and   $self->{header}      .= $p{block};
          /^perl$/     and   $self->{body}        .= "[* $p{block} *]";
          /^cleanup$/  and   $self->{footer}      .= $p{block};
      }
  }

This switchlike statement stores the code given to us in the appropriate attribute. If it is a <%perl%> block, we wrap it in the relevant Embperl tag; otherwise, we simply store it as is in the appropriate slot.

  sub text_block {
      my ($self, %p) = @_;
      $self->{body} .= $p{block};
  }
  
  sub text {
      my ($self, %p) = @_;
      $self->{body} .= $p{text};
  }

The first method is called when the lexer finds a<%text> block. The second is called for regular text. Both of these get placed into the body attribute for later use.

  sub substitution {
      my ($self, %p) = @_;
      $self->{body} .= "[+ $p{substitution} +]";
  }

This method handles substitution tags (<% %>) though it ignores the fact that this method can also be given an escape

parameter. This could be handled via Embperl's $escmode variable (again, left as an exercise for the reader).

  sub perl_line {
      my ($self, %p) = @_;
      $self->{body} .= "[* $p{line} *]";
  }

This method is called for % -lines.

Then we need to implement the end_block() method:

  sub end_block {
      my ($self, %p) = @_;
  
      syntax_error "end of $p{block_type} encountered while in $self->{in_block} block"
          unless $self->{in_block} eq $p{block_type};

Another sanity check is in the start_block() method. It's always a good thing to make sure that the lexer is giving us the kind of input that we would expect.

      $self->{in_block} = undef;
  }

And we reset our in_block attribute so that the next call to start_block() succeeds.

The last method to implement is the component_as_embperl() method, which simply will return a big block of text, our new Embperl page:

  sub component_as_embperl {
      my $self = shift;
  
      my $page = '';
  
      if ( length $self->{once_header} ) {
          $page .= "[! $self->{once_header} !]\n";
      }
  
      if ( length $self->{header} ) {
          $page .= "[* $self->{header} *]\n";
      }
  
      if ( length $self->{body} ) {
          $page .= "$self->{body}\n";
      }
  
      if ( length $self->{footer} ) {
          $page .= "[* $self->{footer} *]\n";
      }
  
      return $page;
  }

And there you have it -- a perfectly good Mason component brutally butchered and turned into an Embperl page. I hope you're happy with yourself!

Storage: Replacing the Resolver

Occasionally, people on the Mason users list wonder if they can store their component source in an RDBMS. The way to achieve this is to create your own HTML::Mason::Resolver subclass.

The resolver's job is take a component path and figure out where the corresponding component is.

We will show an example that connects to a MySQL server containing the following table:

  MasonComponent
  ----------------------------------------
  path           VARCHAR(255)  PRIMARY KEY
  component      TEXT          NOT NULL
  last_modified  DATETIME      NOT NULL

Our code starts as follows:

  package HTML::Mason::Resolver::MySQL;
  $VERSION = '0.01';
  
  use strict;
  
  use DBI;
  use Params::Validate qw(:all);
  
  use HTML::Mason::ComponentSource;
  use HTML::Mason::Resolver;
  use base qw(HTML::Mason::Resolver);
  
  __PACKAGE__->valid_params
      (
       db_name  => { parse => 'string', type => SCALAR },
       user     => { parse => 'string', type => SCALAR, optional => 1 },
       password => { parse => 'string', type => SCALAR, optional => 1 },
      );

These parameters will be used to connect to the MySQL server containing our components. Readers familiar with the Perl DBI will realize that there are a number of other parameters that we could take.

Our constructor method, new(), needs to do a bit of initialization to set up the database connection, so we override our base class's method:

  sub new {
      my $class = shift;
      my $self = $class->SUPER::new(@_);

We invoke the new() method provided by our superclass, which validates the parameters in @_ and makes sure they get sent to the right contained objects. The latter concern doesn't seem so important in this case since we don't have any contained objects, but the point is that if somebody subclasses our HTML::Mason::Resolver::MySQL class and adds contained objects, our new() method will still do the right thing with its parameters.

Now we connect to the database in preparation for retrieving components later:

      $self->{dbh} =
          DBI->connect
              ( "dbi:mysql:$self->{db_name}",
                $self->{user}, $self->{password}, { RaiseError => 1 } );
      
      return $self;
  }

A resolver needs to implement two methods left unimplemented in the parent HTML::Mason::Resolver class. These are get_info() and glob_path(). The first is used to retrieve information about the component matching a particular component path. The second takes a glob pattern like /path/* or /path/*/foo/* and returns the component paths of all the components that match that wildcard path.

Additionally, if we want this resolver to be usable with the ApacheHandler module, we need to implement a method called apache_request_to_comp_path() , which takes an Apache object and translates it into a component path.

Given a path, we want to get the time when this component was last modified, in the form of a Unix timestamp, which is what Mason expects:

  sub get_info {
      my ($self, $path) = @_;
  
      my ($last_mod) =
          $self->{dbh}->selectrow_array
              ( 'SELECT UNIX_TIMESTAMP(last_modified) 
                 FROM MasonComponent WHERE path = ?',
                {}, $path );

      return unless $last_mod;

If there was no entry in the database for the given path, we simply return, which lets Mason know that no matching component was found:

      return
          HTML::Mason::ComponentSource->new
              ( comp_path => $path,
                friendly_name => $path,
                last_modified => $last_mod,
                comp_id => $path,
                source_callback => sub { $self->_get_source($path) },
              );
  }

The get_info() method returns its information in the form of a HTML::Mason::ComponentSource object. This is a very simple class that holds information about a component.

Its constructor accepts the following parameters:

  • comp_path

    This is the component path as given to the resolver.

  • friendly_name

    The string given for this parameter will be used to identify the component in error messages. For our resolver, the component path works for this parameter as well because it is the primary key for the MasonComponent table in the database, allowing us to uniquely identify a component.

    For other resolvers, this might differ from the component path. For example, the filesystem resolver that comes with Mason uses the component's absolute path on the filesystem.

  • last_modified

    This is the last modification time for the component, as seconds since the epoch.

  • comp_id

    This should be a completely unique identifier for the component. Again, since the component path is our primary key in the database, it works well here.

  • source_callback

    This is a subroutine reference that, when called, returns the source text of the component.

    Mason could have had you simply create an HTML::Mason::ComponentSource subclass that implemented a source() method for your resolver, but we thought that rather than requiring you to write such a do-nothing subclass, it would be easier to simply use a callback instead.

    Our _get_source() method is trivially simple:

      sub _get_source {
          my $self = shift;
          my $path = shift;
      
          return 
              $self->{dbh}->selectrow_array
                  ( 'SELECT component FROM MasonComponent WHERE path = ?', {}, $path );
      }
  • comp_class

    This is the component class into which this particular component should be blessed when it is created. This must be a subclass of HTML::Mason::Component. The default is HTML::Mason::Component.

  • extra

    This optional parameter should be a hash reference. It is used to pass information from the resolver to the component class.

    This is needed since an HTML::Mason::Resolver subclass and an HTML::Mason::Component subclass can be rather tightly coupled, but they must communicate with each other through the interpreter (this may change in the future).

Next is our glob_path() method:

  sub glob_path {
      my $self = shift;
      my $pattern = shift;
  
      $pattern =~~ s/*/%/g;

The pattern given will be something that could be passed to Perl's glob() function. We simply replace this with the SQL equivalent for a LIKE search:

      return
          $self->{dbh}->selectcol_array
              ( 'SELECT path FROM MasonComponent WHERE path LIKE ?', {}, $pattern );
  }

Then we return all the matching paths in the database.

Since we may want to use this resolver with ApacheHandler, we will also implement the apache_request_to_comp_path() method:

  sub apache_request_to_comp_path {
      my $self = shift;
      my $r = shift;
  
      my $path = $r->uri;
  
      return $path
          if $self->{dbh}->selectrow_array
              ( 'SELECT 1 FROM MasonComponent WHERE path = ?', {}, $path );
  
      return undef unless $r->path_info;
  
      $path .= $r->path_info;
  
      return $path
          if $self->{dbh}->selectrow_array
              ( 'SELECT 1 FROM MasonComponent WHERE path = ?', {}, $path );
  
      return undef;
  }

We generate a component path by taking the requested URI and looking for that in the database. If it doesn't exist, we will try appending the path info if possible or just give up. Finally, we try the altered path and, if that doesn't exist either, we just give up and return undef, which will cause the ApacheHandler module to return a NOT FOUND status for this request.

That's it, all done. And nothing left as an exercise for the reader this time.

As with the lexer, this can be used either via a httpd.conf directive:

  PerlSetVar  MasonResolverClass  HTML::Mason::Resolver::MySQL

or by passing the resolver_class parameter to the new() method for HTML::Mason::Interp.

Request: A Request Object with a Built-in Session

Wouldn't it be cool to have a request object with a built-in session? "Yes, it would," you answer. "Child's play," we say.

When a request is made using this object, it should either find an old session or create a new one. Then in our components we will simply call $m->session() to get back a hash reference that will persist between requests.

For simplicity's sake, we won't make this class configurable as to what type of session to use, though it could be done.3

  package HTML::Mason::Request::WithSession;
  $VERSION = '0.01';
  
  use strict;
  
  # Import a subroutine error( ) which throws an HTML::Mason::Exception
  # object
  use HTML::Mason::Exceptions ( abbr => [ 'error' ] );
  
  use HTML::Mason::ApacheHandler;
  use base qw(HTML::Mason::Request);

One problem unique to subclassing to the Request object is that Mason already comes with two of its own Request subclasses. These are HTML::Mason::Request::ApacheHandler and HTML::Mason::Request::CGIHandler, which are used by the ApacheHandler and CGIHandler, respectively.

In order to cooperate with the ApacheHandler and CGIHandler modules, we want to subclass the appropriate class. However, we can't know which one to subclass when we are loaded, because it is possible that we will be loaded before the ApacheHandler or CGIHandler module. We'll take care of this in our new() method, which will be discussed momentarily.

Our session will be implemented using cookies and Cache::FileCache for storage, just as we saw in Chapter 11:

  use Apache::Cookie;
  use Cache::FileCache;
  use Digest::SHA1;

We solve our subclassing problem with the following code. There is nothing wrong with changing a class's inheritance dynamically in Perl, so that's what we do. The alter_superclass() method is provided by the HTML::Mason::Request base class, and does the right thing even given multiple inheritance. It also cooperates with Class:Container to make sure that it sees any changes made to the inheritance hierarchy:

  sub new {
      my $class = shift;
  
          $class->alter_superclass( $HTML::Mason::ApacheHandler::VERSION ?
                                'HTML::Mason::Request::ApacheHandler' :
                                $HTML::Mason::CGIHandler::VERSION ?
                                'HTML::Mason::Request::CGI' :
                                'HTML::Mason::Request' );
  
      return $class->SUPER::new(@_);
  }

We make a session, call exec() in our parent class, taking care to preserve the caller's scalar/list context, and then save the session. If an exception is thrown, we simply rethrow it:

  sub exec {
      my $self = shift;
  
      $self->_make_session;
  
      my @result;
      if (wantarray) {
          @result = eval { $self->SUPER::exec(@_) };
      } elsif (defined wantarray) {
          $result[0] = eval { $self->SUPER::exec(@_) };
      } else {
          eval { $self->SUPER::exec(@_) };
      }
  
      # copy this in case _save_session overwrites $@
      my $e = $@;
  
      $self->_save_session;
  
      die $e if $e;
  
      return wantarray ? @result : defined wantarray ? $result[0] : undef;
  }

Making a new session for subrequests is probably incorrect behavior, so we simply reuse our parent's session object if a subrequest is exec()'d:

  sub _make_session {
      my $self = shift;
  
      if ( $self->is_subrequest ) {
          $self->{session} = $self->parent_request->session;
          return;
      }

This code is pulled almost verbatim from Chapter 11:

      my %c = Apache::Cookie->fetch;
      my $session_id =
          exists $c{masonbook_session} ? $c{masonbook_session}->value : undef;

      $self->{session_cache} =
          Cache::FileCache->new( { cache_root => '/tmp',
                                   namespace  => 'Mason-Book-Session',
                                   default_expires_in  => 60 * 60 * 24, # 1 day
                                   auto_purge_interval => 60 * 60 * 24, # 1 day
                                   auto_purge_on_set => 1 } );

      my $session;
      if ($session_id) {
          $session = $self->{session_cache}->get($session_id);
      }

      unless ($session) {
          $session = { _session_id => Digest::SHA1::sha1_hex( time, rand, $$ ) };
      }

      Apache::Cookie->new( $self->apache_req,
                           name => 'masonbook_session',
                           value => $session->{_session_id},
                           path => '/',
                           expires => '+1d',
                         )->bake;

      $self->{session} = $session;
  }

Also just like Chapter 11:

  sub _save_session {
      my $self = shift;
  
      $self->{session_cache}->set
          ( $self->{session}{_session_id} => $self->{session} );
  }

And to finish it off, a simple accessor method:

  sub session { $_[0]->{session} }

Wow, nice and simple. Of course, this would need to be customized for your environment, or you can use the previously mentioned HTML::Mason::Request::WithApacheSession module available from CPAN.

Once again, you have two options to use this new subclass. If you are configuring Mason via your httpd.conf file, do this:

  PerlSetVar  MasonRequestClass  HTML::Mason::Request::WithSession

or in your handler.pl you can load the module and then pass a request_class parameter to the HTML::Mason::ApacheHandler class's constructor.

Argument Munging: ApacheHandler

One of the main reasons that you might consider creating your own ApacheHandler class is to change the way arguments are processed. For example, we might want to create objects based on certain objects.

Our subclass starts like many others:

  package HTML::Mason::ApacheHandler::AddObjects;
  $VERSION = '0.01';
  
  use strict;
  
  use HTML::Mason::ApacheHandler;
  use base qw(HTML::Mason::ApacheHandler);

This should look pretty familiar. Now we'll load a few more classes, which we'll be using to create objects:

  use Date::ICal;  # date object
  use MyApp::User; # user object

And now we override the argument-processing subroutine, request_args():

  sub request_args {
      my $self = shift;
  
      my ($args, $r, $cgi_object) = $self->SUPER::request_args(@_);

ApacheHandler's request_args() method returns three items. The first is a hash reference containing the arguments that will be passed to the component. The second is the Apache or Apache::Request object for the current request, and the third is a CGI.pm object. The CGI.pm object is created only when the ApacheHandler's args_method attribute is set to CGI .

      if ( exists $args->{epoch} ) {
          $args->{date} = Date::ICal->new( epoch => $args->{epoch} );
      }

      if ( exists $args->{user_id} ) {
          $args->{user} = MyApp::User->new( user_id => $args->{user_id} );
      }

This bit of code simply creates some useful objects if certain incoming arguments exist. Finally we return the munged $args hash reference, along with the other return values from our superclass's request_args() method:

      return ($args, $r, $cgi_object);
  }

Now, whenever the client submits an argument called epoch, there will be an additional argument, date, a Date::ICal object, available in components. Similarly, a request with a user_id parameter means that there will be an argument user containing a MyApp::User object passed to components.

Using this class is a little different from what we've seen previously. In our httpd.conf file, we'd have something like this:

  <Location /mason>
   SetHandler perl-script
   PerlHandler HTML::Mason::ApacheHandler::AddObjects
  </Location>

Similarly, in our handler.pl file, we'd simply create this object instead of an HTML::Mason::ApacheHandler object.

More Reader Exercises

Consider the following possibilities for Mason subclasses:

  • A resolver subclass that allows you to have two files per component. One file could be primarily HTML and the other would be code. The resolver subclass would simply concatenate the two together.
  • A lexer subclass enabling ASP-style syntax.
  • An ApacheHandler subclass that munges incoming strings into Unicode, using the Encode module from CPAN.
  • A CGIHandler subclass that performs the same argument-to-object transformation seen in our example ApacheHandler subclass.

Footnotes

1. The equivalent of <% %> in the sane world where people don't use XML for everything! -- Return.

2. See Section 2.10 of the W3C XML 1.0 Recommendation for the definition of "ignorable whitespace." -- Return.

3. This is left as an exercise... Actually, this was left to the one of the authors. Dave Rolsky recently created MasonX::Request::WithApacheSession, which is a highly configurable module that expands on the example shown in this section. This module is available from a CPAN mirror near you. -- Return.


Table of Contents | Foreword | Preface
Chapters: 1 2 3 4 5 6 7 8 9 10 11 12
Appendices: A B C D
Glossary | Colophon | Copyright

These HTML pages were created by running this script against the pseudo-POD source.