I have used template engines for a long time and finally have some time to find out how a template engine works.
Briefly, a template engine is a tool that you can use to do programming tasks involving a lot of textual data. The most common usage is HTML generation in web applications. Specifically in Python, we have a few options right now if you want one template engine, like jinja or mako. Here we are going to find out how a template engine works by digging in the template module of the tornado web framework, it is a simple system so we can focus on the basic ideas of the process.
Before we go into the implementation detail, let's look at the simple API usage first:
from tornado import template
PAGE_HTML = """
<html>
Hello, {{ username }}!
<ul>
{% for job in job_list %}
<li>{{ job }}</li>
{% end %}
</ul>
</html>
"""
t = template.Template(PAGE_HTML)
print t.generate(username='John', job_list=['engineer'])
Here the user's name will be dynamic in the page html, so are a list of jobs. You
can install tornado
and run the code to see the output.
If we look at the PAGE_HTML
closely, we could easily find out that a template string
has two parts, the static literal text part and the dynamic part. We use special
notation to distinguish the dynamic part. In the whole, the template engine should
take the template string and output the static part as it is, it also needs to
handle the dynamic pieces with the given context and produce the right string result.
So basically a template engine is just one Python function:
def template_engine(template_string, **context):
# process here
return result_string
During the processing procedure, the template engine has two phases:
The parsing stage takes the template string and produces something that could be rendered. Consider the template string as source code, the parsing tool could be either a programming language interpreter or a programming language compiler. If the tool is an interpreter, parsing produces a data structure, the rendering tool will walk through the structure and produces the result text. The Django template engine parsing tool is an interpreter. Otherwise, parsing produces some executable code, the rendering tool does nothing but executes the code and produces the result. The Jinja2, Mako and Tornado template module are all using a compiler as parsing tool.
As said above, we now need to parse the template string, and the parsing tool in tornado template module compiles templates to Python code. Our parsing tool is simply one Python function that does Python code generation:
def parse_template(template_string):
# compilation
return python_source_code
Before we get to the implementation of parse_template
, let's see the code it
produces, here is an example template source string:
<html>
Hello, {{ username }}!
<ul>
{% for job in jobs %}
<li>{{ job.name }}</li>
{% end %}
</ul>
</html>
Our parse_template
function will compile this template to Python code, which is
just one function, the simplified version is:
def _execute():
_buffer = []
_buffer.append('\n<html>\n Hello, ')
_tmp = username
_buffer.append(str(_tmp))
_buffer.append('!\n <ul>\n ')
for job in jobs:
_buffer.append('\n <li>')
_tmp = job.name
# simplified, several checks should be done on _tmp here
_buffer.append(str(_tmp))
_buffer.append('</li>\n ')
_buffer.append('\n </ul>\n</html>\n')
return ''.join(_buffer)
Now our template is parsed into a function called _execute
, this function access
all context variables from global namespace. This function creates a list of strings
and join them together as the result string. The username
is put in a local name
_tmp
, looking up a local name is much faster than looking up a global. There are
other optimizations that can be done here, like:
_buffer.append('hello')
_append_buffer = _buffer.append
# faster for repeated use
_append_buffer('hello')
Expressions in {{ ... }}
are evaluated and appended to the string buffer list.
In the tornado template module, there is no restrictions on the expressions you can
include in your statements, if and for blocks get translated exactly into Python.
Let's see the real implementation now. The core interface that we are using is the
Template
class, when we create one Template
object, we compile the template string
and later we can use it to render a given context. We only need to compile once and
you can cache the template object anywhere, the simplified version of constructor:
class Template(object):
def __init__(self, template_string):
self.code = parse_template(template_string)
self.compiled = compile(self.code, '<string>', 'exec')
The compile
will compile the source into a code object. We can execute it
later with an exec
statement. Now let's build the parse_template
function,
firstly we need to parse our template string into a list of nodes that knows
how to generate Python code, we need a function called _parse
, we will see the
function later, we need some helpers now, to help with reading through the template
file, we have the _TemplateReader
class, which handles the reading for us as we
consume the template file. We need to start from the begining and keep going ahead
to find some special notations, the _TemplateReader
will keep the current position
and give us ways to do it:
class _TemplateReader(object):
def __init__(self, text):
self.text = text
self.pos = 0
def find(self, needle, start=0, end=None):
pos = self.pos
start += pos
if end is None:
index = self.text.find(needle, start)
else:
end += pos
index = self.text.find(needle, start, end)
if index != -1:
index -= pos
return index
def consume(self, count=None):
if count is None:
count = len(self.text) - self.pos
newpos = self.pos + count
s = self.text[self.pos:newpos]
self.pos = newpos
return s
def remaining(self):
return len(self.text) - self.pos
def __len__(self):
return self.remaining()
def __getitem__(self, key):
if key < 0:
return self.text[key]
else:
return self.text[self.pos + key]
def __str__(self):
return self.text[self.pos:]
To help with generating the Python code, we need the _CodeWriter
class, this class
writes lines of codes and manages indentation, also it is one Python context manager:
class _CodeWriter(object):
def __init__(self):
self.buffer = cStringIO.StringIO()
self._indent = 0
def indent(self):
return self
def indent_size(self):
return self._indent
def __enter__(self):
self._indent += 1
return self
def __exit__(self, *args):
self._indent -= 1
def write_line(self, line, indent=None):
if indent == None:
indent = self._indent
for i in xrange(indent):
self.buffer.write(" ")
print >> self.buffer, line
def __str__(self):
return self.buffer.getvalue()
In the begining of the parse_template
, we create one template reader first:
def parse_template(template_string):
reader = _TemplateReader(template_string)
file_node = _File(_parse(reader))
writer = _CodeWriter()
file_node.generate(writer)
return str(writer)
Then we pass the reader to the _parse
function and produces a list of nodes.
All of there nodes are the child nodes of the template file node. We create
one CodeWriter object, the file node writes Python code into the CodeWriter,
and we return the generated Python code. The _Node
class would handle the Python
code generation for a specific case, we will see it later. Now let's go back to our
_parse
function:
def _parse(reader, in_block=None):
body = _ChunkList([])
while True:
# Find next template directive
curly = 0
while True:
curly = reader.find("{", curly)
if curly == -1 or curly + 1 == reader.remaining():
# EOF
if in_block:
raise ParseError("Missing {%% end %%} block for %s" %
in_block)
body.chunks.append(_Text(reader.consume()))
return body
# If the first curly brace is not the start of a special token,
# start searching from the character after it
if reader[curly + 1] not in ("{", "%"):
curly += 1
continue
# When there are more than 2 curlies in a row, use the
# innermost ones. This is useful when generating languages
# like latex where curlies are also meaningful
if (curly + 2 < reader.remaining() and
reader[curly + 1] == '{' and reader[curly + 2] == '{'):
curly += 1
continue
break
We loop forever to find a template directive in the remaining file, if we reach the end of the file, we append the text node and exit, otherwise, we have found one template directive.
# Append any text before the special token
if curly > 0:
body.chunks.append(_Text(reader.consume(curly)))
Before we handle the special token, we append the text node if there is static part.
start_brace = reader.consume(2)
Get our start brace, if should be '{{'
or '{%'
.
# Expression
if start_brace == "{{":
end = reader.find("}}")
if end == -1 or reader.find("\n", 0, end) != -1:
raise ParseError("Missing end expression }}")
contents = reader.consume(end).strip()
reader.consume(2)
if not contents:
raise ParseError("Empty expression")
body.chunks.append(_Expression(contents))
continue
The start brace is '{{'
and we have an expression here, just get the contents
of the expression and append one _Expression
node.
# Block
assert start_brace == "{%", start_brace
end = reader.find("%}")
if end == -1 or reader.find("\n", 0, end) != -1:
raise ParseError("Missing end block %}")
contents = reader.consume(end).strip()
reader.consume(2)
if not contents:
raise ParseError("Empty block tag ({% %})")
operator, space, suffix = contents.partition(" ")
# End tag
if operator == "end":
if not in_block:
raise ParseError("Extra {% end %} block")
return body
elif operator in ("try", "if", "for", "while"):
# parse inner body recursively
block_body = _parse(reader, operator)
block = _ControlBlock(contents, block_body)
body.chunks.append(block)
continue
else:
raise ParseError("unknown operator: %r" % operator)
We have a block here, normally we would get the block body recursively and append
a _ControlBlock
node, the block body should be a list of nodes. If we encounter
an {% end %}
, the block ends and we exit the function.
It is time to find out the secrets of _Node
class, it is quite simple:
class _Node(object):
def generate(self, writer):
raise NotImplementedError()
class _ChunkList(_Node):
def __init__(self, chunks):
self.chunks = chunks
def generate(self, writer):
for chunk in self.chunks:
chunk.generate(writer)
A _ChunkList
is just a list of nodes.
class _File(_Node):
def __init__(self, body):
self.body = body
def generate(self, writer):
writer.write_line("def _execute():")
with writer.indent():
writer.write_line("_buffer = []")
self.body.generate(writer)
writer.write_line("return ''.join(_buffer)")
A _File
node write the _execute
function to the CodeWriter.
class _Expression(_Node):
def __init__(self, expression):
self.expression = expression
def generate(self, writer):
writer.write_line("_tmp = %s" % self.expression)
writer.write_line("_buffer.append(str(_tmp))")
class _Text(_Node):
def __init__(self, value):
self.value = value
def generate(self, writer):
value = self.value
if value:
writer.write_line('_buffer.append(%r)' % value)
The _Text
and _Expression
node are also really simple, just append what you
get from the template source.
class _ControlBlock(_Node):
def __init__(self, statement, body=None):
self.statement = statement
self.body = body
def generate(self, writer):
writer.write_line("%s:" % self.statement)
with writer.indent():
self.body.generate(writer)
For a _ControlBlock
node, we need to indent and write our child node list with
the indentation.
Now let's get back to the rendering part, we render a context by using the generate
method of Template
object, the generate
function just call the compiled Python
code:
def generate(self, **kwargs):
namespace = {}
namespace.update(kwargs)
exec self.compiled in namespace
execute = namespace["_execute"]
return execute()
The exec
function executes the compiled code object in the given global namespace,
then we grab our _execute
function from the global namespace and call it.
So that's all, compile the template to Python function and execute it to get result. The tornado template module has more features than we've discussed here, but we already know well about the basic idea, you can find out more if you are interested: