Make is great for simple applications, but sometimes I want more:
What I think I need is a memoizing function wrapper that can take into account not only the objects passed in to the functions, but also acknowledges that a function might read from its environment. The automatic and implicit rules of makefiles are not covered here -- that's an orthogonal task.
Assume we want to implement recursive building of static web sites from source pages:
@make.makable
def render_all_pages(maker, directory):
for page in maker.list_files(directory):
maker.make(self.render_page(page, destdir + page))
@make.makable
def render_page(self, maker, input_filename, output_filename):
in_data = maker.read_file(filename)
result = convert_formatting(in_data)
print("Causing an undeclared side effect")
with open(output_filename) as out:
out.write(result)
>>> maker = make.Maker()
>>> maker.debug(read_file('some_file')) # note that maker.read_file(...) is a shortcut for maker.make(make_module.read_file(...))
<NativeMakable read_file('some_file')>: current hash 'nonexisting', can be waited on.
>>> maker.debug(render_all_pages('.'))
<PythonMakable render_all_pages('.')>: not yet evaluated, might be waitable on.
>>> maker.make(render_all_pages('.'))
Causing an undeclared side effect
Causing an undeclared side effect
>>> maker.make(render_all_pages('.'))
>>> maker.debug(render_all_pages('.'))
<PythonMakable render_all_pages('.')>: current hash 0x5d84, is waitable on
* <NativeMakable list_files('.')>: current hash 0x9375, can be waited on.
* <PythonMakable render_page('file_a', 'dest/file_a')>: current hash 0xfd87, is waitable on
* <NativeMakable read_file('file_a')>: current hash 0xa398, can be waited on.
* <PythonMakable render_page('file_b', 'dest/file_b')>: current hash , is waitable on
* <NativeMakable read_file('file_b')>: current hash 0x9f32a, can be waited on.
This is not backed by real code, but only a mockup. Both names and mechanisms are subject to change; especially, having an implicit (thread-local / task-local) maker object might be worth considering -- but the explicit maker at least makes the intention clearer here.
So far, we don't declare side effects. This is largely ok as the actually
effect can be detected again (for example, if we overwrote the input file,
read_file
's hash would change), but if the output file is removed (eg. by
clearing caches), no input would have been changed so no output happens.
Is it a good idea for the makable to be callable? I've seen to many forgotten
yield from
in asyncio, the issues would be similar here. An alternative
would be maker.make(render_all_pages, '.')
, but that means that all
functions dealing with makables would need to use all their *args for the
makable, makables could only be passed around as tuples, and would not have a
repr on their own as makable would be just a name for a tuple passed in to
*args.
Should .make() run a single iteration or until things converge? (Note that in all scenarios that can be handled by makefiles that don't break the loop, things have converged after a run, as no overwriting of input is allowed). For now, we'll assume it runs a single iteration.
The above example, when executed with a new Maker instance (eg. when running
the program again in a new Python instance), the information learned from
making render_all_pages
would be lost. It would need to be persisted, eg. by
the Maker pickling the hashes to an appropriate place. Note that this storage
mechanism should also include the source code file's hashes. They are not
part of the makable tree at runtime, because we don't reload the Python modules
when the source changed and we run the function again either. (Although we
might be able to do that here ;-) .)
It is expected that a practical implementation of all this would provide tools to make this relatively transparent.
In most applications, running a single iteration is idempotent: No input data has changed, so the next run (even without the whole make framework) will have exactly the same result as the first run. In some applications, this is not the case: Droste images are the obvious example, LaTeX files (which store page numbers of references in a dedicated file which they use in a second run) are the more practical one.
Well-conditioned tasks converge after some number of iterations, meaning that they reach a state where the make run is idempotent again. With droste images, this happens when the smallest (eg. 1x1px) image reaches the image's average color (typically after up to ten iterations); with LaTeX files, this is often the case after the second iteration (how many texts grow an extra page just because they have actual page numbers instead of '?' characters?).
Of course, unstable inputs can be constructed; practical runners will want to have an iteration limit.
Some native makables could declare that they can be used in an event driven
operation, eg. in a daemon that starts rebuilding things whenever the source
files change. The NativeMakable examples above (list_files
, read_file
)
could be implemented using inotify. Others (eg. database queries) might not
have that option.
def make_continuously(makable):
m = make.Maker()
while True:
m.make(makable)
# this will raise Unwaitable if an input is used that is not waitable
yield from make.wait_until_parameter_changes(makable)
(FIXME: this paragraph is not completely to-the-point, but left in here as i've
frequently come back to the issue before) Note that even though this is event
driven, the sequence of events in a single run is still obtained from an
imperatively programmed makable, which needs to take the right sequencing into
account. Take a situation where file A
is built into A.out
, and B
and
A.out
are built into AB.out
; further, assume tha the makables don't really
know about their output data (see question on declaring side effects above). If
A
and B
are changed simultaneously, a good makable would first build
A.out
and then AB.out
. Even if we had a bad makable, though, which first
tried to updater AB.out
from the outdated A.out
and B
, and then updated
A.out
from A
, the hashes of A.out
would not match the hashed expectations
of the makable any more by the time it completes, so the make process would be
run again. It would be slower and wasteful, but still arrive at a correct
result. A maker based on more declarative rules might of course avoid such a
situation, but that can run atop of all this (but might then need the
information on side effects).
In some cases, it might be costly to determine whether an input actually changed. For example, the conent of a file could have remained the same even though its stats have changed, but we don't want to read the whole file when having an unmodified stat is a sufficient condition for knowing that the file did not change. (We'll work on this assumption here, whether it is actually applicable depends on the situation and thus configuration).
A native observable might therefore implement a series of hashing functions,
which are only evaluated on demand (ie. if previous hashes don't match). It is
possible that such a mechanism is not needed at all, though, if the native
makables are sufficiently low level and higher level functions (like
read_file
) are actually implemented as PythonMakable, which would then
provide just the desired functionality anyway.
$watch
mechanism has
similarities to what is described as maker.make
here. As opposed to
angular, this approach allows nested watches in regular programming
structures, and explicit notification about individual changed data sources
and subsequent evaluation of only the affected routines.This is pretty rough, and chances are that a framework that can do this already exists (and I just haven't found it yet).
--chrysn 2015-04-29
This page is part of chrysn's public personal idea incubator; go up for its other entries, or read about the idea of having an idea incubator for more information on what this is.