Motivation

Make is great for simple applications, but sometimes I want more:

What I think I need is a memoizing function wrapper that can take into account not only the objects passed in to the functions, but also acknowledges that a function might read from its environment. The automatic and implicit rules of makefiles are not covered here -- that's an orthogonal task.

How it could work

An example of how that could look in practice

Assume we want to implement recursive building of static web sites from source pages:

@make.makable
def render_all_pages(maker, directory):
    for page in maker.list_files(directory):
        maker.make(self.render_page(page, destdir + page))

@make.makable
def render_page(self, maker, input_filename, output_filename):
    in_data = maker.read_file(filename)

    result = convert_formatting(in_data)

    print("Causing an undeclared side effect")

    with open(output_filename) as out:
        out.write(result)

>>> maker = make.Maker()
>>> maker.debug(read_file('some_file')) # note that maker.read_file(...) is a shortcut for maker.make(make_module.read_file(...))
<NativeMakable read_file('some_file')>: current hash 'nonexisting', can be waited on.
>>> maker.debug(render_all_pages('.'))
<PythonMakable render_all_pages('.')>: not yet evaluated, might be waitable on.
>>> maker.make(render_all_pages('.'))
Causing an undeclared side effect
Causing an undeclared side effect
>>> maker.make(render_all_pages('.'))
>>> maker.debug(render_all_pages('.'))
<PythonMakable render_all_pages('.')>: current hash 0x5d84, is waitable on
* <NativeMakable list_files('.')>: current hash 0x9375, can be waited on.
* <PythonMakable render_page('file_a', 'dest/file_a')>: current hash 0xfd87, is waitable on
  * <NativeMakable read_file('file_a')>: current hash 0xa398, can be waited on.
* <PythonMakable render_page('file_b', 'dest/file_b')>: current hash , is waitable on
  * <NativeMakable read_file('file_b')>: current hash 0x9f32a, can be waited on.

This is not backed by real code, but only a mockup. Both names and mechanisms are subject to change; especially, having an implicit (thread-local / task-local) maker object might be worth considering -- but the explicit maker at least makes the intention clearer here.

Open questions until here

Persistance

The above example, when executed with a new Maker instance (eg. when running the program again in a new Python instance), the information learned from making render_all_pages would be lost. It would need to be persisted, eg. by the Maker pickling the hashes to an appropriate place. Note that this storage mechanism should also include the source code file's hashes. They are not part of the makable tree at runtime, because we don't reload the Python modules when the source changed and we run the function again either. (Although we might be able to do that here ;-) .)

It is expected that a practical implementation of all this would provide tools to make this relatively transparent.

Convergence

In most applications, running a single iteration is idempotent: No input data has changed, so the next run (even without the whole make framework) will have exactly the same result as the first run. In some applications, this is not the case: Droste images are the obvious example, LaTeX files (which store page numbers of references in a dedicated file which they use in a second run) are the more practical one.

Well-conditioned tasks converge after some number of iterations, meaning that they reach a state where the make run is idempotent again. With droste images, this happens when the smallest (eg. 1x1px) image reaches the image's average color (typically after up to ten iterations); with LaTeX files, this is often the case after the second iteration (how many texts grow an extra page just because they have actual page numbers instead of '?' characters?).

Of course, unstable inputs can be constructed; practical runners will want to have an iteration limit.

Event driven operation

Some native makables could declare that they can be used in an event driven operation, eg. in a daemon that starts rebuilding things whenever the source files change. The NativeMakable examples above (list_files, read_file) could be implemented using inotify. Others (eg. database queries) might not have that option.

def make_continuously(makable):
    m = make.Maker()
    while True:
        m.make(makable)
        # this will raise Unwaitable if an input is used that is not waitable
        yield from make.wait_until_parameter_changes(makable)

(FIXME: this paragraph is not completely to-the-point, but left in here as i've frequently come back to the issue before) Note that even though this is event driven, the sequence of events in a single run is still obtained from an imperatively programmed makable, which needs to take the right sequencing into account. Take a situation where file A is built into A.out, and B and A.out are built into AB.out; further, assume tha the makables don't really know about their output data (see question on declaring side effects above). If A and B are changed simultaneously, a good makable would first build A.out and then AB.out. Even if we had a bad makable, though, which first tried to updater AB.out from the outdated A.out and B, and then updated A.out from A, the hashes of A.out would not match the hashed expectations of the makable any more by the time it completes, so the make process would be run again. It would be slower and wasteful, but still arrive at a correct result. A maker based on more declarative rules might of course avoid such a situation, but that can run atop of all this (but might then need the information on side effects).

Series of hashes

In some cases, it might be costly to determine whether an input actually changed. For example, the conent of a file could have remained the same even though its stats have changed, but we don't want to read the whole file when having an unmodified stat is a sufficient condition for knowing that the file did not change. (We'll work on this assumption here, whether it is actually applicable depends on the situation and thus configuration).

A native observable might therefore implement a series of hashing functions, which are only evaluated on demand (ie. if previous hashes don't match). It is possible that such a mechanism is not needed at all, though, if the native makables are sufficiently low level and higher level functions (like read_file) are actually implemented as PythonMakable, which would then provide just the desired functionality anyway.

Similar approaches

Incubator status

This is pretty rough, and chances are that a framework that can do this already exists (and I just haven't found it yet).

--chrysn 2015-04-29


This page is part of chrysn's public personal idea incubator; go up for its other entries, or read about the idea of having an idea incubator for more information on what this is.