Motivation

Make is great for simple applications, but sometimes I want more:

file names with whitespace
depending on something different than files (eg. depending on results of database queries)
configurability on whether dependencies are handled based on time stamps, file hashes or stat hashes
a monitoring mode for event-based reaction
using that for a single function in a different programming language
building rules programmatically
iterating converging runs (eg. building a droste effect image)
building rules from domain specific languages

What I think I need is a memoizing function wrapper that can take into account not only the objects passed in to the functions, but also acknowledges that a function might read from its environment. The automatic and implicit rules of makefiles are not covered here -- that's an orthogonal task.

How it could work

An example of how that could look in practice

Assume we want to implement recursive building of static web sites from source pages:

@make.makable
def render_all_pages(maker, directory):
    for page in maker.list_files(directory):
        maker.make(self.render_page(page, destdir + page))

@make.makable
def render_page(self, maker, input_filename, output_filename):
    in_data = maker.read_file(filename)

    result = convert_formatting(in_data)

    print("Causing an undeclared side effect")

    with open(output_filename) as out:
        out.write(result)

>>> maker = make.Maker()
>>> maker.debug(read_file('some_file')) # note that maker.read_file(...) is a shortcut for maker.make(make_module.read_file(...))
<NativeMakable read_file('some_file')>: current hash 'nonexisting', can be waited on.
>>> maker.debug(render_all_pages('.'))
<PythonMakable render_all_pages('.')>: not yet evaluated, might be waitable on.
>>> maker.make(render_all_pages('.'))
Causing an undeclared side effect
Causing an undeclared side effect
>>> maker.make(render_all_pages('.'))
>>> maker.debug(render_all_pages('.'))
<PythonMakable render_all_pages('.')>: current hash 0x5d84, is waitable on
* <NativeMakable list_files('.')>: current hash 0x9375, can be waited on.
* <PythonMakable render_page('file_a', 'dest/file_a')>: current hash 0xfd87, is waitable on
  * <NativeMakable read_file('file_a')>: current hash 0xa398, can be waited on.
* <PythonMakable render_page('file_b', 'dest/file_b')>: current hash , is waitable on
  * <NativeMakable read_file('file_b')>: current hash 0x9f32a, can be waited on.

This is not backed by real code, but only a mockup. Both names and mechanisms are subject to change; especially, having an implicit (thread-local / task-local) maker object might be worth considering -- but the explicit maker at least makes the intention clearer here.

Open questions until here

So far, we don't declare side effects. This is largely ok as the actually effect can be detected again (for example, if we overwrote the input file, read_file's hash would change), but if the output file is removed (eg. by clearing caches), no input would have been changed so no output happens.
Is it a good idea for the makable to be callable? I've seen to many forgotten yield from in asyncio, the issues would be similar here. An alternative would be maker.make(render_all_pages, '.'), but that means that all functions dealing with makables would need to use all their *args for the makable, makables could only be passed around as tuples, and would not have a repr on their own as makable would be just a name for a tuple passed in to *args.
Should .make() run a single iteration or until things converge? (Note that in all scenarios that can be handled by makefiles that don't break the loop, things have converged after a run, as no overwriting of input is allowed). For now, we'll assume it runs a single iteration.

Persistance

The above example, when executed with a new Maker instance (eg. when running the program again in a new Python instance), the information learned from making render_all_pages would be lost. It would need to be persisted, eg. by the Maker pickling the hashes to an appropriate place. Note that this storage mechanism should also include the source code file's hashes. They are not part of the makable tree at runtime, because we don't reload the Python modules when the source changed and we run the function again either. (Although we might be able to do that here ;-) .)

It is expected that a practical implementation of all this would provide tools to make this relatively transparent.

Convergence

In most applications, running a single iteration is idempotent: No input data has changed, so the next run (even without the whole make framework) will have exactly the same result as the first run. In some applications, this is not the case: Droste images are the obvious example, LaTeX files (which store page numbers of references in a dedicated file which they use in a second run) are the more practical one.

Well-conditioned tasks converge after some number of iterations, meaning that they reach a state where the make run is idempotent again. With droste images, this happens when the smallest (eg. 1x1px) image reaches the image's average color (typically after up to ten iterations); with LaTeX files, this is often the case after the second iteration (how many texts grow an extra page just because they have actual page numbers instead of '?' characters?).

Of course, unstable inputs can be constructed; practical runners will want to have an iteration limit.

Event driven operation

Some native makables could declare that they can be used in an event driven operation, eg. in a daemon that starts rebuilding things whenever the source files change. The NativeMakable examples above (list_files, read_file) could be implemented using inotify. Others (eg. database queries) might not have that option.

def make_continuously(makable):
    m = make.Maker()
    while True:
        m.make(makable)
        # this will raise Unwaitable if an input is used that is not waitable
        yield from make.wait_until_parameter_changes(makable)

(FIXME: this paragraph is not completely to-the-point, but left in here as i've frequently come back to the issue before) Note that even though this is event driven, the sequence of events in a single run is still obtained from an imperatively programmed makable, which needs to take the right sequencing into account. Take a situation where file A is built into A.out, and B and A.out are built into AB.out; further, assume tha the makables don't really know about their output data (see question on declaring side effects above). If A and B are changed simultaneously, a good makable would first build A.out and then AB.out. Even if we had a bad makable, though, which first tried to updater AB.out from the outdated A.out and B, and then updated A.out from A, the hashes of A.out would not match the hashed expectations of the makable any more by the time it completes, so the make process would be run again. It would be slower and wasteful, but still arrive at a correct result. A maker based on more declarative rules might of course avoid such a situation, but that can run atop of all this (but might then need the information on side effects).

Series of hashes

In some cases, it might be costly to determine whether an input actually changed. For example, the conent of a file could have remained the same even though its stats have changed, but we don't want to read the whole file when having an unmodified stat is a sufficient condition for knowing that the file did not change. (We'll work on this assumption here, whether it is actually applicable depends on the situation and thus configuration).

A native observable might therefore implement a series of hashing functions, which are only evaluated on demand (ie. if previous hashes don't match). It is possible that such a mechanism is not needed at all, though, if the native makables are sufficiently low level and higher level functions (like read_file) are actually implemented as PythonMakable, which would then provide just the desired functionality anyway.

Similar approaches

make: Limited to files. Only time stamped operation. Great set of default rules. Hard to use with nested directories.
angular.js: While not a building system, their $watch mechanism has similarities to what is described as maker.make here. As opposed to angular, this approach allows nested watches in regular programming structures, and explicit notification about individual changed data sources and subsequent evaluation of only the affected routines.
The ikiwiki RDF backend idea: An ancestor of this idea. If the problems described above are solved, an RDF based static wiki's core should be doable in a few dozen lines of code plus plugins.

Incubator status

This is pretty rough, and chances are that a framework that can do this already exists (and I just haven't found it yet).

--chrysn 2015-04-29

This page is part of chrysn's public personal idea incubator; go up for its other entries, or read about the idea of having an idea incubator for more information on what this is.