How Jardeps works


Jardeps is a library for GNU Make for compiling Java.

Whenever a tree is compiled, it is effectively a local clean build, as all the roots are passed to javac.

The compiler has been modified so that it remembers lists of classes for which it created classfiles, and which source files it used to do that. The first list is used to build a jar from the generated class files, and the second list is used to build rules to detect when a source file changes such that it is necessary to recompile the tree.

The compiler also re-analyzes the generated class files to generate a public profile of that group of classes, which is a canonicalized list of each class's public and protected members, including their containing class and type signature. For constants, the value is also used. A copy of the profile for tree foo is kept in tmp/tree-foo.api, but its timestamp is only updated if its content has changed since the previous build.

The old way, with order-only dependencies

When a tree foo is stated to depend on another tree bar, two rules are generated:

tmp/tree-foo.compiled: | tmp/tree-bar.compiled
tmp/tree-foo.compiled: tmp/tree-bar.api

tree-foo.compiled is the target for building tree foo. It is only touched after javac and profile creation complete successfully.

The first rule is ‘order-only’. It states that tmp/tree-bar.compiled must have been completed before foo is compiled. However, it does not require that foo be compiled just because bar has changed. In other words, the timestamps of the two .compiled files are not compared.

The second rule is a normal one stating that foo must be recompiled if the public profile of bar has changed. There is no rule that targets tmp/tree-bar.api, but by the time the first rule has been satisfied, that file will exist, so the second rule raises no error. Timestamps are compared, so any changes to the public profile of bar will cause tmp/tree-bar.api to change, and force recompilation of foo.

So, when targeting the compilation of foo, there are four cases:

  • bar has not been compiled, and the first rule ensures that it will be before starting on foo.

  • bar was already compiled, and did not need recompilation. foo will not be recompiled.

  • bar was already compiled, and needed recompilation, but its profile did not change. The timestamp of the profile file will be unchanged, so foo will not be recompiled.

  • bar was already compiled, needed recompilation, and its profile changed. The timestamp of the profile file changed to reflect this, so foo will be recompiled.

The new way, which is compatible with a parallel build

If tree foo is stated to depend on the profile of tree bar, the following straight-forward rule is created automatically:

tmp/tree-foo.compiled: tmp/tree-bar.api

A generic rule shows how to make tree-bar.api, having just compiled tree bar:

tmp/tree-%.api: tmp/tree-%.compiled
	cmp -s tmp/tree-$*.api-tmp tmp/tree-$*.api || \
	  cp tmp/tree-$*.api-tmp tmp/tree-$*.api

Note that it meets its target by simply copying tree-bar.api-tmp (a by-product of compilation) to tree-bar.api, but only if they differ.

The advantage of doing this, instead of using order-only rules, is that this method is safe for parallel builds.

A disadvantage is that, if a tree is compiled, but fails to change its profile, make will never report that there is nothing to be done. Instead, it will repeatedly try to update the profile, and never achieve it. The result is that something is done, but has no effect, and is not reported, and that could be disconcerting.

You also need this for each tree, to ensure that it is compiled before attempting to check the profile timestamp:

tmp/tree-bar.api: | tmp/tree-bar.compiled

In summary, if you have a target dependant, which depends on byproduct, a by-product of mainproduct, you use the following rules:

mainproduct:
	create mainproduct and byproduct-tmp

byproduct: | mainproduct
	cmp -s byproduct-tmp byproduct || \
	  cp byproduct-tmp byproduct

dependant: byproduct

In summary:

  • The smallest dependency unit is the (source) tree — Abstractly, a tree can depend on the public profile, package-private profile, or the implementation of another tree.

  • Rules for make can be generated automatically from abstract expressions of dependencymake works for languages like C because a module's profile (its header file) is already separate from its implementation (source file). For Java, we can extract the profile automatically after each compilation. We obtain the necessary semantics by only replacing the generated profile file when its content changes, and by using order-only rules to relate the builds of different trees.

  • Inter-tree dependencies are specified manually — It is reasonable for the programmer to be concious of dependencies between his own trees, as they will be part of the design he is following. What this system provides is a per-tree incremental build, by-passing the potential circularity of per-class dependencies — there should never be any circularity at the per-jar or per-tree levels.

  • The full set of files from which a jar is built need not be known before compilation, only after — A tree implicitly depends on the set of source files for the classes it contains. A subset of root classes must be specified beforehand, but it is reasonable for the programmer to know this set, as it is his goal to make a tree containing those classes. From the root set, the full set can be automatically derived after each compilation.

  • The Java compiler is best placed to generate the information that this system needs — Early attempts to use Awk to parse the verbose compiler and (later) disassembler outputs are simply not future-proof. Get the compiler to keep track of what it's doing, and you obviate both the parsing and the post-processing.

Profile format

Although the exact format of the profile file is not important, it must have certain characteristics. Here's an example extract from a real profile:

uk/ac/lancs/scc/one/edl/Edit.channels method public ()Ljava/util/Map<Ljava/lang/String;Luk/ac/lancs/scc/one/edl/Channel;>;
uk/ac/lancs/scc/one/edl/Edit iface public Ljava/lang/Object;
uk/ac/lancs/scc/one/edl/Edits class final public Ljava/lang/Object;
uk/ac/lancs/scc/one/edl/Edits.EDL_NS_URI field public static final Ljava/lang/String;="http://scc.lancs.ac.uk/ns/edl/2012"
uk/ac/lancs/scc/one/edl/Edit.segments method public ()Ljava/util/List<Luk/ac/lancs/scc/one/edl/Segment;>;

Each line describes a program element, starting with its fully qualified name. The rest of the line must include the element kind, its attributes and its full generic type. Final static fields also include their value, and methods include their exceptions. Such a set of lines can be canonicalized simply by sorting, which prevents instability in the ordering of elements from triggering a rebuild of dependent trees.

(I wonder, should annotations be present? And their values? Maybe source-level annotations can be excluded.)

Order-only dependency

This section exists to explain a technique that is no longer used by Jardeps. It might be re-adopted in future, if its problem with parallelism can be fixed.

What is order-only dependency? In make, when you write a rule such as:

foo: bar baz

…you indicate that the construction of foo depends on the prior construction of bar and baz. What does this mean procedurally?

make foo triggers the following actions:

  1. make bar baz
  2. If foo does not exist, or foo is older than bar or baz, run the commands associated with foo.

Those commands normally create foo with an up-to-date timestamp, so the timestamp comparison in the second action will fail on an immediately subsequent run.

If we change the relationship between foo and baz, by using an order-only rule as follows:

foo: bar | baz

…we no longer perform the timestamp comparison with baz, so the actions now look like this:

  1. make bar baz
  2. If foo does not exist, or foo is older than bar, run the commands associated with foo.

It is this change which allows Jardeps to do something difficult with Java that is trivial with C: allow module/tree foo to depend on the public characteristics of another module/tree bar, ignoring its internal details. In C, the public characteristics are expressed in a manually separate header file bar.h, and the rule relating foo to it is simple:

foo.o: bar.h

In Java, the easiest way to extract the header is to compile the tree, and examine the generated files. Then we can express the dependency with:

foo.compile: bar.header

But if we make the header target do the compilation:

bar.header:
	compile bar
	extract bar.header from classes of bar

…the header's timestamp will change even if its contents do not. If instead we only update the header if its content changes:

bar.header:
	compile bar
	extract bar.header-tmp from classes of bar
	copy bar.header-tmp to bar.header if different

Make will continue to think that the header is out-of-date as long as it doesn't change. An order-only rule means we don't have to have a rule for building bar.header, because it is a side-effect of building bar.compile:

foo.compile: | bar.compile
foo.compile: bar.header

bar.compile:
	compile bar
	extract bar.header-tmp from classes of bar
	copy bar.header-tmp to bar.header if different
	touch bar.compile

The behaviour of make foo.compile is now:

  1. make bar.compile bar.header
  2. If foo.compile does not exist, or foo.compile is older than bar.header, run the commands associated with foo.compile.

The first action will complete the construction of bar.compile before testing whether bar.header exists, and will ensure that it does exist before that test occurs, without necessarily timestamping it. The second action will test that timestamp, and may find construction of foo.compile unnecessary.

Here's another use for order-only rules, this time with a C program that makes use of some form of IDL. We wish to generate dependencies for each module after it is compiled, but that only works if there are no generated header files. If we make all our modules depend on the target that causes the IDL file to be compiled, our entire program will have to be recompiled on even the slightest inconsequential change to the IDL file. Instead, we make the dependency order-only, and ensure that IDL compilation does not touch files whose contents do not change:

# for GCC
%.o: %.c
	$(COMPILE.c) -o "$@" -MMD -MT "$@" -MF "deps-$*.mk" -MP "$<"

# for some fictitious IDL compiler
%.done: %.idl
	idl "$<" -th "$*_types.h-tmp" \
	         -ts "$*_typesimpl.h-tmp" \
	         -ch "$*_client.h-tmp" \
		 -cs "$*_clientimpl.h-tmp" \
	         -sh "$*_server.h-tmp" \
		 -ss "$*_serverimpl.h-tmp"
	cmp -s "$*_types.h-tmp" "$*_types.h" || \
	  mv "$*_types.h-tmp" "$*_types.h"
	cmp -s "$*_typesimpl.h-tmp" "$*_typesimpl.h" || \
	  mv "$*_typesimpl.h-tmp" "$*_typesimpl.h"
	cmp -s "$*_client.h-tmp" "$*_client.h" || \
	  mv "$*_client.h-tmp" "$*_client.h"
	cmp -s "$*_clientimpl.h-tmp" "$*_clientimpl.h" || \
	  mv "$*_clientimpl.h-tmp" "$*_clientimpl.h"
	cmp -s "$*_server.h-tmp" "$*_server.h" || \
	  mv "$*_server.h-tmp" "$*_server.h"
	cmp -s "$*_serverimpl.h-tmp" "$*_serverimpl.h" || \
	  mv "$*_serverimpl.h-tmp" "$*_serverimpl.h"
	touch "$@"

.PRECIOUS: bar_types.c bar_client.c bar_server.c

%_types.c:
	printf '#include "%s_typesimpl.h"\n' "$*" > "$@"

%_client.c:
	printf '#include "%s_clientimpl.h"\n' "$*" > "$@"

%_server.c:
	printf '#include "%s_serverimpl.h"\n' "$*" > "$@"


foo_obj=foo baz quux bar_client bar_types
foo_lib=-lidl

foo: $(foo_obj:%=%.o)
	$(LINK.c) -o "$@" $(foo_obj:%=%.o) $(foo_lib)

# Order of these two lines is important.
$(foo_obj:%=%.o): | bar.done
-include $(foo_obj:%=deps-%.mk)

Now suppose that baz.c uses bar_client.h, which in turn uses bar_types.h, and that baz.h uses bar_types.h directly. Also, most other modules use baz.h, except quux.c. Only foo.c and quux.c use quux.h.

On the first run of make foo, there are no extra deps-mod.mk files. We see that foo depends on foo.o, and the first rule matching that is:

foo.o bar_client.o bar_types.o baz.o quux.o: | bar.done

Since bar.done does not exist, its rule is invoked, and this creates the various files derived from the IDL file. Now all modules can be safely compiled, knowing that the hierarchy of header files already includes those that need to be generated. Furthermore, we can see the correctly generated header dependencies in the files deps-mod.mk:

$ cat deps-*.mk | egrep -v '^$' | sort -u
bar_client.h:
bar_clientimpl.h:
bar_client.o: bar_client.c bar_clientimpl.h bar_client.h bar_types.h
bar_types.h:
bar_typesimpl.h:
bar_types.o: bar_types.c bar_typesimpl.h bar_types.h
baz.h:
baz.o: baz.c baz.h bar_types.h
foo.o: foo.c baz.h bar_types.h quux.h
quux.h:
quux.o: quux.c quux.h

So, from necessarily explicit knowledge that foo uses bar.idl, we learn implicitly that one of its modules, quux.o, does not. Also, changes to the IDL file that do not result in changes to its generated files do not trigger recompilation of the modules of foo.

Parallelism

Obviously, order-only rules are useful for some tasks, but it seems impossible to use them as above with parallel jobs. This example demonstrates:

all:: foo.done

%.done: %.src
	@echo Starting "$*"
	@sleep 3
	@tr -d ' \n' < "$<" > "$*.api-tmp"
	@cmp -s "$*.api" "$*.api-tmp" || \
	  ( cp "$*.api-tmp" "$*.api" ; \
	    echo "$*" API changed )
	@touch "$@"
	@echo Completed "$*"

foo.done: | bar.done baz.done
foo.done: bar.api baz.api

clean::
	$(RM) *~
	$(RM) *.done
	$(RM) *.api
	$(RM) *.api-tmp

It simulates a compiler-like process that generates a profile xxx.api from the source xxx.src with all spaces and new lines stripped out, while preserving the timestamp if the profile hasn't changed. The ‘module’ foo depends on the profiles of bar and baz.

When building from scratch, it should be possible for bar and baz to be compiled in parallel. However, what actually happens is that Make simultaneously recognizes that foo.done depends on all of {bar,baz}.{done,api}, and then thinks that there's no way to meet {bar,baz}.api, not realising that it has to wait for {bar,baz}.done to complete.

Attempts to express a dependency between a profile and the completion file, such as:

bar.api: bar.done
baz.api: baz.done

…or:

bar.api: | bar.done
baz.api: | baz.done

…or:

bar.api: bar.src
baz.api: baz.src

…or even:

bar.api: | bar.src
baz.api: | baz.src

…all produce an undesirable effect whereby, if baz.api changes, foo.done isn't rebuilt until a second invocation of Make.

Essentially, there seems to be no way to express that bar.api is a potentially non-updating by-product of building bar.done. Here are some relevent discussions in the GNU Make archives:

Extensions to javac

Jardeps originally used Awk scripts to parse javap output in order to generate profiles and lists of classes involved in a tree. This was invoked as soon as javac completed. As javap output wasn't stable between versions, I replaced the Awk scripts with a stand-alone Java program to parse the bytecode.

Both techniques involved duplicating some of the work that the prior invocation of javac has already just done, so I built an alternative javac using the compiler API in javax.tools, which should be faster and more correct. This is just a regular Java program that invokes the normal compiler programmatically. Additionally, it hooks into the compiler to keep track of source files used, class files generated, packages provided, packages used, and the tree's profile. This information is then output using various new switches shown below.

List used source files

-list:sources:N cmd arg1 arg2  argN

After compilaton, run the command cmd with the specified arguments, appending as additional arguments the internal names of top-level classes whose source files were used in the compilation, e.g., com/example/TopLevel. This should not include classes generated by annotation processing. The command is used to generate the make rule that triggers recompilation of a tree when any of its source files change.

List generated classes

-list:classes:N cmd arg1 arg2  argN

After compilaton, run the command cmd with the specified arguments, appending as additional arguments the internal names of all classes generated by the compilation, e.g., com/example/TopLevel$Inner. This should include classes generated by annotation processing. The command is used to generate the list of classfiles that constitute a jar.

Generate profiles

-profile:public filename
-profile:default filename

Generate the canonicalized public/default profile, and write it unconditionally to filename.

List packages provided

-packages:provided filename

Write a list of packages of classes generated by the compilation to filename.

List packages required at runtime

-packages:used filename

Write a list of packages of classes needed at runtime to filename.