How Jardeps works

Profile format

Although the exact format of the profile file is not important, it must have certain characteristics. Here's an example extract from a real profile:

uk/ac/lancs/scc/one/edl/Edit.channels method public ()Ljava/util/Map<Ljava/lang/String;Luk/ac/lancs/scc/one/edl/Channel;>;
uk/ac/lancs/scc/one/edl/Edit iface public Ljava/lang/Object;
uk/ac/lancs/scc/one/edl/Edits class final public Ljava/lang/Object;
uk/ac/lancs/scc/one/edl/Edits.EDL_NS_URI field public static final Ljava/lang/String;="http://scc.lancs.ac.uk/ns/edl/2012"
uk/ac/lancs/scc/one/edl/Edit.segments method public ()Ljava/util/List<Luk/ac/lancs/scc/one/edl/Segment;>;

Each line describes a program element, starting with its fully qualified name. The rest of the line must include the element kind, its attributes and its full generic type. Final static fields also include their value, and methods include their exceptions. Such a set of lines can be canonicalized simply by sorting, which prevents instability in the ordering of elements from triggering a rebuild of dependent trees.

(I wonder, should annotations be present? And their values? Maybe source-level annotations can be excluded.)

Order-only dependency

This section exists to explain a technique that is no longer used by Jardeps. It might be re-adopted in future, if its problem with parallelism can be fixed.

What is order-only dependency? In make, when you write a rule such as:

foo: bar baz

…you indicate that the construction of foo depends on the prior construction of bar and baz. What does this mean procedurally?

make foo triggers the following actions:

make bar baz
If foo does not exist, or foo is older than bar or baz, run the commands associated with foo.

Those commands normally create foo with an up-to-date timestamp, so the timestamp comparison in the second action will fail on an immediately subsequent run.

If we change the relationship between foo and baz, by using an order-only rule as follows:

foo: bar | baz

…we no longer perform the timestamp comparison with baz, so the actions now look like this:

make bar baz
If foo does not exist, or foo is older than bar, run the commands associated with foo.

It is this change which allows Jardeps to do something difficult with Java that is trivial with C: allow module/tree foo to depend on the public characteristics of another module/tree bar, ignoring its internal details. In C, the public characteristics are expressed in a manually separate header file bar.h, and the rule relating foo to it is simple:

foo.o: bar.h

In Java, the easiest way to extract the header is to compile the tree, and examine the generated files. Then we can express the dependency with:

foo.compile: bar.header

But if we make the header target do the compilation:

bar.header:
	compile bar
	extract bar.header from classes of bar

…the header's timestamp will change even if its contents do not. If instead we only update the header if its content changes:

bar.header:
	compile bar
	extract bar.header-tmp from classes of bar
	copy bar.header-tmp to bar.header if different

…Make will continue to think that the header is out-of-date as long as it doesn't change. An order-only rule means we don't have to have a rule for building bar.header, because it is a side-effect of building bar.compile:

foo.compile: | bar.compile
foo.compile: bar.header

bar.compile:
	compile bar
	extract bar.header-tmp from classes of bar
	copy bar.header-tmp to bar.header if different
	touch bar.compile

The behaviour of make foo.compile is now:

make bar.compile bar.header
If foo.compile does not exist, or foo.compile is older than bar.header, run the commands associated with foo.compile.

The first action will complete the construction of bar.compile before testing whether bar.header exists, and will ensure that it does exist before that test occurs, without necessarily timestamping it. The second action will test that timestamp, and may find construction of foo.compile unnecessary.

Here's another use for order-only rules, this time with a C program that makes use of some form of IDL. We wish to generate dependencies for each module after it is compiled, but that only works if there are no generated header files. If we make all our modules depend on the target that causes the IDL file to be compiled, our entire program will have to be recompiled on even the slightest inconsequential change to the IDL file. Instead, we make the dependency order-only, and ensure that IDL compilation does not touch files whose contents do not change:

# for GCC
%.o: %.c
	$(COMPILE.c) -o "$@" -MMD -MT "$@" -MF "deps-$*.mk" -MP "$<"

# for some fictitious IDL compiler
%.done: %.idl
	idl "$<" -th "$*_types.h-tmp" \
	         -ts "$*_typesimpl.h-tmp" \
	         -ch "$*_client.h-tmp" \
		 -cs "$*_clientimpl.h-tmp" \
	         -sh "$*_server.h-tmp" \
		 -ss "$*_serverimpl.h-tmp"
	cmp -s "$*_types.h-tmp" "$*_types.h" || \
	  mv "$*_types.h-tmp" "$*_types.h"
	cmp -s "$*_typesimpl.h-tmp" "$*_typesimpl.h" || \
	  mv "$*_typesimpl.h-tmp" "$*_typesimpl.h"
	cmp -s "$*_client.h-tmp" "$*_client.h" || \
	  mv "$*_client.h-tmp" "$*_client.h"
	cmp -s "$*_clientimpl.h-tmp" "$*_clientimpl.h" || \
	  mv "$*_clientimpl.h-tmp" "$*_clientimpl.h"
	cmp -s "$*_server.h-tmp" "$*_server.h" || \
	  mv "$*_server.h-tmp" "$*_server.h"
	cmp -s "$*_serverimpl.h-tmp" "$*_serverimpl.h" || \
	  mv "$*_serverimpl.h-tmp" "$*_serverimpl.h"
	touch "$@"

.PRECIOUS: bar_types.c bar_client.c bar_server.c

%_types.c:
	printf '#include "%s_typesimpl.h"\n' "$*" > "$@"

%_client.c:
	printf '#include "%s_clientimpl.h"\n' "$*" > "$@"

%_server.c:
	printf '#include "%s_serverimpl.h"\n' "$*" > "$@"


foo_obj=foo baz quux bar_client bar_types
foo_lib=-lidl

foo: $(foo_obj:%=%.o)
	$(LINK.c) -o "$@" $(foo_obj:%=%.o) $(foo_lib)

# Order of these two lines is important.
$(foo_obj:%=%.o): | bar.done
-include $(foo_obj:%=deps-%.mk)

Now suppose that baz.c uses bar_client.h, which in turn uses bar_types.h, and that baz.h uses bar_types.h directly. Also, most other modules use baz.h, except quux.c. Only foo.c and quux.c use quux.h.

On the first run of make foo, there are no extra deps-mod.mk files. We see that foo depends on foo.o, and the first rule matching that is:

foo.o bar_client.o bar_types.o baz.o quux.o: | bar.done

Since bar.done does not exist, its rule is invoked, and this creates the various files derived from the IDL file. Now all modules can be safely compiled, knowing that the hierarchy of header files already includes those that need to be generated. Furthermore, we can see the correctly generated header dependencies in the files deps-mod.mk:

$ cat deps-*.mk | egrep -v '^$' | sort -u
bar_client.h:
bar_clientimpl.h:
bar_client.o: bar_client.c bar_clientimpl.h bar_client.h bar_types.h
bar_types.h:
bar_typesimpl.h:
bar_types.o: bar_types.c bar_typesimpl.h bar_types.h
baz.h:
baz.o: baz.c baz.h bar_types.h
foo.o: foo.c baz.h bar_types.h quux.h
quux.h:
quux.o: quux.c quux.h

So, from necessarily explicit knowledge that foo uses bar.idl, we learn implicitly that one of its modules, quux.o, does not. Also, changes to the IDL file that do not result in changes to its generated files do not trigger recompilation of the modules of foo.

Parallelism

Obviously, order-only rules are useful for some tasks, but it seems impossible to use them as above with parallel jobs. This example demonstrates:

all:: foo.done

%.done: %.src
	@echo Starting "$*"
	@sleep 3
	@tr -d ' \n' < "$<" > "$*.api-tmp"
	@cmp -s "$*.api" "$*.api-tmp" || \
	  ( cp "$*.api-tmp" "$*.api" ; \
	    echo "$*" API changed )
	@touch "$@"
	@echo Completed "$*"

foo.done: | bar.done baz.done
foo.done: bar.api baz.api

clean::
	$(RM) *~
	$(RM) *.done
	$(RM) *.api
	$(RM) *.api-tmp

It simulates a compiler-like process that generates a profile xxx.api from the source xxx.src with all spaces and new lines stripped out, while preserving the timestamp if the profile hasn't changed. The ‘module’ foo depends on the profiles of bar and baz.

When building from scratch, it should be possible for bar and baz to be compiled in parallel. However, what actually happens is that Make simultaneously recognizes that foo.done depends on all of {bar,baz}.{done,api}, and then thinks that there's no way to meet {bar,baz}.api, not realising that it has to wait for {bar,baz}.done to complete.

Attempts to express a dependency between a profile and the completion file, such as:

bar.api: bar.done
baz.api: baz.done

…or:

bar.api: | bar.done
baz.api: | baz.done

…or:

bar.api: bar.src
baz.api: baz.src

…or even:

bar.api: | bar.src
baz.api: | baz.src

…all produce an undesirable effect whereby, if baz.api changes, foo.done isn't rebuilt until a second invocation of Make.

Essentially, there seems to be no way to express that bar.api is a potentially non-updating by-product of building bar.done. Here are some relevent discussions in the GNU Make archives:

How to restrict parallelism in make? [PATCH] (Alexey Neyman, 2005-04-29)
Parallel jobs with order-only prerequisites (Started by me, 2014-10-12)
Re: a syntax for multitarget rules (My response, 2014-11-14)

Extensions to javac

Jardeps originally used Awk scripts to parse javap output in order to generate profiles and lists of classes involved in a tree. This was invoked as soon as javac completed. As javap output wasn't stable between versions, I replaced the Awk scripts with a stand-alone Java program to parse the bytecode.

Both techniques involved duplicating some of the work that the prior invocation of javac has already just done, so I built an alternative javac using the compiler API in javax.tools, which should be faster and more correct. This is just a regular Java program that invokes the normal compiler programmatically. Additionally, it hooks into the compiler to keep track of source files used, class files generated, packages provided, packages used, and the tree's profile. This information is then output using various new switches shown below.

List used source files

-list:sources:N cmd arg1 arg2 … argN

After compilaton, run the command cmd with the specified arguments, appending as additional arguments the internal names of top-level classes whose source files were used in the compilation, e.g., com/example/TopLevel. This should not include classes generated by annotation processing. The command is used to generate the make rule that triggers recompilation of a tree when any of its source files change.

List generated classes

-list:classes:N cmd arg1 arg2 … argN

After compilaton, run the command cmd with the specified arguments, appending as additional arguments the internal names of all classes generated by the compilation, e.g., com/example/TopLevel$Inner. This should include classes generated by annotation processing. The command is used to generate the list of classfiles that constitute a jar.

Generate profiles

-profile:public filename

-profile:default filename

Generate the canonicalized public/default profile, and write it unconditionally to filename.

List packages provided

-packages:provided filename

Write a list of packages of classes generated by the compilation to filename.

List packages required at runtime

-packages:used filename

Write a list of packages of classes needed at runtime to filename.