Content from tabular data in Mokvino Web


Mokvino Web is a collection of scripts for GNU Awk, GNU Make and Mokvino which can be used to build and maintain websites.

You can define a simple text-based table of data in src/tables/foo.tab, and a script in src/tables/foo.awk to generate content from it. First, you need to indicate that the table exists, and that potentially all documents depend on it:

TABLES += foo
TABLE_USERS_foo += $(DOCS)

Let's suppose that the table contains the following:

john|23|John Smith
fred|42|Fred Flintstone
wilma|19|Wilma Cargo

You can write a script to process this:

# In src/tables/foo.awk
BEGIN {
    FS = "|";
}

{
    skip_blanks();
    start_fields();
    id = next_field();
    age = next_field();
    name = next_field();

    outfile = open_quantum("info-" id ".m5");
    printf "define(`AGE/%s', `%d')\\\n", id, age | outfile;
    printf "define(`NAME/%s', ``%s'')\\\n", id, name | outfile;
    close(outfile);

    outfile = open_quantum("link-" id ".m5");
    printf "define(`ADDR/%s', `person/%s')\\\n", id, id | outfile;
    close(outfile);
}

From a Mokvino file, you can load any of the files named by open_quantum calls, e.g.:

loadtable(`foo/link-john')\

indir(`ADDR/john')

By using loadtable, Mokvino Web will remember that this file depends on the one generated by the GNU Awk script. If you change the source table such that the data in foo/link-john.m5 changes, the referencing file will be rebuilt. If you change only a different entry, the table will be reprocessed, but the referencing file won't be rebuilt. Use loadtablesys if you don't want to record the dependency.

Note that files created with open_quantum are sorted by lines. Ensure that every unit written to them is not split over multiple lines.

The argument to open_quantum can include forward slashes as directory separators, so content written to open_quantum("brown/bread.m5"), for example, can be loaded with loadtable(`foo/brown/bread').

Using close(outfile) isn't strictly necessary, but it is recommended to avoid the Awk script running out of file descriptors.

Separate table scripts and data

Table scripts are in src/tables by default, the same as the tables themselves. You might prefer to keep the scripts separate, e.g., in etc/tables:

TABLE_SCRIPTDIR=etc/tables

CSV

If you prefer your tabular data to be in CSV format, use:

BEGIN {
    csv();
}

You can specify an alternative separator:

BEGIN {
    csv("|");
}

You can specify an alternative quote character:

BEGIN {
    csv("", "'");
}

If you want to skip comments and blank lines, and process the remainder as CSV:

BEGIN {
    csv();
    CSV_Q = -1;
}

{
    skip_blanks();
    CSV_Q = 0;
    csv_record();
    CSV_Q = -1;
}

{
    Your normal processing...
}

Generating pages from tabular data

A table can specify sets of documents to make, instead of you having to list them separately in Makefile. Let's extend our example:

# In src/tables/foo.awk
BEGIN {
    FS = "|";
}

{
    skip_blanks();
    start_fields();
    id = next_field();
    age = next_field();
    name = next_field();

    outfile = open_make();
    printf "FOO_PEOPLE += %s\n", id | outfile;

    outfile = open_quantum("info-" id ".m5");
    printf "define(`AGE/%s', `%d')\\\n", id, age | outfile;
    printf "define(`NAME/%s', ``%s'')\\\n", id, name | outfile;
    close(outfile);

    outfile = open_quantum("link-" id ".m5");
    printf "define(`ADDR/%s', `person/%s')\\\n", id, id | outfile;
    close(outfile);
}

The open_make call and the following printf will create a file containing:

FOO_PEOPLE += fred
FOO_PEOPLE += john
FOO_PEOPLE += wilma

(Note that this file is normally sorted by default, so don't split any unit over multiple lines.)

Back in Makefile, that file is automatically included, so you can use these to identify pages for each person:

DOCS += $(FOO_PEOPLE:%=person/%)

When writing the page person/fred.m5, etc, you could define a specialization of your site pages:

define(`PERSON_PAGE', `loadtable(`foo/$person')\
indir(`PAGE'$!?@$?@, `title'=``Personal details for 'indir(`NAME/$person')')',

`body'=``

'p(`indir(`NAME/$person')` is 'indir(`AGE/$person')` years old.'')`

'')

…and then use it explicitly:

# In src/www/person/fred.m5

PERSON_PAGE(`langs'=``en'', `fmts'=``HTML'', `person'=`fred')\

But you could also get it to work out which person was being dealt with automatically:

define(`PERSON_PAGE', `PERSON_PAGE_1(`person'=regexp(VNAME, `^person/(.*)$', ``&1'')$!?@$?@)')

define(`PERSON_PAGE_1',
`loadtable(`foo/$person')\
indir(`PAGE'$!?@$?@, `title'=``Personal details for 'indir(`NAME/$person')')',

`body'=``

'p(`indir(`NAME/$person')` is 'indir(`AGE/$person')` years old.'')`

'')

Then you write:

# In src/www/person/fred.m5

PERSON_PAGE(`langs'=``en'', `fmts'=``HTML'')\

Now, if you have no special data to include, you could put that content in (say) src/www/person/template.m5, and generate it for as many entries as you define in the table:

$(FOO_PEOPLE:%=var/www/person/%.m5): src/www/person/template.m5
var/www/person/%.m5:
        $(CP) src/www/person/template.m5 '$@'

Autoloading tabular data

There is another macro, autoloadtable, whose first argument is a table-generated file, and whose remaining arguments are names of macros defined by that file. If the file hasn't been loaded yet, this macro will define temporary macros to load in file. For example:

autoloadtable(`foo/link-john', `NAME/john')\

The script that generates the files could also generate such expressions:

# In src/tables/foo.awk
BEGIN {
    FS = "|";
}

{
    skip_blanks();
    start_fields();
    id = next_field();
    age = next_field();
    name = next_field();

    outfile = open_quantum("info-" id ".m5");
    printf "define(`AGE/%s', `%d')\\\n", id, age | outfile;
    printf "define(`NAME/%s', ``%s'')\\\n", id, name | outfile;
    close(outfile):

    outfile = open_quantum("link-" id ".m5");
    printf "define(`ADDR/%s', `person/%s')\\\n", id, id | outfile;
    close(outfile):

    outfile = open_quantum("autoload.m5");
    printf "autoloadtable(`foo/link-%s', `ADDR/%s')\\\n",
      id, id | outfile;
    printf "autoloadtable(`foo/info-%s', `AGE/%s', `NAME/%s')\\\n",
      id, id | outfile;
}

foo/autoload.m5 now contains:

autoloadtable(`foo/info-fred', `AGE/fred', `NAME/fred')\
autoloadtable(`foo/info-john', `AGE/john', `NAME/john')\
autoloadtable(`foo/info-wilma', `AGE/wilma', `NAME/wilma')\
autoloadtable(`foo/link-fred', `ADDR/fred')\
autoloadtable(`foo/link-john', `ADDR/john')\
autoloadtable(`foo/link-wilma', `ADDR/wilma')\

If you now loadsys(`foo/autoload.m5') from a module, you can then write indir(`ADDR/john') without having to load the file explicitly.

If you normally access these macros less directly, e.g., person(`fred'), you probably don't need to use autoloadtable, as you'll express autoload(`person', `person'), and define person in etc/person.m5, and get it to load the appropriate files. autoloadtable is more likely to be useful if foo.awk defines macros called fred, john and wilma. However, even then, it might be simpler just to define wilma as `person(`wilma'$!?@$?@)', and avoid auto-loading table files.

AWK libraries for table scripts

If you need to use an AWK function from several table scripts, put it in etc/lib.awk, say, and indicate which tables need it:

TABLES += foo
TABLE_LIBS_foo += lib

TABLES += bar
TABLE_LIBS_bar += lib

The file will automatically be loaded when those tables' scripts are executed. Changes to those files will trigger reprocessing of those tables.

Inter-table dependencies

You might want to use data from one table in the script of another. Let's export our peoples' names and ages:

# In src/tables/foo.awk
BEGIN {
    FS = "|";
}

{
    skip_blanks();
    start_fields();
    id = next_field();
    age = next_field();
    name = next_field();

    PERSON[id] = name;
    AGE[id] = age;

    outfile = open_quantum("info-" id ".m5");
    printf "define(`AGE/%s', `%d')\\\n", id, age | outfile;
    printf "define(`NAME/%s', ``%s'')\\\n", id, name | outfile;
    close(outfile);

    outfile = open_quantum("link-" id ".m5");
    printf "define(`ADDR/%s', `person/%s')\\\n", id, id | outfile;
    close(outfile);
}

END {
    export_array("PERSON_NAME", PERSON, "SS");
    export_array("PERSON_AGE", AGE, "NS");
}

This will create an AWK file var/m5web/tables/foo.awk with the following contents:

{
PERSON_AGE["fred"] = 42;
PERSON_AGE["john"] = 23;
PERSON_AGE["wilma"] = 19;
PERSON_NAME["fred"] = "Fred Flintstone";
PERSON_NAME["john"] = "John Smith";
PERSON_NAME["wilma"] = "Wilma Cargo";
}

If you have another table, called other, you can make this data available to it like this:

TABLES += other
TABLE_USERS_other += $(DOCS)
TABLE_DEPS_other += foo

The third argument specifies how data are to be formatted. The first character controls the value, and you can use S for a string, N for a number, or * for no value, like this:

## format for "*S"
PERSON_AGE["fred"];

The remaining characters format each part of a potentially multi-dimensional key. Use N for a number, and S for a string. For example:

COMPLEX[10, 15, "special"] = "indeed";
...
export_array("COMPLEX", COMPLEX, "SNNS");

You can fragment the exports into several files (all with the extension .awk):

outfile = open_quantum("names.awk");
export_array_to(outfile, "PERSON_NAME", PERSON, "SS");
close(outfile);
outfile = open_quantum("ages.awk");
export_array_to(outfile, "PERSON_AGE", AGE, "NS");
close(outfile);

This allows different tables to use different sets of exports:

TABLES += other1
TABLE_USERS_other1 += $(DOCS)
TABLE_DEPS_other1 += foo/names

TABLES += other2
TABLE_USERS_other2 += $(DOCS)
TABLE_DEPS_other2 += foo/ages

You'll find these data in the files var/m5web/tables/foo/names.awk and var/m5web/tables/foo/ages.awk.

Note that export_array(...) is really just export_array_to(open_export(), ...). open_export() merely opens var/m5web/tables/foo.awk.

Progress bars

Some table processing can be time-consuming. To re-assure that progress is being made, you can set up a progress bar:

progress_init(length(SOMETAB), "Doing stuff")
for (key in SOMETAB) {
   ## Do something time-consuming.

   ## Record progress.
   progress_adv(1)
}
progress_term()

This will print a gradually advancing ASCII-art progress bar prefixed with the message:

Doing stuff ================/

Future versions might include an ETA.

Assertions

It's easy for a mistake in an AWK program to go unnoticed, and lead to subsequent errors that are difficult to trace. Detect them early using assert:

assert(key in VALID_KEYS, "invalid key " key)
COUNT[key] = length(x) * n

If VALID_KEYS does not contain an entry indexed by the value of key, the message is printed to /dev/stderr, and the processing fails.

Dry runs

If you've made some disruptive changes to a table, and you want to see what it would change, add this:

BEGIN {
    dry_run()
}

You'll still get a list of all new, changed and deleted files, but no effective files will be altered. However, the contents of the new and changed files will be found in var/m5web/table/table/quantum.cand.