State of varargs

State of varargs

April 2019: (v. 0.1)

Maurizio Cimadamore

Panama supports variadic native calls by mapping them using the Java varargs feature. While this mapping is handy and natural (from a Java perspective), it is also problematic, especially in the context of upcalls. In this document we will propose an alternate, more raw, way to support variadic calls that overcomes the limitations of the current approach.

Variadic calls and Panama

When the Panama ABI implementation needs to generate a method handle targeting a variadic native function, a trampoline method handle is created, whose task is that of intercepting the dynamic arguments passed to the vararg, creating a specialized method handle, with correct (inferred) layouts and carriers, and dispatch the native call delegating to the specialized handle.

As mentioned above, this step relies on some inference; for instance, the ABI implementation makes two key assumptions: (i) that the number of the additional variadic arguments is always known; and (ii) that the layout of the variadic arguments can be inferred from their dynamic types.

Unfortunately, these assumptions are not generally valid. For instance, layouts can only be inferred from the arguments' dynamic types if the interop framework has a rich enough set of carrier types. In Panama it is always possible to go e.g. from a Struct instance (assuming it is not null) to its underlying layout. But in case of more basic carrier mappings, such as those described here, such inference is not generally possible.

Even worse, in the case of an upcall (e.g. from native code to Java), there's little we can do to infer the size of the additional variadic arguments, or their respective layouts. After all, in the case of an upcall the arguments are coming from the guts of some C/C++ code, and are catapulted into Java-land. The Java side needs to know how many arguments there are and where to fetch them from (register, stack), but this information is just not available for variadic arguments passed by the C code, and there’s no inference process that can rescue us.

In other words, it seems that Panama variadic support is not primitive enough; while emulating C variadic calls using the Java varargs feature works to some extent, the approach also completely breaks down at the upcall level. This means we need a more raw way to support variadic downcalls/upcalls, which does not make any assumption on the availability of side-channels which can be used to infer variadic size/layouts.

C varargs: enter va_list

C supports variadic calls through a special type called va_list, whose specific implementation is ABI dependent. Varargs list in C are handled using the va_start (to create a new list), va_arg (to fetch the next argument), and va_end (to close the list) - these macros are usually defined in the <stdarg.h> include file. That is, the client is in charge for specifying the number as well as the layouts of the arguments to be pulled out of the list - as shown in this example:

void sum(int num, ...) {
    va_start ( arguments, num );           
    double sum = 0.0;
    for ( int x = 0; x < num; x++ )  {
        sum += va_arg ( arguments, double ); 
    }
    return sum;
}

Internally, a va_list is a structure that allows to recover the arguments passed to the callee. In ABI which do not use registers to pass arguments, a va_list is typically implemented as a stack pointer; in more nuanced ABI (such as SystemV), a struct is used to capture the register and the stack state, as follows:

typedef struct {
    unsigned int gp_offset;
    unsigned int fp_offset;
    void *overflow_arg_area;
    void *reg_save_area;
} va_list[1];

As arguments are fetched using va_arg, the contents of the va_list data structure are updated - for instance, gp_offset is moved to the offset of the next argument available in the general register area, and so forth.

Variadic lists can also be copied, using the va_copy macro (only available in certain systems); this takes a snapshot of the current va_list, therefore allowing a client to perform e.g. multiple scans of the same va_list, or even pass the va_list as argument to another function.

Finally, it's important to note that the va_list type is completely unsafe; while on some platforms it might be possible to create pointers out it, store it global variables, etc. in general this is not supported. As it can be seen in the above definition, va_list contains pointers to the caller stack - meaning that it will generally not be safe to access a va_list once the original caller has gone.

Modeling va_list in Panama

Instead of using Java varargs to model variadic calls, we could have a Java Valist interface, as follows:

interface Valist {
   
   //getters
   
   int getInt();
   long getLong();
   ... //all primitives
   
   MemoryAddress get(Address address); 
   MemoryAddress get(Group group); 
   
   
   interface Builder {
       void setInt(int value);
       void setLong(long value);
       ... //all primitives
    
       void set(Address address, MemoryAddress value); 
       void set(Group group, MemoryAddress value);

       Valist build();
   }

   static Builder builder() { ... }
}

Then, at the low level, a variadic native function can be modelled as a Java method taking a trailing Valist argument, which the client can fill up, as required. For instance, a variadic call such as printf can be represented in Java with the following API point:

int printf(MemoryAddress format, Valist args);

The client can then create a Valist instance using the builder and pass it down to the native function. This also scales to upcalls as well: since the upcall logic can materialize a Valist instance using the so called UpcallContext (which already contains pointers to registers, etc.), the Java upcall would be able to pull variadic arguments just in the same way native code can.

At the higher level, if desired, variadic API points can be civilized using method handle combinators; for instance we can easily turn the above API into:

int printf(Pointer format, Object... args);

By using a collector MethodHandle combinator which spreads the Valist onto an array of actual arguments.

Valist vs. va_list

The attentive reader might have noted that, in reality, there are multiple sources for a Java Valist instance:

In the following, we will refer to these sources using the notation Valistjava, Valistupcall and Valistnative, respectively. Let's now examine all possible interactions between different kinds of Valists and native valist.

Downcalls

There are two types of downcalls - the first is a plain downcall to a variadic function (e.g. snprintf), the second is a downcall to a non-variadic function taking a valist (e.g. vsnprintf).

Now, let's first consider the simpler case, that of a downcall to a variadic function (e.g. snprintf). In this case, the client will have a Valist instance on its hands (whether constructed programmatically, or obtained otherwise) and it will pass it down to the ABI. The ABI will detect the trailing Valist, and will need to pass all the extra arguments in the Valist according to the calling convention. Herein lies the first problem: in general, a Valist doesn't know how many arguments there are; that is, it is not possible to spread a generic Valist instance - although this is possible in the case of a Valistjava instance. This seems to imply that, in order to perform a downcall targeting a variadic function a Valistjava must be supplied.

Let's now turn to the second case, a non-variadic function with trailing valist parameter (e.g. vsnprintf). In this case we need to turn the Valist instance into an actual native va_list. Here we have three subcases. In the case of a Valistnative, that's easy, as the instance is just a wrapper for a native va_list to begin with. Similarly, the case of a Valistupcall is also easy, given the UpcallContext contains information about register state and such, so that it should be possible to create a native va_list from there. In the case of a Valistjava, things are harder; in this case we need to buffer the Valistjava arguments into some temporary area, and then create a native va_list~ which points to this area.

The interactions between Valist and native va_list in the case of downcalls are summarized in the following table:

Downcalls adaptations
Source/Target ... va_list
Valistjava Spread Buffer
Valistupcall N/A Adapt
Valistnative N/A Unbox

Upcalls

Again, we have two cases - the target of the upcall could be a Java method modelling a variadic function (e.g. snprintf) or a method modeling a non-variadic function taking a valist (e.g. vsnprintf). In the first case, we can easily synthesize a Valistupcall instance using the UpcallContext information. In the second case we can, equally easily, create a Valistnative from the actual va_list being passed from the native code.

The interactions between Valist and native va_list in the case of downcalls are summarized in the following table:

Upcalls adaptations
Source/Target Valist
... Valistupcall
va_list Valistnative

Layout representation

In order to model downcalls and upcalls targeting a native signature featuring a va_list, we need to have a Layout representation for it. Since the layout of a native va_list is completely platform specific, it seems like a good solution would be to use the unresolved layout feature - e.g. represent va_list symbolically, with an unresolved layout of the kind ${va_list}, and let the ABI define the concrete layout (and carrier, if needed) associated with that symbol.

Conclusion

In this document we have shown how it's possible to model variadic calls (as well as calls accepting explicit va_list parameters) at a lower level than what is currently done in the Panama implementation. This more primitive approach allows us to model variadic calls in environments in which layout inference is not feasible, as well as to complete the variadic support for upcalls too. In cases where such inference is possible, higher-level APIs can recover some of the ease of use of Java varargs notation through a civilization step which spreads the Valist into a concrete sequence of arguments.