State of Panama Pointers

State of Panama Pointers

February 2019: (v. 0.2)

Maurizio Cimadamore

Panama support for foreign function interfaces relies heavily upon the notion of pointers. Pointers are used primarily for two reasons: (i) to serve as a target carrier type for a native pointer type (e.g. int*), and (ii) together with the Scope abstraction, as way to reference regions of memory (which might be located off-heap) in an efficient and safe way (for more details, please refer to this document).

In this document we will focus on the first use case - that is, the problem of mapping native pointers into suitable Java carrier types; we will explain the design center of the Panama pointer API and see how it fits in the C language specification.

Quick overview of the C type system

In the C programming language, types are partitioned into object types (types that describe objects, such as primitives or structs) and function types (types that describe functions). Object types can be complete or incomplete depending on whether there is (or not) enough information to determine their size. The most notable example of incomplete object type is the type void.

It follows that there are two different kinds of pointers: object pointers and function pointers. The former category covers pointers to e.g. primitives (e.g. int*), structs (struct Foo*), and pointers (e.g. int **). In the latter bucket we find pointers to functions (e.g. void(*)(int, long)).

The language allows programmers to convert from one pointer type to another, provided that the type kind is the same. As such, the language allows to convert between object pointer types, and between function pointer types, whereas it does not allow to convert e.g. from an object pointer type to a function pointer type. As it is often the case in C, the semantics of such conversions might be unspecified if the source and target type have incompatible charateristics - e.g. different alignment in case of object types, or different signatures in case of function types.

The pointer to incomplete object type void* is often used as an escape hatch to allow conversion from any two given pointer types, although this usage is not part of the language standards (although in practice it is allowed by most compilers). What the language does say is that an object pointer type of any kind can be converted to and from void*.

In C, pointers often overlap with arrays; the language says that an expression that has type e.g. int[] is implicitly turned (module few exceptions, such as sizeof) into an expression of type int*, where the pointer points to the initial element of the array. This effectively allows programmers to pass arrays where pointers are expected. Indexed access notation [] also applies to both pointers and arrays, further blurring the distinction between the two.

That said, the semantics associated with array and pointer types is quite different; on the one hand, an array type describes "a contiguously allocated nonempty set of objects with a particular member object type", whereas a pointer describes "an object whose value provides a reference to an entity of the referenced type". This clear semantics distinction will come useful in later sections.

So let's summarize the main points:

Object pointers

In Panama, object pointers are modeled by the Pointer interface, a snippet of which is shown below:

interface Pointer<X> {
    long addr();
    LayoutType<X> type();
    Pointer<X> offset(long nElements);
    <Y> Pointer<Y> cast(LayoutType<Y> type);
    X get();
    void set(X x);
}

The API is quite self-explanatory, and closely maps the operations one would expect from a C object pointer. A pointer has an address and a type (an instance of LayoutType). The type information is used in three ways: first, it allows conversion operations (see Pointer::cast), secondly, it allows the pointer to know the size of the pointed object, so that Pointer::offset can be implemented accordingly; lastly, the pointer type provides a pair of getter/setter method handles which are used by the dereference operations Pointer::get and Pointer::set.

Void pointers

Void pointers in Panama are just a degenerate case of object pointers; there is a special LayoutType for modeling the incomplete void object type (see NativeTypes.ofVoid) - a Panama void pointer is just a pointer whose type is the void LayoutType. This degenerate void pointer type has an important property: its getter/setter method handle pairs always throw an exception - meaning that it is impossible to dereference a Panama pointer whose type is the void LayoutType.

Bulk operations

If a pointer points to a primitive value (e.g. an int), dereference operations such as Pointer::get and Pointer::set are effectively O(1). But if a pointer points to an aggregate, such operations (especially Pointer::set) might result in bulk copying.

An alternative would be for Pointer::set to never allow for bulk copying, and use explicit API points (which already exists, such as Struct::assign), although this would increase the irregularity of the extracted C API.

We believe that, as long as there's a clear difference in the carrier type associated with a given Pointer instance (e.g. Pointer<Integer> vs. Pointer<Time>), there's enough information to infer the performance model of the dereference operations associated with such pointer.

Specialized accessors

The Panama pointer API is a homogeneous API - that is, it models all pointer instances using the same interface. While this is great for uniformity, it also poses challenges, because of the lack of specialized generics. That means that the API might introduce unnecessary boxing conversions (this problem will eventually be tackled in full with Valhalla specialized generics).

For this reason, Panama pointers provide low level access to getter/setter method handle pairs which feature exact, non-boxed types, which can be used to dereference a pointer in a box-free manner. It might perhaps be useful to consider adding specialized dereference method to the Pointer interface, as follows:

interface Pointer {
    ...
    X get();
    void set(X x);

    int getAsInt();
    void setAsInt(int value);
    ...
}

Where the specialized accessors will perform an invokeExact on the underlying method handle getter/setter. These methods are indeed trivial to add, and they could be marked as deprecated once specialized generics are supported.

Indexed access

In C, pointers can, like arrays, make use of the indexed access notation []. Panama does not provide this functionality - which can be emulated by a combination of Pointer::offset and Pointer::get/set. However, if usability becomes a concern, indexed access can easily be added to the above API.

Null pointer

Panama models null pointers as a degenerate case of an object pointer, similarly to the void* case. Such a pointer can be obtained through some static factory method. Again, the LayoutType associated with a null pointer will prevent all dereference operations by throwing a suitable exception.

Function pointers

Panama does not model function pointers directly; in Panama, function pointers are second-class abstractions, called callbacks. A snippet of the Callback interface is shown below:

interface Callback<T> {  
    Pointer<?> entryPoint();
    T asFunction();
}

A Panama callback is essentially a pointer (which is used to encode the callback's entry point) coupled with a functional interface carrier type. The latter can be used to obtain an instance of a functional interface from the callback object, so that clients can perform an invocation. Given that Java lacks first class function types, using functional interfaces are carrier types here is the next best abstraction, and it provides interoperability with lambda expressions and method reference, which can both be used to express the body of a callback (see Scope::allocateCallback).

Relationship between object pointers and function pointers

In Panama there's no relationship between object pointers and function pointers, unlike we might find in other Java APIs providing interoperability with native code. Instead, function pointers are expressed in terms of object pointers - more specifically, an object pointer is used to encode the entry point of the function. In a sense, Panama callbacks are views over pointers, exposing additional capabilities (such as functional interface conversion).

This is a pragmatic choice: on the one hand, managing complex class hierarchies is harder to do with the binder pattern. That is, the binder works well with leaves of a given class hierarchy, but it doesn't know what to do if the carried type associated e.g. with a function return type is an abstract class.

On the other hand, this choice doesn't seem to give up much in terms of fidelity with the C language. Since object pointers and function pointers are distinct entities in C (the special void* acts as a root only within the boundaries of certain compiler implementations), there is no real use case for, say pass either a function pointer or an object pointer to the same API point.

Similarly, since the C language only really supports conversion between object pointers (conversion between function pointers has unspecified semantics unless the signatures match), again the choice of providing a conversion operation on Pointer but not on Callback seems a sound one. And, if this became an issue later on, it is indeed easy to extend the API to allow conversion between different callbacks or, even, between object pointers and callbacks.

Null pointers and callbacks

Since, in Panama, null pointers are a special case of object pointers, and Panama Callback is not an object pointer, it follows that it is not possible to pass a null pointer where a callback is expected. Instead, the Callback API will provide some static factory to create a null callback - that is, a callback object whose entry point is the null pointer itself. Such a degenerate callback will throw an exception when attempting to call its Callback::asFunction method.

Arrays

Panama models native arrays with the Array interface, shown below:

interface Array<X> {
    Pointer<X> elementPointer();
    long length();    
    LayoutType<X> elementType();
    <Y> Array<Y> cast(LayoutType<Y> elementType);
    <Y> Array<Y> cast(LayoutType<Y> elementType, long size);
    X get(long index);
    void set(long index, X x);
}

The Array interface is, in spirit, very similar to the Pointer interface shown above; after all, this is to be expected, given that in the C language arrays and pointers provide a similar set of operations. Most notably, Panama arrays are built on top of pointers - a Panama array is a pointer (which represent the address of the first element of the array) coupled with a size information. The API also provides operations to convert an existing array into an array with a new element type and/or size, as well as indexed access operations which allow clients to get/set specific array elements.

Relationship between object pointers and arrays

In Panama, arrays and object pointers are separate entities; while arrays are modeled in terms of pointers they are not pointers themselves. While this might be surprising from an use-site point of view (after all, the provided API is very similar), there are strong reasons which pushed the design in this direction.

From a semantics perspective, arrays are not pointers. As explained in the section above, arrays denote a contiguous region of memory filled with a number of elements of the same type; on the other hand, a pointer denotes an indirection, a value (typically an address) which provides a reference to some other object.

This distinction is nor artificial, nor abstract - let's assume that Panama modeled both pointers and arrays with a single interface Pointer. While this is technically possible, we will soon run into challenges: what should Pointer::get and Pointer::set do? Ultimately, the semantics of these operations depend on whether the Pointer instance models a native pointer vs. an array. While Pointer::get can be tweaked to accommodate this distinction, Pointer::set is much more problematic - as in one case set will be an atomic operation, whereas in the other case it will correspond to a bulk-copy operation!

Ultimately, these differences in behavior are so important that we felt it necessary to use different carrier types to capture the different behaviors: by using different carriers to model the two abstractions we therefore make a deliberate choice of being explicit with respect to the performance model underpinning the operations exposed by the Panama API.

Concluding, Panama arrays are, again, views on pointers - they are built on top of a pointer (which provides the address of the first array element), which is decorated with some size information, so that indexed access/iteration can be implemented safely.

What about Java arrays?

One might wonder why, to model arrays, it has been necessary to add a brand new carrier type, when in fact Java already has the notion of array. While this is technically possible, it is important to note that this approach would suffer from severe performance issues: when obtaining an array from a native API, the binder would in fact need to eagerly convert all the native array elements into the corresponding Java counterparts. This cost seems rather prohibitive, especially in cases where the client is not interested in reading any array element - consider the case where a native array is obtained from some native function call and then passed on to another function.

It is possible that some language features we have in the pipeline, most notably Valhalla, might offer some relief here by generalizing Java arrays using interfaces, which would then allow Panama to provide lazy views over native arrays.

Conclusions

It should be clear by now that, the Pointer interface plays a pivotal role in the Pointer API. Not only it acts as a suitable carrier type for native pointers, but it also conveniently allows for other abstractions such as arrays and callbacks (but also structs, not discussed here) to be built on top of it. This seems like a stable point in the design space: there's one central abstraction - namely Pointer - and many ancillary abstractions which act as pointer views, and provide additional capabilities. Instead of relying on a model based on inheritance (e.g. make either Array or Callback a subclass of Pointer), we instead opted for providing ways to convert a view into another. This way, all conversions are explicit in the code, which increases readability, and keeps a lid on the binder complexity.