State of varargs

To unsafe or not to unsafe?

May 2020: (v. 0.1)

Maurizio Cimadamore

Project Panama has an ambitious goal: instead of having developers jump through hops in order to access the native functionalities they need, it instead makes them directly available in the form of powerful (albeit low level) Java API. More specifically, the Panama API allows Java programs to manipulate off-heap memory regions, through an abstraction called MemorySegment (introduced as part of JEP 373); moreover, Panama allows developers to create native method handles --- that is method handles which target native library functions --- through an abstraction called ForeignLinker. Adding such capabilities comes at a price: it is now possible for developers to introduce bugs in their code that have the potential to crash the JVM or, worse, silently corrupt its state (meaning such corruption will probably manifest itself only later in program execution). This is admittedly a problematic behavior for the JDK - which usually tries (very hard) to encapsulate any underlying unsafety so that it is not exposed to the end users. What might be less known is, as we will show in this document, that the ability to break away from typical safety belts is an use case that has always been silently acknowledged, and catered for, via JNI. In this document we will describe the nature of the unsafe operations that idiomatic Panama code will attempt to perform, draw parallels to what's already available in the platform as well as painting a path towards a better (and safer) platform where all unsafe operations (whether originating from Panama or not) are treated in a consistent fashion, which minimizes surprises for the unsuspecting end user.

What can go wrong

There are fundamentally two ways in which users of the Panama APIs can break the safety model of the JVM: (i) by creating an unchecked memory segment, which is based off an address obtained through some other means and (ii) by obtaining a native method handle --- that is, a method handle that targets some underlying native library function. In the subsequent sections we will discuss both cases, and draw parallels with similar unsafe operations that are offered by the JDK.

Memory segments

A MemorySegment is an abstraction which captures several orthogonal aspects of an off-heap memory block. First, a segment captures the size of the associated memory block, so as to prevent out-of-bounds access. Secondly, a segment captures the lifetime associated with the memory block --- that is, a MemorySegment implements the AutoCloseable interface, and can therefore be explicitly closed by an application that wishes to dispose of the memory block associated with that segment. Finally, a memory segments support thread ownership --- that is, a segment is created by a given thread (the owner thread), and only that thread is allowed to access the memory segment's contents, or to close the memory segment altogether. Together, these guarantees are what makes accessing memory segment safe, that is, it is never possible for a program to abuse a memory segment in such a way that an hard VM crash is triggered.

Unchecked segments

Now, the above ingredients are always known when memory segments are created from Java code, using one of the provided memory segment factories; but, unfortunately, not all memory segments come from Java code. For instance, consider the following system call:

struct tm *gmtime(const time_t *timep);

This call takes a pointer to an epoch and returns a pointer to a Tm struct which contains several fields to access e.g. the day of the month, etc. Since the result of this function is a pointer, Panama would like to model that as a MemoryAddress value; typically a MemoryAddress is associated with a MemorySegment - but in this case, since boundaries of the pointer are unknown, the address is said to be unchecked and have no associated segment.

Unchecked addresses can be used in Panama - primarily by passing them to other functions (which is useful if a library uses opaque pointers) - but they cannot be dereferenced. That's because dereferencing an address that is not backed by any segment is, fundamentally, an unsafe operation - there is no way for the runtime to check that (i) the access is within bounds and (ii) that the memory the address refers to is still present.

When a Panama client wants to explicitly dereference an unchecked address, it has to perform an unsafe rebase operation - that is, it has to create a synthetic segment (with desired size and lifecycle properties) which models the memory region obtained through the native call. This is possible by using the MemorySegment::ofNativeRestricted factory. This factory takes several parameters - among which (i) a base MemoryAddress, (ii) a segment size and (iii) an optional cleanup function.

It can be seen how this is, at its core, an unsafe operation: the runtime must trust the base address provided by the user, as well as the fact that the region associated with that address has a size compatible with that of the specified parameter; and, if a cleanup action is provided, the runtime must also trust that such a cleanup action will effectively deallocate the memory region wrapped by such a segment.

While it seems that unchecked segments only come up in the context of native calls, in reality they also come up whenever framework developers want to model a custom memory source to the memory segment API. Consider the case where a framework maintains an alternate, more scalable native allocator than the one provided by the JDK; presumably, at some point the framework will want to create a memory segment from a memory chunk obtained through this custom allocator.

In other cases, it might be desirable to create segments out of other kind of foreign memory sources, such as GPU memory; for instance, the CUDA library offers primitive to allocate and deallocate GPU memory, so it seems fair to ask whether a segment could be created out of such primitives.

Unchecked buffers

A similar situation arises with the java.nio.ByteBuffer; while off-heap buffers created with the supported API (ByteBuffer::allocateDirect) are safe, there are ways for users to create unchecked buffers, so to speak. One of the main avenues by which this is done is by using the supported JNI function NewDirectByteBuffer:

jobject NewDirectByteBuffer(JNIEnv* env, void* address, jlong capacity);

As the documentation of the function clarifies, buffers created this way are fundamentally unsafe:

Native code that calls this function and returns the resulting byte-buffer object to Java-level code should ensure that the buffer refers to a valid region of memory that is accessible for reading and, if appropriate, writing. An attempt to access an invalid memory location from Java code will either return an arbitrary value, have no visible effect, or cause an unspecified exception to be thrown.

But JNI is not the only way - for instance the popular Java gaming library LWJGL achieves a similar effect by just relying on sun.misc.unsafe - see here.

The net effect is the same --- regardless of how it is achieved --- a potentially unsafe ByteBuffer instance is now leaked into unsuspecting client code. By interacting with the ByteBuffer API, such a client might trigger hard VM crashes (e.g. if a bug is present in the unsafe logic which materialized the unchecked buffer).

Other languages

Java is, understandably, not the only language where this kind of issues arise. For instance the Python Ctypes library provides a pointer abstraction which can be used to manipulate off-heap memory. From Python, a programmer can alter the contents of a pointer to point to whatever address it wants.

Swift also support off-heap access through its UncheckedPointer abstractions; most notably, Swift also supports the concept of raw pointers, which can be created from any numeric value.

Finally, Kotlin provides an abstraction --- namely CPointer --- to model off-heap pointers; not surprisingly, Kotlin defines an extension method on Long to turn an arbitrary long value into a C pointer.

In other words, in various forms, other languages also provides way to wrap an arbitrary address into the abstraction of choice to model off-heap access. It is also interesting to note that, with the exception of Swift, such operations are not even called off as unsafe. And, even in Swift, the unsafety is captured in the abstraction name - but no other safety checks are enforced by the language/platforms - which means programmers can still freely call such unsafe API points.

Native method handles

Another source of unsafety comes from the ability of creating method handles which point to an underlying native library function. Consider the following code, which creates a method handle for the standard library function strcat:

ForeignLinker abi = ForeignLinker.getInstance();
LibraryLookup stdLookup = LibraryLookup.ofDefault();
MethodHandle strcat = abi.downcallHandle(stdLookup.lookup("strcat"),
                        MethodType.methodType(MemoryAddress.class, MemoryAddress.class, MemoryAddress.class),
                        FunctionDescriptor.of(C_POINTER, C_POINTER, C_POINTER));

Here, we create a default library lookup (a lookup which contains symbols for C functions already loaded with the VM) and then we attempt to create a native method handle for the strcat function; this function is specified in C as follows:

char *strcat(char *dest, const char *src);

So we need to tell the native method handle machinery that there are two arguments (pointers) and a return value (also a pointer), and that the Java signature we'd like to use to access the method handle is something like (MemoryAddress, MemoryAddress)MemoryAddress.

The attentive reader might have already spot several issues with the above code. First, it is not guaranteed that the strcat function will be found at all; although this is not too much of a concern --- after all, if the function is not found, the code will fail-fast at runtime. On a more worrisome note, the shared library containing the strcat symbol only carry minimal information - such as the fact that a strcat is a function - no information about the function signature is included in the shared library object. This means that the Panama runtime has no way to validate the signature information provided by the user. In other words, the user is free to re-interpret the signature of the strcat handle as (byte, float)void and there would be no way for the runtime to detect the mismatch, which could result in undefined behavior (either VM crash, or subtle memory/register corruption) when the native method handle is later executed. In other words, the act of creating a native method handle is intrinsically unsafe.

JNI and native methods

Java code can call to native code using JNI. Calling native code via JNI typically requires the developer to jump through many hops, which help minimizing the chances that a given native call might fail in unpredictable ways. That is, a class containing a native method must be preprocessed so that a corresponding C header file can be generated and implemented (by the user). The C signature of the functions included in the generated header will match those of the Java native methods.

Now, while generating JNI stubs using a tool enhances safety, the resulting solution is far from being bullet proof; consider the case in which the underlying native library is updated in a way that requires a signature change; the developer updates the necessary C implementation and headers by hand, but forgets to update the corresponding native method declarations in the Java code. Again we will be back in a situation where undefined behavior can occur.

To minimize the chances of these accidents, the Hotspot VM provides an useful option, namely -Xcheck:jni which provides a variety of additional checks whenever a native call takes place. While these checks are designed to improve robustness of the native call support, it is important to note how these checks are a best effort to detect common mistake and mismatches that can occur when calling native code from Java; as such, mileage can vary, depending on the shape of the called function, as well as the platform on which the checks are performed.

Other languages

Again, Java is not the only language facing the problem of allowing potentially unsafe calls to native library functions. For instance, Python's popular CTypes library allows developers to dynamically create objects which model a native library - such objects will then expose methods which can be used to call into library functions. The CTypes objects specifically points out that calling such methods can result in issues if e.g. the callee is providing the wrong number of actual arguments to a native function; the runtime will try to determine whether the call is well-formed, but, as in the case of -Xcheck:jni such checks are not guaranteed to always kick in.

Other languages, such as Swift, Kotlin and Rust went down the tooling path; that is, for these languages to be able to use native functions, the header files containing such functions must be pre-processed by some tool first (see Rust's binding generator, or Kotlin's C interop tool). As with JNI, tool-based solutions are still prone to issues, when the generated artifacts go out of sync with the underlying native library.

What we've seen so far

Before moving forward, it is worth taking a pause and look back at the things we have seen in the previous sections. All languages considered here have some way to create a managed data structure (e.g. a ByteBuffer or a CPointer) from an arbitrary raw address (e.g. a long value). Moreover, all languages providing some form of native interop similarly have a way to call into native function through what looks like an ordinary method call in that specific language. Both operations are fundamentally unsafe - creating a buffer or a pointer out of a raw address might lead to crashes e.g. if the buffer is not correctly sized. Similarly, calling a native function through an ordinary method call might hide subtle issues e.g. when the number of arguments of the method modeling the native function and the actual arguments received by the native function itself differ.

We have also seen how no particular action is taken by languages (including Java!) when it comes to deal with these unsafe operations; while some languages (e.g. Swift) resort to a naming scheme to call out unsafe API points, no restriction is enforced on the clients of such unsafe API points. This means that it is possible for a developer to distribute a library that can - if buggy - lead to unrecoverable errors and that, conversely, clients of these libraries (either directly, or indirectly) have no way to protect against this class of errors.