Nullable Value Types in L-World

John Rose, Brian Goetz, Valhalla Working Group

DRAFT DRAFT DRAFT
THIS IS A ROUGH DESIGN FOR DISCUSSION
NOT A SPECIFICATION

Background

This design sketch gives an option for making value classes interoperate slightly better with one aspect of object classes, by introducing a limited notion of nullability of selected value classes. Early experiments in the Valhalla project indicate this may be a desirable option to take, to solve some very specific problems with inner classes and migration of legacy value-based classes. This particular option does not impose a nullability feature on all value classes; we still believe values must “code like a class and work like an int”; since Java int types do not interoperate with null, we believe typical value types will not exercise the option of nullability.

The thesis of this proposal is that, if we are forced to introduce nullable value classes, let’s keep them limited in impact, by requiring value class authors to opt into the feature. To further limit the impact, this proposal avoid introducing any new null values, instead recycling the existing null reference as a permissible “bottom” value for some value classes as well as all object classes. Adding one or many new of null work-alikes, just to keep values and objects more separate, would replicate a feature which already introduces taxes on the ecosystem. Hence our platform here is “Read our lips: no new nulls”. When the movie comes out, the slogan will doubtless be “Bottom-lander: There can be only one”.

Basic Premises

Null is the default reference: In Java, all Java reference types have a common default value, the null reference. This reference appears in uninitialized array elements and field values of that type. The null reference is distinct from any reference that is produced by a new expression and/or a constructor call.

A problem with legacy value classes: In order for today’s value-based classes to migrate to proper value classes, they must retain the property that their default value is the null reference, since the null reference may appear in user code that uses such classes.

Nullable value classes: This implies that some value types require the ability to represent a true null reference, as one of the many points in their set of possible values. (This does not imply that any other value of such a type must also be a reference: All other values of the type can and should be proper values.)

Nullability is rare: Most value types, such as complex numbers or vectors, must not represent the null reference. In particular, arithmetic value types of size N bits may need to assign all 2**N code points to regular non-null values. For example, a type that emulates byte could not give up one of its 256 encodings to encode null, and adding an extra hidden bit to all value types would have very large costs. In addition, most value types “work like an int”, and require a default value which can accept method calls without throwing NullPointerExceptions.

Not the default: Nullability must be explicitly selected by the designer of a value type; it is expected to be a rarely used feature, because it is likley to incur extra costs, in space and time, for encoding and decoding the null reference to and from the flattened form of a value class.

Tweaking the key slogan: In summary, value types “code like a class and work like an int”. But there are a few value types that also want to “work like an Integer”, in that their default value is the null reference, rather than an appropriate pattern of zero bits.

User Model

DRAFT DRAFT DRAFT
THIS IS A ROUGH DESIGN FOR DISCUSSION
NOT A SPECIFICATION

New keyword: A value class can be declared with a pseudo-modifier __Nullable, (TBD, may be just nullable) which must be accompanied by the value pseudo-modifier. Such a value class is called a nullable value class. Other value classes are called regular value classes.

Two variable kinds: Variables come in two kinds, heap and stack. A heap variable is a class field (static or non-static) or an array element. A stack variable is a method parameter or local variable. (Local variables include specially declared names such as a catch variable.) Heap variables exhibit type-specific default values. Stack variables do not require a default value convention, because they are subject to definite assignment rules, which require explicitly assigned values.

Default values: For a regular value class RV, the expression RV.default evaluates to a non-null value of RV all of whose fields are their respective default values (typically zero, false, or null). The uninitialized value of a heap variable of type RV or RV.val is RV.defalt. The uninitialized value of a heap variable of type RV.box is null, and RV.box.default evaluates to null.

Nullable defaults to null: For a nullable value class NV, the expression NV.default evaluates to null (the null reference). The uninitalized value of a heap variable of type NV, NV.val, or NV.box is null.

Regular values never null: Regular value classes can never represent null values in their normal unboxed form. (In this, they “work like an int”.) Casting a null to a regular value class will throw a NullPointerException, just like casting a null Integer to int. Reflectively storing an untyped null reference to a heap variable of a regular value type will also throw a NullPointerException. Loading a heap variable of a regular, non-boxed value class will never produce a null.

Constructors are null-checked: In order to keep a clear distinction between the null default value and constructed values, value class constructors have a null check on exit. This means that if constructor code accidentally assigns zero or default values to all the instance fields, the constructor will throw NullPointerException rather than return the null instance value. No such check is done for regular value classes, since null values are impossible for them.

All values are flattenable: In heap variables, instances of both regular and nullable value classes behave as if they were flattened, and are in fact routinely flattened. This is likely to affect the performance and footprint of programs which use such variables. In stack variables, flattening may or may not happen, depending on how the interpreter or the JIT is directing execution.

All values are boxable: All value classes support a “boxed” view which interoperates with null and with erased generics. The expression V.box.default evaluates to null for all value types, both regular and nullable. Heap variables of type V.box are never flattened, but as a consolation prize they can receive nulls even for regular V. For generics, note that List<V.box> is always legal, but List<V.val> is currently illegal and reserved for future use, when specialized generics are available. Note that the unadorned value type name V usually denotes the same type as V.val, but we reserve the right to have some occurrences of V for certain types to denote V.box instead. (This is TBD; perhaps it is part of the migration package for nullable classes.)

Observations and Fine Print

Null stores as vull: When storing a null to a heap variable of a nullable value class, the JVM will reset all (non-static) fields of that variable to their default values. On the heap, a logically null flat value is called a flattened value null, or “vull” for short.

Vull loads as null: When loading a value from a heap variable of a nullable value class, the JVM will detect “vull” and convert it to a proper null reference.

Vull is a ghost: Thus, for a nullable value class NV, it is impossible to create or observe on stack a non-null instance of NV for which all fields of the instance are default. This means that “vulls” are confined to the heap. The JVM enforces this as a low-level invariant, by dynamically transcoding between on-heap “vulls” and on-stack nulls.

Pivot fields: As a “pro move”, a nullable value class can declare that one or more of its non-static fields with the __NullablePivot keyword (TBD). When detecting “vulls”, the JVM consults only such marked fields for their default value, not all fields of the instance. This may makes “vull” detection faster for legacy classes like LocalDate. Such a specially marked field (or fields) may be called a pivot field, since the task of “vull” detection “pivots” around that field. By default, if no fields are marked as pivot fields, then in effect all of them serve as pivot fields.

Null stops bad calls: It is arguable that the most legitimate job of null is to avoid executing a method call on a receiver which has not yet been specified. After all, objects do not always have reasonable default values, and so Java (and the JVM) assigns a “default default” value of null to object variables that are not otherwise initialized. The null value ensures that if buggy code tries to call a method on an uninitialized variable, an exception will be thrown immediately, rather than executing a method body on an unexpected input. (In this view, field gets are the same as method calls. Other uses of nulls, such as a API sentinel values, were created by creative programmers, who given a hammer will always find more nails.) For a value type without a reasonable default value, programmers have a right to a similar sentinel value which prevents method execution on uninitialized variables. But for a value with a reasonable default value, such machinery would be pure annoyance. Only the designer of the value class knows which case is true.

Inner value classes: Any non-static nested (“inner”) value class C.IV must also be declared to be nullable. The reason for this is that every properly constructed instance of C.IV must specify a non-null outer instance of type C. But if C.IV were regular and a method were called on the default value C.IV.default, then that method then it would not be able to observe a definite non-null value C.this. Such a method call would be inescapably broken. Thus, such method calls must be prevented. The existing language achieves this result by throwing NullPointerException on when invoking methods on the default value of C.IObj, for an object class I.IObj. To preserve this behavior, an inner value class C.IV must also present a null default value. This restriction does not apply to static nested value classes C.NV.

A slogan: The slogan for this user model of nullable value types is “no new nulls”. It refuses to introduce new “work-alikes” for the null reference. There is no new NullValueException which pairs with NullPointerException. There is no Nullable interface. There not an isNull method, certainly not a user-definable one. (An isNull method could never return a false result, could it? It would have to throw NullPointerException instead!) There are no directly observable “vull” values to compete for the throne of null; “vulls” are only indirectly observable in the flattening of certain heap variables. In short, the heavy cost of nulls (arguably a “billion dollar mistake”) is not multiplied by a new set of null-like values. And the historic cost of nulls is not pushed forward to new value types that don’t request it, such as arithmetic types.

Another slogan: Alternatively, the slogan from “Highlander” applies: There is only ever one null. Any would-be “vull” value is dissected from the value space and conjoined to null just as soon as it tries to enter the stack.

Implementation

DRAFT DRAFT DRAFT
THIS IS A ROUGH DESIGN FOR DISCUSSION
NOT A SPECIFICATION

Affected bytecodes: At the bytecode level, the instructions getfield, putfield, getstatic, putstatic, withfield, aaload, and aastore must transcode between “vull” and proper null. The defaultvalue bytecode must not produce “vull”.

Null containers rejected: If a getfield instruction is asked for an instance field NV.f of a value class NV, and if the on-stack value is null, then NullPointerException is thrown. There is no conversion of the containing instance to a “vull”. This is true regardless of the type of the field NV.f. If the on-stack value is non-null, and NV.f is a “vull”, then transcoding occur as usual. This means that the sequence defaultvalue V; getfield V.f:T is equivalent to defaultvalue T only if V is a normal value type. (It must also possess a non-static field f of type T.) If V is nullable, then the getfield instruction will throw.

Withfield transcodes twice: The withfield instruction must transcode on both input and output. It must convert a null input value to a temporary “vull”, one of whose fields is then updated. It must then detect whether the resulting value is a “vull” and convert that (and only that) back to a null. Unlike getfield and putfield, withfield does not reject a null container value. For example, it will produce a null result value if asked to store a default value to a field in an instance where all of the other fields are already set to default values.

Unaffected bytecodes: Instructions which operate only on stacked or local values do not need further modification to detect “vulls”, since “vulls” are never on stack. Receiver null checks for invokevirtual and its siblings are unchanged; these instructions will never encounter vull values. The acmp, checkcast, and instanceof instructions (and the aastore store check) already have special semantics for null references which are unchanged.

Transcoding in field instructions: For the field instructions, transcoding is a reasonable incremental cost to add, since these instructions resolve their field and therefore know the specific field type; thus the cost of adding transcoding between “vull” and null is incremental and added only for fields whose types require this extra step.

Transcoding in array instructions: Array element access instructions first check the layout of the target array element and then use the proper sequence of steps to convert from a flattened array element (if present) to a regular on-stack reference. As part of this sequence of steps, if the element type is a nullable, flattened value type, “vull” must be detected on load and produced on store, corresponding to an on-stack null reference.

Reflection, etc.: Access to fields and array elements via reflection, method handles, or JNI is defined in terms of the behaviors of bytecodes, as usual. Thus, reflectively loading or storing a value instance must include a transcoding step exactly when transcoding is required by the corresponding bytecode instruction.

Variable declaration: The bytecode-level descriptor for a flattenable variable of a value type has the form of a Q-descriptor, which begins with the letter “Q” instead of the letter “L” normally used with class-based types. Variables which hold a boxed value are introduced with L-descriptors beginning with “L”, like any other reference type. In the setting of the JVM type system, L-descriptors and Q-descriptors denote L-types and Q-types. Q-types and L-types roughly correspond to user-level V.val and V.box types, respectively. Again, in the setting of JVM types (only) we say a value of a Q-type or L-type is a Q-value or L-value.

Layout includes nullability: The nullability of a value class V is logically a part of V’s overall layout, its size and the format of its fields. This is because layout is the information that dictates the JVM’s exact steps when loading or storing a flattened value. Since these steps necessarily include a “vull” check when the value class is nullable, layout includes nullability.

Q-values in the heap: Q-types are introduced in the heap as part of a class declaration, or when an array type derived from the Q-type is mentioned. The JVM consults the layout of a Q-type Q-V when it lays out an instance field of type Q-V, or prepares a static field of type Q-V, or computes the layout of an array whose element type is Q-V. In all cases, the class declaration of V is loaded if necessary, and the layout of V is consulted, to determine the steps needed to load or store the Q-value.

Q-values on the stack: Q-types are introduced on the stack as part of method, field, or array type descriptors, or as checkcast targets. For nullable value types, both Q-types and L-types can carry null values, while for regular value types, Q-types cannot carry null. Thus, a checkcast to a “Q-type” will throw NullPointerException if presented at runtime with a null reference on the stack, but only if the referenced type is regular (not nullable).

Verifier rules: The verifier tracks the distinction between Q-types and L-types, and specifically ensures that an on-stack L-value is never consumed by an instruction which expects a Q-value, if the L-type accepts null but the Q-type does not. If the Q-type is nullable, implicit conversions are logically permissible, and the verifier should allow them. Given the nullable and regular value types NV and RV, it follows that Q-NV is a proper subtype of L-NV, but Q-RV and L-RV can be treated as the same type, since they have the same set of on-stack values.

Verifier rules for supers: If C is a super (class or interface) of NV or (respectively) RV, then L-C is a proper supertype of Q-NV and L-NV, or (respectively) Q-RV and L-RV. Note that supertypes of value types are always nullable. Thus, there is no need to distinguish between Q-types and L-types when converting to supers; if there is a null it will be welcome in the supertype.

Optimized calling sequences: The JIT may elect to use vull values for non-receiver parameters or return values, as an alternative to buffering via a nullable indirection. Such calling sequences must be made invisible to the end-user by ensuring that “vull” parameters are transcoded (detected and converted to nulls) as needed, and vice versa on return. Such transcoding operations seem to be reorderable and trackable much like null detection is at present. Thus, “vull” transcoding is thought to be optimizable as a straightforward extension to today’s JITs.

Constructor translation: A value class constructor starts with the default value of its class, and builds up the value by assigning to its fields. The rules of the Java language (for final fields and value instance fields) ensure that each field is assigned once and only once. The tracking of assignment along all paths uses a pair of conditions called “definite assignment” and “definite unassignment”. On every normal exit from a constructor, each field must be definitely assigned and not definitely unassigned. The JVM has no such rules for tracking assignment at bytecode boundaries. Instead, constructors are allowed to write to final fields any number of times. For values, the corresponding rule is that withfield is allowed to assign to an uninitialized value any number of times. In order to prevent null values from escaping from a value class constructor, the compiler must precede each return instruction by a null check. (This can be done with a call to Objects.requireNonNull or Object.getClass.) Optionally (TBD) the JVM could perform this check automatically.

Non-throwing getter: Optionally (and TBD), the JVM may choose to define getfield on a Q-type container to transcode the container to “vull”. Most uses of getfield would use the regular L-type container, but this variation would give a “hook” for translation strategies that need to operate on the fields of a possibly uninitialized value. This could happen, for example, inside a constructor. The class component of the CONSTANT_Fieldref of such an instruction would be Q-descriptor, rather than the name of a class. Note that, in the JVM, unadorned class names usually denote L-types, not Q-types, so directing a getfield instruction to a Q-type container is an unusual step.