Vectors for Java

Summary

This work seeks to enhance the Java core libraries by exposing data-parallel vectorization operations directly to the programmer.

Goals

We seek to provide developers precise tools to access vectorization (aka SIMD) operations on native architectures hosting the JVM. We expose this functionality with an idiomatic Java API that abstracts the notion of a vector (i.e. SIMD register) parameterized by element and size types.

Vector<E,S> v = Vector.factory(data);
Vector<E,S> w = Vector.factory(data2);
Vector<E,S> x = v.add(w);
x.intoArray(outputData);

The API must be abstract enough to encapsulate cross-platform features as well as multiple versions of the same platform. This entails facilities for predictable, graceful degradation in the absence of SIMD features on the local architecture.

Success Criteria

Qualitative Metrics

Quantitative Metrics

Motivation

Most general-purpose computing platforms provide facilities for vectorized operations. These are also known as Single-Instruction Multiple-Data (SIMD) operations. On x86 there is SSE and AVX/2/512. ARM has NEON. IBM POWER has AltiVec. SPARC has VIS. These architecture extensions exhibit a considerable amount of functional overlap.

A number of compiler optimizations exist to exploit the presence of SIMD support. These compiler optimizations are quite successful in accelerating existing workloads that map into the optimization schema, but there is always room for improvement and different dimensions to improve in. Compiler optimizations are subject to heuristics and pattern matching restrictions (among others) that preclude them from being a general case vector-programming solution. Additionally, a user looking to ensure some performance characteristic across platforms may be unpleasantly surprised at the variation between native platforms as the support for auto-vectorization may differ based on compiler support and platform SIMD availability. All of this is to say nothing of the oddity that is writing code that is semantically different than the code one expects a compiler to emit.

The Vector API shares some utility with JSR-84. This JSR proposed a pragma to relax the evaluation order on floating point operations to enable more liberal compiler optimizations. Namely, to enable optimizations that assume the presence of algebraic identities in IEEE-754 float and double types. This JSR has been withdrawn, but we note that this JEP enables the same type of optimizations that would become available in a relaxed ordering environment for floating point ops. The primary difference is that the onus for optimization and evaluation order selection is on the user and not the compiler. This has the added benefit of being able to co-exist with the existing FP semantics in the Java Language Specification without requiring any modifications to the specification itself. A similar maneuver is performed by DoubleStream when it uses the sum() terminal operation. This stream can be evaluated out of order (in parallel, even) and thus the evaluations rules aren't necessarily kept true to scalar Java FP arithmetic. The library does, however, apply compensation (Kahan Summation or similar) to help correct errors. We note that applying error compensation to floating point operations can also yield a different and even more accurate result than one might attain with standard FP association in Java. In the case of DoubleStream and in the case of Vector, the divergence from standard ordering is hidden behind an API.

Additionally, the Vector API still allows for explicit ordering of evaluation by the user. This work seeks to bring to the Java and the JVM a library that is the "Java" version of intrinsics libraries commonly seen elsewhere. The key difference between the Vector API and an intrinsic library is, predictably, that the Vector API abstracts over SIMD operations. This work seeks to establish a happy medium of functional abstractions over the widest number of SIMD operations across platforms.

Description

The Vector API is a programming model that centers on an immutable Vector data type that exposes functionality that we wish to support in a cross-architecture fashion. The API's immutability is denoted by the return type of all Vector-level operations. No in-Vector side-effects are intended in this model. This approach aligns our implementation with the register scheme commonly seen in vector/SIMD architecture extensions. Specifically, this makes the Vector API similar to SIMD architectures that use three register (source, operand1, operand2), non-side-effecting (with respect to operands; i.e. non-destructive) operations.

Operations

interface Vector<E,S extends Shape<Vector<?,S>>> {
    // Arithmetic and logical operations
    Vector<E,S> add(Vector<E,S> v2);
    Vector<E,S> mul(Vector<E,S> v2);
    Vector<E,S> neg();
    ...
    Vector<E,S> and(Vector<E,S> v2);
    ...
    // add/sub/mul/div/and/or/xor/min/max/cmp/...
}

The Vector interface supports a standard set of arithmetic and logical methods commonly seen across platforms. This set of operations is considered as the most basic one for developers to use as accelerated stand-ins for primitives Java operators.

interface Vector<E,S extends Shape<Vector<?,S>>> {
  // Getters and setters
  E getElement(int i);
  Vector<E,S> putElement(int i, E elem);
  ...
  // Nominal horizontal reductions
  E sumAll();
  // sum/min/max/and/or/xor, etc, etc, etc...
  ...
  // [De]serialization
  E[] toArray();
  fromArray(E[] ary, int offset);
}

A minimally-complete Vector API specification includes facilities for mapping to and from scalar types. These types include all of the primitive Java data types with the densest feature coverage likely centered in int, float, and double types. Mapping between vector and scalar types comes in the form of scalar indexing into vectors (element-wise loading and putting). Setting a scalar value (denoted by put, not set) entails the creation of a new Vector object. This is in line with the immutable model for Vector types. We include a set of nominal horizontal reductions based on common binary arithmetic and logical operations. These are specialized versions of a more general, higher-order reduce operation. They serve as another map back to scalar primitive types from Vector types. The last bridge between Vectors and scalars is via reading to and from memory with arrays. In the example listed above, the arrays are parameterized by the element type E. In the absence of value types, these methods will be specialized to a set of primitive to/from methods for each primitive type we support.

interface Vector<E,S extends Shape<Vector<?,S>>> {
    // General horizontal reductions:
    E reduce(BinaryOperator<E> op);
    E reduceWhere(Mask<S> mask, E id, BinaryOperator<E> op);

    Vector<E, S>       map(BinaryOperator<E> op, Vector<E,S> this2);
    <F,T> Vector<T, S> map(BiFunction<E,F,T> op, Vector<F,S> this2);
    <F> Mask<S>        test(BiPredicate<E,F> op, Vector<F, S> this2);
    Vector<E, S>       mapWhere(Mask<S> mask, BinaryOperator<E> op,
                                Vector<E,S> this2);
    <F> Vector<E, S>   mapWhere(Mask<S> mask, BiFunction<E,F,E> op,
                                Vector<F,S> this2);
    <F,T> Vector<T, S> mapOrZero(Mask<S> mask, BiFunction<E,F,T> op,
                                 Vector<F,S> this2);
}

The final set of methods, part of the Vector API straw man draft, includes higher order functionality. This suite of methods is the maximally expressive component of the Vector API. We note that some of these methods likely will not exist in the final version in this form due to the inbound function objects being parameterized by a lane type without introspective capabilities. In this form, we lack a robust way to "crack" the lambda and understand its meaning with regards to vector types. A potential enhancement to subsume this functionality is discussed in the alternatives section. An orthogonal piece of functionality is introduced by Mask. The Mask interface allows the user to prevent operations from acting on a particular lane of a Vector. In the straw-man draft, these appear on higher order functions. The final version will likely have a masked version of every basic Vector operation as well. This may double the number of basic operations, but masking allows the user to specify operations over data that may not be aligned to the width of a vector. This prevents extra stepping between scalar and vectorized versions of the same operation.

Data

The Vector API will provide facilities for instantiating vector objects from simple scalar types, but interesting problems generally start from data that lives in structures such as primitive arrays or nio Buffers. The API draft implementation parameterizes array operations by element type. In the absence of value-based types and the necessary related class specialization, parameterized arrays give us boxed-element arrays which are raggedly arrange in memory. We propose supporting specialized loads and stores to all primitive array types where appropriate. One method to do this is to introduce specialized subtypes to Vector that can carry the according array marshalling and unmarshalling methods.

interface FloatVector<S> extends Vector<S extends Shape<Vector<?,S>>> {
    void toArray(float[] ary, int index);
    FloatVector<S> fromArray(float[] ary, int index);
}

ByteBuffer provides an interesting alternative to data represented as primitive type arrays. ByteBuffer provides accessors and mutators for different primitive types that gives the us multiple views onto the underlying data. The Vector API could motivate extensions to the ByteBuffer interface to support wider views onto the data. Additionally, ByteBuffer sees enhancements in JDK 9 that provide for alignment-sensitive slicing in direct buffers. These features allow the user to align Vector loads and stores from memory for greater efficiency. Early tests with aligned ByteBuffers observe a speedup in operations that spend considerable amount of time loading values from memory.

Masking Operations

Vector (or SIMD) operations can benefit from the added expressiveness of masking.An additional argument to an operation is a mask that specifies whichelements that operation will apply to. These operations can be useful for a variety of reasons. Masked loads and stores allow vectorized loop bodies to operate on data do not align to the width of the vector operation. General operation masking allows vector operations to encode control flow without branching instructions. This is an appropriate use case when branches are shallow (ex. a few operations) instead of significantly deep.

interface Mask<S extends Shape<Vector<?, S>>> {
    int length();
    long toLong();
    boolean[] toArray();
    <E> Vector<E, S> toVector(Class<E> type);
    boolean getElement(int i);
}

The above Mask interface is an abstracted notion of a masked register that contains a series of packed bits. Assuming a maximum bit width of 64-bits, this interface assumes a maximum of 64 elements. If one assumes bytes to be the smallest element type, this implies a 512-bit vector. Such operations do exist in the wild and as time progresses, they will only get wider. If one requires masking longer than 64 bits, the interface can accommodate such use via getElement(int i).

In the absence of a proper masked operation on the local architecture we can degrade by "splitting" operations into two different registers and blending the results back together according to the Vector API mask. This is an alternative to a simple branching operation.

Implementation

The Vector API outlined in this document stands as an abstraction layer over an underlying implementation. We utilize factories to encapsulate the generation of Vector objects. A draft implementation of the Vector API currently exists in Project Panama where the Vector API encapsulates an enhancement to Hotspot and the JDK called Machine Code Snippets. Machine Code Snippets ("Code Snippets") are an addition to the JVM that allow developers to specify machine-code level intrinsics in the JDK instead of adding them in Hotspot (specified in C++). The Vector API is currently implemented on top of Code Snippets, but we could substitute this layer of the implementation out for another compiler-interface with a similar set of functions. This may be the case with future iterations of Hotspot or with facilities provided by the Graal compiler.

Data Types

Code Snippets introduces three new value-based classes: Long2, Long4, and Long8. These act as data types for 128-bit, 256-bit, and 512-bit vectors, respectively. These Code Snippets value classes are intended to be identity-less (akin to value types) to maximize their susceptibility to JIT compiler optimizations, namely escape analysis. These types are not intended to be exposed "to the public." The implementation currently binds a Code Snippet value class to a Vector API factory object as a field in that object. We note that this approach isn't totally satisfactory yet: a one-to-one pairing of a value class to a class with an identity creates a de facto identity for the identity-less class. One proposed solution to this problem is to make Vector API factory objects identity-less as well. Introducing an identity-less class to the user may also introduce additional adoption pain. Future identity-less classes do already appear in the JDK.

The documentation strongly recommends that users do not rely on referential equality with instances of these classes because they are slated to become future value types. This is a maneuver that we can run with Vector if need be. If Vector API objects were to be made identity-less in this way, the documentation would have to clearly detail the presence of this characteristic as well as provide a means to give back the missing functionality as it might be needed for interoperability with existing libraries (though at this point it's not clear what the use case would be that would demand this kind of equality for Vectors). This requires further investigation, but may be a potential application for heisenboxes.

Specifically, the border at which the Vector objects must become identity-full instead of identity-less should be explored more than it currently has been. There may be potential automatic solutions in this realm or automatic solutions that work with a modest amount of developer intervention (annotation-driven).

Code Snippets

Machine Code Snippets is an experimental feature added to the Panama Project to support the introduction of native machine code at the Java level. Previously, intrinsic operations was only introducible by way of extending the JVM runtime. This requires a working knowledge of JVM internals which can be a substantial, if orthogonal, amount of knowledge in order to add a feature to the JDK that is otherwise unrelated to compilers. Code Snippets provides a way for a JDK developer with machine-level knowledge to add machine code for his or her supported architecture. An example from the Panama Project follows.

static final MethodHandle mov256MH = MachineCodeSnippet.make("move256",
        MethodType.methodType(void.class,                  // return type
                Object.class /*rdi*/, long.class /*rsi*/,  // src
                Object.class /*rdx*/, long.class /*rcx*/), // dst
        effects(READ_MEMORY, WRITE_MEMORY), // RW
        requires(AVX),
        0xC4, 0xE1, 0x7E, 0x6F, 0x04, 0x37,  // vmovdqu ymm0,[rsi+rdi]
        0xC4, 0xE1, 0x7E, 0x7F, 0x04, 0x0A); // vmovdqu [rdx+rcx],ymm0

/*
 # {method} {0x115c2f880} 'move256' '(Ljava/lang/Object;JLjava/lang/Object;J)V'
 # parm0:    rsi:rsi   = 'java/lang/Object'
 # parm1:    rdx:rdx   = long
 # parm2:    rcx:rcx   = 'java/lang/Object'
 # parm3:    r8:r8     = long
 #           [sp+0x20]  (sp of caller)

 0x1051bd560: mov    %eax,-0x16000(%rsp)
 0x1051bd567: push   %rbp
 0x1051bd568: sub    $0x10,%rsp

 0x1051bd56c: mov    %rsi,%rdi
 0x1051bd56f: mov    %rdx,%rsi
 0x1051bd572: mov    %rcx,%rdx
 0x1051bd575: mov    %r8,%rcx

 0x1051bd578: vmovdqu (%rdi,%rsi,1),%ymm0
 0x1051bd57e: vmovdqu %ymm0,(%rdx,%rcx,1)

 0x1051bd584: add    $0x10,%rsp
 0x1051bd588: pop    %rbp
 0x1051bd589: test   %eax,-0x4d3d58f(%rip)

 0x1051bd58f: retq
 */

The Code Snippets library provides facilities to add bound new snippets by accepting the characteristics of a snippet and its code and returning a MethodHandle binding to that snippet. The user describes the code snippet by its Java type, its effects on memory, and providing a predicate that can check for the required native support before proceeding to bind the snippet (preventing execution of the code should the predicate be false). An additional form of the make() method provides facilities for patching an existing piece of code with registers provided by the register allocator at run time. An example follows.

static final MethodType MT_L2_BINARY =
    MethodType.methodType(Long2.class, Long2.class, Long2.class);

private static final MethodHandle MHm128_vpadd_epi32 = MachineCodeSnippet.make(
        "m128_add_epi32", MT_L2_BINARY, requires(AVX),
        new Register[][]{xmmRegistersSSE, xmmRegistersSSE, xmmRegistersSSE},
        (Register[] regs) -> {
            // VEX.NDS.128.66.0F.WIG FE /r VPADDD xmm1, xmm2, xmm3/m128
            Register out = regs[0];
            Register in1 = regs[1];
            Register in2 = regs[2];

            int[] vex = vex2(1, in2.encoding(), 0, 1);
            return new int[]{
                    vex[0], vex[1],
                    0xFE,
                    modRM(out, in1)};
        });

In the above, the user provides an array of arrays of Register that can be of length 1 - n, where 1 would be a single pinned register and n would be a set of acceptable registers for that position. The order of the registers corresponds to the order of parameters in a MethodType object factory instantiation. The first parameter is the register on which the return value will reside. The following registers contain the parameters in the order of appearance. This approach allows the JIT to dynamically reconfigure the code in a way that minimizes register pressure.

Part of what makes Code Snippets appealing for Hotspot implementers is that it has a low-touch approach to interfacing with code generation. Code Snippets does not expect any compiler transformations to be performed aside from runtime register allocation. Essentially, "what you call is what you generate."

The Importance of Being Encapsulated

Code Snippets is an unsafe library. No bounds checking or (pre/post) loop alignment checking is performed. The purpose of this library is similar to that of Unsafe: as a JDK implementer's tool. Bounds checking and related safety features must be introduced at the JDK library level. Module systems like those introduced in Project Jigsaw must be employed to limit access to these tools. Arbitrary code execution is a textbook security vulnerability. This library would provide a low cost attack vector if made simply public. Therefore, its features should be constrained at the library level and hidden from public use.

Alternatives

There exist a number of direct and indirect or complementary approaches to introducing vector primitives to Java. These include both related (derivable from the work in this JEP) and out-of-band solutions (requires its own JEP).

MethodHandles-first

The Vector API proposes a traditional, factory-based approach to delivering low-cost vectors to the user. This approach builds upon, and assumes the presence of, Code Snippets. One direct alternative to the Vector API would be to expose Code Snippets directly in some limited form. Currently, Code Snippets brings in additional data types that are required for supporting vectors. These types (called Long2, Long4, Long8) are aligned with common vector lengths across architectures (128, 256, 512 bits, respectively) and are intended to be identity-less and unboxed by the compiler. If the Vector API is to be realized in the form described in this JEP, all of the Code Snippets infrastructure is assumed to be in place. An alternative to the full API would be to simply expose the primitives implemented in Code Snippets along with the data types that it introduces. For example, we could standardize the exposed functionality as a dictionary of operations mapped to MethodHandle objects that bind to supported Code Snippets.

EnumMap<VectorOp,MethodHandle> methods512 = get512Methods();
MethodHandle add = methods512.getOp(ADD);
Long8 result = add.invokeExact(someOp1, someOp2);

Such a design approach would require a way for a user to ascertain at runtime which vector operations are available. This could entail the use of a dictionary where the operations are encapsulated in an Optional. A user could then determine their own path of degradation instead of having one imposed upon them by a more high-level API. The drawback to this approach is that it requires the developer to understand MethodHandles and related combinators as a prerequisite to using vector operations in Java. While the MethodHandles API is a very powerful tool, this requirement seems out of band as a prerequisite for using vector operations. Devolving the Vector API to one based on MethodHandles erodes the static type checks that we would be able to bake into the full fledged Vector API. Reducing the power of static checks to find bugs can detract from the utility of the API.

MethodHandles and Abbreviated Value Types

Recent developments in the Valhalla have lead to a proposal of an abbreviated implementation of value types (referred to as Q-types in the accompanying proposal). These Q-Types would be value-based and thus would not be susceptible to boxing overhead that the reference (L-Type) types suffer from in this space. The early release of Q-types would likely not be supported directly in Java syntax. Instead, the proposal outlines a factory model that produces MethodHandles to provide functionality that is normally taken for granted (new, field getters and setters, substitutable equality, and clone). Given that this value types proposal is shaped to provide a faster "time-to-market" for value types, it would stand to reason that the Vector API should do the same. The Vector API could expose available vector operations behind a factory that produces MethodHandles for canonical vector operations described in this document. Vector types would be observed on MethodHandles that encapsulate vector operations, but these types would be opaque and would be directly operable. Additional support for this approach would focus on combinators for kernels that would produce and consume the opaque types that user-defined kernels would operate on. An expression-based domain specific language may also be appropriate to help accelerate kernel construction as it could be consumed by a builder that composes MethodHandles automatically. Combinators that produce and consume results from a kernel would look something like reductions of many data sources (primitive arrays, ByteBuffers) to a single source (array, ByteBuffer) or single scalar value. Ideally, these combinators would accept a kernel to produce a closed, vectorized, custom operation over data sources and sinks. These combinators could be presented in a similar factory pattern as vector op MethodHandles if segmenting them by size is deemed to be necessary. An example follows.

float[] a,b,c,res;
...
// Sources data from N-arrays for an N-argument MethodHandle.
// First array is the result array.
// Assumes a simple traversal of same-sized arrays.
MethodHandle reduceFloatArrays(MethodHandle reducer){...}

EnumMap<VectorOp,MethodHandle> methods512 = get512FloatMethods();

// (QLong8,QLong8)QLong8
MethodHandle add = methods512.getOp(ADD);

// (QLong8,QLong8)QLong8
MethodHandle mul = methods512.getOp(MUL);

// (QLong8,QLong8,QLong8)QLong8
MethodHandle fma = MethodHandles.collectArguments(add,1,mul);

// The Specialized Loop, which accepts float[] arguments
// to match the shape of the kernel.
MethodHandle specLoop = reduceFloatArrays(fma);

specLoop.invokeExact(res,a,b,c);

Better Auto-vectorization

Another avenue to pursue would be enhancements to the auto-vectorizer in the JIT compiler. The specifics of what this would entail are outside the scope of the JEP. Suffice it to say that academic work in this area has progressed since the introduction of superword optimization. The outlook of the upside from such enhancements are, at the time of this writing, unclear.

Expression Trees

Part of the straw-man Vector API proposal includes provisions for higher order functionality in the Vector API. This part of the proposal significantly expands the expressiveness of the API, but comes with some important caveats. Lambdas specified in the functional interfaces defined in java.util.Function have no introspective features. That is, an inbound BinaryOperator isn't "crackable". Ordinarily this would not pose a problem to an API, but in the case of the Vector API, the sensical approach for higher order functionality calls for lambdas to be defined over the element type of a vector, not the vector type itself. Without introspection into the definition of the lambda, we are unable to determine its semantics for vectorization.

An alternative, and perhaps complementary proposal that seeks to provide a solution to this shortcoming is one that seeks to make Java expressions (at a minimum, this could also include statement-level constructs) explicitly encode-able for reinterpretation at runtime. Talks on the Vector API at JVMLS and JavaOne this proposal. In essence, Expression Trees seek to make the body of a loop (presented in a lambda) explicit so it can be customized by a library and compiler at runtime. This overlaps with a problem described by Cliff Click called The Inlining Problem. Expression tree libraries and expression tree reification at runtime is functionality observed in other managed languages, namely C#. It is not clear that the problem that expression trees solves with regards to the Vector API warrants the introduction of such a feature to the JDK by itself. Alternatively, we can introduce a limited object-based embedded expression languages for programmers to explicitly encode expressions and include it in a Vector API release. This seems like it could be introducing a future redundancy. Another possibility is providing specialized higher order functions from an existing library and throw out the ability to define custom lambdas altogether. This approach can cover a lot of ground, but creates technical liability for maintainers and could result in unsatisfying shortcomings in terms of feature coverages for developers.

Expression trees as an alternative to the Vector API would require a method to explicitly reify expressions and traverse them for recompilation by a user-defined expression visitor at runtime. Users would provide a loop kernel in the form of a lambda whose functional interface would be provided a priori by the library. At present, some of this functionality can be accomplished with ASM and serialized lambdas. This approach requires visitor implementations to provide significant coverage of the JVM bytecode spec in order to function well. This seems like another unnecessary knowledge burden simply for better vectorization and loop customization. Moreover, there exists a significant amount of the lambda body that the Vector API shouldn't be expected to support. These include deep, diverging control-flow structures and loops. Trying to reconstruct the semantics of a loop body from bytecode, which can result in so m undesired loop bodies, seems like an unfruitful general approach.

As of the writing of this JEP, expression trees are being held as a separate item for additional study and possible proposal on a future JEP.

Testing

The basic sanity check of a Vector API operation is that it is semantically equivalent to a scalar, loop-based construction based on the same operation.

// v.length == 8
int offset = <some offset>;
Vector<Integer,Size.S256Bit> v   = Vector.fromIntArray(data, offset);
Vector<Integer,Size.S256Bit> v2  = Vector.fromIntArray(data2, offset);
Vector<Integer,Size.S256Bit> res = v.add(v2);
res.intoIntArray(output,offset);

It's equivalent to, in effect:

for (int i = 0; < 8; i++){
    output[offset + i] = data[i] + data2[i];
}

While the Vector API objects clearly affect the state of vector registers on a given machine, they otherwise have little interaction with the existing JVM environment save for methods that read and write to on-heap locations. In the above example of a test case showing the equivalence of an iterative version against a Vector API implementation, one can see the emergence of the "loop body specification" occurring in the Vector API and how it relates to a more traditional loop. These are the semantics of the API that we use to test each operator for basic correctness.

Scalar implementations

As a baseline for testing and pure fallback, we must provide a set of Vector factory objects that are implemented in a classic, scalar fashion without any dependence upon Code Snippets or any other underlying implementation. These implementations would serve dual purposes. First, the classes would be our baseline implementation for correctness. Any native-accelerated Vector object should be tested against these baseline classes for functional correctness. Second, these baseline classes serve as a fallback in the event that the the Vector API classes would not be otherwise supported by a native accelerated implementation. The baseline classes should be structured in a way to make them amenable to auto-vectorization that exists in JDK implementations already.

Risks and Assumptions

The current specification of the Vector API implies some heavy lifting to occur under the hood for it to be efficient. Vector instantiation is to be hidden behind various factory methods, but regular operations still imply the creation of objects. We assume the introduction of a method, or a set of methods, to ameliorate this implied overhead. These could include an enhanced form of escape analysis on Vector objects, the introduction of value types, and/or the introduction of identity-less classes (heisenboxes) that give the compiler more leeway to dispense with objects in that class. Without such enhancements to the VM, this overhead will make the Vector API prohibitively slow to use.

The Vector API assumes the introduction of a mechanism for specifying machine code snippets at the Java level. Alternatively to the existing facilities for introducing primitives, this Code Snippets framework is a tool for implementers to add intrinsics at the JDK level instead of placing additional pressure on the compiler (i.e. Hotspot) codebase itself. This comes with some specific advantages. Lifting intrinsics to the Java level heads off technical debt that we might incur as we move forward to Graal. Future JVM implementations need only support the Code Snippets implementation instead of supporting N vector operations multiplied by M vector types (an NxM problem). Current prototypes of this framework in x86 Hotspot has shown promising characteristics for performance and code quality (inlining, etc.). However, this does incur the cost of an additional VM feature to support across platforms. This approach also introduces a paradigm by which we execute literal data encoded into a Java class. This would seem to have security implications. We counter these concerns by noting that this framework will be used as an implementers-only tool and will arrive well after the introduction to Project Jigsaw and the module system in JDK. We will make use of the module system to wall-in this framework so it will not be available for general consumption.