Instruction execution

One distinguishing feature of Jolt among zkVM architectures is how it handles instruction execution, i.e. proving the correct input/output behavior of every RISC-V instruction in the trace. This is primarily achieved through the Shout lookup argument.

Large Lookup Tables and Prefix-Suffix Sumcheck

The Shout instance for instruction execution must query a massive lookup table -- effectively of size $2^{64}$ , since the lookup query is constructed from two 32-bit operands. This $K >> T$ parameter regime is discussed in the Twist and Shout paper (Section 7), which proposes the use of sparse-dense sumcheck. However, upon implementation it became clear that sparse-dense sumcheck did not generalize well to the full RISC-V instruction set.

Instead, Jolt introduces a new algorithm: prefix-suffix sumcheck, described in the appendix of Proving CPU Executions in Small Space. Like the sparse-dense algorithm, it requires some structure in the lookup table:

The lookup table must have an MLE that is efficiently evaluable by the verifier. The JoltLookupTable trait encapsulates this MLE.
The lookup index can be split into a prefix and suffix, such that MLEs can be evaluated independently on the two parts and then recombined to obtain the desired lookup entry.
Every prefix/suffix MLE is efficiently evaluable (constant time) on Boolean inputs

The prefix-suffix sumcheck algorithm can be conceptualized as a careful application of the distributive law to reduce the number of multiplications required for a sumcheck that would otherwise be intractably large.

To unpack this, consider the read-checking sumcheck for Shout, as presented in the paper.

$rv (r_{cycle}) = k = (k_{1}, \dots, k_{d}) \in ({0, 1}^{l o g (K) / d})^{d}, j \in {0, 1}^{l o g (T)} \sum eq (r_{cycle}, j) \cdot (i = 1 \prod d ra_{i} (k_{i}, j)) \cdot Val (k)$

Naively, this would require $Θ (d K T)$ multiplications, which is far too large given $d = 8$ and $K = 2^{64}$ . But suppose $Val$ has prefix-suffix structure. The key intuition of prefix-suffix structure is captured by the following equation:

$Val (k_{prefix}, k_{suffix}) = (prefix, suffix) \in decompose (Val) \sum prefix (k_{prefix}) \cdot suffix (k_{suffix})$

You can think of $k_{prefix}$ and $k_{suffix}$ as the high-order and low-order "bits" of $k$ , respectively, obtained by splitting $k$ at some partition index. The PrefixSuffixDecomposition trait specifies which prefix/suffix MLEs to evaluate and how to combine them.

We will split $k$ four times at four different indices, and these will induce the four phases of the prefix-suffix sumcheck. Each of the four phases encompasses 16 rounds of sumcheck (i.e. 16 of the address variables $k$ ), so together they comprise the first $lo g K = 64$ rounds of the read-checking sumcheck.

Given our prefix-suffix decomposition of $Val$ , we can rewrite our read-checking sumcheck as follows:

$rv (r_{cycle}) = k_{prefix} \in {0, 1}^{16}, k_{suffix} \in {0, 1}^{48}, j \in {0, 1}^{l o g (T)} \sum eq (r_{cycle}, j) \cdot ra (k_{prefix}, k_{suffix}, j) \cdot (prefix, suffix) \sum prefix (k_{prefix}) \cdot suffix (k_{suffix})$

Note that we have replaced $\prod_{i = 1}^{d} ra_{i} (k_{i}, j)$ with just $ra$ . Since $\prod_{i = 1}^{d} ra_{i} (k_{i}, j)$ is degree 1 in each $k$ variable, we will treat it as a single multilinear polynomial while we're binding those variables (the first $lo g K$ rounds). The equation as written above depicts the first phase, where $k_{prefix}$ is the first 16 variables of $k$ , and $k_{suffix}$ is the last 48 variables of $k$ .

Rearranging the terms in the sum, we have:

$rv (r_{cycle}) = k_{prefix} \in {0, 1}^{16} \sum (prefix, suffix) \sum prefix (k_{prefix}) \cdot k_{suffix} \in {0, 1}^{48}, j \in {0, 1}^{l o g T} \sum ra (k_{prefix}, k_{suffix}, j) \cdot eq (r_{cycle}, j) \cdot suffix (k_{suffix})$

Note that the summand is degree 2 in $k_{prefix}$ , the variables being bound in the first phase:

$k_{prefix}$ appears in $prefix (k_{prefix})$
$k_{prefix}$ also appears in appears in the paranthesized expression in $ra (k_{prefix}, k_{suffix}, j)$

Written in this way, it becomes clear that we can treat the first 16 rounds as a mini-sumcheck over just the 16 $k_{prefix}$ variables, and with just multilinear terms. If we can efficiently compute the $2^{16}$ coefficients of each mulitlinear term, the rest of this mini-sumcheck is efficient. Each evaluation of $prefix (k_{prefix})$ can be computed in constant time, so that leaves the parenthesized term:

$k_{suffix} \in {0, 1}^{48}, j \in {0, 1}^{l o g T} \sum ra (k_{prefix}, k_{suffix}, j) \cdot eq (r_{cycle}, j) \cdot suffix (k_{suffix})$

This can be computed in $Θ (T)$ : since $ra$ is one-hot, we can do a single iteration over $j \in {0, 1}^{l o g T}$ and only compute the terms of the sum where $ra (k_{prefix}, k_{suffix}, j) = 1$ . We compute a table of $eq (r_{cycle}, j)$ evaluation a priori, and $suffix (k_{suffix})$ can be evaluated in constant time on Boolean inputs.

After the first phase, the high-order 16 variables of $k$ will have been bound. We will need to use a new sumcheck expression for the next phase:

$rv (r_{cycle}) = k_{prefix} \in {0, 1}^{16} \sum (prefix, suffix) \sum prefix (r^{(1)}, k_{prefix}) \cdot k_{suffix} \in {0, 1}^{32}, j \in {0, 1}^{l o g T} \sum ra (r^{(1)}, k_{prefix}, k_{suffix}, j) \cdot eq (r_{cycle}, j) \cdot suffix (k_{suffix})$

Now $r^{(1)} \in F^{16}$ are random values that the first 16 variables were bound to, and $k_{prefix}$ are the next 16 variables of $k$ . Meanwhile, $k_{suffix}$ now represents the last 32 variables of $k$ .

This complicates things slightly, but the algorithm follows the same blueprint as in phase 1. This is a sumcheck over the 16 $k_{prefix}$ variables, and there are two multilinear terms. We can still compute each evaluation of $prefix (r^{(1)}, k_{prefix})$ in constant time, and we can still compute the parenthesized term in $Θ (T)$ time (observe that there is exactly one non-zero coefficient of $ra (r^{(1)}, k_{prefix}, k_{suffix}, j)$ per cycle $j$ ).

After the first $lo g K$ rounds of sumcheck, we are left with:

$j \in {0, 1}^{l o g (T)} \sum eq (r_{cycle}, j) \cdot ra (r_{address}, j) \cdot Val (r_{address})$

which we prove using the standard linear-time sumcheck algorithm. Note that $ra (r_{address}, j)$ here is a virtual polynomial.

Prefix and Suffix Implementations

Jolt modularizes prefix/suffix decomposition using two traits:

SparseDensePrefix (under prefixes/)
SparseDenseSuffix (under suffixes/)

Each prefix/suffix used in a lookup table implements these traits.

Multiplexing Between Instructions

An execution trace contains many different RISC-V instructions. Note that there is a many-to-one relationship between instructions and lookup tables -- multiple instructions may share a lookup table (e.g., XOR and XORI). To manage this, Jolt uses the InstructionLookupTable trait, whose lookup_table method returns an instruction's associated lookup table, if it has one (some instructions do not require a lookup).

Boolean lookup table flags indicate which table is active on a given cycle. At most one flag is set per cycle. These flags allow us to "multiplex" between all of the lookup tables:

$ℓ \sum flag_{ℓ} (j) \cdot Val_{ℓ} (k)$

where the sum is over all lookup tables $ℓ$ .

Only one table's flag $\textsf{flag}}_{\ell}(j)$ is 1 at any given cycle $j$ , so only that table's $Val_{ℓ} (k)$ contributes to the sum.

This term becomes a drop-in replacement for $Val$ as it appears in the Shout read-checking sumcheck:

$rv (r_{cycle}) = k = (k_{1}, \dots, k_{d}) \in ({0, 1}^{l o g (K) / d})^{d}, j \in {0, 1}^{l o g (T)} \sum eq (r_{cycle}, j) \cdot (i = 1 \prod d ra_{i} (k_{i}, j)) \cdot (ℓ \sum flag_{ℓ} (j) \cdot Val_{ℓ} (k))$

Note that each $Val_{ℓ}$ here has prefix-suffix structure.

raf Evaluation

The $raf$ -evaluation sumcheck in Jolt deviates from the description in the Twist/Shout paper. This is mostly an artifact of the prefix-suffix sumcheck, which imposes some required structure on the lookup index (i.e. the "address" variables $k$ ).

Case 1: Interleaved operands

Consider, for example, the lookup table for XOR x y. Intuitively, the lookup index must be crafted from the bits of x and y. A first attempt might be to simply concatenate the bits of x and y, i.e.:

$(k_{1}, k_{2}, \dots, k_{64}) = (x_{1}, x_{2}, \dots, x_{32}, y_{1}, y_{2}, \dots, y_{32})$

Unfortunately, there is no apparent way for this formulation to satisfy prefix-suffix structure. Instead we will interleave the bits of x and y, i.e.

$(k_{1}, k_{2}, \dots, k_{64}) = (x_{1}, y_{1}, x_{2}, y_{2}, \dots, x_{32}, y_{32})$

With this formulation, the prefix-suffix structure is easily apparent. Suppose the prefix-suffix split index is 16, so:

$k_{prefix} = (x_{1}, y_{1}, x_{2}, y_{2}, \dots, x_{8}, y_{8}) k_{suffix} = (x_{9}, y_{9}, x_{10}, y_{10}, \dots, x_{32}, y_{32})$

Then XOR x y has the following prefix-suffix decomposition:

$Val_{XOR} (k_{prefix}, k_{suffix}) = prefix_{XOR} (k_{prefix}) + suffix_{XOR} (k_{suffix}) prefix_{XOR} (k_{prefix}) = 2^{31} \cdot (x_{1} + y_{1} - 2 x_{1} y_{1}) + 2^{31} \cdot (x_{2} + y_{2} - 2 x_{2} y_{2}) + \dots + 2^{24} \cdot (x_{8} + y_{8} - 2 x_{8} y_{8}) suffix_{XOR} (k_{prefix}) = 2^{23} \cdot (x_{9} + y_{9} - 2 x_{9} y_{9}) + 2^{22} \cdot (x_{10} + y_{10} - 2 x_{10} y_{10}) + \dots + 2^{0} \cdot (x_{32} + y_{32} - 2 x_{32} y_{32})$

By inspection, $prefix_{XOR} (k_{prefix})$ effectively computes the 8-bit XOR of the high-order bits of x and y, while $suffix_{XOR}$ computes the 24-bit XOR of the low-order bits of x and y. Then the full result XOR x y is obtained by concatenating (adding) the two results.

Now that we've confirmed that we have something with prefix-suffix structure, we can write down the $raf$ -evaluation sumcheck expression. Instead of a single $raf$ polynomial, here we have two lookup operands x and y, which are called LeftLookupOperand and RightLookupOperand in code. These are the values that appear in Jolt's R1CS constraints. The point of the $raf$ -evaluation sumcheck is to relate these (non-one-hot) polynomials to their one-hot counterparts. In the context of instruction execution Shout, this means the $ra$ polynomial.

Since we have two $raf$ -like polynomials, we have two sumcheck instances:

$LeftLookupOperand (r) = k, j \sum eq (r, j) \cdot ra (k, j) \cdot ℓ = 0 \sum l o g (K) /2 - 1 2^{ℓ} \cdot k_{2 ℓ} RightLookupOperand (r) = k, j \sum eq (r, j) \cdot ra (k, j) \cdot ℓ = 0 \sum l o g (K) /2 - 1 2^{ℓ} \cdot k_{2 ℓ + 1}$

This captures the interleaving behavior we described above: LeftLookupOperand is the concatenation of the "odd bits" of $k$ , while RightLookupOperand is the concatenation of the "even bits" of $k$ .

Case 2: Single operand

Many RISC-V instructions are similar to XOR, in that interleaving the operand bits lends itself to prefix-suffix structure. However, there are some instructions where this is not the case. The execution of some arithmetic instructions, for example, are handled by computing the corresponding operation in the field, and applying a range-check lookup to the result to truncate potential overflow bits.

In this case, the lookup index corresponds to the (single) value y being range-checked, so we do not interleave its bits with another operand. Instead, we just have:

$(k_{1}, k_{2}, \dots, k_{64}) = (0, 0, \dots, 0, y_{1} y_{2}, \dots, y_{32})$

In this case, we have the following sumchecks:

$LeftLookupOperand (r) = 0 RightLookupOperand (r) = k, j \sum eq (r, j) \cdot ra (k, j) \cdot ℓ = 0 \sum l o g (K) - 1 2^{ℓ} \cdot k_{ℓ}$

LeftLookupOperand is 0 and RightLookupOperand will be the concatenation of all the bits of $k$ .

In order to handle Cases 1 and 2 simultaneously, we can use the same "multiplexing" technique as in the read-checking sumcheck. We use a flag polynomial to indicate which case we're in:

$InterleaveOperands (j) = 1 - AddOperands (j) - SubtractOperands (j) - MultiplyOperands (j)$

where $AddOperands$ , $SubtractOperands$ , and $MultiplyOperands$ are circuit flags.

Prefix-suffix structure

These sumchecks have similar structure to the read-checking sumcheck. A side-by-side comparison of the read-checking sumcheck with the sumchecks for Case 1:

$rv (r) = k, j \sum eq (r, j) \cdot ra (k, j) \cdot Val (k) LeftLookupOperand (r) = k, j \sum eq (r, j) \cdot ra (k, j) \cdot ℓ = 0 \sum l o g (K) /2 - 1 2^{ℓ} \cdot k_{2 ℓ} RightLookupOperand (r) = k, j \sum eq (r, j) \cdot ra (k, j) \cdot ℓ = 0 \sum l o g (K) /2 - 1 2^{ℓ} \cdot k_{2 ℓ + 1}$

As it turns out, $\sum_{ℓ = 0}^{l o g (K) /2 - 1} 2^{ℓ} \cdot k_{2 ℓ}$ and $\sum_{ℓ = 0}^{l o g (K) /2 - 1} 2^{ℓ} \cdot k_{2 ℓ + 1}$ have prefix-suffix structure, so we can also use the prefix-suffix algorithm for these sumchecks. Due to their similarities, we batch the read-checking and these $raf$ -evaluation sumchecks together in a bespoke fashion.

Other sumchecks

ra virtualization

The lookup tables used for instruction execution are of size $K = 2^{64}$ , so we set $d = 8$ as our decomposition parameter such that $K^{1/ d} = 2^{8}$ .

Similar to the RAM Twist instance, we opt to virtualize the $ra$ polynomial. In other words, Jolt simply carries out the read checking and $raf$ evaluation sumchecks as if $d = 1$ .

At the conclusion of these sumchecks, we are left with claims about the virtual $ra$ polynomial. Since the polynomial is uncommitted, Jolt invokes a separate sumcheck that expresses an evaluation $ra$ in terms of the constituent $ra_{i}$ polynomials. The $ra_{i}$ polynomials are, by definition, a tensor decomposition of $ra$ , so the " $ra$ virtualization" sumcheck is the following:

$ra (r, r^{'}) = j \in {0, 1}^{l o g (T)} \sum eq (r^{'}, j) \cdot (i = 1 \prod d ra_{i} (r_{i}, j))$

Since the degree of this sumcheck is $d + 1 = 9$ , using the standard linear-time sumcheck prover algorithm would make this sumcheck relatively slow. However, we employ optimizations specific to high-degree sumchecks, adapting techniques from the Karatsuba and Toom-Cook multiplication algorithms.

One-hot checks

Jolt enforces that the $ra_{i}$ polynomials used for instruction execution are one-hot, using a Booleanity and Hamming weight sumcheck as described in the paper. These implementations follow the Twist and Shout paper closely, with no notable deviations.

JoltBook