Bytecode

At each cycle of the RISC-V virtual machine, the current instruction (as indicated by the program counter) is "fetched" from the bytecode and decoded. In Jolt, this is proven by treating the bytecode as a lookup table, and fetches as lookups. To prove the correctness of these lookups, we use the Shout lookup argument.

One distinguishing feature of the bytecode Shout instance is that we have multiple instances of the read-checking and $raf$ -evaluation sumchecks. Intuitively, the bytecode serves as the "ground truth" of what's being executed, so one would expect many virtual polynomial claims to eventually lead back to the bytecode. And this holds in practice –– in the Jolt sumcheck DAG diagram, we see that there are three colors of edges (corresponding to stages 1, 2, and 3,) pointing to the bytecode read-checking node, and two colors of edges (corresponding to stages 1 and 3) pointing to the bytecode $raf$ -evaluation node.

Each stage has its a unique opening point, so having in-edges of different colors implies that multiple instances of that sumcheck must be run in parallel to prove the different claims.

Read-checking

Another distinguishing feature of the bytecode Shout instance is that we treat each entry of lookup table as containing a tuple of values, rather than a single value. Intuitively, this is because each instruction in the bytecode encodes multiple pieces of information: opcode, operands, etc.

The figure below loosely depicts the relationship between bytecode and $Val$ polynomial.

bytecode

We start from some ELF file, compiled from the guest program. For each instruction in the ELF (raw bytes), we decode/preprocess the instruction into a structured format containing the individual witness values used in Jolt:

The instruction operands rs1, rs2, rd, imm
Circuit and lookup table flags
The instruction address

Then, we compute a Reed-Solomon fingerprint of some subset of the values in the tuple, depending on what $rv$ claims are being proven. These fingerprints serve as the coefficients of the $Val$ polynomial for that read-checking instance.

The figure above depicts the $Val$ polynomial we would use to prove the read-checking instance for the Stage 2 claims, labeled as "ra/wa claims" in the DAG diagram. As explained below, the $ra$ and $wa$ polynomials correspond to the registers (rs1, rs2, rd) in the bytecode. Each coefficient in $Val$ is $rd + γ \cdot rs1 + γ^{2} \cdot rs2$ , for the registers of one particular instruction.

We can write the read-checking sumcheck expression for the Stage 2 claims as the following:

$rv_{rd} (r) + γ \cdot rv_{rs1} (r) + γ^{2} \cdot rv_{rs2} (r) = k, j \sum eq (r, j) \cdot ra (k, j) \cdot (Val_{rd} (k) + γ \cdot Val_{rs1} (k) + γ^{2} \cdot Val_{rs2} (k))$

where we treat

$Val (k) = Val_{rd} (k) + γ \cdot Val_{rs1} (k) + γ^{2} \cdot Val_{rs2} (k)$

as a single multilinear polynomial.

Observe that this sumcheck satisfies our needs for the Stage 2 claims: $rv_{rd} (r)$ , $rv_{rs1} (r)$ , $rv_{rs2} (r)$ are the claims output by the registers read/write-checking sumcheck, and we can simply compute Reed-Solomon fingerprint of these claims to obtain the left-hand side (i.e. input claim) of sumcheck expression above.

Registers

Technically, the explanation above glosses over some details for the sake of exposition. To be precise, the claims output by registers read/write-checking are actually:

$wa_{rd} (r_{address}, r_{cycle}) ra_{rs1} (r_{address}, r_{cycle}) ra_{rs2} (r_{address}, r_{cycle})$

where $r_{address} \in F^{l o g (# registers)}$ .

So we define:

$Val (k) = rd (k, r_{address}) + γ \cdot rs1 (k, r_{address}) + γ^{2} \cdot rs2 (k, r_{address})$

where $\forall k \in {0, 1}^{l o g (bytecode size)}, k^{'} \in {0, 1}^{l o g (# registers)}$ :

$rd (k, k^{'}) = 1$ if the $k$ -th instruction in the bytecode has $r d = k^{'}$
$rd (k, k^{'}) = 0$ otherwise

and similarly for $rs1$ and $rs2$ .

Equivalently,

$Val (k) = eq (rd [k], r_{address}) + γ \cdot eq (rs1 [k], r_{address}) + γ^{2} \cdot eq (rs2 [k], r_{address})$

where $rd [k]$ (respectively, $rs1 [k]$ and $rs2 [k]$ ) is the bitvector denoting the rd (respectively, rs1 and rs2) for the $k$ -th instruction in the bytecode.

Instruction address

Each instruction in the bytecode has two associated "addresses":

its index in the expanded bytecode. "Expanded" bytecode refers to the preprocessed bytecode, after instructions are expanded to their virtual sequences
its memory address as given by ELF. All the instructions in a virtual sequence are assigned the address of the "real" instruction they were expanded from.

The former is used for the $raf$ -evaluation sumcheck, described below. The latter is used to enforce program counter updates in the [R1CS constraints] (./r1cs_constraints.md), and is treated as a part of the tuple of values in the preprocessed bytecode.

The "outer" and shift sumchecks in Spartan output claims about the virtual UnexpandedPC polynomial, which corresponds to the latter. These claims are proven using bytecode read-checking.

Flags

There are two types of Boolean flags used in Jolt:

Circuit flags, used in R1CS constraints
Lookup table flags, used in the instruction execution Shout

The associated flags for a given instruction in the bytecode can be computed a priori (i.e. in preprocessing), so any claims about these flags arising output by Spartan or instruction execution Shout are also proven using bytecode read-checking.

raf-evaluation

There are two $raf$ -evaluation instances for bytecode, corresponding to $raf$ claims arising from stages 1 and 3.

The $raf$ polynomial, in the context of bytecode, is the program counter (PC). The two sumchecks that output claims about the PC are the "outer" and "shift" sumchecks used to prove Jolt's R1CS constraints.

One-hot checks

Jolt enforces that the $ra_{i}$ polynomials used for bytecode Shout are one-hot, using a Booleanity and Hamming weight sumcheck as described in the paper. These implementations follow the Twist and Shout paper closely, with no notable deviations.

JoltBook