When using gnark as a base zksnarks lib for scroll up system, people will face memory issues.
We detect there are two parts using huge memory/storage size.
- R1CS size. 2. Pk size.
Here we mainly focus on the R1CS size.
R1CS is the description of an arithmetic circuit using by groth16 (a popular snark proving system). R1CS is composed of many R1C, every single R1C will contain L, R, O LinearExpression, LinearExpression means variable or constants set. The R1C describe that L * R = O, and in solving R1C will be extracted and calculate the right value. F.g. if L,R,O is all solved, then the solving system will constraint them to satisfy the multiplication equation. If any two of them is solved, then the solving system will calculate the unsolved LinearExpressions. if less then two of them is solved, the solving system will not work.
There’re many efforts to reduce memory usage by Consensys team, in the gnark latest release v0.8.0, like in this gnark PR #418 the R1CS compilation will replace the long LinearExpression with a new one, and then using the new one for other R1C in the afterwards compilation. And in the gnark PR #412 the updates for the R1CS core system also mainly focus on R1CS memory usage, re structure the Linear Experssion and Term(item in Linear Expression) fields. These effort has improved, however with the explosion of zk application more complex systems(like zkevm or recursive proving) makes the circuit even more larger. We will need to specifically deal with the large circuit for reducing memory.
Noted that the large circuit always contains loops in R1CS generation, here we call them duplicate subcircuit. For example, the merkle tree in circuit will reuse hash functions for many times, and the hash functions like MiMC will reuse permutation for many times. Thus we come up with an idea to highly compress the duplicate subcircuit constraints to one simple circuit. Here’s how we make it.
- record repeatable constraints structure (during this process, we will record the static constraints StaticConstraints, and it will be stored in map by a key defined by user, for looking up the constraint need to be recovered)
- record repeatable constraints’s inputs, and pick only the input constrains to store. (GeneralLazyInputs)
- lazify and remove all the repeatable r1cs structure, with only lazy inputs left. rebuild levels.
- recover the r1c when solving. the input constraints can be easily extracted from store, for others we have a shift recorded the distance from the static constraints, and add the offset shift to the variables to recover.
let’s imagine a giant circuit with inputs i1,i2
, assume that v_
is the internal variables generated, c_
is the constant variables used, the circuit which will generate constraints below,
_circuit 1 generated constraints as below_
1. (i1 + c1) * (i1 + c1) = v1
2. (i2 + c2) * (i2 + c2) = v2
3. (v1 + c3) * (v1 + c3) = v3
4. v3 * v3 = v4
n. ...constraints
if later we got the same circuit, with inputs i3,i4
, assume that before this time this circuit process, thee existed variables index has been 1000, then the constraints will be
_circuit 2 generated constraints as below_
1. (i3 + c1) * (i3 + c1) = v1001
2. (i4 + c2) * (i4 + c2) = v1002
3. (v1000 + c3) * (v1000 + c3) = v1003
4. v1003 * v1003 = v1004
n. ...constraints
as you can see, there will be many internal variables generated in duplicate circuit, the variables’ born and order are the same, except that they got a shift like the internal variables in the mapped constraint always differ 1000.
so we save the structure static constraints(meanwhile circuit 1 generated constraints) for the first time
and we recover the other by 1. retrieve the input constraints, which contains the input’s term. this is exactly the same with normal constraints 2. retrieve contraints and add shift to the variables. in above case, we have shift 1000, so in constraint 4
we can recover (v3 * v3 = v4 in static constraints)
to (v1003 * v1003 = v1004)
Results
mimc-inputs-4 (save 70% r1cs memory usage)
mimc | 1 | 10 | 100 | 1000 | 10000 |
---|---|---|---|---|---|
dump_r1cs_w/o_origin (kb) | 127 | 1,225 | 12,939 | 135,828 | 1,364,706 |
dump_r1cs_w/o_lazify (kb) | 244 | 1,342 | 13,056 | 135,946 | 1,364,833 |
dump_r1cs_w/_lazify (kb) | 158 | 479 | 4,055 | 43,072 | 435,292 |
as you can see, for such circuit we save nearly 70% of memory, and both lazified and non-lazified grow linearly with the mimc hashes num, that’s because we use input constraints to recover true constraints. If the inputs constraints contained is very little, then the memory of r1cs used actually will be very little.
and test for poseidon 12 inputs, which surely the input constraints contained in its static constrains are less.
poseidon-inputs-12 (save 95% r1cs memory usage)
poseidon | 1 | 10 | 100 | 1000 | 10000 |
---|---|---|---|---|---|
dump_r1cs_w/o_origin (kb) | 313 | 2,520 | 24,602 | 273,890 | N/A |
dump_r1cs_w/o_lazify (kb) | 546 | 2,761 | 24,915 | 275,049 | N/A |
dump_r1cs_w/_lazify (kb) | 325 | 450 | 1,745 | 16,962 | N/A |
this time it saves nearly 95% of memory used in r1cs.