Using lazy loading for duplicate subcircuit constraints to cut 95% memory usage for gnark R1CS

ZhAnGeek · April 6, 2023, 8:58am

When using gnark as a base zksnarks lib for scroll up system, people will face memory issues.

We detect there are two parts using huge memory/storage size.

R1CS size. 2. Pk size.

Here we mainly focus on the R1CS size.

R1CS is the description of an arithmetic circuit using by groth16 (a popular snark proving system). R1CS is composed of many R1C, every single R1C will contain L, R, O LinearExpression, LinearExpression means variable or constants set. The R1C describe that L * R = O, and in solving R1C will be extracted and calculate the right value. F.g. if L,R,O is all solved, then the solving system will constraint them to satisfy the multiplication equation. If any two of them is solved, then the solving system will calculate the unsolved LinearExpressions. if less then two of them is solved, the solving system will not work.

There’re many efforts to reduce memory usage by Consensys team, in the gnark latest release v0.8.0, like in this gnark PR #418 the R1CS compilation will replace the long LinearExpression with a new one, and then using the new one for other R1C in the afterwards compilation. And in the gnark PR #412 the updates for the R1CS core system also mainly focus on R1CS memory usage, re structure the Linear Experssion and Term(item in Linear Expression) fields. These effort has improved, however with the explosion of zk application more complex systems(like zkevm or recursive proving) makes the circuit even more larger. We will need to specifically deal with the large circuit for reducing memory.

Noted that the large circuit always contains loops in R1CS generation, here we call them duplicate subcircuit. For example, the merkle tree in circuit will reuse hash functions for many times, and the hash functions like MiMC will reuse permutation for many times. Thus we come up with an idea to highly compress the duplicate subcircuit constraints to one simple circuit. Here’s how we make it.

record repeatable constraints structure (during this process, we will record the static constraints StaticConstraints, and it will be stored in map by a key defined by user, for looking up the constraint need to be recovered)
record repeatable constraints’s inputs, and pick only the input constrains to store. (GeneralLazyInputs)
lazify and remove all the repeatable r1cs structure, with only lazy inputs left. rebuild levels.
recover the r1c when solving. the input constraints can be easily extracted from store, for others we have a shift recorded the distance from the static constraints, and add the offset shift to the variables to recover.

let’s imagine a giant circuit with inputs i1,i2, assume that v_ is the internal variables generated, c_ is the constant variables used, the circuit which will generate constraints below,

_circuit 1 generated constraints as below_
1. (i1 + c1) * (i1 + c1) = v1
2. (i2 + c2) * (i2 + c2) = v2
3. (v1 + c3) * (v1 + c3) = v3
4. v3 * v3 = v4

n. ...constraints

if later we got the same circuit, with inputs i3,i4, assume that before this time this circuit process, thee existed variables index has been 1000, then the constraints will be

_circuit 2 generated constraints as below_

1. (i3 + c1) * (i3 + c1) = v1001
2. (i4 + c2) * (i4 + c2) = v1002
3. (v1000 + c3) * (v1000 + c3) = v1003
4. v1003 * v1003 = v1004

n. ...constraints

as you can see, there will be many internal variables generated in duplicate circuit, the variables’ born and order are the same, except that they got a shift like the internal variables in the mapped constraint always differ 1000.

so we save the structure static constraints(meanwhile circuit 1 generated constraints) for the first time

and we recover the other by 1. retrieve the input constraints, which contains the input’s term. this is exactly the same with normal constraints 2. retrieve contraints and add shift to the variables. in above case, we have shift 1000, so in constraint 4 we can recover (v3 * v3 = v4 in static constraints) to (v1003 * v1003 = v1004)

Results

mimc-inputs-4 (save 70% r1cs memory usage)

mimc	1	10	100	1000	10000
dump_r1cs_w/o_origin (kb)	127	1,225	12,939	135,828	1,364,706
dump_r1cs_w/o_lazify (kb)	244	1,342	13,056	135,946	1,364,833
dump_r1cs_w/_lazify (kb)	158	479	4,055	43,072	435,292

as you can see, for such circuit we save nearly 70% of memory, and both lazified and non-lazified grow linearly with the mimc hashes num, that’s because we use input constraints to recover true constraints. If the inputs constraints contained is very little, then the memory of r1cs used actually will be very little.

and test for poseidon 12 inputs, which surely the input constraints contained in its static constrains are less.

poseidon-inputs-12 (save 95% r1cs memory usage)

poseidon	1	10	100	1000	10000
dump_r1cs_w/o_origin (kb)	313	2,520	24,602	273,890	N/A
dump_r1cs_w/o_lazify (kb)	546	2,761	24,915	275,049	N/A
dump_r1cs_w/_lazify (kb)	325	450	1,745	16,962	N/A

this time it saves nearly 95% of memory used in r1cs.

vgg · May 1, 2023, 10:34am

Looks very promising. Is this applicable only to R1CS or also can be extended to Plonk?

Topic		Replies	Views
[Call For Discussion] Proposal: Design of Greenfield Executable BNB Greenfield	5	2004	April 30, 2023
BNB Greenfield EC configuration adjustment proposal BEP storage , bep	0	455	September 4, 2023
BNB Greenfield validator & SP application: Materia Logic BNB Greenfield	0	603	August 9, 2023
Delegation request for Greenfield validator BNB Greenfield	0	473	May 3, 2023
Greenfield Testnet Community participation request for a Validator slot	0	332	April 28, 2023

Using lazy loading for duplicate subcircuit constraints to cut 95% memory usage for gnark R1CS

mimc-inputs-4 (save 70% r1cs memory usage)

poseidon-inputs-12 (save 95% r1cs memory usage)

Related Topics