BEP-323: Bundle Format for Greenfield

BEP-323: Bundle Format for Greenfield

1. Summary

This BEP proposes a solution for better on-chain storage efficiency and user experience in Greenfield by introducing
a method to bundle small files together before uploading. The new system will reduce storage space and costs caused
by small files while increasing the capacity of the entire network.

2. Motivation

Storing small files in Greenfield is inefficient. The metadata stored on the blockchain can be larger than the files,
leading to higher costs for users. Greenfield Blockchain has a capacity limit to process files simultaneously.

To tackle this problem, we aim to design a bundle format specifically for Greenfield. This is because a specialized
format will allow developers to build something that fits exactly with what Greenfield and its users need, without
any unnecessary features. By introducing a bundle format, Greenfield can provide a better service to the users and
make sure that Greenfield stays efficient and easy to use as it grows.

3. Specification

3.1 Bundle Format

The bundle format specifies the structure and organization of the bundle that users create when packing files.
This format is designed to pack flat files; hierarchical directory structures, or folders, are not supported.

When dealing with a folder, users can simplify its structure by turning it into a series of individual files.
As part of this process, it renames each file to include the folder path. For example, a file originally named
file.txt inside the nested folders dirA and dirB would be renamed to dirA/dirB/file.txt.
This approach allows us to maintain the organization of the folder while conforming to the requirement for flat files in the bundle.

There are still constraints for the bundle format. The file names of the files in the bundle should be unique so that
they can be indexed by the file name.

The bundle format is structured into several key components as follows:

  • Version: This indicates the version number of the bundle protocol being used.
  • Meta Size: This specifies the size of the bundle’s metadata, allowing the construction of the bundle structure without the need to read the entire bundle.
  • Metadata: This section contains information about the files within the bundle. It facilitates the ability to access files randomly, which means you can jump directly to any file within the bundle without going through all the files.
  • Data: This portion represents the actual content and is comprised of all the files in bytes.

The Meta structure is designed to include essential attributes for each file, outlined as follows:

  • Object Name: This is the name of the file within the bundle.
  • Offset: This attribute marks the starting point of the file’s data within the bundle.
  • Size: This details the total length, in bytes, of the file.
  • Hash Algo: This specifies the algorithm used for the file’s hash calculation.
  • Hash: This is the cryptographic hash result of the file’s content. It serves as a tool for verifying the file’s integrity.
  • Content Type: This denotes the MIME type of the file, describing the file’s nature and format.
  • Attributes: This is a map that holds various additional properties of the file like owner .

3.1.1 Encoding

The bundle’s encoding format is structured as follows:

  • Version: Serialized as an unsigned 64-bit integer, occupying 8 bytes.
  • Meta Size: Also an unsigned 64-bit integer, represented using 8 bytes, indicating the size of the metadata section.
  • Metadata: Encoded in bytes, this section utilizes Protocol Buffers (protobuf) for serialization.
  • Data: This consists of the actual file contents, represented as a sequence of bytes.

The Meta structure will be serialized with protobuf:

enum HashAlgo {
  Sha256 = 0;
}

message Meta {
  repeated FileMeta meta = 1;
}

message FileMeta {
  string object_name = 1;
  uint64 offset = 2;
  uint64 size = 3;
  HashAlgo status = 4;
  bytes hash = 5;
  string content_type = 6;
  map<string, string> attributes = 7;
}

4. License

The content is licensed under CC0.

3 Likes

It seems like that it is a serialization protocol that combines a lot of small files into one big GreenField object. I’m thinking about whether we should consider more for this protocol?

  1. Maybe we could define a bundle service that helps users upload small files and takes care of the gas fee?
  2. Define a indexer service that help users retrieve and download small files directly?
  3. If there’s an indexer, how the indexer handle the GreenField Object ACL?
    etc…

For the serialization protocol itself, what if we add a “meta hash” right after the “Meta Size”? Also, how about putting a “MAGIC String” at the start of the bundle file?

Thank you for your feedback. This BEP defines a standard for aggregating small objects on Greenfield. Each community member can implement the bundle/indexer service to upload small files, even before bundling them on Greenfield.

Implementing another ACL system like Greenfield’s is too complex for the indexer service. Therefore, I suggest that the bundle/indexer service prioritize serving object aggregation for the public bucket. However, the ACL information is stored on the Greenfield chain. Therefore, it is possible for the indexer service to implement ACL while indexing private objects.

  1. How does the SP handle a Bundle, As a normal object or should parse the Bundle?
  2. Can we update a small object defined in a Bundle(put the same object_name)?
1 Like
  1. The Greenfield chain and SPs treat the bundled object as a regular object. They do not parse the bundle. The responsibility of bundle aggregation and parsing lies with the bundle services that are built on top of Greenfield.

  2. The ability to support updates in the bundle depends on the bundle service’s functionality. However, once the bundled object is submitted to Greenfield, it becomes difficult to update a small object within the bundle. This is because you would need to delete the entire bundle and submit a new one with the updated small object.

2 Likes

It would be lovely to see multi object upload feature

Good to see this BEP,

Some questions:

  1. Is there a specification to get attributes of some objects
  2. Will there be attributes option when putObject(when not using bundle format)

There should be a indexing layer where we can query not only by object name but also attributes

1 Like

Thanks Keefe, Some more question:

  1. Will there be support for the signing of “small files”? In this case we can allow delegation of payment for a “small file” to a 3rd party, while maintaining the identity and signature of the person who created the “small file”.

  2. Are we planning to support Nested Bundles?

Reference:

1 Like

The current solution seems not friendly for updating small objects.

Actually, in our scenario, there will be lots updates of the small objects within the code repo, do you have any suggestions?

1 Like

The bundle feature allows you to upload multiple objects to the bundle service, which then aggregates and stores them on Greenfield.

Great suggestion! The community is currently developing support tags for bundles, objects, and groups on the blockchain. Once this feature is implemented, creators will be able to add tags to their buckets, objects, and groups. Additionally, the community can assist in building the indexing layer to cater to users who need to index buckets, objects, and groups based on tags.

In terms of the objects within the bundle, its attributes will be similar to the tags. With the introduction of the tags features on Greenfield, the bundled object will also have its own set of attributes.

Thank you for your question, here is my understanding:

  1. The bundle object is created under a bucket, so the bucket owner would own all the objects and need to pay for this bucket, but the owner can delegate a bundle service to aggregate these objects for him, and the bundle service should verify the signature when putting objects into the bundle. We will share some examples of the bundle service design later, you can see the details there.

  2. For the first version, there is no plan for nested bundles, but it may be included in the future version.

1 Like

Great question! In fact, this feature is not released to address the object update issue. However, Greenfield will support object update in the future as it is part of our roadmap.

1 Like

Thanks for your reply, would you explain what’s the difference between Tags and Attributes

1 Like

Actually, they are the same thing. We will change “attributes” to “tags”.

Updated bundle layout and renamed “attributes” to “tags”.

BEP-323: Bundle Format for Greenfield

1. Summary

This BEP proposes a solution for better on-chain storage efficiency and user experience in Greenfield by introducing
a method to bundle small files together before uploading. The new system will reduce storage space and costs caused
by small files while increasing the capacity of the entire network.

2. Motivation

Storing small files in Greenfield is inefficient. The metadata stored on the blockchain can be larger than the files,
leading to higher costs for users. Greenfield Blockchain has a capacity limit to process files simultaneously.

To tackle this problem, we aim to design a bundle format specifically for Greenfield. This is because a specialized
format will allow developers to build something that fits exactly with what Greenfield and its users need, without
any unnecessary features. By introducing a bundle format, Greenfield can provide a better service to the users and
make sure that Greenfield stays efficient and easy to use as it grows.

3. Specification

3.1 Bundle Format

The bundle format specifies the structure and organization of the bundle that users create when packing files.
This format is designed to pack flat files; hierarchical directory structures, or folders, are not supported.

When dealing with a folder, users can simplify its structure by turning it into a series of individual files.
As part of this process, it renames each file to include the folder path. For example, a file originally named
file.txt inside the nested folders dirA and dirB would be renamed to dirA/dirB/file.txt.
This approach allows us to maintain the organization of the folder while conforming to the requirement for flat files in the bundle.

There are still constraints for the bundle format. The file names of the files in the bundle should be unique so that
they can be indexed by the file name.

The bundle format is structured into several key components as follows:

  • Version: This indicates the version number of the bundle protocol being used.
  • Meta Size: This specifies the size of the bundle’s metadata, allowing the construction of the bundle structure without the need to read the entire bundle.
  • Metadata: This section contains information about the files within the bundle. It facilitates the ability to access files randomly, which means you can jump directly to any file within the bundle without going through all the files.
  • Data: This portion represents the actual content and is comprised of all the files in bytes.

The Meta structure is designed to include essential attributes for each file, outlined as follows:

  • Object Name: This is the name of the file within the bundle.
  • Offset: This attribute marks the starting point of the file’s data within the bundle.
  • Size: This details the total length, in bytes, of the file.
  • Hash Algo: This specifies the algorithm used for the file’s hash calculation.
  • Hash: This is the cryptographic hash result of the file’s content. It serves as a tool for verifying the file’s integrity.
  • Content Type: This denotes the MIME type of the file, describing the file’s nature and format.
  • Tags: This is a map that holds various additional properties of the file like owner .

3.1.1 Encoding

bundle_encoding

The bundle’s encoding format is structured as follows:

  • Version: Serialized as an unsigned 64-bit integer, occupying 8 bytes.
  • Meta Size: Also an unsigned 64-bit integer, represented using 8 bytes, indicating the size of the metadata section.
  • Metadata: Encoded in bytes, this section utilizes Protocol Buffers (protobuf) for serialization.
  • Data: This consists of the actual file contents, represented as a sequence of bytes.

The Meta structure will be serialized with protobuf:

enum HashAlgo {
  Sha256 = 0;
}

message Meta {
  repeated FileMeta meta = 1;
}

message FileMeta {
  string object_name = 1;
  uint64 offset = 2;
  uint64 size = 3;
  HashAlgo status = 4;
  bytes hash = 5;
  string content_type = 6;
  map<string, string> attributes = 7;
}

4. License

The content is licensed under CC0.

2 Likes