Streamlining The Graph: An Introduction to Substreams in Web3

Athsrueas.eth | Thomas Freestone

Web3 is an exciting and rapidly-evolving space, and one of the most important pieces of the puzzle is The Graph. This platform allows developers to efficiently access and query data from decentralized applications (dApps) across various blockchains. One of the latest developments within The Graph ecosystem is substreams, a new feature that promises to streamline the querying process and reduce the burden on nodes. In this post, we'll dive into the history of The Graph and StreamingFast, explore the challenges substreams are addressing, and look at how they work and their benefits. Plus, we'll discuss the latest updates in the substream development and provide some useful resources for anyone interested in learning more about The Graph and web3.

StreamingFast

The graph is an indexing solution for web3 and is due for a huge upgrade. Indexing tasks could take weeks to sync but now with substreams from StreamingFast, a powerful parallelizable engine to process blockchain data, these same tasks can be completed in a matter of hours! StreamingFast is a team of developers focused on improving indexing in The Graph ecosystem. Their innovative technology enables quick syncing of data by parallelizing processing tasks across multiple threads and processors. By optimizing this process, StreamingFast enables faster indexing, ultimately leading to quicker and more accurate query results. The team has made a significant impact in the space and is constantly working to make indexing more efficient. StreamingFast formerly dfuse joined The Graph as a core developer in June 2021. Their first major contribution was the Firehose which greatly improved the extraction component of The Graph indexing engine. Substreams will speed up indexing dramatically by improving the Transform layer.


Extract, Transform, Load, Query

This is the Graphs Model

Firehose and Substreams

With the addition of Firehose and Substreams The extract and Transform layers respectively have been massively improved.

If you want data from an Ethereum network to use it you need to index. There are meaningful challenges here.  Want security and verifiability? Read block by block linearly using API calls on ethereum nodes. StreamingFast solved these problems with Firehose powered substreams. Using a streaming first approach and the flat files structure from Firehose we have rich protobuff models, stream cursors, and the ability to use parallelization to improve speeds without sacrificing verifiability.  We now have a more efficient and scalable approach to indexing data. Furthermore, substreams provide a way to ensure data verifiability and security, in fact you could rebuild the entire node from the flat files. Everything is cachebale and hashable. Instead of using the traditional handlers in assembly developers can use Rust to write substreams modules. These can run in parallel boosting performance.


How do they work?

Substreams are able to take the data provided and a query and break it down into small parallelizable chunks. I encourage you to watch this video from StreamingFast to get all the best details.

Key Takeaway:  Get up to 100x speed improvements with Firehose and Substreams on Ethereum clients. It is fast, responsive, and modular, making The Graph the fastest and most efficient way to get data from blockchains. Initial implementation is built and has been tested for over a year. On March 16 StreamingFast announced the substreams reached general availability! On April 27 at core dev call 20 StreamingFast showed off the new UI tool which is available to use now.

More resources can be found below