SQL is coming to The Graph

Anyone who has been developing on The Graph knows that we use GraphQL for queries. The Guild maintains an excellent GraphQL library and there are many benefits. During the last core dev call StreamingFast announced that SQL is coming the The Graph via an upgrade to their data processing suite that already includes superstars like the FireHose and Substreams. During the call they gave a demo of what it would look like to run these new features on your own setup which is exciting as it continues our work towards a truly decentralized future. StreamingFast encouraged devs to contact them on their discord to get help setting up the beta and also to share any specific needs so that they can add features that the community needs.

The call covered other topics as well but in this blog I want to focus on SQL. StreamingFast sepparated their presentation into 3 sepparate parts:

11:30 - Bringing SQL Queries to The Graph | Semiotic Labs

22:20 - How DBT (Data Build Tool) Takes SQL to the Next Level

30:30 - The Backend: Infrastructure for SQL and DBT Integration | feat. StreamingFast

*from the video discription. Video linked below.

I will introduce each part in this blog and link to some additional resources. My intention is not to teach about this topic but rather to give exposure to the new feature, cover what is in the video without you needing to watch it all, as well as to connect you to information that will be helpful.

Core Dev 26 Video Show Notes

Written by Pedro

Perfect if what you want is a summary of the Call

Why SQL?

StreamingFast and others would like to expand the capability of The Graph to cover not only the needs of developers making dapps but also the data science community. GraphQL is great for dapp developers because they don't need to maintain an API backend and because dapps often use fixed queries which can be well defined in advance. SQL offers speed and flexibility as well as brand new applications for the decentralized data marketplace we are building. This is also good for the economy of the market, since more applications means more possible customers. 

This also comes off the back of recent advancements in the tools for SQL. The team at StreamingFast compared the development of tools for GraphQL and other popular web languages to the development of tools for SQL by comparing the teams that typically use these languages. Web focused languages like GraphQL are designed by developers for developers who love to make use and maintain IDE's and all sorts of flashy easy to use tools. Conversely SQL is mostly use by "number crunchers" finance and marketing professionals or data scientists. Because of this difference SQL has been lagging in attractive and useful tools. Data Build Tool (DBT) has emerged as a popular solution to fill this gap in recent years. The team has chosen to also go with clickhouse as a choice of SQL integration for their products but know that it is possible and desirable to include other tools in the future.


Timeline

The timeline as shown in the dev call is as follows

SQL Data Service Timeline

-from The Graph Forum

DBT and Substreams

In many ways this SQL announcement from StreamingFast is an upgrade announcement for Substreams. Integration of DBT  into substreams is ongoing but a beta is available along with a tutorial linked below.  DBT's philosophy is comparable to the philosophy behind substreams. From the Forum Post and Video: "This modularization facilitates data lineage visualization, showing the interconnectivity between different data models." 

New StreamingFast Product and Backend Discussion

There was a brief mention of competetors to the graph in this section of the discussion. It was asserted that there is currently no "great real-time solution" to many of these problems from The Graph or from any competitor such as Dune. There is a need for solutions to the data processing needs of the community:


The proposed solution is something they are calling "deployable units" which they hope will simplify the deployment of both the front end and back end needs while minimizing scaling and hosting complexities. The unit is a stack: Firehose and Substreams perfom the data extraction and initial processing, then SQL sync is used to transfer data into databases like Postgress or ClickHouse. DBT is involved in data transformation as an alternative to Substreams for data manipulation. 

"This setup allows direct querying of the data via Graph QL, REST API, or direct SQL access, facilitating easier data consumption for analytics and front-end applications."  - Pedro, Summarizing StreamingFast

Live Demo of Deployable Units

StreamingFast gave a live demo, showing a deployable unit through Substreams with SQL directly integrated. Key features include:

Two different examples were given showing off the versatility of the units. One example used Postgress to process Bored Ape data while the other used ClickHouse to show CryptoPunks. In both cases they showed off the SQL queries in these deployed systems and how they allow for "direct, efficient data access and manipulation".

Q&A

Viewers asked several questions to the developers which were answered live during the call. Pablo also shared the answers on the forum post

Links and Resources

StreamingFast discord  for discussion and help from the team

ping me on twitter or discord if you have any feedback @athsrueas