image2-Feb-03-2021-06-23-07-96-PM

GraphQL is a very cool technology that is becoming a new standard for communication between the front end and the back end. We started using GraphQL in OpenShift Console 4.6. In this blog post, I will show some real-world examples and describe how we can leverage this technology to speed up console’s loading times.

Most of you have probably heard of GraphQL and possibly have some hands-on experience with it. But for those who are unfamiliar with it or need a refresher, GraphQL is a query language and server-side runtime for APIs. Similar to REST, it allows you to communicate with your back end, but in a much more efficient way. GraphQL has a schema that describes the data format and types. Clients query the API and ask only for the data they need. They can also request multiple resources with one query. When using a REST HTTP API, this typically requires multiple calls, as every resource has its own endpoint. See the official GraphQL web page for a more thorough introduction.

Improving Loading Times of UI

Typical REST APIs exhibit a few issues that we can solve with GraphQL. One of the most prominent is over fetching, which occurs when a client fetches too much data from the server. 

Let's take a look at the Overview page in OpenShift Console:

image1-Feb-03-2021-06-23-35-01-PM

This page shows you all kinds of information gathered from various sources, and it also does plenty of over fetching. Zoom in on the Cluster Inventory card, which shows resources in the cluster and their status. Usually, it contains a number of nodes, pods, storage classes, and PVCs. To show the count and status of pods, we have to fetch all pods from the k8s REST endpoint, parse the response, and examine a few fields within a pod to determine the status. We don't care about all the other data that we received in the response, such as the pod's name, namespace, labels, annotations, but we had to fetch it all because we didn’t have another option. We over fetched.

If we implement a GraphQL endpoint, we can create a query that avoids the over fetching problem, since in the GraphQL query we specify only the fields we want. A minimal running pod on my cluster had approximately 200 lines in YAML format. We fetch them all via REST API. Using GraphQL, we can fetch only the fields necessary to determine the pod’s status (which in this case is about five lines!). It differs from case to case, but you can almost always make the response significantly smaller.

One WebSocket to Rule Them All

Now that we know how to deal with over fetching problem and its relationship with response size, we can explore further. When OpenShift Console is fetching a lot of data, it leverages chunked responses introduced in k8s 1.9. Fetching is split into separate HTTP requests, which improves the responsiveness of the UI with results shown incrementally. The disadvantage of this approach is that we have to make many HTTP requests. Each request carries its own overhead, which adds up since requests are run in series. We need the result of the previous request to create an incremental request. 

GraphQL also communicates via HTTP by default. But there are some cool projects that allow you to switch all communication to one Websocket, and that is exactly what we need.

We can open a Websocket between a GraphQL client and the server as soon as OpenShift Console starts and use it for all communication. We can still incrementally load data via chunks, but every incremental request is just a message sent via Websocket, so there’s no additional HTTP overhead.

Let’s take a look at some comparison data to see what we can gain by switching our communication from REST API to GraphQL over HTTP and GraphQL over Websocket:

image2-Feb-03-2021-06-23-34-56-PM

Now that is quite a difference, especially for a larger amount of data. Let me explain how I measured this. Do you remember the Inventory card on the Overview page from when I was describing over fetching earlier? I modified that componenta bit to fetch only pods and no other resources and isolated it from all other components. I then added some performance measuring code and measured how long it took to fetch everything it needed via different methods.

The numbers seen in the chart are very promising. In case you’re wondering how big the size of the responses are, see the following table:

Number of pods

REST response size (compressed/raw)

GraphQL response size (compressed/raw)

500

201 KB (3.3 MB)

20.2 KB (290 KB)

1000

304 KB (5.7 MB)

38.7 KB (555 KB)

5000

773 KB (19.3 MB)

184 KB (2.0 MB)

 

That clearly demonstrates how much we over fetch with REST API in this particular case. Fetching only the required fields for 5000 pods via GraphQL results in a smaller data set than fetching 500 Pods via REST API. Yes, the response size varies depending on what/how many fields you need, but it’s still clear that GraphQL can be a great help. We can also see that we gain a lot by running our communication through WebSocket instead of separate HTTP requests.

In the chart, I also have GraphQL via Websocket with enabled compression. HTTP enables compression by default, but WebSocket does not. Compression makes the response significantly smaller. It may not really bother you if you have a fast, reliable connection, but that’s not always the case, right? So I wanted to test the compression of messages via WebSocket too. We don't really see a difference for fast connection, so I limited my network to 3Mbit/s and the difference becomes obvious:

image4-4

We can solve another problem with GraphQL via WebSocket. Web browsers limit the number of active HTTP requests (usually up to six). All other requests have to wait. That can be an issue for pages that make a lot of simultaneous requests, like my favorite Overview page. We can also see the limitation during OpenShift Console startup. When OpenShift Console starts, it fetches all models available on the cluster. On my cluster, there are about 140 models, which means 140 separate HTTP requests, out of which only a few (depending on browser, usually six) can be active at a time. The number of models will rise as soon as you start installing operators that bring new CRDs. I had 140 models on my clean cluster. The requests are tiny, just simple GET requests, but we have so many of them, and they cannot run simultaneously. If we do this via WebSocket, we are no longer talking about 140 separate HTTP requests, but 140 messages sent via WebSocket. None of them are being blocked by the web browser, and as expected, this results in another boost in performance:

image3-Feb-03-2021-06-23-34-60-PM

Hold Your Horses!

Not everything that I showed in this blog is implemented in Console yet. I have prototyped everything, but it is not yet ready for production. We added a GraphQL back end for Console in OpenShift 4.6. We use it to fetch k8s models on Console’s startup, and we also run some other smaller requests through it. We don't yet use it for fetching resources like pods, but the performance measurements show a very nice promise.

Feel free to check out our GraphQL back end and front end code.