What’s there to think about?  

This post is a "checklist" of items that you should consider before you start to integrate your application into OpenShift Service Mesh (OSSM).  This is the checklist I wish I had had before I first embarked on my journey with “the mesh.”

If you have not yet started to explore Service Mesh technology, I suggest you take a look at Red Hat’s product itself and read "Istio on OpenShift."  

If you have started to explore Istio, you will have come across the ubiquitous bookinfo application that is used for many of the provided examples on the Internet.  Or, you might have explored other examples like this more complex Travel Agency application.  Using these examples and others, you will have already gained a better understanding of the service mesh.  Your next step might be to get your own application working inside the mesh.

First things first

The official OpenShift Service Mesh 2.0 (OSSM) documentation is one obvious place to start and has some very useful content, including:

As soon as you start to integrate your application into the mesh you will need to go deeper and consult the upstream documentation as well.  Ensure you make note of the relevant component versions included in Red Hat OpenShift Service Mesh.

If not already, test your service mesh with the Bookinfo application.  After this works properly you will want to start adding your own application into the mesh.

To add your application into the service mesh, the first thing you need to do is to ensure the Envoy Proxy side-cars are properly injected into your application pods.  This is relatively simple to do with OSSM and is described well in Enabling automatic sidecar injection.

Use what you have learned configuring the ingress gateway for the bookinfo application in the example, then apply that configuration to your application to ensure you can access it from outside your OpenShift cluster.

Protocol Selection

It is important to understand how Istio detects the protocols your application uses, amongst other things.  Read up on “Protocol Selection'' and “app and version labels” on the "Pods and Services" page.

Protocol Selection caused us to trip up initially.  After injecting the sidecar proxies and loading our application with some test traffic, we were presented with a Kiali Graph as shown below.  The graph was not displaying at all how we were expecting it!  This was because Kiali & Istio could not properly detect the protocols our application was using and showed the connections between the services as "TCP" and not "HTTP".

Kiali Graph, showing only TCP connections

Knowing the protocol in this way is a requirement of Istio.  If the protocol cannot automatically be determined, traffic will be treated as plain TCP.  If necessary, protocols can be specified manually in the Kubernetes Service definition of your application.

It took a fair bit of searching to figure out how to fix this.  In the end, we found the Istio documentation about "Protocol Selection".

To manually determine the protocol used in your service, your Kubernetes service objects need to be appropriately configured.  In our case, the value for "spec -> ports -> name" was missing in our pre-existing Kubernetes Service definitions.  After adding "name: http" into the Kubernetes service definitions for services A, B & C, the graph improved greatly and “HTTP” connections were displayed properly.  

Kiali

Kiali is a great tool to get started with OpenShift Service Mesh.  I would go so far as to say this is the tool you should focus on, especially in the initial phases of your exploration.

Kiali visualizes the metrics generated by Istio, so it can help you truly know what is happening in your mesh.  I recommend reading the Kiali documentation sooner than later!  One of the first mistakes we made when starting out was to skip it!  

Kiali is not just a good tool to visualise your application but it is also useful to create, validate & manage your mesh configurations.  Learning the Istio configurations can be challenging at first, so let Kiali guide you.  

There are a lot of hidden gems in Kiali just waiting to be found!  I recommend reading the Kiali Feature list and the FAQ.  Below is a brief selection of features:

Another important thing to know is how to label your application services.  Istio, and therefore also Kiali, requires your application to be labelled in a particular way and this is not very obvious at first, especially after coming from the bookinfo application which is properly labelled and works perfectly out-of-the-box.

Setting your Deployments with “app” and “version” labels is important because it adds contextual information to the metrics and telemetry Istio collects and which Kiali and Jaeger use.

Istio can shed light on connections between your internal and external services that you didn't even know existed!  This is useful but it can also be annoying when viewing the graph.  You can remove these “Unknown nodes” from the graph.

Another very useful feature of Kiali is that it can validate your mesh configuration.  This is especially useful if you created your configuration yourself.

At first the Kiali Graphs can be a little unnerving.  It is a very good idea to understand the different types of Kiali graphs starting with “Generating a service graph” and “Observability Features”.

Jaeger sampling

When testing your application in the mesh for the first time, you will want to ensure the tracing sample rate is set to something higher than 50%, preferably 100% so that all test requests that pass through your application contribute to the observability data.  This means Jaeger and Kiali will display more information and you won't have to wait for updates.  

When testing your application in the mesh, set the sample rate to 100% - e.g. 10,000 = 100%  

Edit the ServiceMeshControlPlane object  (usually called "basic-install") in your Control Plane project (usually "istio-system") and add/change the following value:

spec:

  tracing:

    sampling: 10000 # 100%

Note that once you start to run your application in production you don't want to have every single request sampled, so you should consider reducing the value back to about 5% or below, to a value you are comfortable with.  

Trace context header propagation

Jaeger provides you with one less headache for your microservices journey, so ensure all your services in your application propagate trace headers properly.

It is very useful to view how requests pass through, possibly many, services in your mesh.  OSSM helps by collecting trace data in the form of "spans" and "traces".  Viewing traces is invaluable in understanding serialisation, parallelism and sources of latency in your application.  Briefly, a span represents a logical unit of work (e.g. a client-server call).  A trace is the path that a request takes as it traverses through the mesh, or in other words, as it hops from one service to another in your application.  Spans and Traces are further explained in the OSSM documentation and is well worth a read.

Important to note about OSSM is that spans (the unit of work) are automatically generated by Istio but, on the other hand, traces are not!  To gain the full visibility of "distributed traces", the developer must change the source code to ensure that any existing trace headers are properly copied from one service to the next.  Note that there is no need to generate the trace headers yourself.  If they are not already present, trace headers are automatically generated and added by the first Envoy proxy (usually at the ingress gateway).

Here is a list of the headers that need to be propagated:

  • x-request-id
  • x-b3-traceid
  • x-b3-spanid
  • x-b3-parentspanid
  • x-b3-sampled
  • x-b3-flags
  • x-ot-span-context

Propagation can be done "manually" or it can be done using Jaeger client libraries which implement the OpenTracing API.

Here is a simple example of manual trace context propagation in Java:

HttpHeaders upstreamHttpHeaders = new HttpHeaders();
if (downstreamHttpHeaders.getHeader(headerName: "x-request-id") != null)
 upstreamHttpHeaders.set("x-request-id", downstreamHttpHeaders.getHeader( headerName: "x-request-id"));

Note: Repeat the same for all of the headers listed above.  

Kiali wizard and YAML editor

Validations

From experience, it is quite tedious setting up yaml manifests manually, esp. if the values in one yaml resource must match up with values in another and also if there must be strict indentation.  Although I found this out after a while of using Kiali, there is a great feature in Kiali to help you create and verify these configurations.

Create the Istio resources using Kiali wizard

Most of the configurations you will need to get started can be created with the Kiali wizard.   The wizard was not the easiest thing to find - because we were so focused on the graph feature we never noticed the wizard - until we read the Kiali FAQ and went hunting for it.  You will find the wizards under the "Services" menu item.    

Use YAML editor:

Kiali provides a YAML editor for viewing and editing Istio configuration resources directly. The YAML editor will also show you if it detects incorrect configurations.  Very useful!

Often, the Kiali graph will uncover some hitherto unknown communication paths that your application uses (and that developers were not aware of!).  This can be unnerving, but it is also refreshing to know that Kiali will help you uncover and detect all the communication paths that exist as you test your application.  Some of the nodes in the graph might be marked as "unknown".  If you are not interested in including these nodes on the Kiali graph, they can be removed by added "node=unknown" into the input box above the graph.

Remove encryption from your code

If you already secure the connections between your services (possibly) and/or use TLS for external connections (very likely), you must turn these off and switch to "http" only.   The Envoy proxy will take care of encryption for you.

If you use TLS from your services to external services, Istio cannot inspect the traffic and Kiali will display the connection only as “TCP”.  

Remove any existing HTTPS security between services and/or to any external services.

  1. Use HTTP not HTTPS

Also, remember to make external services known to the mesh (see “Configure External Services” below).

Simplify your code

Now take the chance to see if you can remove any bloat from your application code!

One of the benefits of the Service Mesh is that many aspects of your service design can be moved from your code into the platform layer so the developer can focus on the all-important business logic.  The developer now has the chance to simplify the code.  For example, consider the following:

  • As above, remove HTTPS encryption.
  • Remove all timeout and retry logic.
  • Remove any other unneeded libraries.
  • Be aware of the added connection pools.  When you introduce two proxies between your services, you are actually adding 2 more "connection pools".  They are:
  1. from your service to the local Envoy sidecar (within the same pod)
  2. from the Envoy sidecar to the upstream Envoy and
  3. from the upstream Envoy to the upstream services (within the same pod).  
    It might be worth your while checking if any configuration changes need to be made to simplify the connection pool in your downstream (client) service.  Here is a great
    blog that goes into this in more detail.

Another positive thing about the simplification of your code is that the footprint of your services can be reduced and performance hopefully increased.  A lot of that “ability” is now implemented in the Envoy proxy, so why have these things duplicated in both the proxy and your service?!

Service Objects

Ensure all your application services use Kubernetes Service Object names and not OpenShift Routes to communicate with each other!  

It can happen, for whatever reason, that developers use OpenShift Routes (cluster ingress endpoints) for service to service communication within the same cluster.  If those services need to be included into the same mesh, then the developers need to make changes to their application configuration/manifests, so the Kubernetes Service Object names are used instead of the OpenShift Route endpoints.  

Fallback Functions

Ensure you implement appropriate fallback functions within your code.  

There are other code changes you might need to consider.  Even though the Envoy proxy takes care of timeouts and retries to make your upstream connections more reliable, sometimes services do fail completely and your code will need to handle these failures.  For this, you will need a "fallback" function in your code which can handle the case when the attempted request really does fail and there’s nothing more Istio’s proxy, Envoy can do.

Configure External Services

Get more out of your Service Mesh by registering external services so that Envoy proxies can send traffic to the services as if they were part of your service mesh.

Most likely, your application communicates with services outside of your Service Mesh.  You might think that because they are external to the mesh that the mesh can't provide much value.  If so, you are wrong!  You can introduce these external services to your mesh which means you can leverage some of the mesh features with these external services as well.

The OSSM documentation explains this in more detail and gives examples.  Here is a longer exercise showing you how to visualise Istio external traffic with Kiali.  Or use TLS origination for your encrypted egress traffic.  

Some of the Istio features you can use with external services are for example:

  1. Encryption (both simple and mutual TLS)
  2. Timeouts and retries
  3. Circuit breakers and
  4. Traffic routing

Conclusion

OpenShift Service Mesh helps you gain visibility into your mesh of services which in turn helps you Increase your overall microservices sophistication.  As a bonus, you can implement more features/functionality in the OpenShift platform itself instead of in each application and ease the strain on your developers.  You might be able to implement such features that were not previously possible, for example, canary deployments, A/B testing and more.  You will gain a consistent way to manage your microservice applications within all of your OpenShift clusters which is good for people & process continuity.  Eventually, this can help you transition from a monolithic design towards a distributed microservice architecture and move more from code to configuration!  

That’s the end of the checklist!  I’m sure many other items could be added.  Can you think of any useful items to add to the checklist?  Now you know about some of the most important items to consider before you implement your application in OpenShift Service Mesh.  I hope it’s useful!