SAS Viya on Red Hat OpenShift – Part 1: Reference Architecture and Deployment Considerations

March 25, 2024Patrick Farley, Hans-Joachim Edert12-minute read

In this two-part blog, we will provide essential technical information about SAS Institute's latest analytic platform, SAS Viya, as well as a reference architecture for deploying SAS Viya on Red Hat OpenShift. Also, make sure to take a look at part 2 of this blog, where we will discuss security, machine management and storage considerations. Let’s start with a few introductory words before we get into the technical details.

Since the launch of SAS Viya in 2020, SAS has offered a fully containerized analytic platform based on a cloud-native architecture. Due to the scale of the platform, SAS Viya requires Kubernetes as an underlying runtime environment and takes full advantage of the native benefits of this technology.

SAS supports numerous Kubernetes distributions, both in the public and private cloud. In fact, many SAS customers - partly due to specific use cases that do not allow otherwise from a regulatory perspective, but also due to strategic considerations - prefer to run their application infrastructure in a private cloud environment.

In these cases, OpenShift provides a solid foundation for the SAS software stack. OpenShift offers both a hardened Kubernetes with many highly valued enterprise features as an execution platform, but also comes with an extensive ecosystem with a particular focus on supporting DevSecOps capabilities.

SAS and Red Hat have enjoyed a productive partnership for more than a decade - while Red Hat Enterprise Linux was the preferred operating system in earlier SAS releases, SAS now bases their container images on the Red Hat Universal Base Image.

Moreover, for OpenShift deployments, SAS takes advantage of the OpenShift Ingress Operator, the cert-utils operator, OpenShift GitOps for deployment (optionally), and integrates into OpenShift’s security approach which is based on SCCs (Security Context Constraints) as part of their deployments.

SAS Viya on OpenShift Reference Architecture

SAS Viya is an integrated platform that covers the entire artificial intelligence (AI) and analytics lifecycle. Thus, it is not just a single application, but a suite of integrated applications. One of the fundamental differences here is the nature of the workload that SAS Viya brings to the OpenShift platform. This affects the need for resources (CPU, memory, storage) and entails special security-specific requirements.

Moving SAS Viya to OpenShift gives Viya unprecedented scalability that was unavailable in previous SAS releases. SAS takes advantage of the scalability by breaking Viya down into different workload types and recommends assigning each workload to a class of nodes, i.e., to machine pools. This makes sure that the proper resources are available to specific workloads. Figure 1 shows the separation of workloads to pools.

Figure 1: SAS Viya on OpenShift Reference Architecture

Note that the setup of pools is not mandatory and there might be reasons to ignore the recommendation if the existing cluster infrastructure is not suitable for such a split. Applying a workload placement strategy by using node pools provides a lot of benefits, as it allows you to tailor the cluster topology to workload requirements; you could, for example, choose different hardware configurations (nodes with additional storage, with GPU cards etc.). The placement of SAS workload classes can be enabled by applying predefined Kubernetes node labels and node taints.

Please refer to part 2 of this blog for an in-depth discussion around machine management on OpenShift, which explains how this can be simplified and automated by capabilities uniquely available in OpenShift.

We'll now briefly explain the workload classes mentioned in this diagram.

SAS Cloud Analytic Services (CAS) (CAS NODE POOL)

The core component at the heart of SAS Viya is Cloud Analytics Services (CAS). It is an in-memory analytics engine. Data is loaded from the required data source into some in-memory tables then clients connect to CAS to run analytical routines against the data. The data loaded into memory for CAS can also be flushed to disk. This is why the CAS server usually has the highest resource requirements of all SAS Viya components: it is CPU- and memory-intensive and requires persistent storage accessible by all nodes hosting CAS pods.

CAS can be deployed in one of two modes: SMP (Symmetric Multi Processing) mode -- as a single instance server, and MPP (Massive Parallel Processing) mode -- as a distributed server. In SMP mode, only one CAS pod is being deployed, in MPP mode, multiple CAS pods are used where one pod takes the role of a controller while the other pods are used for running the computations.

In a default configuration, i.e., when a CAS node pool is being used, each CAS pod runs on a separate worker node, claiming more than 90% of the available CPU and memory resources out-of-the-box. If there is no node pool available for CAS, transformer patches applied during the deployment limit the resource usage of CAS to the desired amount or allow co-existence of CAS with other workloads on the same node.

SAS Compute Services (COMPUTE NODE POOL)

SAS Compute services represent the traditional SAS processing capabilities as used in all previous releases of SAS. A SAS session is launched either interactively by a user from a web application or in batch mode to run as a Kubernetes job to execute submitted SAS code to transform or analyze data. Due to this approach, SAS sessions are highly parallelizable. The number of sessions (or Kubernetes jobs) running in parallel is only limited by the available hardware resources.

The compute node pool is a good candidate for using the cluster autoscaler, if possible. Often customers have typical usage patterns that would directly benefit from this - for example, by intercepting usage peaks (scaling out for nightly batch workload, scaling in over the weekend, etc.). Please refer to the section on Autoscaling in part 2 of this blog for more OpenShift specific details.

SAS Microservices and Web Applications (STATELESS NODE POOL)

Most services in any SAS Viya deployment are designed as microservices, also known as 12 factor apps. They are responsible for providing central services like auditing, authentication, etc. Also, grouped with these services are a set of stateless web applications which are the user interfaces which are exposed to end users, for example: SAS Visual Analytics, SAS Model Manager and SAS Data Explorer.

Infrastructure Services (STATEFUL NODE POOL)

The commodity services are basically the metadata management and storage services. They are made of several open source technologies such as the internal SAS Postgres database, as well as Consul and RabbitMQ for messaging. This is where the critical operational data is stored. These services are rather I/O intensive and do require persistent storage.

Core Platform

Based on the partnership between Red Hat and SAS, we have adopted a holistic approach to SAS Viya on OpenShift. SAS Viya can now be successfully deployed and supported on most x86_64 platforms where OpenShift can be used. This includes self-managed OpenShift with on-premise and public cloud platforms, including bare metal, OpenShift Virtualization, VMware vSphere, Azure, AWS and many others.

The current statement in the SAS Operations Guide confirms this:

“Red Hat OpenShift Container Platform (OCP) 4.13.x - 4.14.x on one of the supported platforms for OpenShift clusters. For detailed information about these environments, see the appropriate OCP installation guide: OCP 4.13; OCP 4.14. You must also deploy on the tested integrations that are documented at OpenShift Container Platform 4.x Tested Integrations (for x86_64). These OCP versions align with the supported versions of Kubernetes (1.26.x - 1.27.x). When Red Hat adds support for Kubernetes 1.28, this guidance will be revised.

SAS has tested only with a user-provisioned infrastructure installation.”

At the time of writing this document, future support is on the roadmap for SAS Viya deployments with OpenShift that’s delivered as a managed service on the hyperscaler public cloud platforms, including Red Hat Service on AWS (ROSA), Azure Red Hat OpenShift (ARO), and others. These would currently fall within the SAS Support for Alternative Kubernetes Distributions policy.

Make sure to check the SAS Operations Guide to determine the versions of OpenShift that are supported. At the time this blog was written, Red Hat OpenShift versions 4.13 - 4.14 are supported for SAS Viya 2024.03. SAS works to align their SAS Viya Kubernetes support levels with OpenShift and typically adds support for the latest OpenShift version updates within 1-2 months of a given version release. Additional details about some of the specific OpenShift components that support the SAS Viya deployment are provided in part 2 of this blog, so we will only provide a high-level overview of the requirements here:

OpenShift Ingress Operator: SAS has specific requirements for forwarding cookies during transaction execution. As such, they used special techniques using the HAProxy to make that happen. So, in this iteration only, the OpenShift Ingress Operator is supported.
OpenShift Routes: SAS prefers the use of native features with the environments with their products, so they take advantage of OpenShift routes.
cert-utils-operator: SAS requires the use of this community operator to manage certificates for transport layer security (TLS) support and create keystores.
cert-manager: SAS Viya supports two different certificate generators, which are used for enforcing full-stack TLS. The default generator uses OpenSSL and is supplied out-of-the-box by SAS. Alternatively, you can optionally deploy and use cert-manager to generate the certificates used to encrypt the pod-to-pod communication.
Security Context Constraints (SCCs): These provide permissions to pods and are required for them to run. SAS requires several custom SCCs to support SAS Viya Services with OpenShift. The SAS documentation provides information about the required SCCs to help understand their use in your environment and to address any security concerns. Further details about the required custom SCCs are provided in part 2 of this blog.

Deployment Options – Red Hat OpenShift

A major strength of OCP is its ability to run almost anywhere. The Red Hat Hybrid Cloud Console can be used to provide a launch point for each of the installation methods by selecting to deploy a cluster in a public cloud provider, or in your own datacenter.

There are various methods for installing self-managed Red Hat OpenShift Container Platform, including:

Interactive: You can deploy a cluster with the web-based Assisted Installer, which provides a user-friendly installation solution offered directly from the Red Hat Hybrid Cloud Console. This is an ideal approach for clusters with networks connected to the internet. The Assisted Installer is the easiest way to install OpenShift Container Platform, it provides smart defaults, and it performs pre-flight validations before installing the cluster.

For more information about the OpenShift Assisted Installer and to see a demo of it in action, see the Red Hat blog OpenShift Assisted Installer is now Generally Available.

Local Agent-based: You can deploy a cluster locally with the Agent-based Installer for disconnected environments or restricted networks. It provides many of the benefits of the Assisted Installer, but you must download and configure the Agent-based Installer first. Configuration is done with a command-line interface. This approach is ideal for disconnected environments.

For more information about the OpenShift Agent-based Installer and to see a demo of it in action, see the Red Hat blog Introducing a Game-Changing New Way to Deploy OpenShift Clusters in Disconnected Environments.

Automated: You can deploy a cluster on installer-provisioned infrastructure (IPI), which allows the OpenShift installation program to pre-configure and automate the provisioning of the required resources. For installations on bare metal, the installation program uses each cluster host’s baseboard management controller (BMC) for provisioning. You can deploy clusters in connected or disconnected environments.

Full control: You can deploy a cluster on user-provisioned infrastructure (UPI) that you prepare and maintain, which provides a manual type of installation with the most control over the installation and configuration process. You can deploy clusters in connected or disconnected environments.

OpenShift hosted control planes (HCP): This new control plane architecture for OpenShift allows you to run the control planes for multiple clusters at scale on a management cluster, allowing you to consolidate the control planes onto fewer nodes, decreasing the total cost of ownership. This also increases the security boundary by separating the control plane from the data plane.

For on-premises architectures, HCP also supports running OpenShift worker (compute) nodes on: bare metal and OpenShift Virtualization.

For more information about using OpenShift Virtualization with HCP, and to see a demo of it in action, see the Red Hat blog Effortlessly And Efficiently Provision OpenShift Clusters With OpenShift Virtualization.

Information about configuring the machine management and storage integration capabilities with OpenShift is supplied within part 2 of this blog.

Deployment Options – SAS Viya

There are several approaches for deploying SAS Viya on Red Hat OpenShift, which are described in the SAS Operations Guide:

Manually by running kubectl commands
Using the SAS Deployment Operator
Using the sas-orchestration command line utility

1. Manual Deployment

After purchasing a SAS Viya license, customers receive a set of deployment templates (known as the deployment assets tarball) in YAML format which they need to modify to create the final deployment manifest (usually called site.yaml). SAS uses the kustomize tool for modifying the templates. Common customizations include the definition of a mirror repository, configuring TLS, high-availability, storage and other site-specific settings. The final deployment manifest can then be submitted to Kubernetes using multiple kubectl commands.

NOTE: You must have cluster-admin privileges to manage deployments of SAS Viya.

Note that the final manifest contains objects which require elevated privileges for deployment, for example Custom Resource Definitions (CRDs), PodTemplates, ClusterRoleBindings, etc., which means that in most cases the SAS project team will need support from the OpenShift administration team to carry out the deployment. SAS has tagged all resources that need to be deployed according to the required permissions. This enables task sharing between the project team (with namespace-admin permissions) and the administration team (with cluster-admin permissions). However, it is important to keep in mind that this dependency will come up again with later updates for example.

2. SAS Deployment Operator

For that reason, using the SAS Deployment Operator might provide a better solution. SAS provides an operator for deploying and updating SAS Viya. The SAS Deployment Operator is not (yet) a certified operator, so it will not be found in the OperatorHub or in the Red Hat Marketplace.

The SAS Viya Deployment Operator provides an automated method for deploying and updating the SAS Viya environments. It runs in the OpenShift cluster and watches for declarative representations of SAS Viya deployments in the form of Custom Resources (CRs) of the type SASDeployment. When a new SASDeployment CR is created or an existing CR is updated, the Deployment Operator performs an initial deployment or updates an existing deployment to match the state that is described in the CR. A single instance of the operator can manage all SAS Viya deployments in the cluster.

NOTE: The Deployment Operator must have cluster-admin privileges to manage deployments of SAS Viya.

As part of a DevOps pipeline, the operator can largely automate deployments and deployment updates, reducing dependency on the OpenShift administration team. For example, the SAS Deployment Operator nicely integrates with OpenShift GitOps, which is a component of OCP that provides a turnkey CI/CD automation solution for continuous integration (CI) and continuous delivery (CD) tasks. OpenShift GitOps can be used to provide additional automation for a SAS Viya deployment by monitoring a Git repository for changes to the SAS CR manifest and automatically syncing its contents to the cluster. Pushing the CR manifest to the Git repository then triggers a sync with OpenShift GitOps. The CR will be deployed to Kubernetes, which in turn triggers the Operator and the deployment to start. Figure 2 illustrates this workflow:

Figure 2: SAS Viya Deployment Operator workflow

For additional information, see the SAS blog Deploying SAS Viya using Red Hat OpenShift GitOps.

3. `sas-orchestration` Utility

The sas-orchestration command-line utility offers the flexibility of both worlds: as a container image it can be launched manually on a Linux shell to create and submit the final deployment manifest (in other words: it combines the kustomize and kubectl actions into one step) or it could be used as a step in a CI/CD pipeline, for example as a task in OpenShift Pipelines, Jenkins or GitHub Actions, etc.

NOTE: You must have cluster-admin privileges to perform a SAS Viya deployment using the sas-orchestration utility.

For more information about the sas-orchestration utility, see the SAS blog New SAS Viya Deployment Methods.

SAS Viya Deployment From A Process Perspective

It has probably already become clear that deploying SAS Viya is a “team sport activity” due to the size and complexity of the software stack. Typically, project teams on OpenShift are granted namespace-local, but not cluster-wide permissions by the OCP admin team (admin vs cluster-admin role). We provide more details on the security requirements within part 2 of this blog, but in short it means that the SAS project team will be lacking the necessary authorizations to carry out a deployment independently.

Based on our experiences with previous deployments at customer sites, we found the following process approach to be helpful. For the sake of this blog, we’re describing the process for a manual deployment in Figure 3:

Figure 3: SAS Viya deployment process

The deployment process can be segmented into three phases (planning, preparing, and performing) with separate tasks for the two main actors, the SAS project team (e.g. local SAS administrators and system engineers from SAS Institute) and the OpenShift administration team.

Planning: It’s usually a good idea to start with a joint workshop where the SAS team gives a technical overview about SAS Viya to the OCP administrators. This is where topics such as security, sizing, storage, networking etc. need to be discussed.
Preparation: The second task typically is in the hands of the OCP administrators: they need to review the requirements and prepare the project setup. The SAS team can then start preparing the deployment manifest (basically a YAML file) which contains site-specific information (such as the DNS name or the supplemental-groups value which is specific to the project/namespace).
Performing: Once the deployment manifests are ready to be submitted, both teams need to collaborate to submit the manifests. While the OpenShift administrators can focus on resources which require elevated permissions (like custom SCCs or CRDs), the SAS project team will handle all the resources with namespace scope. The main deployment manifest contains predefined selectors that allow to make this distinction between.

Conclusion

With that we’d like to conclude this first part of our blog. We hope you found it helpful to provide you with the basic know-how you’ll need to support your project team in deploying SAS Viya on OpenShift. Please take a look at part 2 of this blog where we discuss security and storage considerations.

About the authors

Patrick Farley

Associate Principal Ecosystem Solutions Architect, Global Solution Architecture

Patrick is an Associate Principal Ecosystem Solutions Architect with the Global Solution Architecture team at Red Hat. He joined Red Hat in 2019 and currently works with our OEM and ISV partner ecosystem. Patrick is passionate about creating AI/ML, infrastructure and platform solutions with OpenShift.

Read full bio

Hans-Joachim Edert

Advisory Business Solutions Manager, SAS Institute GmbH

Hans has been supporting SAS customers in Germany, Austria and Switzerland as a Presales Consultant and Solutions Architect since he joined SAS in 2002. Currently he is working in an international team of architects and DevOps engineers which takes care of the EMEA region. His work is focused on Enterprise Architecture, Kubernetes and cloud technologies.

Read full bio

Browse by channel

Explore all channels

Select a language