Using Red Hat Advanced Cluster Security for Kubernetes to protect your containers
Now that we’ve covered security for containers and specifically for a container orchestration such as Kubernetes, it’s time to see how Red Hat® Advanced Cluster Security for Kubernetes (RHACS) can support all of these elements in one solution.
RHACS protects containerized Kubernetes workloads in all major clouds and across hybrid platforms. It is included with Red Hat OpenShift® Platform Plus, which is a complete tool set to secure, protect, and manage your organization’s applications. Watch the video posted below for a deep dive into how RHACS can be applied to support Kubernetes container security.
What will you learn?
- How Red Hat Advanced Cluster Security for Kubernetes can be implemented through OpenShift console
What you need before starting:
- A cluster running Red Hat OpenShift
RHACS in action
Chris Porter (00:09):
Hello viewers. This is Chris Porter here with another demonstration of the Red Hat Advanced Cluster Security Product. ACS, as we call it, is a Kubernetes-native container security product designed to bring security into the world of shift left and DevOps, and designed to reinforce the goals that organizations have in adopting an agile development model with DevOps.
The product here covers on the left-hand side, you'll see I'm logged in to the main dashboard here, covers six major security use cases. We cover vulnerability management, which is understanding what known vulnerabilities are present in container images. We'll dig into this in a little bit. It covers network segmentation, which is understanding the relationship on the network between your pods and external and other pod entities in the environment and building network policy rules to help isolate applications and data from each other. We're going to look at compliance, which is a suite of controls that represent industry and regulatory benchmarks out there for workloads and for the infrastructure.
Chris Porter (01:11):
You'll see some features here that integrate with Red Hat's compliance operators to provide CIS benchmarks for the OpenShift cluster that we're running in as well. We're going to look at configuration management, which is a basket of controls that define how an application interacts with Kubernetes. When we look at an application in a containerized environment, we tend to look at the container itself as being the end-all-be-all of an application, but there's a lot of specifications in the deployment that give us information like how it's interacting with Kubernetes itself, what service account privileges, how it uses the network, or how it might interact with storage or other services, that give us a lot of information. These configurations have risk associated with them, and in many cases they're as or more important than the vulnerabilities that might be present in an image. We're also going to look at runtime, attack detection, and incident response.
Chris Porter (02:06):
As applications are running, whether it is interacting with Kubernetes or the activity related to let's say, a potential breach in a vulnerability on a container application, we're going to monitor that runtime life cycle and provide remediation. Speaking of remediation, before we dive into this, I want to talk about the idea of shifting left and DevSecOps.
A lot of what we're going to surface here in the dashboard is meant for us to understand the risk, to decide what's acceptable and not acceptable risk, and put it to place for remediation, but that doesn't mean that we're going to directly address the remediation in a running environment and keep it with a DevOps approach. We want to go back to the source code. We want to fix the configuration, or the image, that brought the problem in the first place so that we get a permanent solution, and then we want to rebuild and redeploy.
Chris Porter (02:59):
In a DevOps world, we have an automated pipeline that is kicked off by a manual process or it could be automated based on a merged pull request, but something has changed that triggers a rebuild and a redeploy, and that pipeline is an ideal place for us to inject security controls. Here for example, the most common use of this would be to understand what vulnerabilities are present in the images and the dependencies of the files and applications that I have there. Developers are going to be responsible for what they bring to the environment. We're going to help them understand here that they have serious and fixable vulnerabilities, meaning that upstream, a vendor like Red Hat has provided a fix for these vulnerabilities and they need to move to a version of the component that supports the fix for these noted vulnerabilities. Essentially, we're setting a guardrail.
Chris Porter (03:50):
The security team here is deciding that we don't accept you advancing in the pipeline if you're bringing in a serious vulnerability that has a known fix. We're not asking anyone to go out and fix BusyBox or Alpine. We are asking that teams take advantage of fixes that are out there. We're also looking at, and this is a preview of the policy engine, we're looking at other policy output here that helps teams understand how they can improve the security of their applications. And the goal is to change those application source code, change that configuration in a Kubernetes-native way with controls that are available in any Kubernetes cluster to permanently improve the risk stance of your applications. We look at things like how service accounts are being used. This is unfortunately one of those defaults in a Kubernetes deployment that's not so good for security.
Chris Porter (04:42):
We don't want applications to have access to the Kubernetes API at all if they don't need that. Certainly something like a privileged container which has access to the host should be rare or non-existent in your environment. Some of these policies are going to be driven by compliance or industry benchmarks like Docker CIS. This policy covers a pretty standard security principle of running as the least privileged user account that we can.
And again, unfortunately, this is an example of a default that is useful for productivity, it helps people get started easily, but it's not the best way to run this in production, and you're going to see this theme throughout the demo again and again as we surface a problem, as the security team is investigating an incident, it points to a solution to be able to harden your applications at the source code level, make those changes, rebuild, and redeploy. So let's use the dashboard here to surface some problems in our environment, understand the risk level of those problems, and be able to craft a policy or use a built-in policy to remediate those.
Chris Porter (05:46):
I started with vulnerability management earlier, and this is a great place to get started. It is a foundational control. Every organization needs to have vulnerability management. This includes an awareness of where images are coming from, what the developers are building into them, what dependencies they have. We could dig down here into the level of the source code that was used to produce this and the files that were copied in, but the focus here is on what we call fixable serious vulnerabilities.
These are vulnerabilities that could result in a remote code execution or a privilege escalation, and they have a fix published, right? We're not asking anyone to go out and fix LZ4 or Curl or the Linux kernel. We are asking that they take advantage of fixes that are out there. And so a vulnerability like this, this is an easy gate to set, right? That if there's a fix available and you're using this component in your environment, then you need to adopt those fixes.
Chris Porter (06:44):
In many ways, just establishing a good DevOps process where we rebuild and redeploy on the regular is going to prevent this kind of thing from getting promoted into the environment in the first place. That's why I like to say that the best thing you can do starting out in Kubernetes-native security is to make sure that you've got that established automation process. Now, this vulnerability here impacts more than just this one image that I'm looking at.
In fact, it's present in four different deployments and this is where the larger picture of risk comes in that I have four different deployments impacted by the same vulnerability, and let's say it's a remote code execution vulnerability. Well, within the four applications, there's this different risk assigned here automatically. This is a customizable risk model that ACS uses to determine really how likely it is that an exploit is going to be available, and how much power an attacker will have once they land that execution, that vulnerability.
Chris Porter (07:44):
So, in here I've got number one is my visa processor, and the reason that this is more likely to be exploited is that it's exposed on the network. It has other vulnerabilities, right? In fact, if we look at the risk page and I move to a new page in the UI, you'll see that overall risk ranking for all of my applications. And the number one is a combination of these bad configurations, bad vulnerabilities, bad privilege levels, and activity in the containers after they started. That tells us that there's potentially an attacker in the midst. Now, these policy violations we're going to dig into, they are customizable both from what they're detecting, the criteria they use as well as the severity of them, but you can see that this one here is a bad combination of things. I have some very serious vulnerabilities called out. I have very high levels of privilege.
Chris Porter (08:35):
Down a little bit lower you can see what we call the service reachability. Really simply: an application that is listening on the network, particularly those that are exposed outside of the cluster have a higher likelihood of being attacked. This is a front end service. It has networks that are wide open. It's more likely to be attacked. And in the prioritization here, this is the first application that should be addressed. That we think that as you're dealing with vulnerability problems or other configuration issues, it's not always feasible to solve every problem all at once, and so we want to inform that prioritization that may be going on in your environment.
Now, there's a lot that we can do to improve the security of this particular application. As we go down the list, you'll see that the list of policy violations is a little bit shorter. As we go down further, you'll see that we are seeing some activity and some suggestions here, but there's less of those critical vulnerabilities and less of those critical configurations.
Chris Porter (09:32):
We're going to look at what we want to do about it, but there's also opportunity here. That we would look at something like running a privilege container that we see things like a package manager being executed. Down here at the bottom, we're seeing a service account token that's being used with cluster admin level of privileges, right? These are all bad things, right? These allow an attacker to make use of the environment to good effect, to impact other containers, other pods in the environment or the cluster itself.
But by taking the recommendations that we have here, we can actually use those same kind of controls and the same kind of approach to harden our applications, by reducing the surface area of attack. By reducing the privilege levels, we can actually make an attack on an application like this much less effective. Speaking of which here in the process discovery tab, we are looking at what actually occurred here after this container started.
Chris Porter (10:29):
So much of what we discuss here is going to be configuration. It's static, but this is the dynamic part of this and the ACS product is monitoring every pod running in my environment and every container within those pods, and I can look at the activity here in a list. I can see that a shell command was spawned.
This is a classic pattern for the exploit of a Java application. We see the Java runtime spawning a shell command here, in this case to install software. We're seeing the follow on from that, so we see this chain of attack. And if I look at this over time, you could see another pattern here that this changed behavior at some point. So we're looking at a variety of conditions. We're looking at the fact that we don't expect to see things like a shell or a package manager running at all in a container context, and here we're also showing that we don't expect a change in behavior.
Chris Porter (11:20):
We're leveraging the power of this constrained runtime environment with containers to understand that anything that is out of the norm is potentially a bad thing, right? Containers do not live interesting lives. They're not virtual machines, they're not general purpose. They generally follow this pattern of a flurry of initialization activity when they're first instantiated and then they settle down and get kind of boring. And we can leverage that here to understand that anything that's not boring is potentially suspect, and we're identifying that automatically with what we call baselines.
In my case here, we've got the startup from a Java application and we see the Java arguments here. That baseline can be modified. I can also lock it so that we tell the system effectively that yes, we agree that this is the baseline and this is what should be running. Now, one of the rules here is a package manager, right? And I'm picking on the package managers here in this demo because this is legitimate software, right? Even if it doesn't have any vulnerabilities in it right now, this is legitimate software that can be used for malicious purposes, and there's a lot of things like that.
Chris Porter (12:28):
Shells compilers and other developer or debug tools, even tools like Curl and Wget that are really useful troubleshooting tools also have value for attackers, and so there's a whole category of policies designed to help teams understand that these are useful for an attacker, and that by removing them will create less surface area for attack. All of the risk indicators here are driven by the policy engine, whether it's runtime, whether it's configuration, and that policy engine is a single policy engine.
We're looking here at one set of rules that cover different life cycles. I can combine rules across that life cycle so, for example, I can look at vulnerability data as well as deployment in namespace information labels. I can look at privilege levels so that I can go in and create a rule that says I don't want anybody mounting sensitive host directories in specific pods versus others. That I want to be able to identify vulnerabilities in privileged containers.
Chris Porter (13:31):
Those attributes are actually different life cycles. They're different parts of the environment, but with a single policy engine here, we can compose a single set of policies that go across all those different silos there. Now, I mentioned package managers, right? One of my favorite punching bags here in security. This one is specifically around the execution at runtime. There's a similar policy here for the package manager being present, whether it's Alpine or Red Hat or Ubuntu. These are useful utilities that can be used in malicious ways.
And so again, we want to identify these that they're in the image and available for an attacker, and that allows us to help the developers understand that they can pretty simply remove those things and in a small way improve the application security. We also of course, look at this at runtime, that somebody actually executed this. And that means that someone's either really confused about containers that they're installing software or something or maintaining software, maybe a developer or ops person that's new to this model and doesn't understand that these changes are going to be lost or it could be malicious. And we're using that runtime information to indicate that there's a solution to this long-term, which is again the same:
Chris Porter (14:48):
the idea of reducing the surface area in the image build. And this is where the maturity of that DevOps lifecycle comes into play. That if we can quickly make a change, rebuild and redeploy, we can very quickly respond to this and permanently resolve this particular security limitation. Now, my policy here at Runtime is aligned to the Mitre attack framework. If you're using the Mitre attack framework to inform your security efforts, we've prebuilt attack techniques that correspond to this to show that you're either detecting or preventing the attack technique that's outlined in this particular section of the Mitre attack framework.
We're also here if we edit this one, we're also able to do things like impact the scope. There's a lot more to this policy engine here, but what's important about it is this rationale and the guidance right? That we want to be able to tell someone specifically, ‘Hey, there's something you can do about this.’ And so all the policies are written in that way where they provide guidance to the development teams or to whoever is operating and building this application that there's something they can do to resolve that.
Chris Porter (15:54):
When we talk about enforcing this as well, it's not just about feel good, helpful hints. The security team also needs to be able to put enforcement rules in place for things like build and deploy time attributes, like privilege levels and opening up a privileged network port or contents of images. We want to get this early. We want to enforce early. And so going back to my DevOps environment and my pipeline runs, I can see that I've actually caused some of those pipeline runs to fail.
Once the security team decides that the risk reaches a level of unacceptable, we can cause that build to fail. And the development team that's responsible for this is going to see this message and understand they have to do something in order to get past this quality gate or this security gate. Now, that's not enough. Unfortunately. Application developers could bypass the build pipeline.
Chris Porter (16:48):
Maybe somebody has gained access illicitly to the clusters and they're now going to deploy their own workloads. We see this with some of the crypto miner attacks. We want to protect the clusters directly, and that's where the deploy time enforcement comes in. We don't invoke our controls here at the container engine level or at the process level. We do this with the Kubernetes API. The Kubernetes native approach means that we're using things like the admission controller or in the case of a runtime enforcement, we're going to use the pod action of ‘delete,’ kill the pod here to enforce these actions. These are well understood.
It doesn't create a situation where your security tool is now in conflict with the orchestrator because with ACS, they're absolutely aligned. The deploy time enforcement provides the same message to the deployer as we saw in the build pipeline: you're going to be rejected for this reason.
Chris Porter (17:40):
Now, these policies, of course, are totally customizable. I can create a new policy anytime. I can use the built-in criteria. One of the most convenient ways to use the policy engine is actually through search filters. And so if I were to search for, let's say a given CVE here and I want to go look for the new-ish spring boot, I can go out and find that vulnerability. I can also look for other information like exposure levels, and once I'm happy with my search filter here, I can create a policy from that. This allows that natural flow from an investigation of an incident and scenario building to being able to create a policy that would identify that going forward.
I want to jump here to another aspect of threat detection and to policy building, which is the network graph, and if you're familiar with networking tools, you may have seen diagrams like this that attempt to track all the networking activity.
Chris Porter (18:41):
The OpenShift infrastructure that we're running on gives us a convenient place to probe this information. And the goal here of course, is to understand that two applications are communicating with each other who's accessing this database, but more importantly to use that information to be able to restrict the access there. Again, one of those unfortunate defaults is that in a Kubernetes cluster, I have rules that are available, but that are not in use by default.
In other words, any pod that's running a service can be accessed by any others, and if I switch to my allowed view here in the network graph, you'll see exactly that. My front end is wide open, my operations environment is wide open, these dotted lines and the red mean bad. And so we have a potential where an attacker who gets into a front-end environment because somebody's running an old WordPress instance, could then move laterally across the environment to get to other pods here.
Chris Porter (19:36):
We're talking about firewalling essentially or segmentation, but instead of VLANs and IP address ranges, we're talking about pod and deployment names, because the notion of an IP address or a host name in this context just doesn't make any sense. Now, what we want to do about it is important, and in this place, this is where ACS really differs from any of the solutions out there in that we've always taken a Kubernetes-native approach, right?
We could build a firewall. We could build something that would actually act at the pod level or at the Linux kernel level to investigate, apply rules, and to enforce those rules at the network layer. But there's a better option, and what we've chosen to do is to support the Kubernetes-native model by using network policies. This illustrates, I think most clearly what we mean by Kubernetes-native.
Chris Porter (20:27):
We're surfacing issues here with the runtime to be able to understand where a threat might happen. To understand which of my applications are at risk of being exploited and how lateral movement could impact that, and then we're suggesting rules here that would go into OpenShift to be built in to restrict access to those pods. This is really important in that we're not providing a firewall here.
ACS does not have a separate networking component that creates risk in your environment that something like that could fail and cause either a fail open situation where networking is allowed or fail closed, where now your applications are offline. We're using here, the network policy capabilities that are already in use in your environment, to enforce those rules. We really want developers to consider this as part of the code. It is a configuration that impacts their application and by building it into the code, we make for much more secure applications overall.
Chris Porter (21:27):
I can copy and paste these into the OpenShift interface. Really, a better way to do this is to make sure, again, that it's part of the application source code. This kind of specification impacts the performance of the applications and how they function in different environments just as much as the docker image or the deployment data or the behavior at runtime. We think it's just a part and parcel to how these applications are going to be defined.
The last thing I'm going to jump into is compliance. There's a lot more to see here obviously in the products, but we're going to keep this short and talk about this last major functional area. And I've saved it for the end because in many ways you've already seen all of these features. Okay? Compliance is really rolling up a lot of the security policy we have. It's putting in a concrete way the principles of container native security build pipelines being used to examine and to enforce rules around vulnerability scanning, to embedding clear tech secrets using configuration of the environment here. And so the controls here have a corresponding policy in the policy engine, so the ACS product is measuring here effectively:
Chris Porter (22:42):
Are you using the clear text secret policy? Are you using the vulnerability management policy? And the policy is what connects the dots between a high level control like NIST 411 here to the individual fixes communicated to the individual teams that they need to be in compliance. We want to avoid the problem where we tell somebody, ‘Hey, you have to be PCI compliant!’ and leaving them to interpret what that means. What we've done here is taken a pretty aggressive stance to interpret the controls that are required, the technical controls here, to guidance that's available in a Kubernetes-native control. I mentioned that these controls are many things that you've seen already when it comes to PCI and isolation of payment card data. We talk about Kubernetes network policies instead of internet connections, DMZs and firewalls. We're looking at configuration standards here. You'll see that PCI compliance here is not covered entirely.
Chris Porter (23:42):
There are gaps because many of the controls are either not technical or not applicable to this environment, and so this is a part of a larger PCI compliance effort. You can see vulnerability scanning here. Again, we're looking at the control here provided by the PCI standard and providing you with guidance. Another way to look at this too is by namespace, so when we're looking at the compliance standards that apply to namespaces, like vulnerability rules and the use of secrets and privileges, it's I think more convenient to look at this down at the namespace level. I can see where the standards are applicable and where they're not. I can understand within my organization which teams need help, right? Which teams are achieving the goals there and which teams frankly, have work to do in order to get to that level. One of the other things we're seeing in compliance is configuration benchmarks.
Chris Porter (24:38):
In this case, something like the CIS benchmarks for OpenShift 4. This is a recommendation produced by CIS and by Red Hat. And you can see I'm pretty much along, pretty compliant here with this particular configuration standard. These are reading the OpenScap rules that are part of the compliance, just provides a convenient place for security teams to review the results. I often all like to say, though, you're only going to be as compliant as you can prove. And the graphs and the charts in this UI are very useful for determining where the work is to be done, but we also are going to have to supply this.
And if you are seeking regulatory compliance internally or externally with an auditor, having that evidence as a spreadsheet is going to be what you need in order to show that you have the controls in place for every single workload out there, and that evidence file is something that's easily exported. So that's it for the functional areas. There's a lot more to look at. We haven't looked at any of the administration, but the product supports single sign-on role-based access control. Integration with docker registries and container repositories, notifiers for integration with external bug tracking, chat ops and SIM tools. There are built-in support for administration through API tokens. It's very easy to deploy and roll out, but that's it for the demo today. Look forward to seeing you on the next one. Thanks.