Limiting pod creation based on their security attributes
The very key function of Kubernetes is that it allows users of this platform to run their custom workloads on a set of servers that run the platform, and the platform maintains these workloads and updates the user about their current state.
Allowing anyone to run anything on a Kubernetes cluster would sooner or later result in the cluster nodes getting compromised, because no control over what the workloads may access means anyone with access to the cluster can run anything that might allow direct access to the cluster nodes, both intentionally or unintentionally.
Pod Security Policies
For several years cluster administrators had the option to turn on the PodSecurityPolicy admission controller.
This admission worked by checking a set of cluster objects, so called
Pod Security Policies, which could be configured to validate the
securityContext field of the
Pod objects and make a decision whether such a pod can be created based on the
Pod Security Policies access privileges of the ServiceAccount running the pod.
PodSecurityPolicy object configuration matched against the
securityContext field of a pod that was about to be created, this pod would be admitted by the PodSecurityPolicy admission controller. Not only that, but some of the subfields of the pod’s
securityContext could be modified by the admission if unset to meet the expectations of the given PodSecurityPolicy.
However, based on the experiences of both the developers and users of the PodSecurityAdmission, the Kubernetes sig-auth group decided to discontinue the controller as it turned out to be rather counterintuitive to maintain and its behavior was sometimes confusing - especially the mutating behavior as new PodSecurityPolicies might appear at any time in a cluster and would mutate pods differently on new Deployment rollouts.
To learn more about the reasons to move away from the PodSecurityPolicy admission, read the Motivation section of the Pod Security Admission enhancement.
PodSecurityPolicy admission end of life will happen in Kubernetes 1.25 as the API has been deprecated in Kubernetes 1.21.
Pod Security Admission
With lessons learned from the PodSecurityPolicy admission, the Kubernetes sig-auth group came up with a new admission system - Pod Security Admission, as designed in the official enhancement.
The most notable changes compared to the previous system are:
- The admission validates only, no more mutating the pods to match given restriction requirements.
- There are only three predefined security levels - “restricted”, “baseline” and “privileged”. These levels are defined by the Pod security standards. These can be used versioned by Kubernetes release.
- It is now possible to keep up with evolving security standards within the admission without fear that current workloads break as the API allows versioned checks per namespace.
- The admission works on a per-namespace basis compared to the previous group/service account RBAC PodSecurityPolicy privileges.
Based on the above, the user is now fully responsible to configure their pods’
securityContext in order to be able to match a given pod security standards profile. This lets users make conscious choices about security relevant settings of workload manifests.
This new system also can now be turned on by default as the admission rules are predefined compared to PodSecurityPolicy admission where the user had to define their own policies in order to be able to even create pods.
Here, a cluster administrator can turn on just client logging, audit event logging and finally even cluster-wide enforcement of a specific pod security level. This allows an iterative approach to introduce Pod Security Admission to an existing environment.
If unconfigured, Kubernetes does not enforce a specific pod security level by default and treats all workloads as privileged. However, all end to end tests within the Kubernetes and OpenShift source code base are configured to enforce the restricted pod security level by default.
Pod security in OpenShift
Security Context Constraints and Pod Security Admission
In OpenShift, there is an OpenShift-specific dedicated pod admission system called Security Context Constraints. This system resembles the now deprecated PodSecurityPolicy admission, even though there have been many changes throughout the years of its existence. Our aim is to keep the Security Context Constraints pod admission system while also allowing users to have access to the Kubernetes Pod Security Admission. The following text describes what we did in order to make it possible in 4.11, and what we plan to do next in 4.12.
Pod Security Admission, OpenShift
With OpenShift 4.11, we are turning on the Pod Security Admission with global “privileged” enforcement. Additionally we set the “restricted” profile for warnings and audit.
This configuration gives users the possibility to opt-in their namespaces to Pod Security Admission with the per-namespace labels.
The current global configuration we ship OpenShift 4.11 with:
# The build controller creates pods that are likely to be privileged
# based on BuildConfig objects. Access to these build pods is however
# still limited by the SCC exec admission and so we can safely add the
# build-controller SA here.
# This configuration should never be exposed to cluster users as no
# such guarantees are made for any other OpenShift SA/user.
See the Kubernetes documentation for the Pod Security Admission documentation to understand the above snippet better.
Aside from this global configuration, we introduced a new mechanism that automatically synchronizes the Pod Security Admission “warn” and “audit” labels.
Automatically synchronizing Pod Security Admission Labels
Aside from the global Pod Security Admission configuration, the OpenShift team also developed a controller that attempts to synchronize the pod security profiles in the “warn” and “audit” labels for every namespace.
The controller introspects service account permissions to ‘use’ Security Context Constraints in each of the namespaces, it maps the Security Context Constraints to pod security profiles based on each of the Security Context Constraint field values, and eventually it sets the namespace’s Pod Security Admission “warn” and “audit” labels to the highest privileged pod security profile it found so that the pods in the namespace do not trigger warnings or audit logging as they get created in the namespace.
The only thing that this controller is not able to deal with are automatic namespace labels for direct pod creations (without any intermediate pod controller, like Deployment) as these might use the SCC privileges of the user attempting the pod creation, which makes direct pod application be the most probable suspect of such an alert trigger. However, running pods directly is not advised, and so this should not be an issue.
The significance of this controller becomes more clear in the face of further plans for OpenShift 4.12 where we intend to switch the auto-labeling from the logging “warn” and “audit” labels to synchronizing the “enforce” labels.
It is possible to turn off label synchronization for a namespace completely, which makes the owners of the namespace responsible for setting the Pod Security Admission fully responsible for controlling the admission in the namespace. Turning off label synchronization per-namespace is possible by setting the
security.openshift.io/scc.podSecurityLabelSync namespace label to
Note that “openshift-” prefixed namespaces do not get synchronized by default as these are considered system namespaces. You should not create “openshift-” prefixed namespaces. However, if you did and need automatic Pod Security label synchronization for such a namespace, you can set the
security.openshift.io/scc.podSecurityLabelSync label to
true there. Note that the namespaces that come within the default installation payload are managed by respective OpenShift teams and will not get labels synchronized; instead the developers managing these namespace components decide on the proper labels for them.
Changes to Security Context Constraints admission, new SCCs
In order to meet the new Kubernetes requirements in regards to pod security standards, we created a set of new Security Context Constraints objects (SCCs) that enforce pod security further than the previous policies used to do. Note that all the SCCs you knew from previous OpenShift versions are still available, although there are changes to who can “use” the “restricted” SCC - more on that later in this section.
Specifically, the new SCCs are:
These new SCCs enhance their previous “non-v2” versions by:
- dropping ALL capabilities from containers
- they still allow explicitly adding the NET_BIND_SERVICE capability
- defaulting seccompProfile to “runtime/default”
allowPrivilegeEscalationto be unset or set to
falsein security contexts
For new 4.11 clusters, the “restricted-v2” replaces the “restricted” SCC as an SCC that is available to be used by any authenticated user. In clusters upgraded from previous versions, any authenticated user can still use both the “restricted” and “restricted-v2” SCC.
We also modified the SCC admission to explicitly specify the container
runAsNonRoot field to
true in case the container is supposed to run as a non-zero user.
All these changes were made in order to synchronize the SCC admission to be able to reach the “restricted” pod security profile in terms of restricting pod creation.
Plans for next releases
In a subsequent release, we intend to move the global configuration to enforce the “restricted” pod security profile globally. With this change, the label synchronization mechanism will also switch into a mode where it synchronizes the “enforce” Pod Security Admission label rather than the “audit” and “warn”.
To ensure an upgrade is safe, we are also introducing an information-level alert on pod security violations in 4.11.
Alerting on Pod Security Admission violations
The “PodSecurityViolation“ alert is triggered when the Kubernetes API server reports that there was a pod denial on the audit level of the Pod Security Admission.
It is advised to check what the trigger of the alert might have been once it is fired; such a workload is likely to fail to be admitted in the release where global enforcement is set to the “restricted” pod security level.
The label synchronization mechanism of OpenShift should prevent these events from occurring.
In OpenShift, we strive to keep your clusters up to date with the current security standards. We believe that Pod Security Admission is a great next logical step forward in this direction and we hope that our users will be just as excited about our recent changes to bring this new upstream feature to them.