How do SREs manage ROSA?
Ryan Niksch (AWS) and Shaozhen Ding (Red Hat) discuss how Red Hat SRE teams use the OpenShift API to manage public Red Hat OpenShift Service on AWS (ROSA) clusters and PrivateLink to manage private clusters.
To learn more about applying Red Hat applications for your business, please visit our Learning Hub.
Ryan Niksch (00:00):
Greetings. My name is Ryan Niksch. I am a Principal Solutions Architect with Amazon Web Services. Joining me here today is Shaozhen from Red Hat. Shaozhen, say hi.
Shaozhen (00:09):
Hi. I'm a Red Hat Managed Service Black Belt.
Ryan Niksch (00:14):
Shaozhen, working with customers in OpenShift, I'm seeing a lot of customers move towards a managed OpenShift. So a lot of adoption around the Red Hat OpenShift Service on AWS, ROSA, jointly supported, managed by Red Hat. So real benefit there. I want to spend a little bit of time looking at two of the implementations. What does it look like specifically for Red Hat SREs to manage that ROSA environment when customers are deploying ROSA into a public facing architecture, where the entire OpenShift cluster is public facing. And then what does that look like when we shift that into a PrivateLink where everything is private? How do the SREs come in from that perspective?
Ryan Niksch (01:06):
Shall we spend a moment or two and look at the public facing implementation first, what it looks like, what its building blocks have, and then how the SREs come into that environment?
Public implementation
Shaozhen (01:17):
Yeah, sure. So if you look at here, we basically draw ROSA clusters, say that a private subnet. And inside of this VPC you have a private subnet, you have a public subnet, and on the public subnets you have an elastic load balancer. And this subnet attached with AGW internet getaway and a net gateway.
Ryan Niksch (01:40):
So this is a very traditional-
Shaozhen (01:43):
Yeah.
Ryan Niksch (01:43):
-AWS VPC. You've got an internet gateway in that public subnet, the net is to cater for communication from the public to the private and vice versa.
Shaozhen (01:53):
Yep.
Ryan Niksch (01:53):
This is an internet facing AWS load balancer?
Shaozhen (01:57):
Yes. That's an internet facing elastic load balancer.
Ryan Niksch (02:00):
The OpenShift cluster itself, the control plane, the infrastructure nodes, the worker nodes, they're all here inside the private subnet. There's nothing in those public subnets.
Shaozhen (02:11):
Exactly.
Ryan Niksch (02:12):
Except for the load balancer itself.
Shaozhen (02:13):
Yeah. Yeah. The load balancer is basically in front of the ROSA API. Right. The cluster API. And that's where the SREs actually get access from the ingress point to the ROSA cluster.
Ryan Niksch (02:28):
And in this case it's hitting the cluster. This is the same OpenShift API that I would talk to if I was using CLI tools like OC commands or my ROSA commands itself.
Shaozhen (02:41):
Yes.
Ryan Niksch (02:42):
Where are the SREs? They're sitting inside Red Hat inside a separate AWS account, is that correct?
Shaozhen (02:51):
Yeah, SREs basically have their own AWS account, and it's a separate AWS account.
Ryan Niksch (02:58):
So they're coming, in this case, they're coming in over the internet.
Shaozhen (03:04):
Yeah.
Ryan Niksch (03:04):
Is there anything that is from a security standpoint, filtering only SREs coming in? Is there something like an allowlist or?
Shaozhen (03:13):
Yeah, there's an IP list. It's only an IP allowlist to allow the SRE account from Red Hat to manage this ROSA cluster.
Ryan Niksch (03:28):
And they're coming through into this load balancer, which in turn gets to the OpenShift control plane. This API, that's on the master nodes?
Shaozhen (03:38):
Yes.
Ryan Niksch (03:38):
On the OpenShift plane.
Shaozhen (03:39):
Yeah. They forward the traffic to the master node of the ROSA cluster.
Private implementation
Ryan Niksch (03:42):
Now with ROSA public implementations, the API endpoint, the OpenShift console, those are accessible to the outside world. Most customers I work with want a much more private implementation and they gravitate towards ROSA PrivateLink. But this makes everything private. Literally it's exposed to the VPC that it's deployed in. So if I have a customer who is on-prem, they're probably going to have something like AWS Direct Connect or they're going to have a transit gateway to have the communication from their internal organization to this cluster. That doesn't help the SRE team.
Shaozhen (04:33):
Yeah.
Ryan Niksch (04:34):
How does the SRE team get into this if there is nothing connecting them to that public space?
Shaozhen (04:40):
Yes. So there's a feature called Interface VPC Endpoint.
Ryan Niksch (04:55):
This is essentially a PrivateLink endpoint.
Shaozhen (04:59):
Yeah.
Ryan Niksch (04:59):
But we're not linking to another AWS service here like we traditionally would with PrivateLink. I'm assuming we're going to have the SRE team here have their own AWS account, and this is a link to that AWS account.
Shaozhen (05:23):
So once the Interface VPC Endpoint creates the customer, they have to approve this access for certain AWS accounts. That's just a permission, which is eventually the AWS SRE service account. And then that account actually is the only account able to route the traffic to the Interface VPC Endpoint. And everything inside of that traffic is private, it's inside of an AWS infrastructure, not through the internet.
Ryan Niksch (05:57):
This is a private connection. Everything here is still private. What stops an SRE member from being able to come across here into this ROSA AWS account and then taking the next step to come all the way on premises?
Shaozhen (06:16):
Yeah, yeah.
Shaozhen (06:17):
So this is more like ingress traffic right. And you know, you can definitely have your security on the transition gateway or security group or ACL to block the traffic. So this is only about SRE access to the ELB and then the ELB has the traffic to the ROSA cluster.
Ryan Niksch (06:17):
Okay.
Shaozhen (06:17):
Yeah.
Ryan Niksch (06:36):
So we've got a security group here that is defining only these addressed spaces can come in there. There's nothing that allows that to transfer further through the environment. The clusters are private, but there's nothing stopping a customer from taking a private ROSA cluster and then presenting that to the outside world through another security device like a Palo Alto or a WAF implementation.
Shaozhen (06:36):
No.
Ryan Niksch (07:06):
So we do see customers using this PrivateLink implementation to have internal workloads, but also to provide a centralized, secure mechanism to expose that externally.
Shaozhen (07:20):
Yeah.
Ryan Niksch (07:21):
Of these two, which of these two are you seeing as the most common implementation?
Shaozhen (07:27):
Yeah, Enterprise customers, they are not comfortable to expose their API, cluster API, to the internet of course. And even though we have an IP list, so PrivateLink is definitely the most popular way to implement your ROSA cluster.
Ryan Niksch (07:41):
So where are we seeing public? These are smaller implementations where there is a need to expose it publicly. I want something simple, or it's a potentially non-production context?
Shaozhen (07:53):
Yeah. I would say non-production context, or is not a critical security application. And for the PrivateLink, you can also expose your application to public, but you just only expose the API to the public.
Ryan Niksch (08:09):
Okay. And this PrivateLink is only to the Red Hat SRE teams? It's not general to Red Hat in itself. It's literally just the teams that are mandated to manage this environment.
Shaozhen (08:20):
Exactly.
Ryan Niksch (08:20):
And monitoring, logging the telemetry from the OpenShift environment, is that crossing over this as well?
Shaozhen (08:27):
No, this is more like ingress.
Ryan Niksch (08:27):
Okay.
Shaozhen (08:27):
Yeah.
Ryan Niksch (08:30):
As you said, it's ingress. So this is really the OpenShift clusters communicating back to things like OCM for example. And if they need to take action, they're going to come in on this path.
Shaozhen (08:40):
Yeah.
Ryan Niksch (08:41):
There is an ordered process and change control when SRE teams need to utilize this path.
Shaozhen (08:49):
Yeah. There are different kind of escalation paths, right? So the SRE internally, they have to have internal tickets to escalate so they can access this Internal VPC Interface. Yep.
Ryan Niksch (09:03):
All right. Shaozhen, thank you very much. It's a pleasure having you here.
Shaozhen (09:07):
Thank you.
Ryan Niksch (09:08):
And thank you for joining us.