On the storage, are we ok if a node goes down? ie, does it spread it
over all the storage nodes/raid? Or is it just in one place and you are
dead if that node dies?
For storage, we maintain 3 replicas for data, spread across
3 nodes.
However, not all nodes are equally resourced, we have 2 large nodes and 1
much smaller, therefore, more replicas will be spread over these 2 larger
nodes. We can likely afford to lose one of the large physical nodes while
still maintaining data integrity.
Is there any way to backup volumes?
There are ways to clone,
extend, take snapshots etc of these volumes. We've
never done it, so it'll be a learning process for us all ;). We should sync
to get a better handle on the requirements for backups. In CentOS CI we've
set up backups to S3, we can certainly use some of that, eg: backup of
etcd, but may need further investigation to backup the volumes managed by
OCS. Will need to do some research here.
should we make a playbooks/manual/ocp.yml playbook for things like
- list of clusteradmins
- list of clustermoniting
- anything else we want to manage post install
Sure yep, as we're finishing up
soonish, I'd imagine the next few weeks
we'll all be back focused on the Infra/Releng tasks and will be focusing on
tying up any loose ends like this, and starting migration of apps.
Have we tried a upgrade of the clusters yet? Did everything go ok?
Do we need any docs on upgrades?
Yes, we've already completed a number of
upgrades, latest is to 4.8.11. We
have SOPs for upgrades which we can copy over from the CentOS CI infra, and
will make any updates required in the process.
Since the control plane are vm's I assume we need to drain them
one at
a time to reboot the virthosts they are on?
If we are rebooting a single
vmhost/control plane VM at a time, yes that
should be good. If we are doing more than 1 at the same time, we should do
a full graceful cluster shutdown, and then a graceful cluster startup. We
have SOPs for this in CentOS CI also, we'll get those added here and any
content updates made.
* Should we now delete the kubeadmin user? In 3.x I know they advise
to
do that after auth is setup.
We can delete it, as we have system:admin available
from the os-control01
node. Best practices might suggest we do. We can also give cluster-admin
role to all users in the sysadmin-main and sysadmin-openshift groups.
I'm in two minds about deleting it, I was hoping to wait until we get a
solution that syncs IPA groups/users to Openshift. There is an official
supported solution for syncing LDAP (think that will work?).
* Right now the api is only internal. Is it worth getting a forward
setup to allow folks to use oc locally on their machines? It would
expose that api to the world, but of course it would still need auth.
We'd love
to expose it, but.. all interaction with the clusters upto this
point have also only been done via Ansible, so if it turns out we can't
expose the API like this we're ok with that. With minor changes to the
playbook we should be able to at least replicate the current 3.11
experience.
> That's what we decided to do for the CentOS CI ocp setup, and
so CI
> tenants can use oc from their laptop/infra. As long as cert exposed for
> default ingress has it added in the SAN, it works fine :
>
> X509v3 Subject Alternative Name:
>
DNS:*.apps.ocp.ci.centos.org,
DNS:api.ocp.ci.centos.org,
>
DNS:apps.ocp.ci.centos.org
Yeah, thats all fine, but to make it work for our setup, I would need
to
get RHIT to nat in port 6443 to proxy01/10 from the internet. At least I
think thats the case. Openshift 3 could just use https, but alas, I fear
OCP4 needs that 6443 port.
Yep think you're right on that.
Do we want to try and enable http/2 ingress?
https://docs.openshift.com/container-platform/4.5/networking/ingress-oper...
We can take a look and see if we can figure it out!
We will want to enable kubevirt/whatever it's called...
We
definitely want to make this available, but we will have to set quotas
on usage. We should enable on staging, but should we enable on production?
On CentOS CI OCP4 cluster, we have Openshift Virtualization / kubevirt
installed, but I don't think anyone is actually using *it*. We have several
tenants which have elevated permissions, and are then accessing KVM
directly to bring up VMs on the Openshift nodes, this is something we want
to avoid, as we can't effectively set quotas on this type of usage.
On Fri, 24 Sept 2021 at 07:24, Kevin Fenzi <kevin(a)scrye.com> wrote:
> On Thu, Sep 23, 2021 at 07:44:49AM +0200, Fabian Arrotin wrote:
> > On 23/09/2021 02:55, Neal Gompa wrote:
> > > On Wed, Sep 22, 2021 at 7:12 PM Kevin Fenzi <kevin(a)scrye.com> wrote:
> > <snip>
> >
> > >>
> > >> * Since the control plane are vm's I assume we need to drain them
one
> at
> > >> a time to reboot the virthosts they are on?
> >
> > Correct
> >
> > >>
> > >> * Should we now delete the kubeadmin user? In 3.x I know they advise
> to
> > >> do that after auth is setup.
> > >>
> > >
> > > I'm not sure that's a good idea. I'm not even certain that was
a good
> > > idea in the OCP 3.x days, because eliminating the kubeadmin user means
> > > you lose your failsafe login if all else fails.
> >
> > +1 here : the reason why we decided to still keep kubeadmin on the other
> > OCP clusters used for CentOS CI and Stream is exactly for that reason :
> > still be able to login, if there is a problem with the oauth setup, and
> > troubleshoot issues if (for example) ipsilon or IPA have troubles ...
> :-)
>
> We can keep it if folks like. I'd really prefer we don't use it except
> for emergency though. Having people do things as their user will make it
> way easier to see who did what. ;)
>
> > >> * Right now the api is only internal. Is it worth getting a forward
> > >> setup to allow folks to use oc locally on their machines? It would
> > >> expose that api to the world, but of course it would still need auth.
> >
> > That's what we decided to do for the CentOS CI ocp setup, and so CI
> > tenants can use oc from their laptop/infra. As long as cert exposed for
> > default ingress has it added in the SAN, it works fine :
> >
> > X509v3 Subject Alternative Name:
> >
DNS:*.apps.ocp.ci.centos.org,
DNS:api.ocp.ci.centos.org,
> >
DNS:apps.ocp.ci.centos.org
>
Yeah, thats all fine, but to make it work for our setup, I would need
to
get RHIT to nat in port 6443 to proxy01/10 from the internet. At least I
think thats the case. Openshift 3 could just use https, but alas, I fear
OCP4 needs that 6443 port.
>
> kevin
> _______________________________________________
> infrastructure mailing list -- infrastructure(a)lists.fedoraproject.org
> To unsubscribe send an email to
> infrastructure-leave(a)lists.fedoraproject.org
> Fedora Code of Conduct:
>
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
>
https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora...
> Do not reply to spam on the list, report it:
>
https://pagure.io/fedora-infrastructure
>
--
David Kirwan
Software Engineer
Community Platform Engineering @ Red Hat
T: +(353) 86-8624108 IM: @dkirwan