We have new hardware in to replace some of our 4+ year old IBM x3650's and need to do so in the next month or so to make sure we have a good list of hardware to go onto extended warranty this fall.
I would like to come up with a plan of attack on getting them all moved by September 1st to virthost19->virthost22.
Hardware Server Virtual Machine virthost05.phx2.fedoraproject.org bastion02.phx2.fedoraproject.org virthost05.phx2.fedoraproject.org db-fas01.phx2.fedoraproject.org virthost05.phx2.fedoraproject.org proxy01.phx2.fedoraproject.org virthost06.phx2.fedoraproject.org ask01.phx2.fedoraproject.org virthost06.phx2.fedoraproject.org notifs-web02.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org datagrepper02.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org elections01.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org hotness01.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org nuancier01.phx2.fedoraproject.org virthost08.phx2.fedoraproject.org darkserver01.phx2.fedoraproject.org virthost08.phx2.fedoraproject.org ns03.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org busgateway01.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org fedocal01.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org notifs-backend01.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org nuancier02.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org bodhi01.stg.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org bodhi02.stg.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org fas01.stg.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org koji01.stg.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org notifs-web02.stg.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org summershum01.stg.phx2.fedoraproject.org
There are a couple of ways we do these transitions.
1) Spin up a new virtual machine with an incremented hostname: Example: a) check to see which ask systems exist. (ask01, ask02) b) create a new virtual machine with an incremented number: ask03 c) ansible the system to be clone of ask01 d) either turn off ask01 and rename ask03 to be ask01 OR d) configure other servers to point to ask03 instead of ask01 e) fix problems as needed f) shutdown and remove ask01.
2) Move virtual machine to another server. a) Schedule a downtime b) Shutdown the server c) network dd the lvm image to other server. d) copy over the /etc/libvirt/qemu/___.xml file over to other server. e) spin up server f) fix problems as needed g) remove files from old server
3) If the image is on an iscsi share versus local disks... a) shutdown the image on server A. b) copy the xml files over to server B. c) get libvirt to see them. d) start the image on server B e) remove the xml files from server A.
It looks like none of the servers in question are on the iscsi share so we won't be able to do 3. [Unless there is one or two that are good candidates to be on the iscsi share... then a variant of 2 would be used.]
Downtimes except for the fas system will be in the 20 minute range. The fas database might be 1-3 hours due to a 100 GB image to be copied over and the usual 'what we have to reboot that because this was down? WHY?' problems we end up with.
Other plans and ideas can be replied to here.
Number 1 is an interesting option but I think number 2 is more feasible in terms of less effort and decrease confusions during the process.
Best regards.
2015-08-06 14:49 GMT-05:00 Stephen John Smoogen smooge@gmail.com:
We have new hardware in to replace some of our 4+ year old IBM x3650's and need to do so in the next month or so to make sure we have a good list of hardware to go onto extended warranty this fall.
I would like to come up with a plan of attack on getting them all moved by September 1st to virthost19->virthost22.
Hardware Server Virtual Machine virthost05.phx2.fedoraproject.org bastion02.phx2.fedoraproject.org virthost05.phx2.fedoraproject.org db-fas01.phx2.fedoraproject.org virthost05.phx2.fedoraproject.org proxy01.phx2.fedoraproject.org virthost06.phx2.fedoraproject.org ask01.phx2.fedoraproject.org virthost06.phx2.fedoraproject.org notifs-web02.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org datagrepper02.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org elections01.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org hotness01.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org nuancier01.phx2.fedoraproject.org virthost08.phx2.fedoraproject.org darkserver01.phx2.fedoraproject.org virthost08.phx2.fedoraproject.org ns03.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org busgateway01.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org fedocal01.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org notifs-backend01.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org nuancier02.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org bodhi01.stg.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org bodhi02.stg.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org fas01.stg.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org koji01.stg.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org notifs-web02.stg.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org summershum01.stg.phx2.fedoraproject.org
There are a couple of ways we do these transitions.
- Spin up a new virtual machine with an incremented hostname:
Example: a) check to see which ask systems exist. (ask01, ask02) b) create a new virtual machine with an incremented number: ask03 c) ansible the system to be clone of ask01 d) either turn off ask01 and rename ask03 to be ask01 OR d) configure other servers to point to ask03 instead of ask01 e) fix problems as needed f) shutdown and remove ask01.
- Move virtual machine to another server.
a) Schedule a downtime b) Shutdown the server c) network dd the lvm image to other server. d) copy over the /etc/libvirt/qemu/___.xml file over to other server. e) spin up server f) fix problems as needed g) remove files from old server
- If the image is on an iscsi share versus local disks...
a) shutdown the image on server A. b) copy the xml files over to server B. c) get libvirt to see them. d) start the image on server B e) remove the xml files from server A.
It looks like none of the servers in question are on the iscsi share so we won't be able to do 3. [Unless there is one or two that are good candidates to be on the iscsi share... then a variant of 2 would be used.]
Downtimes except for the fas system will be in the 20 minute range. The fas database might be 1-3 hours due to a 100 GB image to be copied over and the usual 'what we have to reboot that because this was down? WHY?' problems we end up with.
Other plans and ideas can be replied to here.
-- Stephen J Smoogen. _______________________________________________ infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
Number 2 seems to be the cleanest and least problems solution.
I can see all sort of breakage that can happen if using number 1.
Best regards
Em Qui, 6 de Ago, 2015 às 21:52, Abdel G. Martínez L. abdel.g.martinez.l@gmail.com escreveu:
Number 1 is an interesting option but I think number 2 is more feasible in terms of less effort and decrease confusions during the process.
Best regards.
2015-08-06 14:49 GMT-05:00 Stephen John Smoogen smooge@gmail.com:
We have new hardware in to replace some of our 4+ year old IBM x3650's and need to do so in the next month or so to make sure we have a good list of hardware to go onto extended warranty this fall.
I would like to come up with a plan of attack on getting them all moved by September 1st to virthost19->virthost22.
Hardware Server Virtual Machine virthost05.phx2.fedoraproject.org bastion02.phx2.fedoraproject.org virthost05.phx2.fedoraproject.org db-fas01.phx2.fedoraproject.org virthost05.phx2.fedoraproject.org proxy01.phx2.fedoraproject.org virthost06.phx2.fedoraproject.org ask01.phx2.fedoraproject.org virthost06.phx2.fedoraproject.org notifs-web02.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org datagrepper02.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org elections01.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org hotness01.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org nuancier01.phx2.fedoraproject.org virthost08.phx2.fedoraproject.org darkserver01.phx2.fedoraproject.org virthost08.phx2.fedoraproject.org ns03.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org busgateway01.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org fedocal01.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org notifs-backend01.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org nuancier02.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org bodhi01.stg.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org bodhi02.stg.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org fas01.stg.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org koji01.stg.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org notifs-web02.stg.phx2.fedoraproject.org virthost10.phx2.fedoraproject.org summershum01.stg.phx2.fedoraproject.org
There are a couple of ways we do these transitions.
- Spin up a new virtual machine with an incremented hostname:
Example: a) check to see which ask systems exist. (ask01, ask02) b) create a new virtual machine with an incremented number: ask03 c) ansible the system to be clone of ask01 d) either turn off ask01 and rename ask03 to be ask01 OR d) configure other servers to point to ask03 instead of ask01 e) fix problems as needed f) shutdown and remove ask01.
- Move virtual machine to another server.
a) Schedule a downtime b) Shutdown the server c) network dd the lvm image to other server. d) copy over the /etc/libvirt/qemu/___.xml file over to other server. e) spin up server f) fix problems as needed g) remove files from old server
- If the image is on an iscsi share versus local disks...
a) shutdown the image on server A. b) copy the xml files over to server B. c) get libvirt to see them. d) start the image on server B e) remove the xml files from server A.
It looks like none of the servers in question are on the iscsi share so we won't be able to do 3. [Unless there is one or two that are good candidates to be on the iscsi share... then a variant of 2 would be used.]
Downtimes except for the fas system will be in the 20 minute range. The fas database might be 1-3 hours due to a 100 GB image to be copied over and the usual 'what we have to reboot that because this was down? WHY?' problems we end up with.
Other plans and ideas can be replied to here.
-- Stephen J Smoogen. _______________________________________________ infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
-- Abdel G. Martínez L.
On Thu, 6 Aug 2015 13:49:41 -0600 Stephen John Smoogen smooge@gmail.com wrote:
We have new hardware in to replace some of our 4+ year old IBM x3650's and need to do so in the next month or so to make sure we have a good list of hardware to go onto extended warranty this fall.
I would like to come up with a plan of attack on getting them all moved by September 1st to virthost19->virthost22.
...snip...
There are a couple of ways we do these transitions.
- Spin up a new virtual machine with an incremented hostname:
Example: a) check to see which ask systems exist. (ask01, ask02) b) create a new virtual machine with an incremented number: ask03 c) ansible the system to be clone of ask01 d) either turn off ask01 and rename ask03 to be ask01 OR d) configure other servers to point to ask03 instead of ask01 e) fix problems as needed f) shutdown and remove ask01.
- Move virtual machine to another server.
a) Schedule a downtime b) Shutdown the server c) network dd the lvm image to other server. d) copy over the /etc/libvirt/qemu/___.xml file over to other server. e) spin up server f) fix problems as needed g) remove files from old server
- If the image is on an iscsi share versus local disks...
a) shutdown the image on server A. b) copy the xml files over to server B. c) get libvirt to see them. d) start the image on server B e) remove the xml files from server A.
It looks like none of the servers in question are on the iscsi share so we won't be able to do 3. [Unless there is one or two that are good candidates to be on the iscsi share... then a variant of 2 would be used.]
Downtimes except for the fas system will be in the 20 minute range. The fas database might be 1-3 hours due to a 100 GB image to be copied over and the usual 'what we have to reboot that because this was down? WHY?' problems we end up with.
Other plans and ideas can be replied to here.
There's a bit of a hybrid plan between 1 and 2 we could also use.
All the virthost10 ones could just be moved anytime since they are staging.
All of these:
virthost05.phx2.fedoraproject.org bastion02.phx2.fedoraproject.org virthost05.phx2.fedoraproject.org proxy01.phx2.fedoraproject.org virthost06.phx2.fedoraproject.org ask01.phx2.fedoraproject.org virthost06.phx2.fedoraproject.org notifs-web02.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org datagrepper02.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org elections01.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org nuancier01.phx2.fedoraproject.org virthost08.phx2.fedoraproject.org ns03.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org fedocal01.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org nuancier02.phx2.fedoraproject.org
Have other active instances, so they could be stopped on those hosts and new versions of them created by ansible. Should just be changing the info in their host_vars to build on a new one and updating ssh host keys.
All of these:
virthost07.phx2.fedoraproject.org hotness01.phx2.fedoraproject.org virthost08.phx2.fedoraproject.org darkserver01.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org busgateway01.phx2.fedoraproject.org
Don't have other active instances, but I think a short outage while we shut them down and rebuild on another host would probibly be ok.
This one:
virthost05.phx2.fedoraproject.org db-fas01.phx2.fedoraproject.org
however, I think we should make a db-fas02 and sync data and cut over to it, so as to keep downtime low.
kevin
On Fri, Aug 07, 2015 at 10:49:39AM -0600, Kevin Fenzi wrote:
On Thu, 6 Aug 2015 13:49:41 -0600 Stephen John Smoogen smooge@gmail.com wrote:
We have new hardware in to replace some of our 4+ year old IBM x3650's and need to do so in the next month or so to make sure we have a good list of hardware to go onto extended warranty this fall.
I would like to come up with a plan of attack on getting them all moved by September 1st to virthost19->virthost22.
...snip...
There are a couple of ways we do these transitions.
- Spin up a new virtual machine with an incremented hostname:
Example: a) check to see which ask systems exist. (ask01, ask02) b) create a new virtual machine with an incremented number: ask03 c) ansible the system to be clone of ask01 d) either turn off ask01 and rename ask03 to be ask01 OR d) configure other servers to point to ask03 instead of ask01 e) fix problems as needed f) shutdown and remove ask01.
- Move virtual machine to another server.
a) Schedule a downtime b) Shutdown the server c) network dd the lvm image to other server. d) copy over the /etc/libvirt/qemu/___.xml file over to other server. e) spin up server f) fix problems as needed g) remove files from old server
- If the image is on an iscsi share versus local disks...
a) shutdown the image on server A. b) copy the xml files over to server B. c) get libvirt to see them. d) start the image on server B e) remove the xml files from server A.
It looks like none of the servers in question are on the iscsi share so we won't be able to do 3. [Unless there is one or two that are good candidates to be on the iscsi share... then a variant of 2 would be used.]
Downtimes except for the fas system will be in the 20 minute range. The fas database might be 1-3 hours due to a 100 GB image to be copied over and the usual 'what we have to reboot that because this was down? WHY?' problems we end up with.
Other plans and ideas can be replied to here.
There's a bit of a hybrid plan between 1 and 2 we could also use.
All the virthost10 ones could just be moved anytime since they are staging.
All of these:
virthost05.phx2.fedoraproject.org bastion02.phx2.fedoraproject.org virthost05.phx2.fedoraproject.org proxy01.phx2.fedoraproject.org virthost06.phx2.fedoraproject.org ask01.phx2.fedoraproject.org virthost06.phx2.fedoraproject.org notifs-web02.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org datagrepper02.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org elections01.phx2.fedoraproject.org virthost07.phx2.fedoraproject.org nuancier01.phx2.fedoraproject.org virthost08.phx2.fedoraproject.org ns03.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org fedocal01.phx2.fedoraproject.org virthost09.phx2.fedoraproject.org nuancier02.phx2.fedoraproject.org
Have other active instances, so they could be stopped on those hosts and new versions of them created by ansible. Should just be changing the info in their host_vars to build on a new one and updating ssh host keys.
Just beware for nuancier that both instances are on this host, so don't turn them both down at the same time :)
Pierre
infrastructure@lists.fedoraproject.org