Greetings.
After a bunch of playing around, I managed to get bidirectional database replication working for koji.stg.fedoraproject.org.
Basic outline:
db-koji01.stg and db-koji02.stg both have postgresql-9.4-bdr installed. init a koji db on both. create a koji user and set password on both. restore prod db dump on one of them. enable replication of the koji db. wait for a while for them to sync up (It took about 4.5 hours to sync ~170GB of database) run the staging sql script on either node.
I then added keepalived to keep a 'application ip' between the two of them (favoring 01).
Then I pointed koji01.stg at the application ip (well, I really just removed it's /etc/hosts entry for db-koji)
I tested switching the ip back and forth. I tested rebooting one node or the other. I tested disabling keepalived on one and rebooting (so the other one became and stayed primary).
Everything seemed to work. ;)
So, next steps:
1. Pound on koji.stg and see if anything breaks. ;) I tried to enable koschei, but I think we have it currently set too not do staging builds. If we could enable it that might be good. If anyone notices anything broken on koji.stg, please let me know. If you can think of some common cases we should test, please let me know that too. ;)
2. I am going to look at setting up another pair and get this all in ansible and then see about migrating other staging services. I'm hopeful that if koji works all our less database heavy apps will work ok too.
3. If everything keeps looking good, move to production.
Longer term what does this mean?
It means we can do updates/reboot cycles with pretty close to no downtime. We may have to be clever about the openvpn hub, but if we can reboot database servers as desired we may be able to avoid maint windows entirely, or at least reduce them a great deal.
kevin
On 10/11/2016 09:07 PM, Kevin Fenzi wrote:
Greetings.
After a bunch of playing around, I managed to get bidirectional database replication working for koji.stg.fedoraproject.org.
Basic outline:
db-koji01.stg and db-koji02.stg both have postgresql-9.4-bdr installed. init a koji db on both. create a koji user and set password on both. restore prod db dump on one of them. enable replication of the koji db. wait for a while for them to sync up (It took about 4.5 hours to sync ~170GB of database) run the staging sql script on either node.
I then added keepalived to keep a 'application ip' between the two of them (favoring 01).
Then I pointed koji01.stg at the application ip (well, I really just removed it's /etc/hosts entry for db-koji)
I tested switching the ip back and forth. I tested rebooting one node or the other. I tested disabling keepalived on one and rebooting (so the other one became and stayed primary).
Everything seemed to work. ;)
So, next steps:
- Pound on koji.stg and see if anything breaks. ;) I tried to enable
koschei, but I think we have it currently set too not do staging builds. If we could enable it that might be good. If anyone notices anything broken on koji.stg, please let me know. If you can think of some common cases we should test, please let me know that too. ;)
- I am going to look at setting up another pair and get this all in
ansible and then see about migrating other staging services. I'm hopeful that if koji works all our less database heavy apps will work ok too.
- If everything keeps looking good, move to production.
Longer term what does this mean?
It means we can do updates/reboot cycles with pretty close to no downtime. We may have to be clever about the openvpn hub, but if we can reboot database servers as desired we may be able to avoid maint windows entirely, or at least reduce them a great deal.
kevin
infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org
Kevin,
Any tests in particular an apprentice could pound on it to check ? Not honestly sure I have access to koji.stg but I'd be game for testing common and edge cases fas:linuxmodder
On Tue, 11 Oct 2016 22:06:42 +0000 Corey Sheldon sheldon.corey@openmailbox.org wrote:
Kevin,
Any tests in particular an apprentice could pound on it to check ? Not honestly sure I have access to koji.stg but I'd be game for testing common and edge cases fas:linuxmodder
Sure. Just go to https://koji.stg.fedoraproject.org/ and click around, make sure pages load reasonably fast, that links work, etc.
If you want you could do some scratch builds https://fedoraproject.org/wiki/Using_the_Koji_build_system#Scratch_Builds
Thanks!
kevin
On Tue, Oct 11, 2016 at 10:06:42PM +0000, Corey Sheldon wrote:
On 10/11/2016 09:07 PM, Kevin Fenzi wrote:
[.. entire email from Kevin ..]
infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org
Kevin,
Any tests in particular an apprentice could pound on it to check ? Not honestly sure I have access to koji.stg but I'd be game for testing common and edge cases fas:linuxmodder
Dear Corey,
Could you please trim down the email you are replying to? Your habits of replying below the footer set by the list is really quite confusing. Most of the time, I look for your answer in the text cannot find it until I check who sent the email and thus go check at the very end. It would really be appreciated if you could make that effort.
Disclaimer: ALL Correspondence shall be deemed as private and confidential, re-distrubution is discouraged unless requested in the the correspondence in question.
Such a disclaimer on an email sent to a public list is... surprising :)
Thanks, Pierre
infrastructure@lists.fedoraproject.org