And the document you've all been waiting for... The meeting log for the
Infrastructure Team held on 15th May 2008 at 2000UTC!
07:56 < sebastian^> first one :>
07:59 -!- mmcgrath changed the topic of #fedora-meeting to:
Infrastructure -- Who's here?
07:59 < smooge> I am
07:59 < abadger1999> smooge == God?
07:59 < sebastian^> me too
07:59 < abadger1999> ;-)
07:59 * G gives mmcgrath a cookie
07:59 * iWolf is here
08:00 * ricky
08:00 * smooge jumps out of the burning bush going "it burns it burns..
oh god it burns..."
08:00 * jcollie is here
08:00 * skvidal is
08:00 < jbrothers> first meeting for me
08:00 < ricky> jbrothers: Welcome!
08:01 < G> I've got logs if needed
08:01 < jbrothers> thanks, ricky
08:01 < ricky> G: Mind emailing them/updating the wiki page today? I
haven't gotten email 100% setup yet
08:01 < G> k
08:01 * geoffs says hello
08:01 < ricky> Thanks
08:01 < mmcgrath> Just an FYI before we get started, fedorapeople got
borked for a bit but it should be fine now. I'll send an outage
notification out after the meeting to let everyone know what happened.
08:01 * nirik is sitting back in the bleachers.
08:01 * couf lurks
08:01 * dgilmore is here
08:02 < mmcgrath> So lets get to the tickets.
08:02 < mmcgrath> oh crap, can't get to the tickets.
08:02 < mmcgrath> .ticket 300
08:02 < zodbot> mmcgrath: #300 (Can't log into wiki) - Fedora
Infrastructure - Trac -
08:02 < mmcgrath> heh, well some people can.
08:02 < mmcgrath> will someone here that can get to
https://fedorahosted.org/fedora-infrastructure/ go to the meeting link
at the bottom and pm me what tickets are listed there?
08:03 < mmcgrath> we're having some routing issues at serverbeach. Not
sure from what exactly yet
08:03 < G> mmcgrath: lets see
08:04 * mdomsch is only a little lage
08:04 < mdomsch> late
08:04 < mmcgrath> mdomsch: no worries, I'm hoping someone can get me the
ticket list, we're having some routing issues to serverbeach.
08:04 < mmcgrath> is anyone able to access that page?
08:04 < mmcgrath> https://fedorahosted.org/fedora-infrastructure/
08:05 < G> mmcgrath: 395 398 446 547
08:05 < skvidal> yes
08:05 < skvidal> I am
08:05 < mmcgrath> nm, I got them
08:05 < mmcgrath> thanks
08:05 < mmcgrath> so first one
08:05 < mmcgrath> .ticket 395
08:05 < zodbot> mmcgrath: #395 (Audio Streaming of Fedora Board
Conference Calls) - Fedora Infrastructure - Trac -
08:05 < mdomsch> it's timing out for me
08:06 < mmcgrath> jcollie: I'd assume there's no news there but I wanted
to tell you I am planning on having asterisk deployed and working before
08:06 < mmcgrath> the non board call part of all of that
08:06 < mmcgrath> jcollie: anything new on that front?
08:06 < mmcgrath> will you have time coming up to work on it?
08:06 < jcollie> yeah, i was going to start some work on that, but keep
08:06 < jcollie> no irc bots and such yet
08:06 * ivazquez apologizes for being late
08:06 < mmcgrath> ivazquez: no worries.
08:07 < mmcgrath> jcollie: is there a web interface for it or anything?
08:07 * mmcgrath assumes no, thought he' dask
08:07 < skvidal> mmcgrath: ticket submitted to sb
08:07 < jcollie> asterisk of flumotion?
08:07 < mmcgrath> flumotion
08:07 < mmcgrath> well either as it relates to conference calls I guess.
08:07 < jcollie> no web client but there is a python gui
08:07 < mmcgrath> skvidal: thanks
08:07 < mmcgrath> ah, k
08:08 < mmcgrath> jcollie: anything else on that front? if not we'll
08:08 < jcollie> basically as a first step i want to get the board
connected over SIP with audio feeds to the web
08:08 < mmcgrath> <nod>
08:08 < jcollie> public will pass Qs to board over irc channel
08:09 < jcollie> control of audio feed will need to be done by someone
08:09 < mmcgrath> skvidal: dgilmore: btw, where are the public board
meetings advertised? I didn't even know about the last one.
08:09 < dgilmore> mmcgrath: fab
08:09 < skvidal> I'm pretty sure we also emailed -announce
08:09 * skvidal looks for stickster
08:09 * stickster here
08:10 < ricky> YES!!!!!
08:10 < ricky> MY FILES!!! AAHHHHHH!H!!
08:10 < ricky> (sorry, I had to do that)
08:10 < mmcgrath> ricky: :)
08:10 * ricky does a dance!
08:11 < dgilmore> stickster: we need to announce board meetings loader
08:11 < dgilmore> louder
08:12 < stickster> dgilmore: mmcgrath: The last one was posted to
fedora-advisory-board, fedora-announce-list, and fedora-marketing-list.
08:12 < mmcgrath> stickster: how far in advanced?
08:12 < stickster> mmcgrath: One week
08:12 * mmcgrath wonders if it went to all 3 at once and got filed
08:12 < mmcgrath> stickster: I suggest you use zodbot to announce the
meetings 5 minutes before they start as well :)
08:12 < mmcgrath> I can give you or someone access to do that
08:13 < stickster>
08:13 < mmcgrath> ok, anywho, we should move on to the next ticket :)
08:13 < mmcgrath> .ticket 398
08:13 < zodbot> mmcgrath: #398 (elfutils `monotone' (mtn) error) -
Fedora Infrastructure - Trac -
08:13 < mmcgrath> jcollie: abadger1999: either of you guys know anything
thats going on with this?
08:13 < jcollie> i haven't heard anything
08:13 < mmcgrath> rmcgrath: we're talking about
08:13 < mmcgrath> .ticket 398
08:13 < zodbot> mmcgrath: #398 (elfutils `monotone' (mtn) error) -
Fedora Infrastructure - Trac -
08:14 < abadger1999> Nope. rmcgrath said usher wouldn't help us.
08:14 < mmcgrath> that whole anonymous monotone thing.
08:14 < abadger1999> But he was looking at hacking something together
that would give fas members anonymous access.
08:14 < rmcgrath> yeah i left that sitting while busy with other work
plus not wanting to worry about the change freeze
08:14 < abadger1999> and later trying to extend it.
08:14 < mmcgrath> well, if there's no news, there's no news.
08:14 < mmcgrath> rmcgrath: <nod>
08:15 < mmcgrath> alrighty then, next ticket!
08:15 < rmcgrath> i have a hack that is probably sufficient for fas
users to get r/o access, just not tested
08:15 < mmcgrath> rmcgrath: solid, well the freeze is over now so we can
get it in if you're ready.
08:15 < ricky> Wait, so does the trac source tab now show anything right
08:15 < rmcgrath> still needs testing, but i think i have what i need
when i get the time
08:16 < mmcgrath> ricky: It does but you can't just check it out that
way and get a working project.
08:16 < ricky> Ah, yeah - that's a problem.
08:16 < mmcgrath> anywho, we'll wait to see how the hack works :)
08:16 < mmcgrath> .ticket 446
08:16 < zodbot> mmcgrath: #446 (Possibility to add external links on
spins page) - Fedora Infrastructure - Trac -
08:16 < mmcgrath> dgilmore: any news on that?
08:16 -!- rmcgrath [n=roland(a)c-76-102-158-52.hsd1.ca.comcast.net] has
left #fedora-meeting 
08:17 < dgilmore> i started and keep forgeting to go back to it
08:17 -!- wolfy [n=lonewolf@fedora/wolfy] has joined #fedora-meeting
08:17 < mmcgrath> heh, so nothing new on that front then :) ?
08:18 < dgilmore> Im going to lock myself ina room and catch up on my
08:18 < dgilmore> not today
08:18 < mmcgrath> k
08:18 < mmcgrath> and
08:18 < mmcgrath> .ticket 547
08:18 < zodbot> mmcgrath: #547 (Koji DB Server as postgres 8.3) - Fedora
Infrastructure - Trac -
08:18 < mmcgrath> ahhh this is a good one.
08:18 < mmcgrath> abadger1999: want to give a quick round up on why we
want to do this?
08:19 < dgilmore> so ive been using 8.3 on the sparc buildsys for awhile now
08:19 < skvidal> hey hey
08:19 < skvidal> http://forums.serverbeach.com/showthread.php?t=7472
08:19 < abadger1999> we've been running into all sorts of annoyances
with postgres 8.1
08:19 < ricky> Aha.
08:19 < abadger1999> About half of them have to do with vacuuming.
08:19 < mmcgrath> skvidal: whew, always nice to know its not "just us" :)
08:19 < abadger1999> -- having to vacuum the whole db frequently, not
being able to use autovacuum.
08:19 < skvidal> no, but later in that thread they specifically blame you :)
08:20 < mmcgrath> skvidal: figures
08:20 < abadger1999> The other half are performance -- large queries
taking days to complete and such.
08:20 < skvidal> nod
08:20 < abadger1999> 8.3 has several enhancements that mitigate the issues
08:20 < dgilmore> abadger1999: if we need some benchmarking i can do that
08:20 < dgilmore> though it wont be apples to apples
08:21 < mmcgrath> abadger1999: so here's my only concern.... What are
we going to be doing as far as building and maintaining our own package
08:21 < mmcgrath> ?
08:21 < mmcgrath> I just suspect thats not a small job, we should ping
the package maintainer and see what he thinks.
08:21 < mmcgrath> is there any chance of us getting 8.3 into RHEL5.2 or 5.3?
08:21 < dgilmore> mmcgrath: no change for getting 8.3 into RHEL5
08:22 < mmcgrath> dgilmore: why is that? not backwards compatable?
08:22 -!- nim-nim [n=nim-nim@fedora/nim-nim] has quit [Connection timed out]
08:22 < dgilmore> mmcgrath: its stuck at 8.1 for the life of RHEL5
08:22 * mmcgrath has no idea what that decision process is like.
08:22 < mdomsch> the whole dump/restore process on upgrade would be
problematic for users
08:22 < abadger1999> yep.
08:22 < mmcgrath> k
08:22 < jcollie> hop in the fedora time machine and bring back a copy of
RHEL6 from 1 year from now
08:22 < mmcgrath> so we're on our own... how much additional work are we
talking about there?
08:22 < dgilmore> mmcgrath: ive been building 8.3 from F8 on a FC-6
sparc tree and using that
08:23 < mmcgrath> is anyone here volunteering to maintain that package
in the infrastructure repo?
08:23 < dgilmore> i think that maintainence wont be too bad. until
Fedora moves beyond 8.3
08:24 < mdomsch> I hear a volunteer
08:24 < G> if it comes to it, I'll do it, but I'd perfer not to (I don't
even have RHEL5 available to me, sounds like a job for a redhatter :)
08:24 * jcollie whispers CentOS in G's ear
08:24 < skvidal> G: you can't access centos5?
08:25 < jcollie> centos rocks if you can't afford rhel
08:25 < mmcgrath> ok, lets forget about volunteering to build and
maintain the package...
08:25 < mmcgrath> Is anyone here against moving to 8.3 for any reason?
08:25 < jcollie> +1 from me
08:25 < mmcgrath> It is my understanding (though I've not used 8.3) that
there are a lot of benefits to it.
08:25 < abadger1999> so how long are we looking at staying with 8.3 on
08:25 < G> I can access CentOS but there is a matter of bandwidth I want
to avoid :)
08:26 < mdomsch> abadger1999, until we want to upgrade the box to RHEL6
08:26 < jcollie> abadger1999: probably until rhel6 i suppose
08:26 < mmcgrath> abadger1999: I'd say until such a time comes that we
can't use 8.3 or until it gets supported on another platform.
08:26 < mmcgrath> like RHEL6
08:26 < G> To me 8.3 on a new server just for Koji sounds good
08:26 < mdomsch> which I won't speculate as to it's release date in this
08:26 < abadger1999> So will upgrading to RHEL6 for the db boxes be a
08:26 < mmcgrath> abadger1999: we'll have to consider that when the time
08:27 < abadger1999> k.
08:27 < mmcgrath> I mean, if our RHEL5 box just keeps on ticking with
the postgres83 we've got on it, probably no hurry.
08:27 < mmcgrath> as long as there aren't security issues or something.
08:27 < abadger1999> I can build 8.3 for infrastructure.
08:27 < mmcgrath> abadger1999: k, I'll comaintain it with you then
08:27 < abadger1999> But i'm not going to be able to investigate the
depths of the source code.
08:27 < abadger1999> sounds good.
08:27 < mmcgrath> skvidal: I just got a recovery notice.
08:27 < skvidal> me too
08:28 < mmcgrath> abadger1999: do we want to put a time frame on this or
just wait until db3 is installed and ready?
08:28 < G> ah ha, I can access fh.org from home now, better than links :)
08:28 < abadger1999> Doing it with db3 makes sense.
08:29 < mmcgrath> k, so thats settled then. Anyone have anything else?
if not we'll move on.
08:29 < abadger1999> it'll take a dump and reload to migrate the data
from 8.1 to 8.3
08:29 < mmcgrath> <nod>
08:29 < mmcgrath> alrighty then. Thats the end of the tickets
08:29 -!- mmcgrath changed the topic of #fedora-meeting to:
Infrastructure -- Lessons Learned
08:29 < mmcgrath> .tiny
08:29 < zodbot> mmcgrath: http://tinyurl.com/3omv7h
08:30 < mmcgrath> mdomsch: want to take the floor on this or should I
just go through it?
08:30 < mdomsch> go for it
08:30 < mmcgrath> allllrighty
08:30 < mdomsch> a lot of it is stuff I noticed for MM/mirrors
08:30 < mmcgrath> So as some of you probably heard we had a release on
08:30 < mdomsch> red hat IS didn't hear
08:30 < G> mmcgrath: we did? :P
08:30 < mmcgrath> and, to date, its been the smoothest release I've ever
seen as far as the infrastructure side.
08:31 < mmcgrath> mdomsch: ?
08:31 < mdomsch> amen
08:31 < ricky> :-)
08:31 < mdomsch> no melted switches, datacenters offline
08:31 < mmcgrath> mdomsch: you just sayin they didn't get much traffic
or did they tell you they didn't know aobut it? (I only ask because...
I created a ticket :)
08:31 < mmcgrath> ahh, yes!
08:31 < mmcgrath> so down the list!
08:31 < mmcgrath> So the biggest thing we saw was the leak.
08:31 < smooge> congrats to mdomsch and the rest for making that sooo smooth
08:32 < mmcgrath> and it was really only a problem because it got fixed
multiple times, and still started leaking but I think we have a handle
on that so it won't happen next time.
08:32 < mmcgrath> it'd be good to review this list.....
08:32 < mmcgrath> mdomsch: hey, would you mind adding this page to the
SOP/Release page as something to review?
08:32 < mmcgrath> the next thing was jigdo, I actually have no idea what
happened there or even that there was a problem.
08:32 < mdomsch> personally, I'm OK with a few leaks - let the fanbois
get their early taste of freedom
08:32 < mmcgrath> mdomsch: what happened there?
08:33 < mdomsch> mmcgrath, pyjigdo wasn't done in time; jesse had to
manually tweak the created template files
08:33 < mdomsch> and those are tricky to manually tweak
08:33 < mdomsch> so they were wrong a couple times
08:33 < mmcgrath> <nod>
08:33 < mdomsch> on Thursday before relesae, after bits had been sent to
the mirrors, rel-eng pushed the panic button
08:34 < mdomsch> and wound up respinning a bunch of ISOs
08:34 < mdomsch> but not quite all of them
08:34 < mmcgrath> updates/ is certainly something we could do easier
08:34 < mmcgrath> We could even have updates available ahead of time...
I suppose thats up to releng though.
08:34 < mdomsch> yeah, I see no reason for updates/ not to be ready even
before the rest of the bits; after rawhide is frozen
08:34 < mdomsch> at least updates/testing/
08:35 < mmcgrath> mdomsch: regarding mm db. Did those changes get in
08:35 < mdomsch> though we did have 200MB (per arch) of 0-day updates
08:35 < mmcgrath> <nod>
08:35 < G> I was kinda told that it would be ready, I was quite
surprised to the see the release-3hours (I think) push
08:35 < mdomsch> mmcgrath, yes, mm changes are active now
08:35 < mmcgrath> solid
08:35 < mdomsch> abadger1999, re the mm hosts table
08:35 < mmcgrath> lmacken: f13: you two around?
08:36 < mdomsch> would it help if I split the timestamp recording for
that out into its own table?
08:36 < mdomsch> that's really the field being updated often
08:36 < abadger1999> mdomsch: IIRC the table doesn't have much data?
08:37 < mdomsch> no, not much
08:37 < abadger1999> So it probably won't make a big difference.
08:37 < mdomsch> ok, good :-)
08:37 < abadger1999> host_category_dir made a huge difference since it
was both big and frequent updates.
08:37 < mdomsch> moving on
08:37 < mmcgrath> :)
08:38 < mmcgrath> so luke's not around but I'm sure he's aware of the
bodi push issues.
08:38 < mdomsch> we can come back to bodhi when they're here
08:38 < mmcgrath> yeah
08:38 < mmcgrath> mdomsch: do you want to talk about mirror pushing?
08:38 < mdomsch> this is both a good idea, and scary
08:38 < mmcgrath> its something debian does now IIRC
08:38 < mdomsch> yes
08:38 < mmcgrath> what do the mirrors think about it?
08:39 < mdomsch> short story is, we would need an account and limited
ssh key on each mirror, and would ssh into each to start a sync job
08:39 < mdomsch> I haven't asked in a long time; would need to.
08:39 < mdomsch> a few would go for it
08:39 < skvidal> they may be more receptive now
08:39 < mdomsch> those that carry debian probably...
08:39 < mdomsch> can't hurt to ask
08:40 < mmcgrath> <nod>
08:40 * Jeff_S has no problems with something like that
08:40 < mmcgrath> mdomsch: alternatively (this might be a bad idea) we
could have them run an hourly cron job or something that checks in to
mirrormanager that says "yes, update now!"
08:40 < mmcgrath> that the mirrors themselves run.
08:40 * mdomsch hates polling
08:40 < mdomsch> but yes
08:40 < mmcgrath> heh
08:41 < mmcgrath> well, I guess all ew can do is ask.
08:41 < mdomsch> there's another catch; rel-eng knows when they've
written the bits to the master mirror
08:41 < mmcgrath> mhmm
08:41 < mdomsch> oh, and yeah, we now know when the other netapps are in
08:41 < mdomsch> so, no catch :-)
08:41 < mmcgrath> :)
08:41 < mmcgrath> mdomsch: just curious.... how many is "several mirrors"
08:42 < skvidal> you could probably get the tier1s to do it
08:42 < mdomsch> ?
08:42 < skvidal> the top mirrors - to allow us to tel lthem when to sync
08:42 < mmcgrath> mdomsch: in your list it says several mirrors didn't
catch the bitflip for a while. how was it.
08:42 < mmcgrath> err how many was it
08:43 < mdomsch> oh, yeah; not sure exactly, I just saw reports of 403s
for the first few hours
08:43 < mdomsch> I didn't ping them each myself :-(
08:43 < mdomsch> mm needs a leak detector
08:43 < mmcgrath> <nod>
08:43 < mmcgrath> well thats something for the future.
08:43 < mdomsch> which doubles as a 'not flipped yet' detector
08:43 < mmcgrath> mdomsch: anything else on that topic?
08:43 < mdomsch> nope
08:44 < mdomsch> redirects to failing mirrors... either 403 or 427...
08:44 < mmcgrath> mdomsch: so this HTTP 300 redirect... I'm not familiar
08:44 < mdomsch> I have no good way to handle it
08:44 < mmcgrath> how's this work.
08:44 < G> (or 421 for FTP)
08:44 < mdomsch> mmcgrath, no one is... nearly no one uses it
08:44 < mdomsch> I haven't a clue
08:44 < mmcgrath> well then I say lets do it :)
08:45 < mdomsch> and I don't think it would work anyhow; they'd still
get an error document from the remote server, which would erase any 300
+ HTML page we send
08:45 < mmcgrath> <nod>
08:45 < mdomsch> unless we iframe'd it...
08:45 < mmcgrath> well we can look to see what our options there are.
08:45 < mmcgrath> or new page it or something
08:46 < mmcgrath> mdomsch: lets setup some proof of concepts on the test
servers to see what the 300 actually does.
08:47 < mdomsch> the other thing that would help the 427s (overloaded),
would be a python random.shuffle() that took weights
08:47 < mdomsch> mmcgrath, sure
08:47 < mdomsch> then I could weight responses based on the host's
08:47 < mdomsch> but still somewhat random
08:47 < mmcgrath> mdomsch: so yeah, stats from mirrors I'd love to
have. I wonder if we could setup an ftp site or something for them to
just send logs our way? or did you have something else in mind.
08:48 < mdomsch> mmcgrath, that'd be fine. MM report_mirror could
collect them too, but that's re-implementing file-copy
08:48 < mdomsch> which I've sworn never to do again
08:48 < mmcgrath> heheh, lesson learned ehh?
08:49 < mdomsch> they've got FAS accounts...
08:49 < mdomsch> but the mirror admins aren't in a FAS group
08:49 < mmcgrath> mdomsch: well, there's options there, I guess its just
a matter of us picking one and seeing what the mirrors think.
08:49 < mmcgrath> mdomsch: mind opening up a ticket?
08:49 < mdomsch> ok
08:50 < mdomsch> last...
08:51 < mdomsch> announcement email went out a few minutes before fp.o
front was updated
08:51 < mmcgrath> solid
08:51 < mdomsch> just need to get fp.o out first
08:52 < mmcgrath> <nod>
08:52 < mdomsch> anxious trigger fingers
08:52 < mmcgrath> no doubt :)
08:52 < mmcgrath> we did that same thing during the beta, hopefully
08:52 < mmcgrath> Anyone have anything else they learned during the lessons?
08:52 < mmcgrath> err during the release.
08:53 < mmcgrath> I'll say I learned one thing. the wiki is a virus
infecting every machine we had it on.
08:53 < mmcgrath> 1) last release the wiki failed. Conclusion? Put it
on more servers now that we can to help spread the load.
08:53 < mmcgrath> result? all other apps had problems because our wiki
is so bad.
08:53 < mmcgrath> but once I turned it off on some boxes and changed the
load ratio quite a bit... problem solved.
08:54 < ricky> :-)
08:54 < mmcgrath> anyone have anything else on the release?
08:54 < mmcgrath> mdomsch: really, thanks for putting that together both
this release and last release. Much appreciated
08:54 < mdomsch> np
08:55 < G> yeah, good work!
08:55 < mmcgrath> Ok, if thats that then..
08:55 -!- mmcgrath changed the topic of #fedora-meeting to:
Infrastructure -- Open Floor
08:55 < mmcgrath> Anyone have anything they'd like to discuss?
08:56 < G> Yeah, two quick things
08:56 * ricky noticed
08:56 < mdomsch> ah, one more...
08:56 < mdomsch> TG caching...
08:56 -!- knurd is now known as knurd_afk
08:56 < f13> here, sorry, was at a company thing.
08:56 < mdomsch> after marking 9-Preview as not to be displayed
08:56 < mdomsch> restart start-mirrors on app4
08:56 < mdomsch> so it blows away its cache
08:57 < G> 1) kojipkgs seems to be working spot on (good job guys), koji
appeared a bit more responsive for me when a few builds were happening,
so thats a start.
08:57 * lmacken rolls in
08:58 < mmcgrath> G: yeah. So here's something I want to do for that box
08:58 < mmcgrath> I'd like to enable caching on that box but I want to
get a baseline of how nfs1 is looking.
08:58 < mmcgrath> I recently added a disk monitor to that box (just sar
with the -d option enabled)
08:59 < mmcgrath> once I have that, I'll enable the caching stuff and
see if it actually makes an impact on how much that array is getting used.
08:59 < G> 2) I noticed that NTP was a bit behind the game, a few
e-mails that were sent out when the 9-updates push happened, I've added
a 'fact' to puppet, called physical which should be true on xen hosts
and non virtualised hosts
08:59 -!- stickster is now known as stickster_afk
08:59 < G> .ticket 541
08:59 < zodbot> G: #541 (Re-enable NTP on Fedora Machines) - Fedora
Infrastructure - Trac -
09:00 < mmcgrath> G: I'm ready for that to go out whenever you are :)
09:00 < G> (for reference) I'll try and get that done sometime today
09:00 < mmcgrath> oh, hey we're actually out of time for th emeeting!
09:00 < mmcgrath> anyone have anything pressing? If not we'll close th
emeeting in 30.
09:00 < mmcgrath> G: sorry bout that :)
09:00 < mmcgrath> 15
09:00 < G> mmcgrath: np, just finished anyway
09:00 < mmcgrath> 5
09:01 -!- mmcgrath changed the topic of #fedora-meeting to:
Infrastructure -- Meeting Closed!!!
09:01 < mmcgrath> Thanks for coming everyone
09:01 -!- mmcgrath changed the topic of #fedora-meeting to: Channel is
used by various Fedora groups and committees for their regular meetings
| Note that meetings often get logged | For questions about using Fedora
please ask in #fedora | See
09:01 < G> oh except, don't forget to visit
http://fedoraproject.org/wiki/Infrastructure/SOP/Nagios if you want to
be able to send acknowledgements etc
09:02 < mmcgrath> G: ;-)
Just released Fedora 9 Sulpher. I hope, you already downloaded and installed
it. Surely, you loved it. Tell us, about your experience. How is Sulpher?
what do you think about Sulpher?
GPG key: 0xC4639705
As you well know, here at India we are having low bandwidth mirrors.
To keep up the QoS, we need to restrict our number of connections.
I myself operate on ftp and usually, the limit is set to 45 concurrent
Now as mirrormanager automatically redirects the clicks at the
the requests from India usually get routed to these two mirrors.
As they are always operate at their limit, the user eventually gets a
response "error 421: too many connected users".
This usually happens over and over.
So, what can we do to redirect the requests to the next server in the
queue _if_ we are full?
In order to reduce the load on the koji hub, we've recently brought up a
new http host that serves out the /mnt/koji/packages/ content.
Currently when koji builders build repodata they hard code the baseurl
of 'http://koji.fedoraproject.org/packages'. This means that every
builder and every static-repo user will hit the hub to download
packages. Instead we'd like them to hit the new host, kojipkgs. A
simple change to kojid.conf files on the builders will make the new
baseurl be 'http://kojipkgs.fedoraproject.org/packages'. Any repodata
made after that change (and builder restart) will have the new url. The
old url will continue to work for the old repodata, but the amount of
systems hitting it should reduce over time.
I'm ready to commit the change to puppet, and once we're sure a puppet
run has gone through and updated the files I can start a rolling restart
of the builders. The process would look like this:
1) koji disable-host <all the hosts>
2) as each host finishes it's current task, log in and restart the kojid
3) koji enable-host <each host after restart>
The final step would be to watch for a newRepo task and verify that the
generated repodata has the correct url, and that said repodata is usable
by builders and by consumers of static-repos. If there is a failure the
rollback plan would be much like above, only including a rollback to the
previous URL listed in kojid configs.
Is there any objection to me committing the puppet change, making it
live and starting on the rolling process? There should be no overall
outage to send mail about, service will remain uninterrupted.
Fedora -- Freedom² is a feature!
I just noticed that fas was having issues and found that fas was using
over a GB of memory on fas2.fedora.phx.redhat.com. No problem, I
restarted fas and expected everything to be fine. 14 minutes later,
memory was over a GB again. Another restart took care of the problem
but I looked at the logs to see what's going on.
Apparently FAS is busy enough now that it's running into the database
connection limit we impose via SQLAlchemy and then requests are backing
up, causing the memory explosion. Since FAS is far and away our busiest
TG app I'm bumping the limits up so that we don't keep losing FAs service:
sqlalchemy.connection_pool 5 => 10
sqlalchemy.max_overflow 21 => 25
The connection_pool is the number of db connections each fas server will
hold open. Since it's busy, it makes sense to hold open more
connections. connection_pool + max_overflow = the total number of
connections that can be open when there's a lot of requests.
I've made these changes and pushed them live since it's causing FAS to
timeout and throw errors (which affects other services which auth
through fas as well.)
I seem to have deleted the nagios notification that I want to mention in
particular, but as I noted in SOP/Nagios the %ages noted in the e-mails
are what is left, so inode=99% doesn't mean that 99% of the inodes are
used, it means 99% are still free.
Anyway, what this means, is that when nagios has been complaining about
cvs-int recently, in particular the fact that /git has reached WARNING.
After a bit of hunting around, I found /repo/pkgs using 168GiB of the
192GiB available, which is understandable, Fedora has got huge.
Problem here, is that there are a LOT of old tarballs in that folder,
which leaves me wondering if we should do a spring clean ~1 mo after
Lets take banshee for example, a package which I adopted....
$ ls -l /repo/pkgs/banshee/
drwxrwsr-x 3 apache repoextras 4096 May 3 2006 banshee-0.10.10.tar.gz
drwxrwsr-x 3 apache repoextras 4096 Aug 7 2006 banshee-0.10.11.tar.gz
drwxrwsr-x 3 apache repoextras 4096 Aug 24 2006 banshee-0.10.12.tar.gz
drwxrwsr-x 3 apache repoextras 4096 Mar 4 2006 banshee-0.10.6.tar.gz
drwxrwsr-x 3 apache repoextras 4096 Mar 7 2006 banshee-0.10.7.tar.gz
drwxrwsr-x 3 apache repoextras 4096 Mar 14 2006 banshee-0.10.8.tar.gz
drwxrwsr-x 3 apache repoextras 4096 Apr 16 2006 banshee-0.10.9.tar.gz
drwxrwsr-x 3 apache repoextras 4096 Feb 2 2007 banshee-0.11.5.tar.bz2
drwxrwsr-x 3 apache repoextras 4096 Mar 7 2007 banshee-0.12.0.tar.bz2
drwxrwsr-x 3 apache repoextras 4096 Apr 5 2007 banshee-0.12.1.tar.bz2
drwxrwsr-x 3 apache repoextras 4096 Aug 7 2007 banshee-0.13.0.tar.bz2
drwxrwsr-x 3 apache repoextras 4096 Aug 31 2007 banshee-0.13.1.tar.bz2
drwxrwsr-x 3 apache repoextras 4096 Jan 14 15:58 banshee-0.13.2.tar.bz2
drwxrwsr-x 3 apache repoextras 4096 Apr 13 01:51 banshee-1-0.98.3.tar.bz2
drwxrwsr-x 3 apache repoextras 4096 May 10 03:41 banshee-1-0.99.1.tar.bz2
$ du -sh /repo/pkgs/banshee/
At the most there should only be 4 tar balls there (R-2, R-1, R,
Rawhide), R-2 only valid for one month after R has released.
Another couple of examples:
$ du -sh /repo/pkgs/kernel/
(900ish tarballs dating back to 2004)
$ du -sh /repo/pkgs/kdeedu
(48 tarballs from KDE-3.0)
With plans such and Hans' plans of creating a 500M vegastrike data SRPM
and the size growth and update schedules of some packages we are going
to have these nightmares more and more frequently.
Two solutions I can think of:
Create a script, go thru ALL non dead.package's grab the tarball name
from sources.list and basically create a bit of a database of what we
are using, scan through /repo/pkgs and either:
* Move old tarballs to some archiving system (another use for archive.fp.o?)
* Delete old tarballs
Or throw more diskspace at cvs-int
Even if were to only remove 15% of the tarballs there (this is a very
cautious estimate of the number of stale tarballs) we could potentially
reach 72% diskspace available on that mount (down from 82%) - note this
is very simplistic math, in essence, we could be no better off if we
only removed the small stale tarballs :).
Diskspace isn't cheap, so I like delete old tarballs, I also like this
option because it's not like they disappear completely, they should be
in the src.rpm's already on archive.fp.o and if we accidentally delete 1
or 2 that are still needed, well grab it from src.rpm...
This leads on to my second item...
xenbuilder2 has run out of diskspace in /, it's down to 32M, thankfully
koji has disabled it so it's safe for now, but wouldn't it be nice to
throw say a 50GB partition dedicated solely for /var/lib/mock &
/mnt/build? Yes, yes, I know money, but once again, builds are getting
bigger so 'it'd be nice'.
Just thought I'd throw these two thoughts into the open, let the
I'm working on a script to collect blog/name/hackergotchi entries
from users homedirs on fedorapeople.org and assemble them into a config
file for planet to use. All the bits about grabbing the files are
clearly simple. My only question is this:
- Should I just have the user put one (or more) planet config stanzas
in a file in their homedir or have them list just the blog feed url, the
hackergotchi url and their name and try to parse that out?
I'm inclined to the former. So a user could just have a .planet file
name = Seth Vidal
face = http://skvidal.fedorapeople.org/skvidal.png
which I could read in using configparser, throw out errors about any
duplicates and also balk if someone tries to over [Planet] or something
so, I wanted some feedback on what people thought would be the better
I only speak for me.
Anyone else seeing db2 be pretty slow? Like >20 minutes for the MM
publiclist pages to generate (per page!) and the inability to log into
the MM admin web interface it's taking so long... This has caused the
publiclist and mirrorlists to not be regenerated in about 24 hours.
db2 load average is ~9 right now, which doesn't seem nuts...
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux