Change freeze tomorrow!
by Mike McGrath
Just a reminder... The real change freze starts tomorrow. If you've got
something to do... DO IT! :)
-Mike
15 years, 11 months
Introduction - Pavel Khardikov
by Pavel Khardikov
Hello everybody,
I'm Pavel Khardikov from the Kursk State University, Russia. I'm 22. I'm
a first year student of postgraduate study.
Graduated from the faculty of computer science and computer engineering.
Speciality: administration software and information systems.
My application have been accepted into Google Summer of Code 2008. I'm
going to work on Pretty Web 2.0 Interfaces for Smolt.
This is my first GSoC. I'm really happy about it. My mentor is Yaakov Nemoy.
I want to help fedora project and open source, as well as to improve my
skills and experiences.
For more than 5 years, I worked as a system administrator for ISP
(Internet Service Provider). I set up and supported servers and services
(apache, nginx,
tomcat, squid, exim, bind, postgresql, mysql, jabberd, pptpd, etc ...)
administered RHEL, CentOS and Fedora. I have some experiences on OpenVZ
and Xen infrastructure.
I use Fedora 8 on my laptop. RHEL 5 or CentOS 5 is installed on my
production servers basically.
And moreover I like to programming.
I know perl, python and web programming (XML, HTML/XHTML, CSS,
cross-browser making-up).
I like to design user interfaces.
--
Best regards. Pavel Khardikov
15 years, 11 months
Fedora Services API
by Toshio Kuratomi
Hey guys, I'm talking especially to the web app developers here, but I
also hope that other people will chime in with useful thoughts.
As part of python-fedora, I've started documenting some standards for
Fedora Services. These will go hand in hand with documents on how to
use BaseClient that I am in the process of writing. These documents
will document how Fedora Services should be written rather than how they
are currently implemented. As such, I want to make sure everyone:
1) has a chance to review the document and make comments/changes before
the changesare integrated into BaseClient and people have to port to
things that are backwards incompatible.
2) can bring up other areas that need to be addressed.
For instance, skvidal recently wrote a short client to the pkgdb that
can generate email aliases for packages (so package(a)packages.fp.o could
send mail to all the committers to a package). He had the following
comments to make:
1) It would be nice to have an introspection API to look at what methods
are available on a TG app and how to use them.
2) JSON data that is deeply nested sucks because objects are turned into
dictionaries so you have to write things like:
package['listings']['F-8']['people']['toshio']['acls']['commit']['status']
to get the status of a person's acls on a package.
I'd like to get other people's input on how to work on these and any
other issues before completing the next python-fedora so I can be sure
that I've got most of the incompatibilities in. Talking here and on IRC
is welcome.
Here's the current version of the Fedora Services Documentation. What's
there should be complete. Addressing skvidal's concerns hasn't even
been stubbed out yet::
http://tinyurl.com/52xmjw
The BaseClient Documentation is still in heavy flux. It's here::
http://tinyurl.com/4ww6v7
-Toshio
15 years, 11 months
Change Request: setup restart-memhogs on fas1/2
by Toshio Kuratomi
We have a script to restart TG apps on the app servers when their memory
exceeds a customizable limit. We have this run on the app servers on
alternate hours so that we don't take down all the load balanced copies
of an app at the same time. This has proved useful for a few of our
apps which have a tendency to grow until the box starts swapping. I'd
like to set this up for fas1 and fas2 before we get to the hard change
freeze.
As an initial limit, I want to start with 1GB of rss before an app
restarts. The present usage
fas1 fas2
Memory on the box: 4096000 2048000
Memory usage after
1 day 21 hours: 728820 1005260
-Toshio
15 years, 11 months
Re: How much downtime do we afford for nagios?
by Nigel Jones
> Hi,
Hi,
>
> For a few days false notification of nagios reduced. But it has increased
> again.
You sure?
>
> Looking at the /configs/system/nagios/services/template.cfg reveals
> that it is configured as
> max_check_attempt = 4 and retry_check_interval 1 for hosts
> and
> max_check_attempts = 3 and retry_check_interval 1.
>
> So if a service or host is unreachable for 3 or 4 mins, we get a
> notification. (However most of the cases it is false positive, due to
> congestion or others).
Looking through my email, from what I can recall there are no false
positives. xen6 had to be power-cycled which caused all the other
collateral notifications.
Just to put it into perspective...
1st notification: 0212UTC - Accounts down on .120-phx
...
5th notification: 0216UTC - UNKNOWN status on xen6 (NRPE: Unable to read
output)
...
11/12th notifications: 0228UTC - Host Down - xen6/db2
& Starting 0233UTC - Host/service UP/Okay notifications
According to my IRC logs xen6 went a bit haywire and had to be rebooted,
so TBH I don't see what is false here.
Yes congestion can cause some problems, but isn't that also a sign that
stuff may need to be balanced better or given more processing/networking
capacity.
It's long enough to not detect every single VPN bloop, but it's also long
enough to give an idea of problems.
>
> How about finding out a working delay which we can afford, if a
> service or host is really down. How about 10 mins ? (5 attempt x 2
> mins?).
IMO this is too long, also, it doesn't take that long for someone to SSH
in and have a quick look, I don't speak for everyone, but I don't mind if
I spend 2-5 minutes to check.
>
> Also we may list services/host which are critical and which are not.
> That will help to define different notification period for the
> different hots/services.
>
> I thought I shall do it after the freeze, but its becoming too annoying.
Personally, I don't think anything should be done at the moment.
- Nigel
15 years, 12 months
introduce myself
by Luca Foppiano
Hi all,
I'm Luca Foppiano, I come from Italy (near Milan).
I've been in this ML for two months but I don't know yet which area I
prefer into fedora-infrastructure.
I know python and web programming (but I prefer python :P), I had some
experiences on apache, lighttpd, Fedora Directory server and xen
infrastructure.
I'm here because I want help fedora project, but also to improve my
skill and experiences.
My nick on irc is lfoppiano (or, sometimes whitenoise).
Best regards
Luca
--
Today is Sweetmorn, the 43rd day of Discord in the YOLD 3174
15 years, 12 months
Re: How much downtime do we afford for nagios?
by Nigel Jones
>> > So if a service or host is unreachable for 3 or 4 mins, we get a
>> > notification. (However most of the cases it is false positive, due to
>> > congestion or others).
>> Looking through my email, from what I can recall there are no false
>> positives. xen6 had to be power-cycled which caused all the other
>> collateral notifications.
>
>
> How long was it down? Why should a normal reboot will send 23 mails?
> Reboot is not any exceptional thing. Is it?
> An alert should be when its absolutely necessary...
> it should report only when xen6 comes up but a service does not come up..
> What do you think?
> Thanks.
Remembering that unresponsive and down are different things it looks like
it went unresponsive ~0210 UTC (2-3 minutes before first email) - I
*think* this might have just being domU's at that point, from IRC logs it
looks like the dom0 was rebooted sometime around 0228 (potentially before
hand I do not know).
It's 1 email per checked item for down/up and I guess in perspective, it
was quite big...
IMO these reports are 'absolutely necessary' and I personally like to
check it every now and then (especially after an outage like this to see
if everything was back up (service/host overview on nagios web is handy
for this).
- Nigel
>
>
>
> --
> Regards,
> Susmit.
>
> =============================================
> ssh
> 0x86DD170A
> http://www.fedoraproject.org/wiki/SusmitShannigrahi
> =============================================
>
> _______________________________________________
> Fedora-infrastructure-list mailing list
> Fedora-infrastructure-list(a)redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list
>
15 years, 12 months
How much downtime do we afford for nagios?
by susmit shannigrahi
Hi,
For a few days false notification of nagios reduced. But it has increased again.
Looking at the /configs/system/nagios/services/template.cfg reveals
that it is configured as
max_check_attempt = 4 and retry_check_interval 1 for hosts
and
max_check_attempts = 3 and retry_check_interval 1.
So if a service or host is unreachable for 3 or 4 mins, we get a
notification. (However most of the cases it is false positive, due to
congestion or others).
How about finding out a working delay which we can afford, if a
service or host is really down. How about 10 mins ? (5 attempt x 2
mins?).
Also we may list services/host which are critical and which are not.
That will help to define different notification period for the
different hots/services.
I thought I shall do it after the freeze, but its becoming too annoying.
Thanks
--
Regards,
Susmit.
=============================================
ssh
0x86DD170A
http://www.fedoraproject.org/wiki/SusmitShannigrahi
=============================================
15 years, 12 months
Re: ** ACKNOWLEDGEMENT alert - xenbuilder1.fedora.phx.redhat.com/Swap is CRITICAL **
by Nigel Jones
I acknowledged this in Nagios with the hope I'd be able to find someone to
kill the eclipse build and hopefully fix this all up before I went to
sleep, alas no one has been around.
What has happened is java has thrown and exception, but hasn't exited
properly so it needs killing...
So if someone can kill build ID 583255 (and resubmit) I'd imagine it'd
solve the problem.
Dunka
Nigel's Bed :)
> ***** Nagios *****
>
> Notification Type: ACKNOWLEDGEMENT
>
> Service: Swap
> Host: xenbuilder1.fedora.phx.redhat.com
> Address: xenbuilder1.fedora.phx.redhat.com
> State: CRITICAL
>
> Date/Time: Sat Apr 26 11:55:19 UTC 2008
>
> Additional Info:
>
> SWAP CRITICAL - 10% free (293 MB out of 3071 MB)
>
15 years, 12 months
Introduction - Aioanei Rares
by Rares Aioanei
Hi, my name is Aioanei Rares from Romania, I'm 29, i work as a shop manager
at one of the biggest computer hardware retailers
around here (http://www.depozituldecalculatoare.ro/) and I like programming
in my (scarcely available) spare time. My knowledge
consists of some C, C# (mono), Python, databases (mainly Postgres) and of
course I want to know more all the time. I would like to help
the Fedora Project in any way I can, also maybe with translations - I'm an
English major - so...drop me a line for more info.
Have a great weekend.
15 years, 12 months