Re: changing a few things in our host mgmt tools

Wednesday, 23 March 2011

On Fri, 2011-03-18 at 11:04 -0400, seth vidal wrote:
...
 Hi folks,
  some thoughts have been slowly coalescing in my head about how we're
 managing our boxes/services and I have some suggestions I've passed by
 various folks but I wanted to check them out with everyone:

 1. puppetd sucks..... memory. Right now we have puppetd running on every
 box and it wakes up every half hour and runs itself. This is fine but in
 the time where it is not doing anything it just eats memory for no good
 reason. I'd like to suggest we move to a cron-driven model instead of
 puppetd. I'd write a simple cron job that runs every half hour to run
 puppetd, if a lock file is not found. Pretty straightforward, of
 course.  
this is done.

...

 2. monitoring if puppetd has run properly:
    two things we want to know about puppet runs:
    a. when they last happened per-box
    b. if they fell over in a horrible way.

     (a) can be known by looking at the $nodename.yaml file which lives
 on the puppetmaster. I've written a script to check if that file is
 older than 1 hour and report the nodename if it is.
     (b) can be done via the cron job - ie: taking error output from the
 puppet run and mailing to people until we fix it! :) 

I've written this and it can now submit issues via nsca (via func). One
problem it appears our puppet node names do not match our nagios host
names, A LOT. So we'll need to get some aliases in place so they work.

-sv

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: changing a few things in our host mgmt tools