Merge branch 'master' of git://208.76.47.48
[god.git] / site / index.html
blob73772c3703095d026e0baa040e601e5329b84939
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
2 <html xmlns="http://www.w3.org/1999/xhtml">
3 <head>
4 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
5 <title>god - process and task monitoring done right</title>
6 <link href="styles.css" rel="stylesheet" type="text/css" />
7 <style type="text/css" media="screen">
8 * {
9 margin: 0;
10 font-size: 100%;
13 body {
14 font: normal .8em/1.5em "Trebuchet MS", Verdana, Arial, Helvetica, sans-serif;
15 color: #484848;
16 background: #E6EAE9 url(images/bg_grey.gif);
19 a {
20 color: #c75f3e;
21 text-decoration: none;
24 a:hover,
25 a:active {
26 text-decoration: underline;
29 #mothership {
30 width: 307px;
31 height: 117px;
32 margin: 0 auto;
33 background: url(images/god_logo1.gif);
36 #content {
37 width: 700px;
38 margin: 3px auto;
39 background: white;
40 border: 1px solid #444;
41 padding: 0 24px;
42 background: #f8f8ff;
43 overflow: hidden;
46 .banner {
47 margin-top: 24px;
48 border: 1px solid #ddd;
49 width: 698px;
50 height: 150px;
51 background: url(images/banner.jpg);
54 #menu {
55 margin-top: 5px;
58 #menu div.dots {
59 background: url(images/red_dot.gif) repeat;
60 height: 5px;
61 width: 700px;
62 font-size: 0;
65 #menu ul {
66 font-family: "Trebuchet MS", Verdana, Arial, Helvetica, sans-serif;
67 font-weight: bold;
68 text-transform: uppercase;
69 color: #4D4D4D;
70 font-size: 12px;
71 padding: 0;
72 margin: 0;
73 margin-top: 0 !important;
74 margin-top: -2px;
77 #menu li {
78 display: inline;
79 margin: 0 30px 0 0;
82 #menu a:link,
83 #menu a:visited {
84 color: #4D4D4D;
85 text-decoration: none;
88 #menu a:hover,
89 #menu a:active {
90 color: black;
91 text-decoration: none;
94 #page_home #menu li.menu_home a {
95 color: #A70000;
98 .columnleft {
99 float: left;
100 width: 325px;
101 margin-bottom: 20px;
104 .columnleft p {
105 text-align: justify;
108 .columnright {
109 float: right;
110 width: 325px;
111 margin-bottom: 20px;
114 h1 {
115 font: bold 1.5em "Trebuchet MS", Verdana, Arial, Helvetica, sans-serif;
116 color: #f36e21;
117 text-transform: uppercase;
118 margin: 1.5em 0 .5em 0;
119 clear: both;
123 margin-bottom: 1em;
126 ul.features {
127 padding: 0;
128 margin-left: 1.5em !important;
129 margin-left: 1.3em;
132 ul.features li {
133 list-style-position: outside;
134 list-style-type: circle;
135 list-style-image: url(images/bullet.jpg);
136 line-height: 1.4em;
139 #footer {
140 text-align: center;
141 color: white;
142 margin-bottom: 50px;
147 pre {
148 line-height: 1.3;
149 border: 1px solid #ccc;
150 padding: 1em;
151 background-color: #efefef;
152 margin: 1em 0;
155 code {
156 font-size: 1.2em;
159 .ruby .keywords {
160 color: blue;
163 .ruby .comment {
164 color : green;
167 .ruby .string {
168 color : teal;
171 .ruby .keywords {
172 color : navy;
175 .ruby .brackets {
176 color : navy;
178 </style>
180 <script type="text/javascript" src="javascripts/code_highlighter.js"></script>
181 <script type="text/javascript" src="javascripts/ruby.js"></script>
183 </head>
185 <body id="page_home">
187 <div id="mothership">
189 </div>
190 <div id="content">
191 <div class="banner">
193 </div>
195 <!-- <div id="menu">
196 <div class="dots"></div>
197 <ul>
198 <li class="menu_home"><a href="/">Home</a></li>
199 <li class="menu_contact"><a href="mailto:tom@projectmothership.com">Contact</a></li>
200 </ul>
201 <div class="dots"></div>
202 </div> -->
204 <div class="columnleft">
205 <h1>A Better Way to Monitor</h1>
206 <p>God is an easy to configure, easy to extend monitoring framework written in Ruby.</p>
207 <p>Keeping your server processes and tasks running should be a simple part of your deployment process. God aims to be the simplest, most powerful monitoring application available.</p>
208 <p style="text-align: right">Tom Preston-Werner<br />tom at rubyisawesome dot com</p>
209 </div>
211 <div class="columnright">
212 <h1>Features</h1>
213 <ul class="features">
214 <li>Config file is written in Ruby</li>
215 <li>Easily write your own custom conditions in Ruby</li>
216 <li>Supports both poll and event based conditions</li>
217 <li>Different poll conditions can have different intervals</li>
218 </ul>
219 </div>
221 <h1>Installation (v 0.2.0)</h1>
222 <p>The best way to get god is via rubygems:</p>
223 <pre>$ sudo gem install god</pre>
224 <p>You can also peruse or clone the code from <a href="http://repo.or.cz/w/god.git">http://repo.or.cz/w/god.git</a></p>
226 <h1>Requirements</h1>
228 <p>God currently only works on <b>Linux (kernel 2.6.15+), BSD,</b> and <b>Darwin</b> systems. No support for Windows is planned.</p>
230 <p>The following systems have been tested. Help us test it on others!</p>
232 <ul>
233 <li>Darwin 10.4.10</li>
234 <li>RedHat Fedora Core</li>
235 <li>Ubuntu Feisty</li>
236 </ul>
238 <h1>Finally, a Config File that Makes Sense</h1>
239 <p>The easiest way to understand how god will make your life better is by looking at a sample config file. The following configuration file is what I use at <a href="http://site.gravatar.com/">gravatar.com</a> to keep the mongrels running:</p>
241 <pre><code class="ruby"># file: gravatar.god
242 # run with: god start -c /path/to/gravatar.god
244 # This is the actual config file used to keep the mongrels of
245 # gravatar.com running.
247 RAILS_ROOT = "/var/www/gravatar2/current"
249 God.meddle do |god|
250 %w{8200 8201 8202}.each do |port|
251 god.watch do |w|
252 w.name = "gravatar2-mongrel-#{port}"
253 w.interval = 30 # seconds default
254 w.start = "mongrel_rails cluster::start --only #{port} \
255 -C #{RAILS_ROOT}/config/mongrel_cluster.yml"
256 w.stop = "mongrel_rails cluster::stop --only #{port} \
257 -C #{RAILS_ROOT}/config/mongrel_cluster.yml"
258 w.grace = 10 # seconds
260 pid_file = File.join(RAILS_ROOT, "log/mongrel.#{port}.pid")
262 w.behavior(:clean_pid_file) do |b|
263 b.pid_file = pid_file
266 w.start_if do |start|
267 start.condition(:process_running) do |c|
268 c.interval = 5 # seconds
269 c.running = false
270 c.pid_file = pid_file
274 w.restart_if do |restart|
275 restart.condition(:memory_usage) do |c|
276 c.pid_file = pid_file
277 c.above = (150 * 1024) # 150mb
278 c.times = [3, 5] # 3 out of 5 intervals
281 restart.condition(:cpu_usage) do |c|
282 c.pid_file = pid_file
283 c.above = 50 # percent
284 c.times = 5
289 end</code></pre>
291 <p>That's a lot to take in at once, so I'll break it down by section and explain what's going on in each.</p>
293 <pre><code class="ruby">RAILS_ROOT = "/var/www/gravatar2/current"</code></pre>
295 <p>Here I've set a constant that is used throughout the file. Keeping the <code>RAILS_ROOT</code> value in a constant makes it easy to adapt this script to other applications. Because the config file is Ruby code, I can set whatever variables or constants I want that make the configuration more concise and easier to work with.</p>
297 <pre><code class="ruby">God.meddle do |god|
299 end</code></pre>
301 <p>The meat of the config file is defined inside a <code>God.meddle</code> block.</p>
303 <pre><code class="ruby"> %w{8200 8201 8202}.each do |port|
305 end</code></pre>
307 <p>Because the config file is written in actual Ruby code, we can construct loops and do other intelligent things that are impossible in your every day, run of the mill config file. I need to watch three mongrels, so I simply loop over their port numbers, eliminating duplication and making my life a whole lot easier.</p>
309 <pre><code class="ruby"> god.watch do |w|
310 w.name = "gravatar2-mongrel-#{port}"
311 w.interval = 30 # seconds default
312 w.start = "mongrel_rails cluster::start --only #{port} \
313 -C #{RAILS_ROOT}/config/mongrel_cluster.yml"
314 w.stop = "mongrel_rails cluster::stop --only #{port} \
315 -C #{RAILS_ROOT}/config/mongrel_cluster.yml"
316 w.grace = 10 # seconds
319 end</code></pre>
321 <p>A <code>watch</code> represents a single process or task that has concrete start, stop, and/or restart operations. You can define as many watches as you like inside the <code>God.meddle</code> block. In the example above, I've got a Rails instance running in a Mongrel that I need to keep alive. Every watch must have a unique <code>name</code> so that it can be identified later on. The <code>interval</code> option sets the default poll interval (this can be overridden in each condition). The <code>start</code> and <code>stop</code> attributes specify the commands to start and stop the process. If no <code>restart</code> attribute is set, restart will be represented by a call to stop followed by a call to start. The optional <code>grace</code> attribute sets the amount of time following a start/stop/restart command to wait before resuming normal monitoring operations.</p>
323 <pre><code class="ruby"> pid_file = File.join(RAILS_ROOT, "log/mongrel.#{port}.pid")</code></pre>
325 <p>A variable to hold the location of the PID file.</p>
327 <pre><code class="ruby"> w.behavior(:clean_pid_file) do |b|
328 b.pid_file = pid_file
329 end</code></pre>
331 <p>Behaviors allow you to execute additional commands around start/stop/restart commands. In our case, if the process dies it will leave a PID file behind. The next time a start command is issued, it will fail, complaining about the leftover PID file. We'd like the PID file cleaned up before a start command is issued. The built-in behavior <code>clean_pid_file</code> will do just that. All we have to do is specify the location of the PID file.</p>
333 <pre><code class="ruby"> w.start_if do |start|
334 start.condition(:process_not_running) do |c|
335 c.interval = 5 # seconds
336 c.running = false
337 c.pid_file = pid_file
339 end</code></pre>
341 <p>Watches contain conditions grouped by the action to execute should they return <code>true</code>. I start with a <code>start_if</code> block that contains a single condition. Conditions are specified by calling <code>condition</code> with an identifier, in this case
342 <code>:process_not_running</code>. Each condition can specify a poll interval that will override the default watch interval. In this case, I want to check that the process is still running every 5 seconds instead of the 30 second interval that other conditions will inherit. The ability to set condition specific poll intervals makes it possible to run costly tests less often then cheap tests.</p>
344 <pre><code class="ruby"> w.restart_if do |restart|
345 restart.condition(:memory_usage) do |c|
346 c.pid_file = pid_file
347 c.above = (150 * 1024) # 150mb
348 c.times = [3, 5] # 3 out of 5 intervals
352 end</code></pre>
354 <p>Similar to <code>start_if</code> there is a <code>restart_if</code> command that groups conditions that should trigger a restart. The <code>memory_usage</code> condition will fail if the specified process is using too much memory. Once again, the <code>pid_file</code> must be set. The maximum allowable amount of memory is specified with the <code>above</code> attribute in units of kilobytes. The number of times the test needs to fail in order to trigger a restart is set with <code>times</code>. This can be either an integer or an array. An integer means it must fail that many times in a row while an array [x, y] means it must fail x times out of the last y tests.</p>
356 <pre><code class="ruby"> w.restart_if do |restart|
359 restart.condition(:cpu_usage) do |c|
360 c.pid_file = pid_file
361 c.above = 50 # percent
362 c.times = 5
364 end</code></pre>
366 <p>To keep an eye on CPU usage, I've employed the <code>cpu_usage</code> condition. When CPU usage for a Mongrel process is over 50% for 5 consecutive intervals, it will be restarted.</p>
368 <h1>Starting God</h1>
370 <p>To start the god monitoring process simply run the <code>god</code> executable passing in the path to the config file:</p>
372 <pre>$ god start -c /path/to/config.god</pre>
374 <p>While you're writing your config file, it can be helpful to run god in the foreground so you can see the log messages. You can do that with:</p>
376 <pre>$ god run -c /path/to/config.god</pre>
378 <h1>Advanced Configuration with Transitions and Events</h1>
380 <p>So far you've been introduced to a simple poll-based config file and seen how to run it. Poll-based monitoring works great for simple things, but falls short for highly critical tasks. God has native support for kqueue/netlink events on BSD/Darwin/Linux systems. For instance, instead of using the <code>process_running</code> condition to poll for the status of your process, you can use the <code>process_exits</code> condition that will be notified <b>immediately</b> upon the exit of your process. This means less load on your system and shorter downtime after a crash.</p>
382 <p>While the configuration syntax you saw in the previous example is very simple, it lacks the power that we need to deal with event based monitoring. In fact, the <code>start_if</code> and <code>restart_if</code> methods are really just calling out to a lower-level API. If we use the low-level API directly, we can harness the full power of god's event based lifecycle system. Let's look at another example config file.</p>
384 <pre><code class="ruby">RAILS_ROOT = "/Users/tom/dev/git/helloworld"
386 God.meddle do |god|
387 god.watch do |w|
388 w.name = "local-3000"
389 w.interval = 5 # seconds
390 w.start = "mongrel_rails start -P ./log/mongrel.pid -c #{RAILS_ROOT}"
391 w.stop = "mongrel_rails stop -P ./log/mongrel.pid -c #{RAILS_ROOT}"
393 pid_file = File.join(RAILS_ROOT, "log/mongrel.pid")
395 # clean pid files before start if necessary
396 w.behavior(:clean_pid_file) do |b|
397 b.pid_file = pid_file
400 # determine the state on startup
401 w.transition(:init, { true => :up, false => :start }) do |on|
402 on.condition(:process_running) do |c|
403 c.running = true
404 c.pid_file = pid_file
408 # determine when process has finished starting
409 w.transition([:start, :restart], :up) do |on|
410 on.condition(:process_running) do |c|
411 c.running = true
412 c.pid_file = pid_file
416 # start if process is not running
417 w.transition(:up, :start) do |on|
418 on.condition(:process_exits) do |c|
419 c.pid_file = pid_file
423 # restart if memory or cpu is too high
424 w.transition(:up, :restart) do |on|
425 on.condition(:memory_usage) do |c|
426 c.interval = 20
427 c.pid_file = pid_file
428 c.above = (50 * 1024) # 50mb
429 c.times = [3, 5]
432 on.condition(:cpu_usage) do |c|
433 c.interval = 10
434 c.pid_file = pid_file
435 c.above = 10 # percent
436 c.times = [3, 5]
441 </code></pre>
443 <p>A bit longer, I know, but very straighforward once you understand how the <code>transition</code> calls work. The <code>name</code>, <code>interval</code>, <code>start</code>, and <code>stop</code> commands should be familiar. We again define a <code>pid_file</code> variable and specify the <code>clean_pid_file</code> behavior.</p>
445 <p>Before jumping into the code, it's important to understand the different states that a Watch can have, and how that state changes over time. At any given time, a Watch will be in one of the <code>init</code>, <code>up</code>, <code>start</code>, or <code>restart</code> states. As different conditions are satisfied, the Watch will progress from state to state, enabling and disabling conditions along the way.</p>
447 <p>When god first starts, each Watch is placed in the <code>init</code> state.</p>
449 <p>You'll use the <code>transition</code> method to tell god how to transition between states. It takes two arguments. The first argument may be either a symbol or an array of symbols representing the state or states during which the specified conditions should be enabled. The second argument may be either a symbol or a hash. If it is a symbol, then that is the state that will be transitioned to if any of the conditions return <code>true</code>. If it is a hash, then that hash must have both <code>true</code> and <code>false</code> keys, each of which point to a symbol that represents the state to transition to given the corresponding return from the single condition that must be specified.</p>
451 <pre><code class="ruby"> # determine the state on startup
452 w.transition(:init, { true => :up, false => :start }) do |on|
453 on.condition(:process_running) do |c|
454 c.running = true
455 c.pid_file = pid_file
457 end</code></pre>
459 <p>The first transition block tells god what to do when the Watch is in the <code>init</code> state (first argument). This is where I tell god how to determine if my task is already running. Since I'm monitoring a process, I can use the <code>process_running</code> condition to determine whether the process is running. If the process is running, it will return true, otherwise it will return false. Since I sent a hash as the second argument to <code>transition</code>, the return from <code>process_running</code> will determine which of the two states will be transitioned to. If the process is running, the return is true and god will put the Watch into the <code>up</code> state. If the process is not running, the return is false and god will put the Watch into the <code>start</code> state.</p>
461 <pre><code class="ruby"> # determine when process has finished starting
462 w.transition([:start, :restart], :up) do |on|
463 on.condition(:process_running) do |c|
464 c.running = true
465 c.pid_file = pid_file
467 end</code></pre>
469 <p>If god has determined that my process isn't running, the Watch will be put into the <code>start</code> state. Upon entering this state, the <code>start</code> command that I specified on the Watch will be called. In addition, the above transition specifies a condition that should be enabled when in either the <code>start</code> or <code>restart</code> states. The condition is another <code>process_running</code>, however this time I'm only interested in moving to another state once it returns <code>true</code>. A <code>true</code> return from this condition means that the process is running and it's ok to transition to the <code>up</code> state (second argument to <code>transition</code>).</p>
471 <pre><code class="ruby"> # start if process is not running
472 w.transition(:up, :start) do |on|
473 on.condition(:process_exits) do |c|
474 c.pid_file = pid_file
476 end</code></pre>
478 <p>This is where the event based system comes into play. Once in the <code>up</code> state, I want to be notified when my process exits. The <code>process_exits</code> condition registers a callback that will trigger a transition change when it is fired off. Event conditions (like this one) cannot be used in transitions that have a hash for the second argument (as they do not return true or false).</p>
480 <pre><code class="ruby"> # restart if memory or cpu is too high
481 w.transition(:up, :restart) do |on|
482 on.condition(:memory_usage) do |c|
483 c.interval = 20
484 c.pid_file = pid_file
485 c.above = (50 * 1024) # 50mb
486 c.times = [3, 5]
489 on.condition(:cpu_usage) do |c|
490 c.interval = 10
491 c.pid_file = pid_file
492 c.above = 10 # percent
493 c.times = [3, 5]
495 end</code></pre>
497 <p>Notice that I can have multiple transitions with the same start state. In this case, I want to have the <code>memory_usage</code> and <code>cpu_usage</code> poll conditions going at the same time that I listen for the process exit event. In the case of runaway CPU or memory usage, however, I want to transition to the <code>restart</code> state. When a Watch enters the <code>restart</code> state it will either call the <code>restart</code> command that you specified, or if none has been set, call the <code>stop</code> and then <code>start</code> commands.</p>
501 <h1>Extend God with your own Conditions</h1>
503 <p>God was designed from the start to allow you to easily write your own custom conditions, making it simple to add tests that are application specific.</p>
505 <pre><code class="ruby">module God
506 module Conditions
508 class ProcessRunning < PollCondition
509 attr_accessor :pid_file, :running
511 def valid?
512 valid = true
513 valid &= complain("You must specify the 'pid_file' attribute
514 for :process_running") if self.pid_file.nil?
515 valid &= complain("You must specify the 'running' attribute
516 for :process_running") if self.running.nil?
517 valid
520 def test
521 return !self.running unless File.exist?(self.pid_file)
523 pid = File.open(self.pid_file).read.strip
524 active = System::Process.new(pid).exists?
526 (self.running && active) || (!self.running && !active)
531 end</code></pre>
533 </div>
534 <div id="footer">
535 <p>Brought to you by <a href="http://rubyisawesome.com/">Ruby is Awesome</a></p>
536 </div>
538 <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
539 </script>
540 <script type="text/javascript">
541 _uacct = "UA-2196727-1";
542 urchinTracker();
543 </script>
545 </body>
546 </html>