Fix spelling error in docs.
[PostgreSQL.git] / doc / src / sgml / sources.sgml
blob7fd1ba88e57434420597c537591d38c333cb2eec
1 <!-- $PostgreSQL$ -->
3 <chapter id="source">
4 <title>PostgreSQL Coding Conventions</title>
6 <sect1 id="source-format">
7 <title>Formatting</title>
9 <para>
10 Source code formatting uses 4 column tab spacing, with
11 tabs preserved (i.e., tabs are not expanded to spaces).
12 Each logical indentation level is one additional tab stop.
13 </para>
15 <para>
16 Layout rules (brace positioning, etc) follow BSD conventions. In
17 particular, curly braces for the controlled blocks of <literal>if</>,
18 <literal>while</>, <literal>switch</>, etc go on their own lines.
19 </para>
21 <para>
22 Do not use C++ style comments (<literal>//</> comments). Strict ANSI C
23 compilers do not accept them. For the same reason, do not use C++
24 extensions such as declaring new variables mid-block.
25 </para>
27 <para>
28 The preferred style for multi-line comment blocks is
29 <programlisting>
31 * comment text begins here
32 * and continues here
34 </programlisting>
35 Note that comment blocks that begin in column 1 will be preserved as-is
36 by <application>pgindent</>, but it will re-flow indented comment blocks
37 as though they were plain text. If you want to preserve the line breaks
38 in an indented block, add dashes like this:
39 <programlisting>
40 /*----------
41 * comment text begins here
42 * and continues here
43 *----------
45 </programlisting>
46 </para>
48 <para>
49 While submitted patches do not absolutely have to follow these formatting
50 rules, it's a good idea to do so. Your code will get run through
51 <application>pgindent</> before the next release, so there's no point in
52 making it look nice under some other set of formatting conventions.
53 </para>
55 <para>
56 The <filename>src/tools</filename> directory contains sample settings
57 files that can be used with the <productname>emacs</productname>,
58 <productname>xemacs</productname> or <productname>vim</productname>
59 editors to help ensure that they format code according to these
60 conventions.
61 </para>
63 <para>
64 The text browsing tools <application>more</application> and
65 <application>less</application> can be invoked as:
66 <programlisting>
67 more -x4
68 less -x4
69 </programlisting>
70 to make them show tabs appropriately.
71 </para>
72 </sect1>
74 <sect1 id="error-message-reporting">
75 <title>Reporting Errors Within the Server</title>
77 <indexterm>
78 <primary>ereport</primary>
79 </indexterm>
80 <indexterm>
81 <primary>elog</primary>
82 </indexterm>
84 <para>
85 Error, warning, and log messages generated within the server code
86 should be created using <function>ereport</>, or its older cousin
87 <function>elog</>. The use of this function is complex enough to
88 require some explanation.
89 </para>
91 <para>
92 There are two required elements for every message: a severity level
93 (ranging from <literal>DEBUG</> to <literal>PANIC</>) and a primary
94 message text. In addition there are optional elements, the most
95 common of which is an error identifier code that follows the SQL spec's
96 SQLSTATE conventions.
97 <function>ereport</> itself is just a shell function, that exists
98 mainly for the syntactic convenience of making message generation
99 look like a function call in the C source code. The only parameter
100 accepted directly by <function>ereport</> is the severity level.
101 The primary message text and any optional message elements are
102 generated by calling auxiliary functions, such as <function>errmsg</>,
103 within the <function>ereport</> call.
104 </para>
106 <para>
107 A typical call to <function>ereport</> might look like this:
108 <programlisting>
109 ereport(ERROR,
110 (errcode(ERRCODE_DIVISION_BY_ZERO),
111 errmsg("division by zero")));
112 </programlisting>
113 This specifies error severity level <literal>ERROR</> (a run-of-the-mill
114 error). The <function>errcode</> call specifies the SQLSTATE error code
115 using a macro defined in <filename>src/include/utils/errcodes.h</>. The
116 <function>errmsg</> call provides the primary message text. Notice the
117 extra set of parentheses surrounding the auxiliary function calls &mdash;
118 these are annoying but syntactically necessary.
119 </para>
121 <para>
122 Here is a more complex example:
123 <programlisting>
124 ereport(ERROR,
125 (errcode(ERRCODE_AMBIGUOUS_FUNCTION),
126 errmsg("function %s is not unique",
127 func_signature_string(funcname, nargs,
128 actual_arg_types)),
129 errhint("Unable to choose a best candidate function. "
130 "You might need to add explicit typecasts.")));
131 </programlisting>
132 This illustrates the use of format codes to embed run-time values into
133 a message text. Also, an optional <quote>hint</> message is provided.
134 </para>
136 <para>
137 The available auxiliary routines for <function>ereport</> are:
138 <itemizedlist>
139 <listitem>
140 <para>
141 <function>errcode(sqlerrcode)</function> specifies the SQLSTATE error identifier
142 code for the condition. If this routine is not called, the error
143 identifier defaults to
144 <literal>ERRCODE_INTERNAL_ERROR</> when the error severity level is
145 <literal>ERROR</> or higher, <literal>ERRCODE_WARNING</> when the
146 error level is <literal>WARNING</>, otherwise (for <literal>NOTICE</>
147 and below) <literal>ERRCODE_SUCCESSFUL_COMPLETION</>.
148 While these defaults are often convenient, always think whether they
149 are appropriate before omitting the <function>errcode()</> call.
150 </para>
151 </listitem>
152 <listitem>
153 <para>
154 <function>errmsg(const char *msg, ...)</function> specifies the primary error
155 message text, and possibly run-time values to insert into it. Insertions
156 are specified by <function>sprintf</>-style format codes. In addition to
157 the standard format codes accepted by <function>sprintf</>, the format
158 code <literal>%m</> can be used to insert the error message returned
159 by <function>strerror</> for the current value of <literal>errno</>.
160 <footnote>
161 <para>
162 That is, the value that was current when the <function>ereport</> call
163 was reached; changes of <literal>errno</> within the auxiliary reporting
164 routines will not affect it. That would not be true if you were to
165 write <literal>strerror(errno)</> explicitly in <function>errmsg</>'s
166 parameter list; accordingly, do not do so.
167 </para>
168 </footnote>
169 <literal>%m</> does not require any
170 corresponding entry in the parameter list for <function>errmsg</>.
171 Note that the message string will be run through <function>gettext</>
172 for possible localization before format codes are processed.
173 </para>
174 </listitem>
175 <listitem>
176 <para>
177 <function>errmsg_internal(const char *msg, ...)</function> is the same as
178 <function>errmsg</>, except that the message string will not be
179 translated nor included in the internationalization message dictionary.
180 This should be used for <quote>cannot happen</> cases that are probably
181 not worth expending translation effort on.
182 </para>
183 </listitem>
184 <listitem>
185 <para>
186 <function>errdetail(const char *msg, ...)</function> supplies an optional
187 <quote>detail</> message; this is to be used when there is additional
188 information that seems inappropriate to put in the primary message.
189 The message string is processed in just the same way as for
190 <function>errmsg</>.
191 </para>
192 </listitem>
193 <listitem>
194 <para>
195 <function>errdetail_log(const char *msg, ...)</function> is the same as
196 <function>errdetail</> except that this string goes only to the server
197 log, never to the client. If both <function>errdetail</> and
198 <function>errdetail_log</> are used then one string goes to the client
199 and the other to the log. This is useful for error details that are
200 too security-sensitive or too bulky to include in the report
201 sent to the client.
202 </para>
203 </listitem>
204 <listitem>
205 <para>
206 <function>errhint(const char *msg, ...)</function> supplies an optional
207 <quote>hint</> message; this is to be used when offering suggestions
208 about how to fix the problem, as opposed to factual details about
209 what went wrong.
210 The message string is processed in just the same way as for
211 <function>errmsg</>.
212 </para>
213 </listitem>
214 <listitem>
215 <para>
216 <function>errcontext(const char *msg, ...)</function> is not normally called
217 directly from an <function>ereport</> message site; rather it is used
218 in <literal>error_context_stack</> callback functions to provide
219 information about the context in which an error occurred, such as the
220 current location in a PL function.
221 The message string is processed in just the same way as for
222 <function>errmsg</>. Unlike the other auxiliary functions, this can
223 be called more than once per <function>ereport</> call; the successive
224 strings thus supplied are concatenated with separating newlines.
225 </para>
226 </listitem>
227 <listitem>
228 <para>
229 <function>errposition(int cursorpos)</function> specifies the textual location
230 of an error within a query string. Currently it is only useful for
231 errors detected in the lexical and syntactic analysis phases of
232 query processing.
233 </para>
234 </listitem>
235 <listitem>
236 <para>
237 <function>errcode_for_file_access()</> is a convenience function that
238 selects an appropriate SQLSTATE error identifier for a failure in a
239 file-access-related system call. It uses the saved
240 <literal>errno</> to determine which error code to generate.
241 Usually this should be used in combination with <literal>%m</> in the
242 primary error message text.
243 </para>
244 </listitem>
245 <listitem>
246 <para>
247 <function>errcode_for_socket_access()</> is a convenience function that
248 selects an appropriate SQLSTATE error identifier for a failure in a
249 socket-related system call.
250 </para>
251 </listitem>
252 <listitem>
253 <para>
254 <function>errhidestmt(bool hide_stmt)</function> can be called to specify
255 suppression of the <literal>STATEMENT:</> portion of a message in the
256 postmaster log. Generally this is appropriate if the message text
257 includes the current statement already.
258 </para>
259 </listitem>
260 </itemizedlist>
261 </para>
263 <para>
264 There is an older function <function>elog</> that is still heavily used.
265 An <function>elog</> call:
266 <programlisting>
267 elog(level, "format string", ...);
268 </programlisting>
269 is exactly equivalent to:
270 <programlisting>
271 ereport(level, (errmsg_internal("format string", ...)));
272 </programlisting>
273 Notice that the SQLSTATE error code is always defaulted, and the message
274 string is not subject to translation.
275 Therefore, <function>elog</> should be used only for internal errors and
276 low-level debug logging. Any message that is likely to be of interest to
277 ordinary users should go through <function>ereport</>. Nonetheless,
278 there are enough internal <quote>cannot happen</> error checks in the
279 system that <function>elog</> is still widely used; it is preferred for
280 those messages for its notational simplicity.
281 </para>
283 <para>
284 Advice about writing good error messages can be found in
285 <xref linkend="error-style-guide">.
286 </para>
287 </sect1>
289 <sect1 id="error-style-guide">
290 <title>Error Message Style Guide</title>
292 <para>
293 This style guide is offered in the hope of maintaining a consistent,
294 user-friendly style throughout all the messages generated by
295 <productname>PostgreSQL</>.
296 </para>
298 <simplesect>
299 <title>What goes where</title>
301 <para>
302 The primary message should be short, factual, and avoid reference to
303 implementation details such as specific function names.
304 <quote>Short</quote> means <quote>should fit on one line under normal
305 conditions</quote>. Use a detail message if needed to keep the primary
306 message short, or if you feel a need to mention implementation details
307 such as the particular system call that failed. Both primary and detail
308 messages should be factual. Use a hint message for suggestions about what
309 to do to fix the problem, especially if the suggestion might not always be
310 applicable.
311 </para>
313 <para>
314 For example, instead of:
315 <programlisting>
316 IpcMemoryCreate: shmget(key=%d, size=%u, 0%o) failed: %m
317 (plus a long addendum that is basically a hint)
318 </programlisting>
319 write:
320 <programlisting>
321 Primary: could not create shared memory segment: %m
322 Detail: Failed syscall was shmget(key=%d, size=%u, 0%o).
323 Hint: the addendum
324 </programlisting>
325 </para>
327 <para>
328 Rationale: keeping the primary message short helps keep it to the point,
329 and lets clients lay out screen space on the assumption that one line is
330 enough for error messages. Detail and hint messages can be relegated to a
331 verbose mode, or perhaps a pop-up error-details window. Also, details and
332 hints would normally be suppressed from the server log to save
333 space. Reference to implementation details is best avoided since users
334 don't know the details anyway.
335 </para>
337 </simplesect>
339 <simplesect>
340 <title>Formatting</title>
342 <para>
343 Don't put any specific assumptions about formatting into the message
344 texts. Expect clients and the server log to wrap lines to fit their own
345 needs. In long messages, newline characters (\n) can be used to indicate
346 suggested paragraph breaks. Don't end a message with a newline. Don't
347 use tabs or other formatting characters. (In error context displays,
348 newlines are automatically added to separate levels of context such as
349 function calls.)
350 </para>
352 <para>
353 Rationale: Messages are not necessarily displayed on terminal-type
354 displays. In GUI displays or browsers these formatting instructions are
355 at best ignored.
356 </para>
358 </simplesect>
360 <simplesect>
361 <title>Quotation marks</title>
363 <para>
364 English text should use double quotes when quoting is appropriate.
365 Text in other languages should consistently use one kind of quotes that is
366 consistent with publishing customs and computer output of other programs.
367 </para>
369 <para>
370 Rationale: The choice of double quotes over single quotes is somewhat
371 arbitrary, but tends to be the preferred use. Some have suggested
372 choosing the kind of quotes depending on the type of object according to
373 SQL conventions (namely, strings single quoted, identifiers double
374 quoted). But this is a language-internal technical issue that many users
375 aren't even familiar with, it won't scale to other kinds of quoted terms,
376 it doesn't translate to other languages, and it's pretty pointless, too.
377 </para>
379 </simplesect>
381 <simplesect>
382 <title>Use of quotes</title>
384 <para>
385 Use quotes always to delimit file names, user-supplied identifiers, and
386 other variables that might contain words. Do not use them to mark up
387 variables that will not contain words (for example, operator names).
388 </para>
390 <para>
391 There are functions in the backend that will double-quote their own output
392 at need (for example, <function>format_type_be</>()). Do not put
393 additional quotes around the output of such functions.
394 </para>
396 <para>
397 Rationale: Objects can have names that create ambiguity when embedded in a
398 message. Be consistent about denoting where a plugged-in name starts and
399 ends. But don't clutter messages with unnecessary or duplicate quote
400 marks.
401 </para>
403 </simplesect>
405 <simplesect>
406 <title>Grammar and punctuation</title>
408 <para>
409 The rules are different for primary error messages and for detail/hint
410 messages:
411 </para>
413 <para>
414 Primary error messages: Do not capitalize the first letter. Do not end a
415 message with a period. Do not even think about ending a message with an
416 exclamation point.
417 </para>
419 <para>
420 Detail and hint messages: Use complete sentences, and end each with
421 a period. Capitalize the first word of sentences. Put two spaces after
422 the period if another sentence follows (for English text; might be
423 inappropriate in other languages).
424 </para>
426 <para>
427 Rationale: Avoiding punctuation makes it easier for client applications to
428 embed the message into a variety of grammatical contexts. Often, primary
429 messages are not grammatically complete sentences anyway. (And if they're
430 long enough to be more than one sentence, they should be split into
431 primary and detail parts.) However, detail and hint messages are longer
432 and might need to include multiple sentences. For consistency, they should
433 follow complete-sentence style even when there's only one sentence.
434 </para>
436 </simplesect>
438 <simplesect>
439 <title>Upper case vs. lower case</title>
441 <para>
442 Use lower case for message wording, including the first letter of a
443 primary error message. Use upper case for SQL commands and key words if
444 they appear in the message.
445 </para>
447 <para>
448 Rationale: It's easier to make everything look more consistent this
449 way, since some messages are complete sentences and some not.
450 </para>
452 </simplesect>
454 <simplesect>
455 <title>Avoid passive voice</title>
457 <para>
458 Use the active voice. Use complete sentences when there is an acting
459 subject (<quote>A could not do B</quote>). Use telegram style without
460 subject if the subject would be the program itself; do not use
461 <quote>I</quote> for the program.
462 </para>
464 <para>
465 Rationale: The program is not human. Don't pretend otherwise.
466 </para>
468 </simplesect>
470 <simplesect>
471 <title>Present vs past tense</title>
473 <para>
474 Use past tense if an attempt to do something failed, but could perhaps
475 succeed next time (perhaps after fixing some problem). Use present tense
476 if the failure is certainly permanent.
477 </para>
479 <para>
480 There is a nontrivial semantic difference between sentences of the form:
481 <programlisting>
482 could not open file "%s": %m
483 </programlisting>
484 and:
485 <programlisting>
486 cannot open file "%s"
487 </programlisting>
488 The first one means that the attempt to open the file failed. The
489 message should give a reason, such as <quote>disk full</quote> or
490 <quote>file doesn't exist</quote>. The past tense is appropriate because
491 next time the disk might not be full anymore or the file in question might
492 exist.
493 </para>
495 <para>
496 The second form indicates that the functionality of opening the named file
497 does not exist at all in the program, or that it's conceptually
498 impossible. The present tense is appropriate because the condition will
499 persist indefinitely.
500 </para>
502 <para>
503 Rationale: Granted, the average user will not be able to draw great
504 conclusions merely from the tense of the message, but since the language
505 provides us with a grammar we should use it correctly.
506 </para>
508 </simplesect>
510 <simplesect>
511 <title>Type of the object</title>
513 <para>
514 When citing the name of an object, state what kind of object it is.
515 </para>
517 <para>
518 Rationale: Otherwise no one will know what <quote>foo.bar.baz</>
519 refers to.
520 </para>
522 </simplesect>
524 <simplesect>
525 <title>Brackets</title>
527 <para>
528 Square brackets are only to be used (1) in command synopses to denote
529 optional arguments, or (2) to denote an array subscript.
530 </para>
532 <para>
533 Rationale: Anything else does not correspond to widely-known customary
534 usage and will confuse people.
535 </para>
537 </simplesect>
539 <simplesect>
540 <title>Assembling error messages</title>
542 <para>
543 When a message includes text that is generated elsewhere, embed it in
544 this style:
545 <programlisting>
546 could not open file %s: %m
547 </programlisting>
548 </para>
550 <para>
551 Rationale: It would be difficult to account for all possible error codes
552 to paste this into a single smooth sentence, so some sort of punctuation
553 is needed. Putting the embedded text in parentheses has also been
554 suggested, but it's unnatural if the embedded text is likely to be the
555 most important part of the message, as is often the case.
556 </para>
558 </simplesect>
560 <simplesect>
561 <title>Reasons for errors</title>
563 <para>
564 Messages should always state the reason why an error occurred.
565 For example:
566 <programlisting>
567 BAD: could not open file %s
568 BETTER: could not open file %s (I/O failure)
569 </programlisting>
570 If no reason is known you better fix the code.
571 </para>
573 </simplesect>
575 <simplesect>
576 <title>Function names</title>
578 <para>
579 Don't include the name of the reporting routine in the error text. We have
580 other mechanisms for finding that out when needed, and for most users it's
581 not helpful information. If the error text doesn't make as much sense
582 without the function name, reword it.
583 <programlisting>
584 BAD: pg_atoi: error in "z": cannot parse "z"
585 BETTER: invalid input syntax for integer: "z"
586 </programlisting>
587 </para>
589 <para>
590 Avoid mentioning called function names, either; instead say what the code
591 was trying to do:
592 <programlisting>
593 BAD: open() failed: %m
594 BETTER: could not open file %s: %m
595 </programlisting>
596 If it really seems necessary, mention the system call in the detail
597 message. (In some cases, providing the actual values passed to the
598 system call might be appropriate information for the detail message.)
599 </para>
601 <para>
602 Rationale: Users don't know what all those functions do.
603 </para>
605 </simplesect>
607 <simplesect>
608 <title>Tricky words to avoid</title>
610 <formalpara>
611 <title>Unable</title>
612 <para>
613 <quote>Unable</quote> is nearly the passive voice. Better use
614 <quote>cannot</quote> or <quote>could not</quote>, as appropriate.
615 </para>
616 </formalpara>
618 <formalpara>
619 <title>Bad</title>
620 <para>
621 Error messages like <quote>bad result</quote> are really hard to interpret
622 intelligently. It's better to write why the result is <quote>bad</quote>,
623 e.g., <quote>invalid format</quote>.
624 </para>
625 </formalpara>
627 <formalpara>
628 <title>Illegal</title>
629 <para>
630 <quote>Illegal</quote> stands for a violation of the law, the rest is
631 <quote>invalid</quote>. Better yet, say why it's invalid.
632 </para>
633 </formalpara>
635 <formalpara>
636 <title>Unknown</title>
637 <para>
638 Try to avoid <quote>unknown</quote>. Consider <quote>error: unknown
639 response</quote>. If you don't know what the response is, how do you know
640 it's erroneous? <quote>Unrecognized</quote> is often a better choice.
641 Also, be sure to include the value being complained of.
642 <programlisting>
643 BAD: unknown node type
644 BETTER: unrecognized node type: 42
645 </programlisting>
646 </para>
647 </formalpara>
649 <formalpara>
650 <title>Find vs. Exists</title>
651 <para>
652 If the program uses a nontrivial algorithm to locate a resource (e.g., a
653 path search) and that algorithm fails, it is fair to say that the program
654 couldn't <quote>find</quote> the resource. If, on the other hand, the
655 expected location of the resource is known but the program cannot access
656 it there then say that the resource doesn't <quote>exist</quote>. Using
657 <quote>find</quote> in this case sounds weak and confuses the issue.
658 </para>
659 </formalpara>
661 <formalpara>
662 <title>May vs. Can vs. Might</title>
663 <para>
664 <quote>May</quote> suggests permission (e.g. "You may borrow my rake."),
665 and has little use in documentation or error messages.
666 <quote>Can</quote> suggests ability (e.g. "I can lift that log."),
667 and <quote>might</quote> suggests possibility (e.g. "It might rain
668 today."). Using the proper word clarifies meaning and assists
669 translation.
670 </para>
671 </formalpara>
673 <formalpara>
674 <title>Contractions</title>
675 <para>
676 Avoid contractions, like <quote>can't</quote>; use
677 <quote>cannot</quote> instead.
678 </para>
679 </formalpara>
681 </simplesect>
683 <simplesect>
684 <title>Proper spelling</title>
686 <para>
687 Spell out words in full. For instance, avoid:
688 <itemizedlist>
689 <listitem>
690 <para>
691 spec
692 </para>
693 </listitem>
694 <listitem>
695 <para>
696 stats
697 </para>
698 </listitem>
699 <listitem>
700 <para>
701 parens
702 </para>
703 </listitem>
704 <listitem>
705 <para>
706 auth
707 </para>
708 </listitem>
709 <listitem>
710 <para>
711 xact
712 </para>
713 </listitem>
714 </itemizedlist>
715 </para>
717 <para>
718 Rationale: This will improve consistency.
719 </para>
721 </simplesect>
723 <simplesect>
724 <title>Localization</title>
726 <para>
727 Keep in mind that error message texts need to be translated into other
728 languages. Follow the guidelines in <xref linkend="nls-guidelines">
729 to avoid making life difficult for translators.
730 </para>
731 </simplesect>
733 </sect1>
735 </chapter>