Files
nethack/doc/style.doc
2002-01-05 21:05:47 +00:00

1107 lines
46 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Indian Hill C Style and Coding Standards
as amended for U of T Zoology UNIX-
L.W. Cannon
R.A. Elliott
L.W. Kirchhoff
J.H. Miller
J.M. Milner
R.W. Mitze
E.P. Schan
N.O. Whittington
Bell Labs
Henry Spencer
Zoology Computer Systems
University of Toronto
ABSTRACT
This document is an annotated (by the last
author) version of the original paper of the same
title. It describes a set of coding standards and
recommendations which are local standards for
officially-supported UNIX programs. The scope is
coding style, not functional organization.
April 18, 1990
_________________________
- UNIX is a trademark of Bell Laboratories.
1. Introduction
This document is a result of a committee formed at
Indian Hill to establish a common set of coding standards
and recommendations for the Indian Hill community. The
scope of this work is the coding style, not the functional
organization of programs. The standards in this document
are not specific to ESS programming only. [ In fact,
they're pretty good general standards. ``To be clear is
professional; not to be clear is unprofessional.''
- Sir Ernest Gowers. This document is presented
unadulterated; U of T variations, comments, exceptions, etc.
are presented in footnotes. ] { Now, U of T variations are
in []'s, while NetHack variations are in {}'s. Otherwise
it's just about impossible to read on-line. } We have
tried to combine previous work [1,6] on C style into a
uniform set of standards that should be appropriate for any
project using C. [ Of necessity, these standards cannot
cover all situations. Experience and informed judgement
count for much. Inexperienced programmers who encounter
unusual situations should consult 1) code written by
experienced C programmers following these rules, or 2)
experienced C programmers. ]
{ This document applies to the the ``core'' code files and
should be at least considered for port-specific code files,
although in that case norms for the port's system may also
be considered. A couple notes on general layout before
getting to the specific layout rules. Each indenting level
should be 4 positions, and tabs ( ^I's ) should be multiples
of 8 positions. Occasionally tabs may be used for each
indenting level, provided this does not cause line
wrapping. }
2. File Organization
A file consists of various sections that should be
separated by several blank lines. Although there is no max-
imum length requirement for source files, files with more
than about 1500 lines are cumbersome to deal with. The edi-
tor may not have enough temp space to edit the file, compi-
lations will go slower, etc. Since most of us use 300 baud
terminals, entire rows of asterisks, for example, should be
discouraged. [ This is not a problem at U of T, or most
other sensible places, but rows of asterisks are still
annoying. ] Also lines longer than 80 columns are not
handled well by all terminals and should be avoided if pos-
sible. [ Excessively long lines which result from deep
indenting are often a symptom of poorly-organized code. ]
{ For NetHack, lines should be limited to 79 characters
(barring long strings) since even 80 is a nuisance in some
situations. A long string may be moved left from its
natural indentation to avoid line wrapping. }
The suggested order of sections for a file is as fol-
lows:
1. Any header file includes should be the first thing in
the file. { NetHack files should have a copyright/
license section before any of the others mentioned
here. }
2. Immediately after the includes should be a prologue
that tells what is in that file. A description of the
purpose of the objects in the files (whether they be
functions, external data declarations or definitions,
or something else) is more useful than a list of the
object names. [ A common variation, in both Bell code
and ours, is to reverse the order of sections 1 and 2.
This is an acceptable practice. ]
3. Any typedefs and defines that apply to the file as a
whole are next.
4. Next come the global (external) data declarations. If
a set of defines applies to a particular piece of glo-
bal data (such as a flags word), the defines should be
immediately after the data declaration. [ Such defines
should be indented to put the defines one level deeper
than the first keyword of the declaration to which they
apply. ] { For NetHack, the defines should not be
indented if there are a large number of them. They may
even go in a separate file if the declaration tells
where to find them. }
5. The functions come last. [ They should be in some sort
of meaningful order. Top-down is generally better than
bottom-up, and a ``breadth-first'' approach (functions
on a similar level of abstraction together) is
preferred over depth-first (functions defined as soon
as possible after their calls). Considerable judgement
is called for here. If defining large numbers of
essentially-independent utility functions, consider
alphabetical order. ]
2.1. File Naming Conventions
UNIX requires certain suffix conventions for names of
files to be processed by the cc command [5]. [ In addition
to the suffix conventions given here, it is conventional to
use `Makefile' (not `makefile') for the control file for
make and `README' for a summary of the contents of a
directory or directory tree. ] The following suffixes are
required:
o C source file names must end in .c
o Assembler source file names must end in .s
In addition the following conventions are universally
followed:
o Relocatable object file names end in .o
o Include header file names end in .h or .d [ .h is
preferred. An alternate convention that may be
preferable in multi-language environments is to use the
same suffix as an ordinary source file but with two
periods instead of one (e.g. ``foo..c''). ]
o Ldp specification file names end in .b [ No idea what
this is. ]
o Yacc source file names end in .y
o Lex source file names end in .l
3. Header Files
Header files are files that are included in other files
prior to compilation by the C preprocessor. Some are
defined at the system level like stdio.h which must be
included by any program using the standard I/O library.
Header files are also used to contain data declarations and
defines that are needed by more than one program. [ Don't
use absolute pathnames for header files. Use the <name>
construction for getting them from a standard place, or
define them relative to the current directory. The -I
option of the C compiler is the best way to handle extensive
private libraries of header files; it permits reorganizing
the directory structure without having to alter source
files. ] Header files should be functionally organized,
i.e., declarations for separate subsystems should be in
separate header files. Also, if a set of declarations is
likely to change when code is ported from one machine to
another, those declarations should be in a separate header
file.
Header files should not be nested. Some objects like
typedefs and initialized data definitions cannot be seen
twice by the compiler in one compilation. On non-UNIX sys-
tems this is also true of uninitialized declarations without
the extern keyword. [ It should be noted that declaring
variables in a header file is often a poor idea. Frequently
it is a symptom of poor partitioning of code between files.]
This can happen if include files are nested and will cause
the compilation to fail. { NetHack header files should use
the #ifndef HEADER_H/#define HEADER_H/contents/#endif idiom
to bracket their contents, and can then be nested as
desired. }
4. External Declarations
External declarations should begin in column 1. Each
declaration should be on a separate line. A comment
describing the role of the object being declared should be
included, with the exception that a list of defined con-
stants do not need comments if the constant names are suffi-
cient documentation. The comments should be tabbed so that
they line up underneath each other. [ So should the
constant names and their defined values. ] Use the tab
character (CTRL I if your terminal doesn't have a separate
key) rather than blanks. For structure and union template
declarations, each element should be alone on a line with a
comment describing it. { Closely related elements may be
grouped together on a line if a single comment can easily
cover them. } The opening brace ( { ) should be on the same
line as the structure tag, and the closing brace should be
alone on a line in column 1, i.e.
struct boat {
int wllength; /* water line length in feet */
int type; /* see below */
long sarea; /* sail area in square feet */
};
/*
* defines for boat.type
* [ These defines are better put right after the
* declaration of type, within the struct declaration,
* with enough tabs after # to indent define one level
* more than the structure member declarations. ]
*/
#define KETCH 1
#define YAWL 2
#define SLOOP 3
#define SQRIG 4
#define MOTOR 5
If an external variable is initialized the equal sign should
not be omitted. [ Any variable whose initial value is
important should be explicitly initialized, or at the very
least should be commented to indicate that C's default
initialization to 0 is being relied on. The empty
initializer, ``{}'', should never be used. Structure
initializations should be fully parenthesized with braces.
Constants used to initialize longs should be explicitly
long. ]
int x = 1;
char *msg = "message";
struct boat winner = {
40, /* water line length */
YAWL,
600 /* sail area */
};
[ In any file which is part of a larger whole rather than a
self-contained program, maximum use should be made of the
static keyword to make functions and variables local to
single files. Variables in particular should be accessible
from other files only when there is a clear need that cannot
be filled in another way. Such usages should be commented
to make it clear that another file's variables are being
used; the comment should name the other file. ]
5. Comments
Comments that describe data structures, algorithms,
etc., should be in block comment form with the opening /* in
column one, a * in column 2 before each line of comment
text, and the closing */ in columns 2-3. [ Some automated
program-analysis packages use a different character in
column 2 as a marker for lines with specific items of
information. In particular, a line with a `-' here in a
comment preceding a function is sometimes assumed to be a
one-line summary of the function's purpose. ]
/*
* Here is a block comment.
* The comment text should be tabbed over19
* and the opening /* and closing star-slash
* should be alone on a line.
* [ A common practice in both Bell and local code is
* to use a space rather than a tab after the *. This
* is acceptable. ]
*/
Note that grep ^.\* will catch all block comments in
the file. In some cases, block comments inside a function
are appropriate, and they should be tabbed over to the same
tab setting as the code that they describe. Short comments
may appear on a single line indented over to the tab setting
of the code that follows.
if (argc > 1) {
/* Get input file from command line. */
if (freopen(argv[1], "r", stdin) == (FILE *)0)
error("can't open %s\n", argv[1]);
}
Very short comments may appear on the same line as the
code they describe, but should be tabbed over far enough to
separate them from the statements. If more than one short
comment appears in a block of code they should all be tabbed
to the same tab setting.
if (a == 2)
return(TRUE); /* special case */
else
return(isprime(a)); /* works only for odd a */
6. Function Declarations
Each function should be preceded by a block comment
prologue that gives the name and a short description of what
the function does. [ Discussion of non-trivial design
decisions is also appropriate, but avoid duplicating infor-
mation that is present in (and clear from) the code. It's
too easy for such redundant information to get out of date.]
{ For NetHack, even the block comment can be simplified when
the entire function is considered trivial.} If the function
returns a value, the type of the value returned should be
alone on a line in column 1 (do not default to int). If the
function does not return a value then it should not be given
a return type. { Since this is what ``void'' was invented
for, use it. }
If the value returned requires a long explanation, it should
be given in the prologue; otherwise it can be on the same
line as the return type, tabbed over. The function name and
formal parameters should be alone on a line beginning in
column 1. Each parameter should be declared (do not default
to int), with a comment on a single line. { These parameter
declarations should begin in column 1; tabbing them over is
acceptable but not preferred. } The opening brace of the
function body should also be alone on a line beginning in
column 1. The function name, argument declaration list, and
opening brace should be separated by a blank line. [ Nei-
ther Bell nor local code has ever included these separating
blank lines, and it is not clear that they add anything
useful. Leave them out. ] { Unless deemed desirable in a
port-specific file where all compilers for the port support
ANSI C, all function declarations must be ``old-style''. }
All local declarations and code within the function body
should be tabbed over at least one tab.
If the function uses any external variables, these
should have their own declarations in the function body
using the extern keyword. If the external variable is an
array the array bounds must be repeated in the extern
declaration. There should also be extern declarations for
all functions called by a given function. This is particu-
larly beneficial to someone picking up code written by
another. If a function returns a value of type other than
int, it is required by the compiler that such functions be
declared before they are used. Having the extern declara-
tion in the calling function's declarations section avoids
all such problems. [ These rules tend to produce a lot of
clutter. Both Bell and local practice frequently omits
extern declarations for static variables and functions.
This is permitted. Omission of declarations for standard
library routines is also permissible, although if they are
declared it is better to declare them within the functions
that use them rather than globally. ] { All external
NetHack functions should be declared in extern.h, widely-
used system functions in system.h, and most widely-used
global NetHack variables in decl.h. Explicit extern declar-
ations elsewhere should limited to things not widely-used. }
In general each variable declaration should be on a
separate line with a comment describing the role played by
the variable in the function. If the variable is external
or a parameter of type pointer which is changed by the func-
tion, that should be noted in the comment. All such com-
ments for parameters and local variables should be tabbed so
that they line up underneath each other. The declarations
should be separated from the function's statements by a
blank line.
A local variable should not be redeclared in nested
blocks. [ In fact, avoid any local declarations that over-
ride declarations at higher levels. ] Even though this is
valid C, the potential confusion is enough that lint will
complain about it when given the -h option.
6.1. Examples
/*
* skyblue()
*
* Determine if the sky is blue.
*/
int /* TRUE or FALSE */
skyblue()
{
extern int hour;
if (hour < MORNING || hour > EVENING)
return(FALSE); /* black */
else
return(TRUE); /* blue */
}
/*
* tail(nodep)
*
* Find the last element in the linked list
* pointed to by nodep and return a pointer to it.
*/
NODE * /* pointer to tail of list */
tail(nodep)
NODE *nodep; /* pointer to head of list */
{
register NODE *np; /* current pointer advances to NULL */
register NODE *lp; /* last pointer follows np */
np = lp = nodep;
while ((np = np->next) != (NODE*) 0)
lp = np;
return(lp);
}
7. Compound Statements
Compound statements are statements that contain lists
of statements enclosed in braces. The enclosed list should
be tabbed over one more than the tab position of the com-
pound statement itself. The opening left brace should be at
the end of the line beginning the compound statement and the
closing right brace should be alone on a line, tabbed under
the beginning of the compound statement. Note that the left
brace beginning a function body is the only occurrence of a
left brace which is alone on a line. { The case and default
keywords may be indented further within a switch statement.}
7.1. Examples
if (expr) {
statement;
statement;
}
if (expr) {
statement;
statement;
} else {
statement;
statement;
}
Note that the right brace before the else and the right
brace before the while of a do-while statement (below) are
the only places where a right braces appears that is not
alone on a line.
for (i = 0; i < MAX; i++) {
statement;
statement;
}
while (expr) {
statement;
statement;
}
do {
statement;
statement;
} while (expr);
switch (expr) {
case ABC:
case DEF:
statement;
break;
case XYZ:
statement;
break;
default:
statement;
break24;
}
[ The last break is, strictly speaking, unnecessary, but it
is required nonetheless because it prevents a fall-through
error if another case is added later after the last one. ]
Note that when multiple case labels are used, they are
placed on separate lines. The fall through feature of the C
switch statement should rarely if ever be used when code is
executed before falling through to the next one. If this is
done it must be commented for future maintenance. { Falling
though is used more widely in NetHack, but it should still
be commented if the code is split as described above. }
if (strcmp(reply, "yes") == EQUAL) {
statements for yes
...
} else if (strcmp(reply, "no") == EQUAL) {
statements for no
...
} else if (strcmp(reply, "maybe") == EQUAL) {
statements for maybe
...
} else {
statements for none of the above
...
}
The last example is a generalized switch statement and the
tabbing reflects the switch between exactly one of several
alternatives rather than a nesting of statements.
8. Expressions
8.1. Operators
The old versions of equal-ops =+, =-, =*, etc. should
not be used. The preferred use is +=, -=, *=, etc. All
binary operators except . and -> should be separated from
their operands by blanks. [ Some judgement is called for in
the case of complex expressions, which may be clearer if
the ``inner'' operators are not surrounded by spaces and the
``outer'' ones are. ] In addition, keywords that are
followed by expressions in parentheses should be separated
from the left parenthesis by a blank. [ Sizeof is an
exception, see the discussion of function calls. Less
logically, so is return. ] Blanks should also appear after
commas in argument lists to help separate the arguments
visually. { Dropping the spaces after keywords and between
arguments is acceptable, especially if it avoids line
wrapping. } On the other hand, macros with arguments and
function calls should not have a blank between the name and
the left parenthesis. In particular, the C preprocessor
requires the left parenthesis to be immediately after the
macro name or else the argument list will not be recognized.
Unary operators should not be separated from their single
operand. Since C has some unexpected precedence rules,
all expressions involving mixed operators should be fully
parenthesized.
Examples
a += c + d;
a = (a + b) / (c * d);
strp->field = str.fl - ((x & MASK) >> DISP);
while (*d++ = *s++)
; /* EMPTY BODY */
8.2. Naming Conventions
Individual projects will no doubt have their own naming
conventions. There are some general rules however.
o An initial underscore should not be used for any user-
created names. [ Trailing underscores should be
avoided too. ] UNIX uses it for names that the user
should not have to know (like the standard I/O
library). [ This convention is reserved for system
purposes. If you must have your own private identi-
fiers, begin them with a capital letter identifying the
package to which they belong. ]
o Macro names, typedef names, and define names should be
all in CAPS. { NetHack typedef names tend to be
lowercase. }
o Variable names, structure tag names, and function names
should be in lower case. [ It is best to avoid names
that differ only in case, like foo and FOO. The
potential for confusion is considerable. ] { Some
systems do not distinguish case in function names, so
especially avoid using both foo() and Foo(). } Some
macros (such as getchar and putchar) are in lower case
since they may also exist as functions. Care is needed
when interchanging macros and functions since functions
pass their parameters by value whereas macros pass
their arguments by name substitution. [ This differ-
ence also means that carefree use of macros requires
care when they are defined. Remember that complex
expressions can be used as parameters, and operator-
precedence problems can arise unless all occurrences of
parameters in the definition have parentheses around
them. There is little that can be done about the
problems caused by side effects in parameters except to
avoid side effects in expressions (a good idea anyway).]
8.3. Constants
Numerical constants should not be coded directly. [ At
the very least, any directly-coded numerical constant must
have a comment explaining the derivation of the value. ]
{ There are, however, a number of NetHack constants that
have no particular derivation, being numbers picked out of
the air for a particular piece of code, often indicating the
probability of something. As long as these constants do not
occur outside the limited bit of code, don't bother forcing
a name on something that doesn't want one. } The define
feature of the C preprocessor should be used to assign a
meaningful name. This will also make it easier to admin-
ister large programs since the constant value can be changed
uniformly by changing only the define. The enumeration data
type is the preferred way to handle situations where a
variable takes on only a discrete set of values, since
additional type checking is available through lint.
{ However, some older compilers cannot handle enumeration
types, so it's back to defined constants. }
There are some cases where the constants 0 and 1 may
appear as themselves instead of as defines. For example if
a for loop indexes through an array, then
for (i = 0; i < ARYBOUND; i++)
is reasonable, as is
fptr = fopen(filename, "r");
if (fptr == (FILE*)0)
error("can't open %s\n", filename);
In the last example, although the defined constant NULL is
available as part of the most versions of the standard I/O
library's header file, stdio.h, its type is uncertain and it
requires considerable thought to decide when an uncast NULL
is safe for all allowed (and disallowed, but occurring!)
definitions of NULL and all degrees of prototyping. Since a
cast is sometimes necessary even with NULL, just explicitly
cast 0 to the pointer type in each use. { The wording of
this paragraph has changed from the original to suit the
needs of NetHack. }
{ The rest of this document is not binding on NetHack. One
added point, though -- #ifdef/#else/#endif sets should have
a single space inserted between the `#' and the first letter
for each ifdef nesting level (not counting OVLx, or
HEADER_H in header files. #else and #endif lines should be
commented with a reminder of their condition, unless the
conditional section is very short. }
9. Portability
The advantages of portable code are well known. This
section gives some guidelines for writing portable code,
where the definition of portable is taken to mean that a
source file contains portable code if it can be compiled and
executed on different machines with the only source change
being the inclusion of possibly different header files. The
header files will contain defines and typedefs that may vary
from machine to machine. Reference [1] contains useful
information on both style and portability. Many of the
recommendations in this document originated in [1]. The
following is a list of pitfalls to be avoided and recommen-
dations to be considered when designing portable code:
o First, one must recognize that some things are
inherently non-portable. Examples are code to deal
with particular hardware registers such as the program
status word, and code that is designed to support a
particular piece of hardware such as an assembler or
I/O driver. Even in these cases there are many rou-
tines and data organizations that can be made machine
independent. It is suggested that source file be
organized so that the machine-independent code and the
machine-dependent code are in separate files. Then if
the program is to be moved to a new machine, it is a
much easier task to determine what needs to be
changed. [ If you #ifdef dependencies, make sure that
if no machine is specified, the result is a syntax
error, not a default machine! ] It is also possible
that code in the machine-independent files may have
uses in other programs as well.
o Pay attention to word sizes. The following sizes apply
to basic types in C for the machines that will be used
most at IH: [ The 3B is a Bell Labs machine. The VAX,
not shown in the table, is similar to the 3B in these
respects. The 68000 resembles either the pdp11 or the
3B, depending on the particular compiler. ]
type pdp11 3B IBM
________________________
char 8 8 8
short 16 16 16
int 16 32 32
long 32 32 32
In general if the word size is important, short or long
should be used to get 16 or 32 bit items on any of the
above machines. [ Any unsigned type other than plain
unsigned int should be typedefed, as such types are
highly compiler-dependent. This is also true of long
and short types other than long int and short int.
Large programs should have a central header file which
supplies typedefs for commonly-used width-sensitive
types, to make it easier to change them and to aid in
finding width-sensitive code. ] If a simple loop
counter is being used where either 16 or 32 bits will
do, then use int, since it will get the most efficient
(natural) unit for the current machine. [ Beware of
making assumptions about the size of pointers. They
are not always the same size as int. Nor are all
pointers always the same size, or freely intercon-
vertible. Pointer-to-character is a particular trouble
spot on machines which do not address to the byte. ]
o Word size also affects shifts and masks. The code
x &= 0177770
will clear only the three rightmost bits of an int on a
PDP11. On a 3B it will also clear the entire upper
halfword. Use
x &= ~07
instead which works properly on all machines. [ The or
operator ( | ) does not have these problems, nor do
bitfields (which, unfortunately, are not very portable
due to defective compilers). ]
o Code that takes advantage of the two's complement
representation of numbers on most machines should not
be used. Optimizations that replace arithmetic opera-
tions with equivalent shifting operations are particu-
larly suspect. You should weigh the time savings with
the potential for obscure and difficult bugs when your
code is moved, say, from a 3B to a 1A.
o Watch out for signed characters. On the PDP-11, char-
acters are sign extended when used in expressions,
which is not the case on any other machine. In partic-
ular, getchar is an integer-valued function (or macro)
since the value of EOF for the standard I/O library is
-1, which is not possible for a character on the 3B or
IBM. [ Actually, this is not quite the real reason why
getchar returns int, but the comment is valid: code
which assumes either that characters are signed or that
they are unsigned is unportable. It is best to
completely avoid using char to hold numbers. Manip-
ulation of characters as if they were numbers is also
often unportable. ]
o The PDP-11 is unique among processors on which C exists
in that the bytes are numbered from right to left
within a word. All other machines (3B, IBM, Interdata
8/32, Honeywell) number the bytes from left to right.
[ Actually, there are some more right-to-left machines
now, but the comments still apply. ] Hence any code
that depends on the left-right orientation of bits in a
word deserves special scrutiny. Bitfields within
structure members will only be portable so long as two
separate fields are never concatenated and treated as
a unit. [1,3] [ The same applies to variables in
general. Alignment considerations and loader
peculiarities make it very rash to assume that two
consecutively-declared variables are together in
memory, or that a variable of one type is aligned
appropriately to be used as another type. ]
o Do not default the boolean test for non-zero, i.e.
if (f() != FAIL)
is better than
if (f())
even though FAIL may have the value 0 which is con-
sidered to mean false by C. [ A particularly notorious
case is using strcmp to test for string equality, where
the result should never ever be defaulted. The
preferred approach is to define a macro STREQ:
#define STREQ(a, b) (strcmp((a), (b)) == 0)
]
This will help you out later when somebody decides that
a failure return should be -1 instead of 0. [ An
exception is commonly made for predicates, which are
functions which meet the following restrictions:
o Has no other purpose than to return true or false.
o Returns 0 for false, 1 for true, nothing else.
o Is named so that the meaning of (say) a `true'
return is absolutely obvious. Call a predicate isvalid
or valid, not checkvalid.
]
o Be suspicious of numeric values appearing in the code.
Even simple values like 0 or 1 could be better
expressed using defines like FALSE and TRUE (see previ-
ous item). [ Actually, YES and NO often read better. ]
Any other constants appearing in a program would be
better expressed as a defined constant. This makes it
easier to change and also easier to read.
o Become familiar with existing library functions and
defines. [ But not too familiar. The internal details
of library facilities, as opposed to their external
interfaces, are subject to change without warning.
They are also often quite unportable. ] You should not
be writing your own string compare routine, or making
your own defines for system structures. [ Or,
especially, writing your own code to control terminals.
Use the termcap package. ] Not only does this waste
your time, but it prevents your program from taking
advantage of any microcode assists or other means of
improving performance of system routines. [ It also
makes your code less readable, because the reader has
to figure out whether you're doing something special in
that reimplemented stuff to justify its existence.
Furthermore, it's a fruitful source of bugs. ]
o Use lint. It is a valuable tool for finding machine-
dependent constructs as well as other inconsistencies
or program bugs that pass the compiler. [ The use of
lint on all programs is strongly recommended. It is
difficult to eliminate complaints about functions whose
return value is not used (in the current version of C,
at least), but most other messages from lint really do
indicate something wrong. The -h, -p, -a, -x, and -c
options are worth learning. All of them will complain
about some legitimate things, but they will also pick
up many botches. Note that -p checks function-call
type-consistency for only a subset of Unix library
routines, so programs should be linted both with and
without this option for best ``coverage''. ]
10. Lint
Lint is a C program checker [2] that examines C source
files to detect and report type incompatibilities, incon-
sistencies between function definitions and calls, potential
program bugs, etc. It is expected that projects will
require programs to use lint as part of the official accep-
tance procedure. [ Yes. ] In addition, work is going on in
department 5521 to modify lint so that it will check for
adherence to the standards in this document.
It is still too early to say exactly which of the
standards given here will be checked by lint. In some cases
such as whether a comment is misleading or incorrect there
is little hope of mechanical checking. In other cases such
as checking that the opening brace of a function body is
alone on a line in column 1, the test has already been
added. [ Little of this is relevant at U of T. The version
of lint that we have lacks these mods. ] Future bulletins
will be used to announce new additions to lint as they
occur.
It should be noted that the best way to use lint is not
as a barrier that must be overcome before official accep-
tance of a program, but rather as a tool to use whenever
major changes or additions to the code have been made. Lint
can find obscure bugs and insure portability before problems
occur.
11. Special Considerations
This section contains some miscellaneous do's and
don'ts.
o Don't change syntax via macro substitution. It makes
the program unintelligible to all but the perpetrator.
o There is a time and a place for embedded assignment
statements. [ The ++ and -- operators count as assign-
ment statements. So, for many purposes, do functions
with side effects. ] In some constructs there is no
better way to accomplish the results without making the
code bulkier and less readable. The while loop in
section 8.1 is one example of an appropriate place.
Another is the common code segment:
while ((c = getchar()) != EOF) {
process the character
}
Using embedded assignment statements to improve run-
time performance is also possible. However, one should
consider the tradeoff between increased speed and
decreased maintainability that results when embedded
assignments are used in artificial places. For exam-
ple, the code:
a = b + c;
d = a + r;
should not be replaced by
d = (a = b + c) + r;
even though the latter may save one cycle. Note that
in the long run the time difference between the two
will decrease as the optimizer gains maturity, while
the difference in ease of maintenance will increase as
the human memory of what's going on in the latter piece
of code begins to fade. [ Note also that side effects
within expressions can result in code whose semantics
are compiler-dependent, since C's order of evaluation
is explicitly undefined in most places. Compilers do
differ. ]
o There is also a time and place for the ternary ? :
operator and the binary comma operator. The logical
expression operand before the ? : should be
parenthesized:
(x >= 0) ? x : -x
Nested ? : operators can be confusing and should be
avoided if possible. There are some macros like
getchar where they can be useful. The comma operator
can also be useful in for statements to provide multi-
ple initializations or incrementations.
o Goto statements should be used sparingly as in any
well-structured code. [ The continue statement is
almost as bad. Break is less troublesome. ] The main
place where they can be usefully employed is to break
out of several levels of switch, for, and while
nesting, e.g.
for (...)
for (...) {
...
if (disaster)
goto error;
}
...
error:
clean up the mess
[ The need to do such a thing may indicate that the
inner constructs should be broken out into a separate
function, with a success/failure return code. ]
When a goto is necessary the accompanying label should
be alone on a line and tabbed one tab position to the
left of the associated code that follows.
o This committee recommends that programmers not rely on
automatic beautifiers for the following reasons.
First, the main person who benefits from good program
style is the programmer himself. This is especially
true in the early design of handwritten algorithms or
pseudo-code. Automatic beautifiers can only be applied
to complete, syntactically correct programs and hence
are not available when the need for attention to white
space and indentation is greatest. It is also felt
that programmers can do a better job of making clear
the complete visual layout of a function or file, with
the normal attention to detail of a careful program-
mer. [ In other words, some of the visual layout is
dictated by intent rather than syntax. Beautifiers
cannot read minds. ] Sloppy programmers should learn
to be careful programmers instead of relying on a beau-
tifier to make their code readable. Finally, it is
felt that since beautifiers are non-trivial programs
that must parse the source, the burden of maintaining
them in the face of the continuing evolution of C is
not worth the benefits gained by such a program.
12. Project Dependent Standards
Individual projects may wish to establish additional
standards beyond those given here. The following issues are
some of those that should be adddressed by each project pro-
gram administration group.
o What additional naming conventions should be followed?
In particular, systematic prefix conventions for func-
tional grouping of global data and also for structure
or union member names can be useful.
o What kind of include file organization is appropriate
for the project's particular data hierarchy?
o What procedures should be established for reviewing
lint complaints? A tolerance level needs to be esta-
blished in concert with the lint options to prevent
unimportant complaints from hiding complaints about
real bugs or inconsistencies.
o If a project establishes its own archive libraries, it
should plan on supplying a lint library file [2] to the
system administrators. This will allow lint to check
for compatible use of library functions.
13. Conclusion
A set of standards has been presented for C programming
style. One of the most important points is the proper use
of white space and comments so that the structure of the
program is evident from the layout of the code. Another
good idea to keep in mind when writing code is that it is
likely that you or someone else will be asked to modify it
or make it run on a different machine sometime in the
future.
As with any standard, it must be followed if it is to
be useful. The Indian Hill version of lint will enforce
those standards that are amenable to automatic checking. If
you have trouble following any of these standards don't just
ignore them. Programmers at Indian Hill should bring their
problems to the Software Development System Group (Lee
Kirchhoff, contact) in department 5522. Programmers outside
Indian Hill should contact the Processor Application Group
(Layne Cannon, contact) in department 5512. [ At U of T
Zoology, it's Henry Spencer in 336B. ]
References
[1] B.A. Tague, "C Language Portability", Sept 22, 1977.
This document issued by department 8234 contains three
memos by R.C. Haight, A.L. Glasser, and T.L. Lyon deal-
ing with style and portability.
[2] S.C. Johnson, "Lint, a C Program Checker", Technical
Memorandum, 77-1273-14, September 16, 1977.
[3] R.W. Mitze, "The 3B/PDP-11 Swabbing Problem", Memoran-
dum for File, 1273-770907.01MF, September 14, 1977.
[4] R.A. Elliott and D.C. Pfeffer, "3B Processor Common
Diagnostic Standards- Version 1", Memorandum for File,
5514-780330.01MF, March 30, 1978.
[5] R.W. Mitze, "An Overview of C Compilation of UNIX User
Processes on the 3B", Memorandum for File, 5521-
780329.02MF, March 29, 1978.
[6] B.W. Kernighan and D.M. Ritchie, The C Programming
Language, Prentice-Hall 1978.
/*
* The C Style Summary Sheet Block comment,
* by Henry Spencer, U of T Zoology describes file.
*/
#include <errno.h> Headers; don't nest.
typedef int SEQNO; /* ... */ Global definitions.
#define STREQ(a, b) (strcmp((a), (b)) == 0)
static char *foo = (char *)0; /* ... */ Global declarations.
struct bar { Static whenever poss.
SEQNO alpha; /* ... */
# define NOSEQNO 0
int beta; /* ... */ Don't assume 16 bits.
};
/*
* Many unnecessary braces, to show where. Functions.
*/
static int /* what is returned */ Don't default int.
bletch(a)
int a; /* ... */ Don't default int.
{
int bar; /* ... */
extern int errno; /* ..., changed here */
extern char *index();
if (foobar() != FAIL) { if (!isvalid()) {
return(OK); errno = ERANGE;
} } else {
x = &y + z->field;
while (x == (y & MASK)) { }
f += (x >= 0) ? x : -x;
} for (i = 0; i < BOUND; i++) {
/* lint -h[p]cax. */
do { }
/* Avoid nesting ?: */
} while (index(a, b) != (char*)0); if (STREQ(x, "foo")) {
x |= 07; /* 07 is... */
switch (...) { } else if (STREQ(x, "bar")) {
case ABC: x &= ~077; /* 077 is... */
case DEF: } else if (STREQ(x, "ugh")) {
printf("...", a, b); /* Avoid gotos */
break; } else {
case XYZ: /* and continues. */
x = y; }
/* FALLTHROUGH */
default: while ((c = getc()) != EOF)
/* Limit imbedded =s. */ ; /* NULLBODY */
break;
}
}