Recently I was training a team in Clean Code for Embedded Systems. The team were very keen to define a coding standard, but as we talked, it became increasingly clear they were more interested in what I would call a coding style. Let me start with a couple of definitions that I use
Code Style: The aspects of the source code that are independent of the code generated. What the code looks like, the layout, indentation, white space, naming etc. They are defined with the intent of making the code readable.
Coding Standard: Restrictions on the use of a programming language, often with the intent of producing a safer subset of the language.
In this post, I am focussing on coding style. I will consider why it is important, include examples of various styles, share the style I prefer and finally share how I encourage teams to apply coding style.
NOTE: Some formal coding standards do include aspects of code style where readability leads to code that is dangerous to modify. Why separate style and standard out? Why treat them as different? I think that there are multiple reasons.
- Coding Standards can be debated based on known issues that impact the portability and reliability of code, Coding Style is much more a matter of opinion.
- In a large organisation, multiple code bases may be developed over many years. It may be desirable to move these to a common coding standard over time, but there is probably less reason to change the style.
- Enforcement, violating a coding style is less dangerous, so less rigour needs to be applied if there is a deviation.
- Automatic application, many aspects of a coding style can be applied automatically as there is no change to the code function. This is not the case with a coding standard.
Why is a coding style important?
As software engineers, we spend a long time reading code, much more time than when we write code. Even during the process of writing code, it is estimated that we spend around ten times as long reading and rereading the code we have written as we do writing code. When we work as a team we spend a significant amount of time reading the code other team members have written, either when pairing, code reviewing or maintaining. A consistent style helps us read our own and each other’s code more easily.
How to define a coding style for your company/team?
Coding style is to a large extent a matter of opinion. In my experience, if I have a team of six engineers and ask them to define a coding style, typically I will get seven or eight proposals. Coding style is one area of software engineering where it helps to have a leader who can make the ultimate decision. My preference is to have a team discussion, share opinions and then the leader makes a decision based on the opinions that have been heard. Typically the details of the coding style are less important than the chosen style being followed by the whole team. It’s team rules that count.
What goes into a coding style?
Clean code has three topic areas that in my opinion should go into a coding style: Naming, Comments and Formatting. Let’s look at these in turn.
Naming
Naming is important, Robert C Martin states “You should name a variable using the same care with which you name a first-born child”. I typically take several attempts at getting good names, and I rename identifiers frequently. Below are a few guides to consider when naming things.
Reveal Intention
Names should reveal their intention, it’s better to rename a poor variable name than add a comment.
double temp; // Temperature in degrees C
double temperatureDegreesC;
Don’t tell lies
For example, the code below lies, it lies about the type of MeasurementList, it implies that it is a list when in fact it is an array.
Measurement measurementList[MAX_MEASUREMENTS];
Distinguishable
Use names that can easily be distinguished, and avoid very long identifiers that only differ at the end.
int veryLongIntegerIdentifierThatOnlyChangesAtTheEnd1;
int veryLongIntegerIdentifierThatOnlyChangesAtTheEnd2;
Do I need to say…
Avoid confusing identifiers that could be confused with numbers. Apparently, code like this exists, I’ve never seen it in the wild, fortunately!
int product(int O, int I)
{
return O*I;
}
Distinguish arguments
Consider these declarations for strncpy:
char *strncpy(char *string1, const char *string2, size_t count);
char *strncpy(char *dest, const char *src, size_t n);
char * strncpy ( char * destination, const char * source, size_t num );
Which version has the most useful argument names? In the first version, we need to refer to the documentation to know which is the source and which is the destination. In the second, it is clearer but consider for a moment if English isn’t your first language and you aren’t used to abbreviations (There was a good reason for abbreviations once upon a time, I remember in the 1980s editing source code to make the identifiers shorter so that the compiler could fit a source module in memory!). The final version is clearer, although I prefer count
to num
for the final argument.
Don’t put types in names
Having the type in a name just adds noise, it was useful back in the days when we had to print out source code to see a module, in modern IDEs the type is shown when you hover over an identifier.
Having the type in a name just adds noise, it was useful back in the days when we had to print out source code to see a module, in modern IDEs the type is shown when you hover over an identifier.
Consider the example below, String
is just noise, the simple name is better.
char name[MAX_LENGTH];
char nameString[MAX_LENGTH];
Distinguishable Identifiers
Avoid having similar function names that don’t clearly distinguish what they do. For instance what is the difference between these functions? It may be possible to guess, but it isn’t obvious.
Sample* GetSample();
Sample* GetSamples();
SampleInfo* GetSampleInfo();
Pronounceable
How much easier to read is the second example? Imagine for a moment that English isn’t your first language and then look at the first example, it appears just a random collection of letters.
struct Mmnt
{
Date genhmsdmy;
Date modhmsdmy;
};
struct Measurement
{
Date generationTimestamp;
Date modificationTimestamp;
};
Hungarian notation
Hungarian notation is a system of encoding the type of a variable into the name, an example would be m_szName
to identify a member variable that is a zero-terminated string. The system was popular in some circles in the 1980s, it had some use when code was printed out and poured over on paper listings. However, there are problems, types got changed without renaming, and readability was reduced. Microsoft was one of the strongest proponents, but even they now recommend against its use.
Class Names
Prefer using nouns, for example
Measurement
File
ReportWriter
Function/Method Names
Prefer using verbs or verb phrases, for example, getName
, write
, deleteRecord
Vocabulary
Consider the vocabulary that you use, when you are referring to design patterns, use the names of the pattern. After all, those reading the code are most likely software engineers and so you convey meaning. Also, consider your customers’ language, and use a consistent metaphor in your code, user stories, tests and conversations with your users
Comments
My views on comments have changed over the years. When I started out code was considered good when it was well-commented. I worked in businesses where there were closing standards that required every variable, method, class and constant to be commented, with standardised comments that could be turned into printed documentation. Perhaps there was some value in this in the early days, when identifiers had to be kept small, and as a consequence were not descriptive, and when code reviews were done by printing out listings on z-fold paper. However these days, most code is read on a computer, IDE’s show type information by hovering over identifiers, there aren’t limits on identifier lengths. IDEs are aware of identifiers and classes and so predict identifiers, so it doesn’t take much longer to type long meaningful identifiers.
These days I encourage a coding style not to mandate comments. The only comments that I would mandate in a coding standard (if there is a need) are copyright and license notices. Everything else I would leave to the professionalism of the developers.
I would specifically ban commented-out code, we have version control systems these days if old code is needed.
I have a large collection of bad comments that I have collected (and anonymised) over the years, perhaps I should do another post highlighting bad comments!
Formatting
Formatting is all about clarity, about being able to read the structure of the code quickly and accurately. I’m convinced that if I asked three software engineers for what formatting rules they wanted to follow we would have at least four opinions. In reality, I believe that consistency of formatting within a code base is what is critical, rather than any one particular style. So without saying which is better, I will show a few examples of the same code formatted differently for you to consider which is easiest to understand.
Vertical formatting
Consider openness between different concepts and density, how much code you can see at a time. The example below is one I found as shown on the left (identifiers changed to anonymise it). All that is changed between the three examples is vertical spacing and the removal of redundant comments. Which do you find easier to read?
class Sample {
public:
struct sTime {
uint8_t Hour;
uint8_t Minute;
uint8_t Second;
};
struct sDate {
uint8_t Day;
uint8_t Month;
uint8_t Year;
};
class Sample {
public:
struct sTime {
uint8_t Hour;
uint8_t Minute;
uint8_t Second;
};
struct sDate {
uint8_t Day;
uint8_t Month;
uint8_t Year;
};
///----------------------------------------------------------------------------
/// <summary> A sample </summary>
class Sample {
public:
///------------------------------------------------------------------------
/// <summary> a sample time </summary>
struct sTime {
///--------------------------------------------------------------------
/// <summary> The hour </summary>
uint8_t Hour;
///--------------------------------------------------------------------
/// <summary> The minute </summary>
uint8_t Minute;
///--------------------------------------------------------------------
/// <summary> The second </summary>
uint8_t Second;
};
///------------------------------------------------------------------------
/// <summary> a sample date </summary>
struct sDate {
///--------------------------------------------------------------------
/// <summary> The Day </summary>
uint8_t Day;
///--------------------------------------------------------------------
/// <summary> The Month </summary>
uint8_t Month;
///--------------------------------------------------------------------
/// <summary> The Year </summary>
uint8_t Year;
};
While considering vertical formatting, the order of items is significant too. Prefer placing definitions as close as possible to use, and reducing scope wherever possible. Consider placing dependent functions in calling order, and close together this has the effect that code is ordered in decreasing levels of abstraction.
Horizontal formating
Firstly, how long should the maximum line be? Should there be a limit? Back in the day, a limit of 80 characters was common because this was the number of characters that fitted across a terminal. We then got bigger terminals and the limit went up to perhaps 132 characters. My current monitor fits well over 300 characters at my chosen font size in an editor window, the fact it fits on screen doesn’t make 300 character lines a good idea though!
Consider the openness vs the density of code horizontally. Which of these reads better
bool AppendMessage(LogHandle self, const char *fileName, const LogMessage *message)
{
File_Open(self->file, fileName, FileModeAppend);
File_Write(self->file, message->file, strlen(message->file));
File_Write(self->file, message->line, strlen(message->line));
File_Write(self->file, message->error, strlen(message->error));
return File_Close(self->file);
}
bool AppendMessage(LogHandle self,const char *fileName,const LogMessage *message)
{
File_Open(self->file,fileName,FileModeAppend);
File_Write(self->file,message->file,strlen(message->file));
File_Write(self->file,message->line,strlen(message->line));
File_Write(self->file,message->error,strlen(message->error));
return File_Close(self->file);
}
Alignment is a detail where I have changed my opinion. I used to like everything to line up in neat columns, before code beautifiers this could take a significant time, but the code looked beautiful. Was it more readable though? Look at these two examples, in which example is it easiest to determine which value is being assigned to which variable?
const int exampleInt = 5;
const int secondExample = 6734;
const int thirdExampleThisOneHasALongName = 2;
const char* exampleString = "Hello World";
const int exampleInt = 5;
const int secondExample = 6734;
const int thirdExampleThisOneHasALongName = 2;
const char* exampleString = "Hello World";
Indentation and placement of parenthesis, this topic seems to cause wars between opposing camps. Which do you prefer?
Error GetLastError(){
if(errorBuffer.Used() == 0)
{
return NoError;
}
return errorBuffer.GetLatest();
}
Inconsistent placement
Error GetLastError()
{
if(errorBuffer.Used() == 0)
{
return NoError;
}
return errorBuffer.GetLatest();
}
Allman
Error GetLastError(){
if(errorBuffer.Used() == 0) {
return NoError;
}
return errorBuffer.GetLatest();
}
K&R
Error GetLastError() {
if(errorBuffer.Used() == 0)
return NoError;
return errorBuffer.GetLatest()
}
Clean Code
How are dummy scopes written to be most readable, and least likely to cause errors when code is maintained?
while(c != getc(stdin));
while(c != getc(stdin))
;
while(c != getc(stdin))
{}
while(c != getc(stdin))
{
}
while(c != getc(stdin)){}
Applying or Enforcing a Coding Style
I prefer coding styles to be applied in two distinct ways:
Firstly for naming and commenting, the guidelines should be documented so that all team members know what is expected, but keep it light, the goal is readability. Application and enforcement become part of the coding, refactoring and reviewing cycle. Any comments should be peer reviewed along with the code, and names should be scrutinised.
The application of formatting is different. I prefer to have a coded set of rules that can be applied automatically. Any developer should be able to take any source file and automatically format it using a shared team definition for the format. Most IDEs have a code formatting built in these days and the rules can be exported, checked into version control and shared throughout the team. Some code formatters can be integrated with version control systems to automatically apply formatting rules when code is checked in, either to the whole file or to only the changed lines.
My personal preference
Naming
I prefer to use
- PascalCase for types and for functions.
- camelCase for members and for variables.
- UPPER_CASE for all #defines.
- PascalCase for enumeration values (I used to use UPPER_CASE, this is a recent change)
I do make exceptions, for instance, I may name a method accessing electrical current Current_mA(). I also break by naming preferences accidentally, particularly when I’m engaged with a client that has a different style, I find it hard to switch.
Comments
I prefer to keep comments to a minimum. If required I add a copyright/license block at the top of a file. Otherwise, comments are limited to explaining intent when I have failed to express myself in the code.
I prefer to use markdown files kept with the source code to express higher-level concepts and to provide any necessary documentation.
Formatting
I like Allman-style formatting, with braces on their own lines. I use spaces and not tabs and I indent by four spaces. I prefer to use my IDE to apply formatting rules, my current preferred IDE is Visual Studio Code with clang-format.