Failures are often the cause of a lot of frustration in both writing and using software. One of the reasons for this is that failures sometimes show obscure error messages that do not explain what went wrong thoroughly, or how to fix the problem. In the worst cases, the problem might not become clear even after several internet searches. This is not only an example of poor software but of poor service towards your users. After writing and reading a lot of error messages, I saw that there were three things that every error message should have for it to be descriptive and usable.
What went wrong?
It amazes me that some error messages fail to describe what went wrong. Asking yourself what went wrong should be a very basic part of the failing process. The first part of the error message should make sure to describe what the software was doing, providing context from the beginning. This part usually talks about an action that has failed, such as ‘Cannot read value...’, ‘Unable to contact…’, ‘Failed to load…‘
Describing what went wrong is just as important as describing why it went wrong. If you only know the reason for a problem, but not the problem itself, you are no wiser. ‘Cannot read value from file because file was in invalid format‘ is a lot more descriptive than ‘File was in invalid format‘. However, people often forget this first part because when writing software you think that the context is clear from the code. What we forget is that your code may run in different scenarios, where this context might be missing. When you make sure to start your error message with an action that has failed, you have already made a great improvement for better defect localization.
Why did it go wrong?
Just as often as the context is missing, the reason is missing from error messages. Describing the reason why something went wrong is critical to a problem, as it points the finger to the part where you have to start your search to fix the problem. Is it something about the input? Something related to the environment? Missing information, invalid information or unavailable information are common reasons for failure.
Only describing what went wrong without describing why something went wrong is just as bad as forgetting to provide context. ‘Cannot write to file because it does not exist‘ already gives you an idea why the ‘write’ function fails, but ‘Cannot write to file‘ is so obscure that it is as bad as not providing an error message at all. These type of error messages are also often a sign of poor error handling, as they would seem that a lot of different types of failures all receive the same error message, and that the reason for the failure is not clear at the point where the error message was constructed. Make sure to include a reason why the failure occurred and if you cannot determine a reason, make sure to fine-tune your error handling so that you do.
How to stop it from going wrong?
Adding context and a reason to your error message is already a big improvement and will put you right to the group of better-styled failures, but there is more. People should be able to know what to do after reading your error message. If that action is not clear – and it usually is not – then you should also end your error message with a description of how to fix the problem. ‘Cannot contact the service because the service did not respond, please make sure to start the service‘ provides an action to the user as to what to do. Because it also has a context and a reason, the user knows why they are doing it. Saying only ‘Please make sure to start the service‘ does not help, and people would be hesitant to take this action if they do not know why.
A description without context or reason may seem like a ‘good enough’ error message in some cases, but they usually are not. ‘Cannot contact service, please start the service‘ or ‘Service did not respond, please start the service‘ may seem OK because of the simplicity of the error message. If we use a real-life error message, you’d see the problem: ‘Cannot POST HTTP request to ‘https://my-service-is-running-here/ because the service did not respond with a 202 Accepted, please make sure the service is enabled to handle these requests.‘
Look at when we remove the context or reason: ‘Service did not respond with a 202 Accepted, please make sure the service is enabled to handle these requests‘ or ‘Cannot POST HTTP request to https://my-service-is-running-here/, please make sure the service is enabled to handle these requests.‘ These are still error messages that leave the user in the dark and because we do not have enough information, it could also be a symptom of poor software quality as the action may not result in a fixed problem. Then you have a real problem.
💡 Referring to an external URL could help if the action to be taken is too long or complex to be included in the error message itself. But make sure to at least give an idea as to what might help.
Conclusion
Software is often written with only a ‘success’ path in mind. Because of this, failures are often handled globally and error messages are reduced to obscure sentences or even words. You would think that this 3-pilar error message structure is a commonly known concept and that the big software companies use this format, but that is anything but true.
This structure is something that I use when handling failures. It is not something that I have read or learned somewhere. It is by reading and handling a lot of errors that I realized that a good error message is actually constructed out of three separate parts: a context, a reason, and a solution.
Thanks for reading!
Stijn
P.S. if you really want to constrict this in your software, you could enforce this throughout.
Subscribe to our RSS feed