Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
Ben Nadel at cf.Objective() 2009 (Minneapolis, MN) with: Ted Steinman and Tim Meyer and Andy Pittman
Ben Nadel at cf.Objective() 2009 (Minneapolis, MN) with: Ted Steinman ( @tedsteinmann ) , Tim Meyer ( @timlmeyer ) , and Andy Pittman ( @apittman )

Considering When To Throw Errors, Why To Chain Them, And How To Report Them To Users

By Ben Nadel on

The truth is, I am not great at dealing with Exceptions and Error reporting in web application development. And, as someone who's been in web development for over 15-years, this saddens me. But, I'm trying to do something about it. For the last several months, I've been reading articles, watching presentations, and listening to podcasts about error handling all while constructing a foundational mental model for how I wanted to implement error handling in my own code. My mental model is not yet complete - and it will take more field experience in order to polish it; but, I'd like to share what I have so far.

The first thing I did on this journey was to start reading, watching, and listening to loads of content relating to error handling - just letting it wash over me. During this phase, I was doing nothing more than saturating my subconscious with interesting details. Then, once the content started to feel repetitive, I went back through all the content and extracted evidence that I found compelling in one way or another (see evidence below).

Once I had a good amount of evidence, I then went back and attempted to reduce the evidence down into a set of DO and DO NOT rules regarding error handling. I know that most expert programmers will say "it depends" to just about everything in computer science. But, that's not how my brain is wired (yet). I enjoy rules. I thrive on rules. And so, when I am learning, my mental model becomes an aggregation of rules that represent my understanding to date. That's not to say that these rules don't evolve over time - only to say that I represent wisdom behind rules, not behind more abstract layers of wisdom.

That said, the following list contains the DO and DO NOT rules that I have come up with (so far) regarding error handling in web application development:

CAVEAT: I had already decided that throwing exceptions was cleaner than returning error codes. As such, I wasn't looking to see which approach was better - I was looking for evidence on how to carry about error handling more coherently.

The DO Rules:

  • DO throw an error from any method that cannot carry out the function it was contracted to do.
  • DO design code that - at a minimum - provides the "Fundamental Guarantee", which says that no resources will be leaked and no component will be left in an undefined state if an error is thrown (use those "Finally" blocks to clean up after yourself!).
  • DO consider robustness far before performance. Errors should be rare enough that performance overhead doesn't really matter. If you suspect performance is an issue, measure the performance first so that you can make an educated decision about refactoring.
  • DO keep the granularity and robustness of error handling commensurate with the cost of failure. A Pace Maker will necessarily have more error handling than a web-based TO-DO application.
  • DO have a global error handler at the boundary of your system (such as the web-application framework, aka the "delivery mechanism") that catches errors and recovers (preventing a crash). This handler should prevent technical details from leaking out of the system, while providing a meaningful response that the client can consume.
  • DO keep a centralized list of all the "error responses" that can be reported back to the user. This serves to document the behavior of the application and, as an added benefit, encourages consistency and facilitates internationalization of messages. The translation of a thrown error to one of these error responses should be done at the global / boundary error handler.
  • DO chain errors when you want to semantically-enrich the error.
  • DO semantically-enrich an error when you believe that the layer(s) above you might want to react to the semantically-enriched error. Reacting might involve something complex like a retry and fail-over algorithms; or, something simple like responding with a more meaningful HTTP status code.
  • DO include the "root cause" error when semantically-enriching an error. You never want to lose the message or the stack trace from the underlying error.
  • DO include sufficient contextual information when throwing (or chaining) an error. Your error object should contain enough contextual information that a programmer can easily reconstruction why the error occurred when looking at the logs.
  • DO design your errors to be consumed specifically by programmers (NEVER USERS). Keep your errors full of helpful technical and contextual information that can facilitate the debugging of the application.
  • DO prefer text over numbers for API "error codes". Doing so makes it easier to search (Google) for further documentation on the error's meaning (ex, prefer "RATE_LIMIT_EXCEEDED" over "E9004"). This also makes the error code easier to understand without documentation.
  • DO feel free to create subtle variations of error types and code that may help programmers and support engineers debug the application. For example, you can use "CONFLICT_ON_CREATE" and "CONFLICT_ON_RENAME" even when the same underlying "conflict" business rule is being violated. This subtle variation helps narrow in on the problem quickly and facilitates the tailoring of more specific error responses for the user (at the global / boundary error handler).
  • DO provide a way to collect superficial validation errors before a workflow is enacted. This allows the application to report multiple validation errors to the user instead of just one (based on the first error thrown in the workflow). That said, you should throw an exception if one of those validation errors is still present once the workflow has been engaged.
  • DO log "expected" domain errors and "unexpected" technical errors at different log-levels so that they can be easily identified and separated in the logs.
  • DO log meta-data about the request from which the error originated. Meta-data like "user ID", "IP address", and server / pod name can be really helpful when debugging. Correlation / request tracing IDs can also help track errors across distributed systems.
  • DO consider returning NULL instead of throwing a NOT_FOUND exception in low-level APIs if and only if it makes the code easier to consume and does not create ambiguity around the meaning of NULL (sorry Yegor).

The DO NOT Rules:

  • DO NOT catch errors unless you really have to. Every "Catch" block has to have a really good reason to exist in your application.
  • DO NOT chain errors just for the sake of chaining them. Only chain when semantically enriching an error or when adding critical contextual information.
  • DO NOT chain errors simply because of language constraints. Meaning, some languages let you catch a subset of errors (allowing others errors to continue to propagate up the call stack); some languages do not. If you have to catch all errors locally in order to inspect the type (so to handle a subset of types), do not feel like you have to chain the rethrow of the non-handled errors.
  • DO NOT omit an underlying error when throwing a new error. Always include the underlying error as the "root cause" of the error you are throwing.
  • DO NOT let your end-users see error messages. Your application should catch errors at the boundary and translate them into "error responses" (which are always safe - and intended - to show the user).
  • DO NOT internationalize your error messages - doing so is an indication that you are targeting error messages at the users, not programmers. Error messages are for programmers.
  • DO NOT throw "domain errors" based on "technical errors". Meaning, a "NETWORK_FAILURE" technical error should never lead to a "USER_AUTHENTICAION_FAILED" domain error. Technical errors should bubble up through the system as technical errors (except for when they are being handled locally in retry or fallback algorithms, for example).
  • DO NOT log personally identifiable information (PII). Be sure to scrub common properties like "password", "ssn", and "creditcard" out of error payloads before logging them.
  • DO NOT swallow errors. Either log them or throw them (but never both).
  • DO NOT return error codes instead of throwing errors. It violates the Command Query Separation principle and litters your code with error handling logic.

For me, learning effectively requires more than just reading and writing - I have to code. And, since writing is open to interpretation, I wanted to try and come with a small Node.js demo that would illustrate how I interpret the rules that I have described above and how I would actually manifest them in an application.

The demo is rather trite - a JSON (JavaScript Object Notation) API that allows you to Set and Get a single message. I wanted something that was small in scope; but, sufficiently layered so as to include as many of the DO and DO NOT rules as I could manage. What I came up with was a small Express.js application with two routes that map to two Use-Cases that both share a single Repository:


 
 
 

 
 Error handing, error chaining, error reporting rules. 
 
 
 

I tried to keep the code as simple as I could while still keeping it informative. As such, the organization of the code does not necessarily represent best practices. And, to be completely transparent, I am stil not sure how I would want to implement validation prior to invokation (ie, collecting error messages without throwing errors).

I've also tried to keep all irrelevant comments out of the code (which, if you know me, was a monumental challenge of wills). The only comments in the demo are the inclusion of DO and DO NOT rules where they clarify the choices I am making.

That said, let's start from the bottom up - the repository - the source of data. This is where the messages live, using an implementation from which the rest of the application can remain decoupled. As we go through the code, I'll try to let the comments speak for themselves; but, I'll provide additional insight if I think it adds value.

  • // Require the application modules.
  • var AppError = require( "./app-error" ).AppError;
  •  
  • // ----------------------------------------------------------------------------------- //
  • // ----------------------------------------------------------------------------------- //
  •  
  • exports.MessageRepository = class MessageRepository {
  •  
  • constructor() {
  •  
  • this._message = null;
  •  
  • }
  •  
  • // ---
  • // PUBLIC METHODS.
  • // ---
  •  
  • getMessage() {
  •  
  • var promise = new Promise(
  • ( resolve, reject ) => {
  •  
  • // DO NOT throw "domain errors" based on "technical errors". Meaning, a
  • // "NETWORK_FAILURE" technical error should never lead to a
  • // "USER_AUTHENTICAION_FAILED" domain error. Technical errors should
  • // bubble up through the system as technical errors (except for when they
  • // are being handled locally in retry or fallback algorithms, for example).
  • this._simulateNetworkProblems();
  •  
  • // DO throw an error from any method that cannot carry out the function
  • // it was contracted to do.
  • if ( ! this._message ) {
  •  
  • throw(
  • new AppError({
  • type: "Message.NotFound"
  • })
  • );
  •  
  • }
  •  
  • resolve( this._message );
  •  
  • }
  • );
  •  
  • return( promise );
  •  
  • }
  •  
  • setMessage( newMessage ) {
  •  
  • this._message = newMessage;
  •  
  • return( Promise.resolve() );
  •  
  • }
  •  
  • // DO consider returning NULL instead of throwing a NOT_FOUND exception in low-
  • // level APIs if and only if it makes the code easier to consume and does not
  • // create ambiguity around the meaning of NULL (sorry Yegor).
  • tryGetMessageOrNull() {
  •  
  • var promise = new Promise(
  • ( resolve, reject ) => {
  •  
  • // DO NOT throw "domain errors" based on "technical errors". Meaning, a
  • // "NETWORK_FAILURE" technical error should never lead to a
  • // "USER_AUTHENTICAION_FAILED" domain error. Technical errors should
  • // bubble up through the system as technical errors (except for when they
  • // are being handled locally in retry or fallback algorithms, for example).
  • this._simulateNetworkProblems();
  •  
  • resolve( this._message || null );
  •  
  • }
  • );
  •  
  • return( promise );
  •  
  • }
  •  
  • // ---
  • // PRIVATE METHODS.
  • // ---
  •  
  • _simulateNetworkProblems() {
  •  
  • if ( Math.random() < 0.5 ) {
  •  
  • throw( new Error( "NetworkFailure" ) );
  •  
  • }
  •  
  • }
  •  
  • };

In this repostory, I am including some simulated (albeit greatly exaggerated) network problems. I did this because I wanted to illustrate my choice to let non-domain errors propagate through the stack as "technical errors". Notice that I am not trying to wrap the network error in some sort of repository error (ex, "Message.ReadFailure"). I am doing this because I don't believe the calling context needs to handle network problems explicitly. That said, the application may evolve over time to require this differentiation, at which point I may consider chaining the underlying network failure error.

The repository is then used by the two Use-Case for getting and setting the message. Let's look at the GetMessageQuery first:

  • // Require the application modules.
  • var AppError = require( "./app-error" ).AppError;
  •  
  • // ----------------------------------------------------------------------------------- //
  • // ----------------------------------------------------------------------------------- //
  •  
  • exports.GetMessageQuery = class GetMessageQuery {
  •  
  • constructor( messageRepository ) {
  •  
  • this._repository = messageRepository;
  •  
  • }
  •  
  • // ---
  • // PUBLIC METHODS.
  • // ---
  •  
  • execute() {
  •  
  • var promise = this._repository
  • .getMessage()
  • .catch(
  • ( error ) => {
  •  
  • // DO chain errors when you want to semantically-enrich the error.
  • if ( AppError.is( error, "Message.NotFound" ) ) {
  •  
  • // DO semantically-enrich an error when you believe that the
  • // layer(s) above you might want to react to the semantically-
  • // enriched error. Reacting might involve something complex like
  • // a retry and fail-over algorithms; or, something simple like
  • // responding with a more meaningful HTTP status code.
  • throw(
  • new AppError({
  • type: "Message.NotYetSet",
  • // DO design your errors to be consumed specifically by
  • // programmers (NEVER USERS). Keep your errors full of
  • // helpful technical and contextual information that can
  • // facilitate the debugging of the application.
  • detail: "The user has tried to access the message before any message has been set in the system.",
  • // DO include the "root cause" error when semantically-
  • // enriching an error. You never want to lose the message
  • // or the stack trace from the underlying error.
  • rootCause: error
  • })
  • );
  •  
  • }
  •  
  • // DO NOT chain errors simply because of language constraints.
  • // Meaning, some languages let you catch a subset of errors (allowing
  • // others errors to continue to propagate up the call stack); some
  • // languages do not. If you have to catch all errors locally in order
  • // to inspect the type (so to handle a subset of types), do not feel
  • // like you have to chain the rethrow of the non-handled errors.
  • throw( error );
  •  
  • }
  • )
  • ;
  •  
  • return( promise );
  •  
  • }
  •  
  • };

Here, you can see where I am really stretching to come up with sufficiently illustrative cases in such a trite demo. In this application, you're not allowed to get the message until a message is set. I could have just let the NotFound error bubble up to the top; but, I am wrapping the "Message.NotFound" error in a "Message.NotYetSet" error (ie, chaining it) so as to enrich it semantically. Among other things, this will allow us to return a more informative error response to the user.

That said, you can see that there is a pathway in which I am just rethrowing the error without chaining it. This is to make up for the fact that JavaScript / Node.js doesn't let me differentiate errors at the Catch-level. As such, I'm blindly rethrowing the errors I don't care about in order to make up for the technical deficit in the language.

NOTE: Some user-land Promise libraries allow you to provide .catch() handlers that are bound to a particular "instanceof" Error. But, I have never used that feature personally (just never tried it).

The SetMessageQuery use-case is slightly more complicated because it attempts to provide a pre-action validation hook:

  • // Require the application modules.
  • var AppError = require( "./app-error" ).AppError;
  •  
  • // ----------------------------------------------------------------------------------- //
  • // ----------------------------------------------------------------------------------- //
  •  
  • exports.SetMessageCommand = class SetMessageCommand {
  •  
  • constructor( messageRepository ) {
  •  
  • this._repository = messageRepository;
  •  
  • }
  •  
  • // ---
  • // PUBLIC METHODS.
  • // ---
  •  
  • execute( newMessage ) {
  •  
  • var promise = new Promise(
  • ( resolve, reject ) => {
  •  
  • // DO provide a way to collect superficial validation errors before a
  • // workflow is enacted. This allows the application to report multiple
  • // validation errors to the user instead of just one (based on the first
  • // error thrown in the workflow).....
  • var errors = this.validate( newMessage );
  • // ..... That said, you should throw an exception if one of those
  • // validation errors is still present once the workflow has been engaged.
  • if ( errors.length ) {
  •  
  • throw(
  • new AppError({
  • type: errors[ 0 ].type,
  • extendedInfo: errors[ 0 ].extendedInfo
  • })
  • );
  •  
  • }
  •  
  • this._repository
  • // DO consider returning NULL instead of throwing a NOT_FOUND
  • // exception in low-level APIs if and only if it makes the code
  • // easier to consume and does not create ambiguity around the
  • // meaning of NULL (sorry Yegor).
  • .tryGetMessageOrNull()
  • .then(
  • ( message ) => {
  •  
  • if ( message === newMessage ) {
  •  
  • // DO NOT return error codes instead of throwing errors.
  • // It violates the Command Query Separation principle and
  • // litters your code with error handling logic.
  • throw(
  • new AppError({
  • type: "Message.Conflict",
  • // DO include sufficient contextual information
  • // when throwing (or chaining) an error. Your
  • // error object should contain enough contextual
  • // information that a programmer can easily
  • // reconstruction why the error occurred when
  • // looking at the logs.
  • extendedInfo: {
  • message: message,
  • newMessage: newMessage
  • }
  • })
  • );
  •  
  • }
  •  
  • return( this._repository.setMessage( newMessage ) );
  •  
  • }
  • )
  • .then( resolve, reject )
  • ;
  •  
  • }
  • );
  •  
  • return( promise );
  •  
  • }
  •  
  • // DO provide a way to collect superficial validation errors before a workflow is
  • // enacted. This allows the application to report multiple validation errors to the
  • // user instead of just one (based on the first error thrown in the workflow). That
  • // said, you should throw an exception if one of those validation errors is still
  • // present once the workflow has been engaged.
  • validate( newMessage, errors = [] ) {
  •  
  • if ( ( typeof newMessage ) !== "string" ) {
  •  
  • errors.push({
  • type: "Message.Invalid",
  • extendedInfo: {
  • message: newMessage
  • }
  • });
  •  
  • }
  •  
  • if ( ! newMessage ) {
  •  
  • errors.push({
  • type: "Message.Blank"
  • });
  •  
  • }
  •  
  • if ( newMessage.length > 100 ) {
  •  
  • errors.push({
  • type: "Message.TooLong",
  • extendedInfo: {
  • message: newMessage,
  • maxLength: 100
  • }
  • });
  •  
  • }
  •  
  • return( errors );
  •  
  • }
  •  
  • };

As I stated earlier, allowing for pre-command validation is the part of this entire application that feels the most shaky for me. That said, it's mostly an implementation detail. Hopefully, what I am illustrating here is that the pre-command validation allows the web-application to collect superficial validation errors. But, if the web-application were to continue trying to process the request - ignoring the validation errors - the use-case would through an exception. This goes back to the base philosophy that a method should throw an error if it cannot do what it said it would do, which in this case, would be to "save a message."

Both of these use-cases represent the interactions that the "application core" is capable of performing. And, both of these use-cases are consumed by the Express.js application, which is the "delivery mechanism" for the application:

  • // Require the core node modules.
  • var bodyParser = require( "body-parser" );
  • var chalk = require( "chalk" );
  • var express = require( "express" );
  • var util = require( "util" );
  •  
  • // Require the application modules.
  • var AppError = require( "./app-error" ).AppError;
  • var ErrorTransformer = require( "./error-transformer" ).ErrorTransformer;
  • var GetMessageQuery = require( "./get-message-query" ).GetMessageQuery;
  • var MessageRepository = require( "./message-repository" ).MessageRepository;
  • var SetMessageCommand = require( "./set-message-command" ).SetMessageCommand;
  • var Sanitizer = require( "./sanitizer" ).Sanitizer;
  •  
  • // ----------------------------------------------------------------------------------- //
  • // ----------------------------------------------------------------------------------- //
  •  
  • errorTransformer = new ErrorTransformer();
  • messageRepository = new MessageRepository();
  • sanitizer = new Sanitizer();
  •  
  • getMessageQuery = new GetMessageQuery( messageRepository );
  • setMessageCommand = new SetMessageCommand( messageRepository );
  •  
  • // ----------------------------------------------------------------------------------- //
  • // ----------------------------------------------------------------------------------- //
  •  
  • var app = express();
  • app.use( bodyParser.json() );
  • app
  • .route( "/" )
  • // I GET the message from the system.
  • .get(
  • function( request, response, next ) {
  •  
  • getMessageQuery
  • .execute()
  • .then(
  • ( message ) => {
  •  
  • response.send( message );
  •  
  • }
  • )
  • // DO have a global error handler at the boundary of your system (such
  • // as the web-application framework, aka the "delivery mechanism") that
  • // catches errors and recovers (preventing a crash). This handler should
  • // prevent technical details from leaking out of the system, while
  • // providing a meaningful response that the client can consume.
  • .catch( next )
  • ;
  •  
  • }
  • )
  • // I SET the message in the system.
  • .post(
  • function( request, response, next ) {
  •  
  • // DO provide a way to collect superficial validation errors before a
  • // workflow is enacted. This allows the application to report multiple
  • // validation errors to the user instead of just one (based on the first
  • // error thrown in the workflow). That said, you should throw an exception
  • // if one of those validation errors is still present once the workflow has
  • // been engaged.
  • var superficialErrors = setMessageCommand.validate( request.body.message );
  •  
  • // TODO: Possibly move this error handling to the global error handler using
  • // next(superficialErrors) -- this way we truly centralize all of the
  • // error recovery.
  • if ( superficialErrors.length ) {
  •  
  • response
  • .status( 400 )
  • .json( errorTransformer.getValidationResponse( superficialErrors ) )
  • ;
  • return;
  •  
  • }
  •  
  • setMessageCommand
  • .execute( request.body.message )
  • .then(
  • () => {
  •  
  • response
  • .status( 204 )
  • .end()
  • ;
  •  
  • }
  • )
  • // DO have a global error handler at the boundary of your system (such
  • // as the web-application framework, aka the "delivery mechanism") that
  • // catches errors and recovers (preventing a crash). This handler should
  • // prevent technical details from leaking out of the system, while
  • // providing a meaningful response that the client can consume.
  • .catch( next )
  • ;
  •  
  • }
  • )
  • ;
  •  
  • // DO have a global error handler at the boundary of your system (such as the web-
  • // application framework, aka the "delivery mechanism") that catches errors and
  • // recovers (preventing a crash). This handler should prevent technical details from
  • // leaking out of the system, while providing a meaningful response that the client
  • // can consume.
  • app.use(
  • function( error, request, response, next ) {
  •  
  • // DO log meta-data about the request from which the error originated. Meta-
  • // data like "user ID", "IP address", and server / pod name can be really
  • // helpful when debugging. Correlation / request tracing IDs can also help
  • // track errors across distributed systems.
  • var metaData = {
  • httpMethod: request.method,
  • httpResource: request.originalUrl,
  • ipAddress: request.ip,
  • // DO NOT log personally identifiable information (PII). Be sure to scrub
  • // common properties like "password", "ssn", and "creditcard" out of error
  • // payloads before logging them.
  • params: sanitizer.sanitizeScopeForLogging( request.params ),
  • query: sanitizer.sanitizeScopeForLogging( request.query ),
  • body: sanitizer.sanitizeScopeForLogging( request.body )
  • };
  •  
  • // DO log "expected" domain errors and "unexpected" technical errors at
  • // different log-levels so that they can be easily identified and separated
  • // in the logs.
  • if ( AppError.isDomainError( error ) ) {
  •  
  • console.log( chalk.bold.cyan( "WARNING - Domain Error" ) );
  • console.log({
  • level: "warn",
  • metaData: metaData,
  • // DO NOT log personally identifiable information (PII). Be sure to
  • // scrub common properties like "password", "ssn", and "creditcard"
  • // out of error payloads before logging them.
  • data: sanitizer.sanitizeScopeForLogging( error )
  • });
  •  
  • } else {
  •  
  • console.log( chalk.bold.red( "ERROR - Unexpected Error" ) );
  • console.log({
  • level: "error",
  • metaData: metaData,
  • // DO NOT log personally identifiable information (PII). Be sure to
  • // scrub common properties like "password", "ssn", and "creditcard"
  • // out of error payloads before logging them.
  • data: sanitizer.sanitizeScopeForLogging( error )
  • });
  •  
  • }
  •  
  • if ( ! response.headersSent ) {
  •  
  • // DO NOT let your end-users see error messages. Your application should
  • // catch errors at the boundary and translate them into "error responses"
  • // (which are always safe - and intended - to show the user).
  • var errorResponse = errorTransformer.getErrorResponse( error );
  •  
  • response
  • .type( "application/json" )
  • .status( errorResponse.status )
  • .send({
  • code: errorResponse.code,
  • status: errorResponse.status,
  • message: errorResponse.message
  • })
  • ;
  •  
  • }
  •  
  • }
  • );
  •  
  • app.listen( "8080" );

The primary point of illustration in this Express.js application is that I am using a global error handler that logs errors and translates them into responses for the user. The Express.js application is the "boundary" to our system where we prevent internal, technical details from leaking out into the Client.

In this case, I happen to be handling "validation errors" directly in the POST route handler. However, if I could go back and redo this demo (frankly, I'm exhausted), I think I would move the validation error handling down into the global error handler. This way, the route handlers could just call:

return( next( superficialErrors ) );

... and I wouldn't be littering my routes with special error responses.

When the global error handler does receive an error, it doesn't just return it to the user. Doing so would, at best, be confusing to the user and, at worst, expose security vulnerabilities to the outside world. As such, the errors that have bubbled up to the top of the application are translated into "Error responses" using an error transformer:

  • // DO keep a centralized list of all the "error responses" that can be reported back to
  • // the user. This serves to document the behavior of the application and, as an added
  • // benefit, encourages consistency and facilitates internationalization of messages.
  • // The translation of a thrown error to one of these error responses should be done at
  • // the global / boundary error handler.
  • var errorResponses = exports.errorResponses = {
  • // DO prefer text over numbers for API "error codes". Doing so makes it easier to
  • // search (Google) for further documentation on the error's meaning (ex, prefer
  • // "RATE_LIMIT_EXCEEDED" over "E9004"). This also makes the error code easier to
  • // understand without documentation.
  • "Message.Blank": {
  • status: 422,
  • message: "The message cannot be empty. Please provide a longer message."
  • },
  • "Message.Invalid": {
  • status: 400,
  • message: "The message you sent cannot be parsed. Please double-check to make sure you're sending a String value."
  • },
  • "Message.NotYetSet": {
  • status: 403,
  • message: "The message has not yet been initialized. Once you SET a message, you will be able to GET a message."
  • },
  • "Message.TooLong": {
  • status: 422,
  • message: function( error ) {
  •  
  • var message = error.extendedInfo.maxLength
  • ? `The message is too long. Please provide a message that is less than ${ error.extendedInfo.maxLength } characters.`
  • : "The message is too long. Please provide a shorter message."
  • ;
  •  
  • return( message );
  •  
  • }
  • },
  • "Server": {
  • status: 500,
  • message: "There was an unexpected error - our support team has been notified and will be investigating."
  • }
  • };
  •  
  • exports.ErrorTransformer = class ErrorTransformer {
  •  
  • getErrorResponse( error ) {
  •  
  • var errorCode = ( error && errorResponses.hasOwnProperty( error.type ) )
  • ? error.type
  • : "Server"
  • ;
  • var errorResponse = errorResponses[ errorCode ];
  • var errorStatus = errorResponse.status;
  • // DO NOT let your end-users see error messages. Your application should catch
  • // errors at the boundary and translate them into "error responses" (which are
  • // always safe - and intended - to show the user).
  • var errorMessage = ( errorResponse.message instanceof Function )
  • ? errorResponse.message( error )
  • : errorResponse.message
  • ;
  •  
  • return({
  • code: errorCode,
  • status: errorStatus,
  • message: errorMessage
  • });
  •  
  • }
  •  
  • getValidationResponse( validationErrors ) {
  •  
  • var validationResponse = validationErrors.map(
  • ( validationError ) => {
  •  
  • return( this.getErrorResponse( validationError ) );
  •  
  • }
  • );
  •  
  • return( validationResponse );
  •  
  • }
  •  
  • };

The Error Transformer acts as the point of centralization for all errors that the application can return to the client. These error response messages can be static; or, they can depend on the error that has bubbled up through the application (notice that the "Message.TooLong" error provides the "message" as an invocable, not a string).

Of course, reporting the error to the user is one thing; but, the more important aspect of error reporting is, arguably, reporting it to the engineers that are building and maintaining the application. As such, the Express.js application both returns the "error response" and logs the "exception". In this case, I'm just logging to the console (where it might be consumed on the standard error / out stream); but, a critical point in all of this is that I'm doing my best not to persist Personally Identifiable Information (PII) to the logs. I attempt to prevent this by passing the logged data through a sanitizer before I write it to the console:

  • exports.Sanitizer = class Sanitizer {
  •  
  • // ---
  • // PUBLIC METHODS.
  • // ---
  •  
  • sanitizeScopeForLogging( scope ) {
  •  
  • var sanitizedScope = Object.assign( {}, scope );
  •  
  • // NOTE: These properties don't appear to be iterable on an Error object; so,
  • // we are explicitly checking for them (and copying them over for logging).
  • scope.message && ( sanitizedScope.message = scope.message );
  • scope.stack && ( sanitizedScope.stack = scope.stack );
  •  
  • for ( var key of Object.keys( scope ) ) {
  •  
  • // DO NOT log personally identifiable information (PII). Be sure to scrub
  • // common properties like "password", "ssn", and "creditcard" out of error
  • // payloads before logging them.
  • if ( this._isBlacklisted( key ) ) {
  •  
  • sanitizedScope[ key ] = "[sanitized]";
  •  
  • }
  •  
  • }
  •  
  • // Beyond the scope of this exploration:
  • // --
  • // TODO: Recursively walk through scope.
  • // TODO: Truncate long key-values to make sure the logs remain responsive
  • // and easy to consume.
  •  
  • return( sanitizedScope );
  •  
  • }
  •  
  • // ---
  • // PRIVATE METHODS.
  • // ---
  •  
  • _isBlacklisted( key ) {
  •  
  • return(
  • key.match( /creditcard/i ) ||
  • key.match( /expiration(month|year)/i ) ||
  • key.match( /password/i ) ||
  • key.match( /ssn/i )
  • );
  •  
  • }
  •  
  • };

In reality, I would probably need to recurse through the structure being logged; but, in order to keep things as simple as possible, I'm just sanitizing top-level keys, making sure that indicators like "password" or "ssn" (Social Security Number) don't get written to log files.

And, of course, under all of this is my custom AppError, which is a subclass of the native Error (since you can subclass native Constructors in ES6):

  • exports.AppError = class AppError extends Error {
  •  
  • constructor( settings = {} ) {
  •  
  • settings.type
  • ? super( `Domain Error: ${ settings.type }` )
  • : super( "Technical Error" )
  • ;
  •  
  • this.name = "AppError";
  • this.type = ( settings.type || null );
  • this.detail = ( settings.detail || null );
  • this.rootCause = ( settings.rootCause || null );
  • this.extendedInfo = ( settings.extendedInfo || null );
  • Error.captureStackTrace( this, this.constructor );
  •  
  • }
  •  
  • // ---
  • // STATIC METHODS.
  • // ---
  •  
  • static is( error, ...typePrefixes ) {
  •  
  • if ( ! ( ( error instanceof AppError ) && error.type ) ) {
  •  
  • return( false );
  •  
  • }
  •  
  • for ( const typePrefix of typePrefixes ) {
  •  
  • if (
  • ( error.type === typePrefix ) ||
  • ( error.type.startsWith( typePrefix + "." ) )
  • ) {
  •  
  • return( true );
  •  
  • }
  •  
  • }
  •  
  • return( false );
  •  
  • }
  •  
  • static isDomainError( error ) {
  •  
  • return( ( error instanceof AppError ) && error.type );
  •  
  • }
  •  
  • }

And that's all there is to this application. Hopefully the code is simple enough to be understandable; yet, complex enough to illustrate the error handling rules that I am formulating without them seeming superfluous. And, remember that this mental model will continue to evolve over time, especially as I try to practice more of this approach in a production environment.

Now that we've looked at the code, I'll present the evidence (without explanation) upon which these rules have been built. In some cases, the content is from podcasts where I had to transcribe it manually. As such, it might not be entirely accurate.

One of the things that I found most interesting in all of this was the concept of the "Fundamental Guarantee":

Fundamental guarantee should be provided by every single part of your system - what ever happens, you must not have resource leaks; and, whatever happens, you must not have code that is in an undefined state but still available to clients.=

To me, this makes so much sense! And, again, just speaks to one of my earlier posts in which I stated that it seems odd to crash a process just because an error bubbled up to the top of the application.

ERROR HANDLING AND REPORTING EVIDENCE

This doesn't represent all of the content that I consumed; but, these were the pieces that I felt were worth sharing.

Elegant Objects By Yegor Bugayenko

It is an obvious choice we have to make when designing a method - to catch all exceptions here and now, making the method look "safe" for its users, or escalate the problems. I am in favor of the second option. Escalate them as much as you can. Every catch statement has to have a very strong reason for its existence. In other words, don't catch unless you really have to do it, and there is no other choice.

In an ideal design, there has to be only one catch statement per point of entrance into the application. For example, if it is a mobile app communicating with the user through the phone screen, it has a single entrance and must have a single catch in the entire application. Unfortunately, that is very rarely possible, mostly because Java itself and many existing frameworks are design with a different idea in mind.

(Page 202)

I'm sure this is just obvious, but let me reiterate anyway: always chain exceptions, and never ignore original ones.

But why do we need exception chaining in the first place, you may ask. Why can't we just let the exceptions float up and declare all our methods unsafe? In the example above, why catch IOException and throw it again, "wrapped" into Exception? What is wrong with the existing IOException? The answer is obvious: exception chaining semantically enriches the problem context. In other words, just receiving "Too many open files (24)" is not enough. It is too low-level. Instead, we want to see a chain of exceptions where the original one says that there are too many open files, the next one says that the file length can't be calculated, the next one claims that the content of the image can't be read, etc. If a user can't open his or her own profile picture, just returning "too many files" is not enough.

Ideally, each method has to catch all exceptions possible and rethrow them through chaining. Once again, catch everything, chain, and immediately rethrow. This is the best approach to exception handling.

(Page 207)

[main] is the only legal place for recovery... The same should happen at every entrance point. There are not so many of them, even in complex systems. What I'm saying is that there are just a few legal places for recovering in any software. Everywhere else, we must catch and rethrow or not catch at all. The first option is preferable. Always catch, chain, and rethrow. Recover only once at the highest level. That's it.

(Page 209)

If you agree with the "never recover" and "always chain" principles explained above, you would understand why exception typing is a redundant feature. Indeed, if we recover only once, we have an exception object that contains all other exceptions inside it when that happens. If properly chained, why do we need to know its type?

Moreover, we never use exceptions for flow control, right? We never catch exceptions in order to decide what to do next. we only catch in order to rethrow, right? If that's the case, we don't really care about the type of exception we're catching. We will rethrow it anyway. We simply don't need this information, because we never use it. We don't catch exceptions on their way up. Even when do catch, we do it for only one purpose: to chain them and rethrow.

(Page 212)

Exception messages are for programmers

All of the above arguments even implicitly assume that you're writing exception messages for a known system. What if you're writing a reusable library? If you're writing a reusable library, you don't know the context in which it's going to be used. If you don't know the context, then how can you write an appropriate message for an end user?

It should be clear that when writing a reusable library, you're unlikely to know the language of the end user, but it's worse than that: you don't know anything about the end user. You don't even know if there's going to be an end user at all; your library may as well be running inside a batch job.

The converse is true as well. Even if you aren't writing a reusable library, end user-targeted exception messages increases the coupling in the system, because exception messages would be coupled to a particular user interface. This sort of coupling doesn't occur in the type system, but is conceptual. Exactly because it's not tied to any type system, an automated tool can't detect it, so it's much harder to notice, and more insidious as a result.

Exception messages are not for end users. Applications can catch known exception types and translate them to context-aware error messages, but this is a user interface concern - not a technical concern.

Exceptional practices, Part 3

... The explanatory error text simply does not belong in the Java code; it should be retrieved from an external source. What happens when you want to localize this code for a foreign market? Someone will have to comb through all the source code, identifying all text strings that will be used to build exceptions. Then a translator will have to translate them all, and then you have to figure out how to maintain different versions of the same code for multiple markets. And you have to repeat this process for every release.

Even if you never plan to localize, there are other good reasons for removing error message text from code: Error messages can easily get out of sync with the documentation -- a developer might change an error message and forget to inform the documentation writer. Similarly, you might want to change the wording of all error messages of a certain type to make them more consistent, but since you must comb through all the sources to find all such error messages, you could easily miss one, resulting in inconsistent error messages.

....

The error message is separated (somewhat) from the code, so it is much easier to find when localizing the class. Separating the error message from the code reduces the work required for localization and reduces the chance that an error message will be missed. It also encourages developers to reuse the same message if multiple methods throw the same error so the program will be more likely to have a consistent set of error messages.

....

Using message catalogs to store exception message strings offers another hidden benefit. Once you've placed all the exception messages in a message catalog, you now have a comprehensive list of all the exception messages your application might throw. This provides an easy starting point for the documentation writer to use when creating the manual's "Troubleshooting" or "Error Messages" section. Message catalogs also make it easy for documentation writers to track changes in the resource bundles and ensure the documentation stays in sync with the software.

Error handling considerations and best practices

Be descriptive in your error messages and include as much context as possible. Failure to do so will cost you dearly in support later on: if your client developers cannot figure out why their request went wrong, they will look for help - and eventually that will be you who will spend time tracking down client errors instead of coding new and exiting features for your service.

If it is a validation error, be sure to include why it failed, where it failed and what part of it that failed. A message like "Invalid input" is horrible and client developers will bug you for it over and over again, wasting your precious development time. Be descriptive and include context: "Could not place order: the field 'Quantity' should be an integer between 0 and 99 (got 127)".

You may want to include both a short version for end users and a more verbose version for the client developer.

....

Error messages for end users should be localized (translated into other languages) if your service is already a multi language service. Personally I don't think developer messages should be localized: it is difficult to translate technical terms correct and it will make it more difficult to search online for more information.

....

I often find myself searching for online resources that can help me when I get some error while interacting with third party APIs. Usually I search for a combination of the API name, error messages and codes. If you include additional error codes in your response then you might want to use letters instead of digits: it is simply more likely to get a relevant hit for something like "OAUTH_AUTHSERVER_UNAVAILABLE" than "1625".

Best Practices To Create Error Codes Pattern For an Enterprise Project in C#

There is a difference between error codes and error return values. An error code is for the user and help desk. An error return value is a coding technique to indicate that your code has encountered an error.

One can implement error codes using error return values, but I would advice against that. Exceptions are the modern way to report errors, and there is no reason why they should not carry an error code within them.

This is how I would organize it (Note that points 2-6 are language agnostic):

1. Use a custom exception type with an additional ErrorCode property. The catch in the main loop will report this field in the usual way (log file / error pop-up / error reply). Use the same exception type in all of your code.
2. Do not start at 1 and don't use leading zeros. Keep all error codes to the same length, so a wrong error code is easy to spot. Starting at 1000 usually is good enough. Maybe add a leading 'E' to make them clearly identifiable for users (especially useful when the support desk has to instruct users how to spot the error code).
3. Keep a list of all error codes, but don't do this in your code. Keep a short list on a wiki-page for developers, which they can easily edit when they need a new code. The help desk should have a separate list on their own wiki.
4. Do not try to enforce a structure on the error codes. There will always be hard-to-classify errors and you don't want to discuss for hours whether an error should be in the 45xx group or in the 54xx group. Be pragmatic.
5. Assign each throw in your code a separate code. Even though you think it's the same cause, the help desk might need to do different things in different cases. It's easier for them to have "E1234: See E1235" in their wiki, than to get the user to confess what he has done wrong.
6. Split error codes if the help desk asks for it. A simple if (...) throw new FooException(1234, ".."); else throw new FooException(1235, ".."); line in your code might save half an hour for the help desk.

And never forget that the purpose of the error codes is to make life easier for the help desk.

The Generation, Management and Handling of Errors (Part 1)

One of the key requirements for any group required to maintain a system is the ability to detect errors when they occur and to obtain sufficient information to diagnose and fix the underlying problems from which those errors spring. If incorrect or inappropriate error information is generated from a system it becomes difficult to maintain. Too much error information is just as much of a problem as too little. Although most modern development environments are well provisioned with mechanisms to indicate and log the occurrence of errors (such as exceptions and logging APIs), such tools must be used with consistency and discipline in order to build a maintainable application. Inconsistent error handling can lead to many problems in a system such as duplicated code, overly-complex algorithms, error logs that are too large to be useful, the absence of error logs and confusion over the meaning of errors. The incorrect handling of errors can also spill over to reduce the usability of the system as unhandled errors presented to the end user can cause confusion and will give the system a reputation for being faulty or unreliable. All of these problems are manifest in software systems targeted at a single machine. For distributed systems, these issues are magnified.

....

Split Domain And Technical Errors

This pattern language classifies errors as 'domain' or 'technical' and also as 'expected' and 'unexpected'. To a large degree the relationship between these classifications is orthogonal. You can have an expected domain error (no funds in the account), an unexpected domain error (account not in database), an expected technical error (WAN link down - retry), and an unexpected technical error (missing link library). Having said this, the most common combinations are expected domain errors and unexpected technical errors.

....

Design and development policies should be defined for domain and technical error handling. These policies should include:

* A technical error should never cause a domain error to be generated (never the twain should meet). When a technical error must cause business processing to fail, it should be wrapped as a SystemError.
* Domain errors should always start from a domain problem and be handled by domain code.
* Domain errors should pass 'seamlessly' through technical boundaries. It may be that such errors must be serialized and re-constituted for this to happen. Proxies and facades should take responsibility for doing this.
* Technical errors should be handled in particular points in the application, such as boundaries (see Log at Distribution Boundary).
....

Positive consequences:

* The business error handling code will be in a completely different part of the code to the technical error handling and will employ different strategies for coping with errors.
* Business code needs only to handle any business errors that occur during its execution and can ignore technical errors making it easier to understand and more maintainable.
* Business error handling code can



Reader Comments