Input sanitization vs input validation
Input sanitization and input validation are different topics but are in the same family. Both center around how we should safely use outside data in our secure and pure domain. However, the way they do this and where in the system they are located differs. Input validation is all about input scheme and domain specifications: ‘What is considered valid?’ Regular expressions, input ranges, and max length are all requirements of input validation. If one of the requirements fails, we can respond with a user-friendly error message.
Input sanitization is about preparing your input before it gets validated. At this stage, we are dealing with untrusted data and the outcome of the sanitization is also untrusted within the context of the domain. Sanitization is a good example of ‘defense in depth’, where we make sure that another layer tries to filter out any malicious inputs. ‘Preparing input’ can be a good metaphor for this process. Removing heading and trailing whitespace before someone’s name gets validated is an example of how input sanitization can also help with tedious tasks that input validation should not have to worry about. A person’s name with heading or trailing whitespace could still be considered a valid name, but instead of making sure that the input validation knows about this, we could implement an input sanitization process where the input ‘gets prepared’.
I see it as the glue that brings untrusted data-transfer models and trusted core models together.
F# FPrimitive sanitization
My own FPrimitive library combines several string manipulative functionalities that help with input sanitization. Unlike input validation, the outcome is not a result model. Any failures that happen here should be considered a failure of implementation, not user input.
A lot of functionality exists, from ‘heading/trailing whitespace removal’ to ‘regex replacements’ to ‘string truncation’ to ‘ascii filter’ and lots more. These functionalities are commonly used when dealing with inputs. We could even make sure that the input only has European characters.
The following example shows how the guest name of a previously defined reservation model is sanitized with this library. ofNull
makes sure that we’re dealing with empty strings instead of null
values, trim_ws
trims heading/trailing whitespace characters, max 10
makes sure that we only get inputs with a maximum of 10 characters and ascii
makes sure that only ASCII characters remain afterwards. An attentive reader already sees an opportunity to use lenses.
F# Giraffe input sanitization
F# Giraffe already has a great way to implement input validation right from the route pipelines. In the same manner, we can implement input sanitization and even bake it right into helper functions. In this case, we create a required IModelSanitization
contract that we can let our records implement.
Note the in this case, lenses are being used.
This example does not include any domain validation, only the data-transfer validation. It makes sure that we have an input that matches the expected scheme (int, date…) but does not validate any domain-specific requirements (like: are there enough seats for the reservation on the requested day?).
Using this setup allows you to make sure that any input is sanitized by default.
C# ASP.NET Core input sanitization
ASP.NET Core is not as easy to alter as Giraffe due to the tightly-coupled model binding and serialization. Without it, we do not have the same way of enforcing input sanitization. We do, however, have the possibility to alter the data-transfer models themselves, so that when the values are stored, they are automatically sanitized.
Conclusion
This post looked over some practical examples of how input sanitization can be an integral part of your application, without altering the domain validation in the process. Input sanitization is an important part of application security and deserves its spot. These examples show you that validation and sanitization are closely linked and could benefit from being close to each other. Functional programming concepts are a good way of thinking about this, as the input from one system becomes the output of another. That is why the Giraffe example feels so natural and the ASP.NET Core example feels like a workaround.
Thanks for reading.
Stijn
Subscribe to our RSS feed