This topic contains 1 reply, has 2 voices, and was last updated by rathaus 1 month, 2 weeks ago.
- March 2, 2021 at 11:34 pm #368582
Hello, I am developing a web app, a really simple one, the user uploads a csv file, I process it in python using the django framework and i simply respond with results on another page. The only inputs are a text input and a file input. I use pandas to read through the csv file. I have heard a few things about string sanitization but I have no idea what could be done regarding the file. I want to ask you first of all what the worst case scenario would be should someone upload a bad file, if you have any suggestions or resources regarding file sanitization? if that exists? and if you see any other vulnarabilities despite only the 2 inputs? And in general any tips would be greatly appreciated. Thank you for reading and thanks in advance
- March 2, 2021 at 11:34 pm #368586
- March 2, 2021 at 11:34 pm #368585
The million dollar question is – when you say that the content in the csv is “processed” what exactly does that mean?
The real risk here is that someone could add text to the csv that gets parsed as code in your back end. That may or may not be possible depending on what your code structure looks like – and to figure that out you really need to have a firm grip on what your app is doing with the input.
As a best practice, you always want to restrict the characters that are allowed to be transported from the user (through the csv) to your back end as much as possible.
Think about what the use case is for your app. For example, if the app is only supposed to do math on numbers that users put in the csv, then pass the csv inputs through a function that deletes all other characters besides integers and then store them with the proper datatype before you try to do any math. There should be absolutely no way that anything that doesn’t have datatype: integer could come out from that csv to interact with any part of your code. If there is then that is a huge potential vulnerability.
Sanitized means, I expect only a certain type of data to come from my user, therefore I am going to strictly enforce that datatype as soon as it comes in. If users start putting extra junk in their input, I need to strip it out in a little encapsulated function before it gets processed any further.
The quick and dirty way of structuring your code might be to simply let all users to put whatever string they want into the csv and then just throw exceptions when the inputs don’t contain what you expect. But this is extremely insecure and there are a million ways this might lead to arbitrary code execution.
- March 2, 2021 at 11:34 pm #368584
I would sanitize the user input by reading the csv into a film stream and checking the values. Once done, I would save the results into a new file somewhere else. MongoDB is probably a good solution for this as you could store the whole csv as a piece of data within a document along with metadata such as a “filename”.
This should stop remote execution of anything malicious that escaped your sanitization.
- March 2, 2021 at 11:34 pm #368583
It all boils down to
1. How this file is stored between the browser uploading and your code processing – do you use the filename provided (for example)
2. How the data is used, maybe you putting it in a sql query insecurely, for example
3. What you use this data for, if you expect it to edit certain records – can a user manipulate it in a way that it would overflow/overwrite other people’s data
You must be logged in to reply to this topic.