CSV file import vulnerabilities? [python django pandas] – Digitalmunition

Home Forums CSV file import vulnerabilities? [python django pandas]

This topic contains 1 reply, has 2 voices, and was last updated by  rathaus 1 month, 2 weeks ago.

  • Author
  • #368582


    Hello, I am developing a web app, a really simple one, the user uploads a csv file, I process it in python using the django framework and i simply respond with results on another page. The only inputs are a text input and a file input. I use pandas to read through the csv file. I have heard a few things about string sanitization but I have no idea what could be done regarding the file. I want to ask you first of all what the worst case scenario would be should someone upload a bad file, if you have any suggestions or resources regarding file sanitization? if that exists? and if you see any other vulnarabilities despite only the 2 inputs? And in general any tips would be greatly appreciated. Thank you for reading and thanks in advance

  • #368586


    If you’re displaying any content directly from the CSV, be on the lookout for [stored cross-site scripting](https://owasp.org/www-community/attacks/xss/). For example, if someone were fill one of the CSV fields with javascript instead of a string that you display.

  • #368585


    The million dollar question is – when you say that the content in the csv is “processed” what exactly does that mean?

    The real risk here is that someone could add text to the csv that gets parsed as code in your back end. That may or may not be possible depending on what your code structure looks like – and to figure that out you really need to have a firm grip on what your app is doing with the input.

    As a best practice, you always want to restrict the characters that are allowed to be transported from the user (through the csv) to your back end as much as possible.

    Think about what the use case is for your app. For example, if the app is only supposed to do math on numbers that users put in the csv, then pass the csv inputs through a function that deletes all other characters besides integers and then store them with the proper datatype before you try to do any math. There should be absolutely no way that anything that doesn’t have datatype: integer could come out from that csv to interact with any part of your code. If there is then that is a huge potential vulnerability.

    Sanitized means, I expect only a certain type of data to come from my user, therefore I am going to strictly enforce that datatype as soon as it comes in. If users start putting extra junk in their input, I need to strip it out in a little encapsulated function before it gets processed any further.

    The quick and dirty way of structuring your code might be to simply let all users to put whatever string they want into the csv and then just throw exceptions when the inputs don’t contain what you expect. But this is extremely insecure and there are a million ways this might lead to arbitrary code execution.

  • #368584


    I would sanitize the user input by reading the csv into a film stream and checking the values. Once done, I would save the results into a new file somewhere else. MongoDB is probably a good solution for this as you could store the whole csv as a piece of data within a document along with metadata such as a “filename”.
    This should stop remote execution of anything malicious that escaped your sanitization.

  • #368583


    It all boils down to
    1. How this file is stored between the browser uploading and your code processing – do you use the filename provided (for example)
    2. How the data is used, maybe you putting it in a sql query insecurely, for example
    3. What you use this data for, if you expect it to edit certain records – can a user manipulate it in a way that it would overflow/overwrite other people’s data

You must be logged in to reply to this topic.