Protect Web Applications Against Malicious File Uploads

Many organizations run web applications that allow users to upload files such as resumes, insurance claims, proofs of identity, scanned documents and completed forms. Security leaders should go beyond single antivirus scanners to protect their backends against the upload of malicious files.

How do I protect web applications against malicious file uploads?

There are essentially three architectural options to Integrate security controls into the file upload channel:

Integration into the application via APIs is prefered for most control.
Integration into an upstream gateway (application delivery controller [ADC]/web application firewall [WAF] etc.) provides the lowest effort and broad coverage across applications.
Integration into storage provides the most generic approach.

Restrict the file types to the minimum required. For allowed file types, there are essentially four options to limit the risk of malware upload:

Content disarm and reconstruction (CDR) provides the highest security.
Multi-AV scanning is an option if CDR does not meet the business requirements of the application.
Sandboxing provides good detection for custom and new malware, but the latency and scalability may be unacceptable for web uploads.
Single antivirus scanners are easy to implement but should generally be avoided because they are less effective against evasive malware.

More Detail

Many organizations run web applications that allow users to upload files. Examples include the upload of resumes by candidates to recruiting websites, information on accidents to an insurance portal or, more generally, scanned documents or completed forms. In most cases, these files are office (Word, Excel, PDF) or multimedia file types (pictures, video). Not all applications require additional security controls. However, if the application provides access to a large number of external users, security leaders should require an application design that protects against the abuse of the channel for uploading malware.

Security solution design consists of two choices):

1. Ways to integrate the security control

2. Types of security control

Integration

There are three options for integrating a security control for malware protection into a web application:

1. Application-integration — The preferred option is direct integration into the application. This gives full control over the application flow and uploaders can be warned if, for example, they upload file types they shouldn’t. Application developers can temporarily store and directly call an antivirus engine or other security control from the web server code, or connect to a security service over Representational State Transfer (REST), Internet Content Adaptation Protocol (ICAP) or other exposed APIs. This option is feasible if the code is developed in house, and only scalable to a low number of applications.

2. Gateway-integration — If the number of applications is large, or if the application is closed source, the integration can be done in the gateway in front of the applications (typically one, or a combination, of reverse proxy, WAF, ADC, load balancer). If such a gateway is present, and is set up to decrypt transport layer security (TLS), it can be used to connect to a security service over ICAP or REST API. Although relatively common on-premises, cloud web applications often do not have such a gateway with exposed APIs.

3. Storage-integration — The last option, gaining interest, is scanning files after they pass the web application and end up in storage. Various security solutions can process files at time of storage in on-premises network-attached storage (NAS) through ICAP or RPC, and an increasing number can do this for AWS S3 and Azure Blob storage through function PaaS (fPaaS) (examples include Broadcom, OPSWAT and Trend Micro). This option is not dependent on the way the files end up in storage, so is attractive for use cases that do not rely on HTTP and where there is no control over the application. Since this applies after the upload has completed, there is no real integration with the application and no business logic can rely on the scan results. It is common to hold files temporarily in one storage location, and move to another location for further processing only after the security scan or CDR completes.

Some use cases require feedback loops, either to inform the uploader of the upload success, or to inform the back-office users of any denied or infected uploads or changes made to files during the process. Although all integration options allow for some feedback loop, direct integration into the application allows for most control.

Security Controls

The first layer of any security solution for file uploads should be to restrict the upload to specific file types and contents. Use allowlists on approved file types, not blocklists of unwanted file types. Per allowed file type, block files that have embedded content that you really do not expect and need. For example, do not allow any resumes that contain macros, or, do not allow PDF files with scripts and forms.

For the second layer, there are four options:

1. Content disarm and reconstruction (CDR) — The strongest option for the second layer is to use CDR. Done well, CDR removes all threats from uploaded files without adding significant latency. Since it does not depend on the detection of known threats, it can even protect against completely new attack types. The main disadvantage of CDR lies in the fact that it changes the files’ contents. Depending on the use case, this may or may not be an issue. Consider maintaining an archive of originals, and allowing recipients to request the originals if they need them. When originals are requested, deep analysis such as provided by sandboxing may scale well enough and any latency is likely acceptable. Example CDR vendors include Deep Secure, Glasswall Solutions, JiranSecurity, Odix, OPSWAT, Sasa Software, SoftCamp and Votiro.

2. Multi-AV — In case CDR is unacceptable or not perfectly implementable, the most common second layer is multi-AV. Multi-AV uses a smart combination of AV scanners to scan files to increase detection rates. The set of AV scanners cannot be picked at random: although branded differently, it is not uncommon for AV scanners to use the same engines for specific file types such as PDF. The most common vendors for multi-AV are OPSWAT and VirusTotal.

3. Sandboxing — Although a solid approach to detecting new malware, sandboxing introduces latency and does not always scale well. It is rarely used in-line in upload channels. Most endpoint and network security vendors offer sandbox solutions, in addition to dedicated vendors such as FireEye, Joe Security, Lastline (VMware) and VMRay. Some solutions position themselves as fast dynamic analysis alternatives to sandboxes. Examples of such alternatives are ReversingLabs and Perception Point. Many security gateways, such as WAFs, have integrations with leading sandboxes.

4. Single-AV — A single antivirus scanner can have low detection rate when it statically scans a file of a type that is typically uploaded to applications. The reason for this poor detection rate (sometimes as low as 20% for newer variants) is that there are ample ways to obfuscate malicious content embedded in such files, and hard to parse during static scans. An attacker that uploads new malware to your application is targeting you, which makes nontrivial obfuscation very likely. Only execute single AV scans if you must — for example, because of compliance reasons, not because of the great detection rates for new malware.

資料來源回列表頁