Protect Web Applications Against Malicious File Uploads
Many organizations run web applications that
allow users to upload files such as resumes, insurance claims, proofs of
identity, scanned documents and completed forms. Security leaders should go beyond
single antivirus scanners to protect their backends against the upload of
malicious files.
How do I protect web applications against malicious file
uploads?
- There are essentially
three architectural options to Integrate security controls into the file
upload channel:
- Integration into the
application via APIs is prefered for most control.
- Integration into an upstream
gateway (application delivery controller [ADC]/web application firewall
[WAF] etc.) provides the lowest effort and broad coverage across
applications.
- Integration into storage
provides the most generic approach.
- Restrict the file types to the
minimum required. For allowed file types, there are essentially four
options to limit the risk of malware upload:
- Content disarm and
reconstruction (CDR) provides the highest security.
- Multi-AV scanning is an option
if CDR does not meet the business requirements of the application.
- Sandboxing provides good
detection for custom and new malware, but the latency and scalability may
be unacceptable for web uploads.
- Single antivirus scanners are
easy to implement but should generally be avoided because they are less
effective against evasive malware.
More Detail
Many
organizations run web applications that allow users to upload files. Examples
include the upload of resumes by candidates to recruiting websites, information
on accidents to an insurance portal or, more generally, scanned documents or
completed forms. In most cases, these files are office (Word, Excel, PDF) or
multimedia file types (pictures, video). Not all applications require
additional security controls. However, if the application provides access to a
large number of external users, security leaders should require an application
design that protects against the abuse of the channel for uploading malware.
Security
solution design consists of two choices):
1. Ways to integrate the security control
2. Types of security control
Integration
There are
three options for integrating a security control for malware protection into a
web application:
1. Application-integration
— The preferred option
is direct integration into the application. This gives full control over
the application flow and uploaders can be warned if, for example, they upload
file types they shouldn’t. Application developers can temporarily store and
directly call an antivirus engine or other security control from the web server
code, or connect to a security service over Representational State Transfer
(REST), Internet Content Adaptation Protocol (ICAP) or other exposed
APIs. This option is feasible if the code is developed in house, and only
scalable to a low number of applications.
2. Gateway-integration
— If the number of
applications is large, or if the application is closed source, the integration
can be done in the gateway in front of the applications (typically one, or a
combination, of reverse proxy, WAF, ADC, load balancer). If such a gateway is
present, and is set up to decrypt transport layer security (TLS), it can be
used to connect to a security service over ICAP or REST API. Although
relatively common on-premises, cloud web applications often do not have such a
gateway with exposed APIs.
3. Storage-integration
— The last option,
gaining interest, is scanning files after they pass the web application and end
up in storage. Various security solutions can process files at time of storage
in on-premises network-attached storage (NAS) through ICAP or RPC, and an
increasing number can do this for AWS S3 and Azure Blob storage through
function PaaS (fPaaS) (examples include Broadcom, OPSWAT and Trend Micro). This
option is not dependent on the way the files end up in storage, so is
attractive for use cases that do not rely on HTTP and where there is no control
over the application. Since this applies after the upload has completed, there
is no real integration with the application and no business logic can rely on
the scan results. It is common to hold files temporarily in one storage
location, and move to another location for further processing only after the
security scan or CDR completes.
Some use
cases require feedback loops, either to inform the uploader of the upload
success, or to inform the back-office users of any denied or infected uploads
or changes made to files during the process. Although all integration options
allow for some feedback loop, direct integration into the application allows
for most control.
Security Controls
The first
layer of any security solution for file uploads should be to restrict the
upload to specific file types and contents. Use allowlists on approved file
types, not blocklists of unwanted file types. Per allowed file type, block
files that have embedded content that you really do not expect and need. For
example, do not allow any resumes that contain macros, or, do not allow PDF
files with scripts and forms.
For the
second layer, there are four options:
1. Content
disarm and reconstruction (CDR) — The
strongest option for the second layer is to use CDR. Done well, CDR removes all
threats from uploaded files without adding significant latency. Since it does
not depend on the detection of known threats, it can even protect against
completely new attack types. The main disadvantage of CDR lies in the fact that
it changes the files’ contents. Depending on the use case, this may or may
not be an issue. Consider maintaining an archive of originals, and allowing
recipients to request the originals if they need them. When originals are
requested, deep analysis such as provided by sandboxing may scale well enough
and any latency is likely acceptable. Example CDR vendors include Deep Secure,
Glasswall Solutions, JiranSecurity, Odix, OPSWAT, Sasa Software,
SoftCamp and Votiro.
2. Multi-AV
— In case CDR is
unacceptable or not perfectly implementable, the most common second layer is
multi-AV. Multi-AV uses a smart combination of AV scanners to scan files to
increase detection rates. The set of AV scanners cannot be picked at random:
although branded differently, it is not uncommon for AV scanners to use the
same engines for specific file types such as PDF. The most common vendors for
multi-AV are OPSWAT and VirusTotal.
3. Sandboxing
— Although a solid approach to
detecting new malware, sandboxing introduces latency and does not always scale
well. It is rarely used in-line in upload channels. Most endpoint and network
security vendors offer sandbox solutions, in addition to dedicated vendors such
as FireEye, Joe Security, Lastline (VMware) and VMRay. Some solutions
position themselves as fast dynamic analysis alternatives to sandboxes.
Examples of such alternatives are ReversingLabs and Perception Point. Many
security gateways, such as WAFs, have integrations with leading sandboxes.
4. Single-AV — A single antivirus scanner can have low detection rate
when it statically scans a file of a type that is typically uploaded to
applications. The reason for this poor detection rate (sometimes as low as 20%
for newer variants) is that there are ample ways to obfuscate malicious content
embedded in such files, and hard to parse during static scans. An attacker that
uploads new malware to your application is targeting you, which makes
nontrivial obfuscation very likely. Only execute single AV scans if you must —
for example, because of compliance reasons, not because of the great
detection rates for new malware.