What are the advantages of using XPath over Regex for parsing HTML documents in PHP?

When parsing HTML documents in PHP, using XPath has several advantages over Regex. XPath is specifically designed for navigating and selecting elements in XML/HTML documents, making it more robust and reliable for parsing structured data. XPath also provides a more intuitive and readable way to target specific elements within the document, compared to the complex and error-prone patterns required by Regex.

// Load the HTML document
$html = file_get_contents('example.html');
$dom = new DOMDocument();
$dom->loadHTML($html);

// Use XPath to select specific elements
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//div[@class="content"]');

// Loop through the selected elements
foreach ($elements as $element) {
    echo $element->nodeValue . "\n";
}