How can invalid characters in XML files be handled when using simplexml_load_file in PHP?

When using simplexml_load_file in PHP to parse XML files, invalid characters can cause parsing errors. To handle this issue, you can use the libxml_use_internal_errors function to suppress errors and then manually clean the XML content by removing invalid characters before parsing it with simplexml_load_string.

libxml_use_internal_errors(true);
$xml = file_get_contents('example.xml');

// Remove invalid characters from XML content
$cleaned_xml = preg_replace('/[^\x09\x0A\x0D\x20-\x7F]/', '', $xml);

// Parse cleaned XML content with simplexml_load_string
$xml_object = simplexml_load_string($cleaned_xml);

if ($xml_object === false) {
    foreach(libxml_get_errors() as $error) {
        echo "XML Error: {$error->message}\n";
    }
    libxml_clear_errors();
} else {
    // XML parsing was successful, continue processing the data
    // Example: access XML elements using $xml_object->element_name
}