How can PHP developers ensure that attributes in HTML tags are not duplicated when filtering them using regular expressions?

When filtering HTML attributes using regular expressions, PHP developers can ensure that attributes are not duplicated by capturing the attribute names and values separately and then reconstructing the attributes without duplicates. By maintaining a list of already captured attribute names, developers can check for duplicates before adding a new attribute to the reconstructed tag.

$html = '<div class="example" id="test" class="duplicate">Content</div>';

$pattern = '/(\w+)\s*=\s*["\']([^"\']*)["\']/';
preg_match_all($pattern, $html, $matches, PREG_SET_ORDER);

$filteredAttributes = [];
$capturedAttributes = [];

foreach ($matches as $match) {
    $attribute = $match[1];
    $value = $match[2];

    if (!in_array($attribute, $capturedAttributes)) {
        $filteredAttributes[] = "$attribute=\"$value\"";
        $capturedAttributes[] = $attribute;
    }
}

$filteredHtml = preg_replace('/<(\w+)([^>]*)>/', '<$1 ' . implode(' ', $filteredAttributes) . '>', $html);

echo $filteredHtml;