What are common mistakes when using PHP regex in URL parsing and how can they be avoided?

Common mistakes when using PHP regex in URL parsing include not properly escaping characters, not accounting for optional or repeating elements, and not using capturing groups correctly. To avoid these mistakes, it is important to carefully construct the regex pattern with the specific URL structure in mind and test it thoroughly with different URL variations.

$url = "https://www.example.com/page1";
$pattern = "/^(https?:\/\/)?(www\.)?([a-zA-Z0-9-]+)\.([a-z]{2,})(\/[a-zA-Z0-9-\/]*)?$/";

if (preg_match($pattern, $url, $matches)) {
    $protocol = $matches[1];
    $subdomain = $matches[2];
    $domain = $matches[3];
    $tld = $matches[4];
    $path = $matches[5];
    
    echo "Protocol: $protocol\n";
    echo "Subdomain: $subdomain\n";
    echo "Domain: $domain\n";
    echo "TLD: $tld\n";
    echo "Path: $path\n";
} else {
    echo "Invalid URL\n";
}