What are some PHP functions or methods that can be used to extract only the visible text from an HTML page?

When extracting visible text from an HTML page, we need to remove any HTML tags, scripts, styles, and other non-text content. One way to achieve this is by using PHP functions like strip_tags() to remove HTML tags and preg_replace() with a regular expression to remove scripts and styles.

// Function to extract only visible text from an HTML page
function extractVisibleText($html) {
    // Remove HTML tags
    $text = strip_tags($html);
    
    // Remove scripts and styles
    $text = preg_replace('/<script\b[^>]*>(.*?)<\/script>/is', '', $text);
    $text = preg_replace('/<style\b[^>]*>(.*?)<\/style>/is', '', $text);
    
    // Remove extra whitespace and newlines
    $text = preg_replace('/\s+/', ' ', $text);
    
    return trim($text);
}

// Usage example
$html = file_get_contents('https://example.com');
$visibleText = extractVisibleText($html);
echo $visibleText;