How can PHP developers ensure that only textual content is extracted from HTML body tags while excluding other elements like scripts and metadata?
To ensure that only textual content is extracted from HTML body tags while excluding other elements like scripts and metadata, PHP developers can use PHP's strip_tags() function with a whitelist of allowed tags. By specifying only the <p>, <h1>, <h2>, <h3>, <h4>, <h5>, <h6>, <ul>, <ol>, <li>, <a>, <strong>, <em>, and <br> tags as allowed, we can strip out unwanted tags like <script> and <meta>.
$html = '<html><head><title>Sample Page</title></head><body><h1>Hello World!</h1><p>This is a sample paragraph.</p><script>alert("Hello, World!");</script></body></html>';
$allowed_tags = '<p><h1><h2><h3><h4><h5><h6><ul><ol><li><a><strong><em><br>';
$clean_content = strip_tags($html, $allowed_tags);
echo $clean_content;
Keywords
Related Questions
- In PHP, what methods or functions can be used to move a file from one directory to another, especially when dealing with user-uploaded content?
- Is using str_replace to replace "?" with another character a good alternative to urlencode for SEO optimization in PHP?
- What are the potential issues with using reserved keywords like "password" in PHP code and how can they be addressed?