What best practices should be followed when implementing a badword filter function in PHP to avoid unintentional censorship?
When implementing a badword filter function in PHP, it is important to avoid unintentional censorship by considering context and variations of words. One way to address this is by using a whitelist of allowed words or phrases to prevent false positives. Additionally, using regular expressions with word boundaries can help ensure that only exact matches are filtered.
function filterBadWords($text, $badwords) {
$whitelist = array("goodword1", "goodword2");
$filteredText = preg_replace_callback('/\b(' . implode('|', $badwords) . ')\b/i', function($match) use ($whitelist) {
if (in_array(strtolower($match[1]), $whitelist)) {
return $match[0];
} else {
return str_repeat('*', strlen($match[1]));
}
}, $text);
return $filteredText;
}
$badwords = array("badword1", "badword2");
$text = "This is a badword1 example of badword2 filtering.";
echo filterBadWords($text, $badwords);