How can full-text search functionality be implemented in PHP for indexing HTML pages?
To implement full-text search functionality in PHP for indexing HTML pages, you can use a combination of PHP's file handling functions, regular expressions, and a search algorithm like the Levenshtein distance algorithm to determine relevancy of search results.
<?php
// Function to index HTML pages for full-text search
function indexHTMLPages($directory){
$index = array();
$files = glob($directory . '/*.html');
foreach($files as $file){
$content = file_get_contents($file);
$text = strip_tags($content); // Strip HTML tags
$words = preg_split('/\s+/', $text); // Split text into words
foreach($words as $word){
$word = strtolower($word);
if(strlen($word) > 3){ // Ignore short words
if(!isset($index[$word])){
$index[$word] = array();
}
$index[$word][] = $file;
}
}
}
return $index;
}
// Function to search indexed HTML pages
function searchHTMLPages($index, $query){
$results = array();
$query = strtolower($query);
$queryWords = preg_split('/\s+/', $query);
foreach($queryWords as $word){
if(isset($index[$word])){
$results = array_merge($results, $index[$word]);
}
}
$results = array_unique($results);
return $results;
}
// Index HTML pages in a directory
$index = indexHTMLPages('path/to/html/pages');
// Search indexed HTML pages for a query
$searchResults = searchHTMLPages($index, 'search query');
print_r($searchResults);
?>