What potential challenges or limitations should be considered when trying to extract text from Word and PDF documents using PHP?

Extracting text from Word and PDF documents using PHP can be challenging due to the different file formats and complexities involved. Some potential limitations include handling encrypted files, complex formatting, and non-standard encoding. To address these challenges, consider using libraries like PHPWord for Word documents and TCPDF or FPDF for PDF documents. These libraries provide functions to extract text and handle various document complexities.

// Example using PHPWord library to extract text from Word document
require_once 'PHPWord.php';

$phpWord = new \PhpOffice\PhpWord\PhpWord();
$phpWord = \PhpOffice\PhpWord\IOFactory::load('example.docx');

$sections = $phpWord->getSections();
foreach ($sections as $section) {
    $elements = $section->getElements();
    foreach ($elements as $element) {
        if ($element instanceof \PhpOffice\PhpWord\Element\TextRun) {
            echo $element->getText();
        }
    }
}