DOMDocument function is used to parse and manipulate HTML and XML documents in PHP. Below are a few examples on how to use the DOMDocument php function.
Install PHP XML Dom Parser
On Ubuntu
sudo apt-get install php-xml
On Redhat/CentOS
sudo yum install php-xml
Instantiate DOMDocument and Parse HTML
// get html
$html = file_get_contents('https://www.bitbook.io/cron-job-at-7-am-everyday-and-other-crontab-examples/');
// suppress any errors from invalid HTML
libxml_use_internal_errors(true);
// new dom parser on this html
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
Parse meta description
$metaDescription = '';
$contents = $xpath->query('/html/head/meta[@name="description"]/@content');
if ($contents->length != 0) {
foreach ($contents as $content) {
$metaDescription .= $content->value;
}
}
echo("Meta Description: $metaDescription\n\n");
Meta Description:
Parse meta keywords
$contents = $xpath->query('/html/head/meta[@name="keywords"]/@content');
if ($contents->length != 0) {
foreach ($contents as $content) {
$metaKeywords .= ' ' . $content->value;
}
}
echo("Meta Keywords: $metaKeywords\n\n");
Meta Keywords:
Parse h1 tag text
$contents = $xpath->query('//h1');
if (!is_null($contents)) {
foreach ($contents as $i => $node) {
$heading1 .= ' ' . $node->nodeValue;
}
}
echo("h1: $heading1\n\n");
h1: Cron Job at 7 am Everyday and Other Crontab Examples
Parse h2 tag text
$contents = $xpath->query('//h2');
if (!is_null($contents)) {
foreach ($contents as $i => $node) {
$heading2 .= ' ' . $node->nodeValue;
}
}
echo("h2: $heading2\n\n");
h2: Examples Post navigation
Parse h3 and h4 tag text
$contents = $xpath->query('//h3 | //h4');
if (!is_null($contents)) {
foreach ($contents as $i => $node) {
$heading3and4 .= ' ' . $node->nodeValue;
}
}
echo("h3 and h4s: $heading3and4\n\n");
h3: List out Cron Jobs for Current User Edit Cron Jobs for Current User Crontab Column Meanings Everyday at 7 am Everyday at 9:30 am Everyday at 9:30 am, Monday Through Friday 1st Day of the Month at 12:30 am Every Tuesday at Midnight Every Tuesday at Midnight Every 2 Minutes Leave a Reply Cancel reply Latest Posts Bitbook
Parse all text except inside script tags
$contents = $xpath->query("//body/descendant::*[name() != 'script']/text()");
if (!is_null($contents)) {
foreach ($contents as $i => $node) {
$allPageText .= ' ' . $node->nodeValue;
}
}
echo("All Text: $allPageText\n\n");