DOMDocument function is used to parse and manipulate HTML and XML documents in PHP. Below are a few examples on how to use the DOMDocument php function.
Install PHP XML Dom Parser
On Ubuntu
sudo apt-get install php-xml
On Redhat/CentOS
sudo yum install php-xml
Instantiate DOMDocument and Parse HTML
// get html $html = file_get_contents('https://www.bitbook.io/cron-job-at-7-am-everyday-and-other-crontab-examples/'); // suppress any errors from invalid HTML libxml_use_internal_errors(true); // new dom parser on this html $doc = new DOMDocument(); $doc->loadHTML($html); $xpath = new DOMXPath($doc);
Parse meta description
$metaDescription = ''; $contents = $xpath->query('/html/head/meta[@name="description"]/@content'); if ($contents->length != 0) { foreach ($contents as $content) { $metaDescription .= $content->value; } } echo("Meta Description: $metaDescription\n\n");
Meta Description:
Parse meta keywords
$contents = $xpath->query('/html/head/meta[@name="keywords"]/@content'); if ($contents->length != 0) { foreach ($contents as $content) { $metaKeywords .= ' ' . $content->value; } } echo("Meta Keywords: $metaKeywords\n\n");
Meta Keywords:
Parse h1 tag text
$contents = $xpath->query('//h1'); if (!is_null($contents)) { foreach ($contents as $i => $node) { $heading1 .= ' ' . $node->nodeValue; } } echo("h1: $heading1\n\n");
h1: Cron Job at 7 am Everyday and Other Crontab Examples
Parse h2 tag text
$contents = $xpath->query('//h2'); if (!is_null($contents)) { foreach ($contents as $i => $node) { $heading2 .= ' ' . $node->nodeValue; } } echo("h2: $heading2\n\n");
h2: Examples Post navigation
Parse h3 and h4 tag text
$contents = $xpath->query('//h3 | //h4'); if (!is_null($contents)) { foreach ($contents as $i => $node) { $heading3and4 .= ' ' . $node->nodeValue; } } echo("h3 and h4s: $heading3and4\n\n");
h3: List out Cron Jobs for Current User Edit Cron Jobs for Current User Crontab Column Meanings Everyday at 7 am Everyday at 9:30 am Everyday at 9:30 am, Monday Through Friday 1st Day of the Month at 12:30 am Every Tuesday at Midnight Every Tuesday at Midnight Every 2 Minutes Leave a Reply Cancel reply Latest Posts Bitbook
Parse all text except inside script tags
$contents = $xpath->query("//body/descendant::*[name() != 'script']/text()"); if (!is_null($contents)) { foreach ($contents as $i => $node) { $allPageText .= ' ' . $node->nodeValue; } } echo("All Text: $allPageText\n\n");