xpath dom php code

DOMDocument PHP Tutorial

DOMDocument function is used to parse and manipulate HTML and XML documents in PHP. Below are a few examples on how to use the DOMDocument php function.

Install PHP XML Dom Parser

On Ubuntu

sudo apt-get install php-xml

On Redhat/CentOS

sudo yum install php-xml

 

Instantiate DOMDocument and Parse HTML

// get html
$html = file_get_contents('https://www.bitbook.io/cron-job-at-7-am-everyday-and-other-crontab-examples/');

// suppress any errors from invalid HTML
libxml_use_internal_errors(true);

// new dom parser on this html
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);

 

Parse meta description

$metaDescription = '';
$contents = $xpath->query('/html/head/meta[@name="description"]/@content');
if ($contents->length != 0) {
	foreach ($contents as $content) {
		$metaDescription .= $content->value;
	}
}
echo("Meta Description: $metaDescription\n\n");
Meta Description: 

 

Parse meta keywords

$contents = $xpath->query('/html/head/meta[@name="keywords"]/@content');
if ($contents->length != 0) {
	foreach ($contents as $content) {
		$metaKeywords .= ' ' . $content->value;
	}
}
echo("Meta Keywords: $metaKeywords\n\n");
Meta Keywords: 

 

Parse h1 tag text

$contents = $xpath->query('//h1');
if (!is_null($contents)) {
	foreach ($contents as $i => $node) {
		$heading1 .= ' ' . $node->nodeValue;
	}
}
echo("h1: $heading1\n\n");
h1:  Cron Job at 7 am Everyday and Other Crontab Examples

 

Parse h2 tag text

$contents = $xpath->query('//h2');
if (!is_null($contents)) {
	foreach ($contents as $i => $node) {
		$heading2 .= ' ' . $node->nodeValue;
	}
}
echo("h2: $heading2\n\n");
h2:   Examples Post navigation

 

Parse h3 and h4 tag text

$contents = $xpath->query('//h3 | //h4');
if (!is_null($contents)) {
	foreach ($contents as $i => $node) {
		$heading3and4 .= ' ' . $node->nodeValue;
	}
}
echo("h3 and h4s: $heading3and4\n\n");
h3:   List out Cron Jobs for Current User Edit Cron Jobs for Current User Crontab Column Meanings Everyday at 7 am Everyday at 9:30 am Everyday at 9:30 am, Monday Through Friday 1st Day of the Month at 12:30 am Every Tuesday at Midnight Every Tuesday at Midnight Every 2 Minutes Leave a Reply Cancel reply Latest Posts Bitbook

 

Parse all text except inside script tags

$contents = $xpath->query("//body/descendant::*[name() != 'script']/text()");
if (!is_null($contents)) {
	foreach ($contents as $i => $node) {
		$allPageText .= ' ' . $node->nodeValue;
	}
}

echo("All Text: $allPageText\n\n");

Leave a Reply

Your email address will not be published. Required fields are marked *