請參考 http://simplehtmldom.sourceforge.net/
php Simple HTML DOM Parser 強力解析html 工具


include('../simple_html_dom.php');
// Create DOM from URL or file
$dom = file_get_dom('http://www.torrentz.com/movies');
// Find all
foreach($dom->find('img') as $element)
echo $element->src . "\n";
foreach($dom->find('a') as $element)
echo $element->href . " ".$element->innertext."\n"; //網址及結連名稱

運用curl主站目前已可抓文抓圖
//curl 範列
$ch = curl_init ("http://static.php.net/www.php.net/images/php.gif");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER,1);
$rawdata=curl_exec ($ch);
curl_close ($ch);
$fp = fopen("php.gif",'w');
fwrite($fp, $rawdata);
fclose($fp);

===========================================================
me578022 簡單分析HTML - PHP Simple HTML DOM Parser

網址: http://sourceforge.net/projects/simplehtmldom/

因為使用PHP內建的DOM物件分析HTML實在太痛苦了,
我自己寫了一個放到Sourceforge上, 希望大家能提供一些開發的建議.

特色:
1. 只支援PHP5以上
2. 可以分析不嚴謹(invalid)的HTML.
3. 支援簡單的CSS Selector.
4. 簡單的DOM操作
5. 會維持HTML中的原始格式.

範例:
<?
// 示範如何讀取HTML元素
include('html_dom_parser.php');

// 產生DOM物件
$dom = file_get_dom('http://www.google.com/');

// 找出所有網頁連結
$result = $dom->find('a');
foreach($result as $v) {echo $v->href . '<br>';}

// 找出所有網頁圖片
$result = $dom->find('img');
foreach($result as $v) {echo $v->src . '<br>';}

// 找出所有網頁中所有id=gbar的div標籤
$result = $dom->find('div#gbar');
foreach($result as $v) {echo $v->innertext . '<br>';}

// 找出所有網頁中所有calss=gb1的span 標籤
$result = $dom->find('span.gb1');
foreach($result as $v) {echo $v->outertext . '<br>';}

// 找出所有網頁中所有align=center的'td標籤
$result = $dom->find('td[align=center]');
foreach($result as $v) {echo $v->outertext . '<br>';}
?>

<?
// 示範如何修改HTML元素
include('html_dom_parser.php');

// 產生DOM物件
$dom = file_get_dom('http://www.google.com/');

// 移除網頁中所有圖片
$ret = $dom->find('img');
foreach($ret as $v) {$v->outertext = '';}

// 修改網頁中所有input標籤
$ret = $dom->find('input');
foreach($ret as $v) {$v->outertext = '[INPUT]';}

// 顯示修改後的網頁
echo $dom->save();
?>

Felix 發表在 痞客邦 PIXNET 留言(6) 人氣()


留言列表 (6)

發表留言
  • mowd8574
  • 您好,我用了您的simple_html_dom.php發現一個嚴重的問題,將網頁放著一段時間之後會導致PHP出現問題必須重啟Apache才可恢復正常,此時出現的錯誤如下
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>596</b><br />
    <br />
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>597</b><br />

    <br />
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>598</b><br />
    <br />
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>599</b><br />
    <br />

    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>600</b><br />
    <br />
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>601</b><br />
    <br />
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>94</b><br />

    <br />
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>602</b><br />
    <br />
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>603</b><br />
    <br />

    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>605</b><br />
    <br />
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>606</b><br />
    <br />
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>608</b><br />

    <br />
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>939</b><br />
    <br />
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>939</b><br />
    <br />

    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>939</b><br />
    <br />
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>939</b><br />
    <br />
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>939</b><br />

    <br />
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>939</b><br />
    <br />
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>939</b><br />
    <br />

    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>939</b><br />
    <br />
    <b>Warning</b>: Attempt to assign property of non-object in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>939</b><br />
    <br />
    <b>Fatal error</b>: Call to undefined method stdClass::find() in <b>D:\xampplite\htdocs\simple_html_dom.php</b> on line <b>577</b><br />

    不知道是什麼原因導致這個問題
  • registerboy
  • 非常抱歉,這個我也不太懂.
  • andy
  • 請問:
    1、若我想取得title內容該如何做?
    2、若我想取得<meta name="keywords的content值該如何做?
    謝謝!
  • andy
  • 謝謝回答!

    謝謝回答!