PHP把网页保存为word文件的三种方法

前端技术 2023/09/02 PHP

一、PHP生成word的两种思路或原理

1.利用windows下面的 com组件
2.利用PHP将内容写入doc文件之中
具体实现方法如下。

二、利用windows下面的com组件

原理：com作为PHP的一个扩展类，安装过office的服务器会自动调用word.application的com，可以自动生成文档，PHP官方文档手册：http://www.php.net/manual/en/class.com.php

使用官方实例：

<?php
// starting word
$word = new COM(\"word.application\") or die(\"Unable to instantiate Word\");
echo \"Loaded Word, version {$word->Version}\\n\";

//bring it to front
$word->Visible = 1;

//open an empty document
$word->Documents->Add();

//do some weird stuff
$word->Selection->TypeText(\"This is a test...\");
$word->Documents[1]->SaveAs(\"Useless test.doc\");

//closing word
$word->Quit();

//free the object
$word = null;
?>

个人建议：com实例后的方法都需要查找官方文档才知道什么意思，编辑器没有代码提示，非常不方便，另外这个效率也不是很高，不推荐使用

三、利用PHP将内容写入doc文件之中
这个方法又可以分为两种方法

1.生成mht格式（和HTML很相似）写入word
2.纯HTML格式写入word

1）、生成mht格式（和HTML很相似）写入word

复制代码代码如下:

/**
* 根据HTML代码获取word文档内容
* 创建一个本质为mht的文档，该函数会分析文件内容并从远程下载页面中的图片资源
* 该函数依赖于类MhtFileMaker
* 该函数会分析img标签，提取src的属性值。但是，src的属性值必须被引号包围，否则不能提取
*
* @param string $content HTML内容
* @param string $absolutePath 网页的绝对路径。如果HTML内容里的图片路径为相对路径，那么就需要填写这个参数，来让该函数自动填补成绝对路径。这个参数最后需要以/结束
* @param bool $isEraseLink 是否去掉HTML内容中的链接
*/
function getWordDocument( $content , $absolutePath = \"\" , $isEraseLink = true )
{
    $mht = new MhtFileMaker();
    if ($isEraseLink)
        $content = preg_replace(\'/<a\\s*.*?\\s*>(\\s*.*?\\s*)<\\/a>/i\' , \'$1\' , $content);   //去掉链接

    $images = array();
    $files = array();
    $matches = array();
    //这个算法要求src后的属性值必须使用引号括起来
    if ( preg_match_all(\'/<img[.\\n]*?src\\s*?=\\s*?[\\\"\\\'](.*?)[\\\"\\\'](.*?)\\/>/i\',$content ,$matches ) )
    {
        $arrPath = $matches[1];
        for ( $i=0;$i<count($arrPath);$i++)
        {
            $path = $arrPath[$i];
            $imgPath = trim( $path );
            if ( $imgPath != \"\" )
            {
                $files[] = $imgPath;
                if( substr($imgPath,0,7) == \'http://\')
                {
                    //绝对链接，不加前缀
                }
                else
                {
                    $imgPath = $absolutePath.$imgPath;
                }
                $images[] = $imgPath;
            }
        }
    }
    $mht->AddContents(\"tmp.html\",$mht->GetMimeType(\"tmp.html\"),$content);

    for ( $i=0;$i<count($images);$i++)
    {
        $image = $images[$i];
        if ( @fopen($image , \'r\') )
        {
            $imgcontent = @file_get_contents( $image );
            if ( $content )
                $mht->AddContents($files[$i],$mht->GetMimeType($image),$imgcontent);
        }
        else
        {
            echo \"file:\".$image.\" not exist!<br />\";
        }
    }

    return $mht->GetFile();
}

这个函数的主要功能其实就是分析HTML代码中的所有图片地址，并且依次下载下来。获取到了图片的内容以后，调用MhtFileMaker类，将图片添加到mht文件中。具体的添加细节，封装在MhtFileMaker类中了。

使用方法1：远程调用

复制代码代码如下:

$url= http://www.***.com;

$content = file_get_contents($url);

$fileContent = getWordDocument($content,\"http://www.yoursite.com/Music/etc/\");
$fp = fopen(\"test.doc\", \'w\');
fwrite($fp, $fileContent);
fclose($fp);
其中，$content变量应该是HTML源代码，后面的链接应该是能填补HTML代码中图片相对路径的URL地址

其中，$content变量应该是HTML源代码，后面的链接应该是能填补HTML代码中图片相对路径的URL地址

使用方法2:本地生成调用

复制代码代码如下:

header(\"Cache-Control: no-cache, must-revalidate\");
header(\"Pragma: no-cache\");
$wordStr = \'PHP教程网站--phpstudy.net\';
$fileContent = getWordDocument($wordStr);
$fileName = iconv(\"utf-8\", \"GBK\", ‘PHP教程\' . \'_\'. $intro . \'_\' . rand(100, 999));
header(\"Content-Type: application/doc\");
header(\"Content-Disposition: attachment; filename=\" . $fileName . \".doc\");
echo $fileContent;

注意，在使用这个函数之前，您需要先包含类MhtFileMaker，这个类可以帮助我们生成Mht文档。

复制代码代码如下:

<?php
/***********************************************************************
Class:        Mht File Maker
Version:      1.2 beta
Date:         02/11/2007
Author:       Wudi <wudicgi@yahoo.de>
Description: The class can make .mht file.
***********************************************************************/

class MhtFileMaker{
    var $config = array();
    var $headers = array();
    var $headers_exists = array();
    var $files = array();
    var $boundary;
    var $dir_base;
    var $page_first;

    function MhtFile($config = array()){

    }

    function SetHeader($header){
        $this->headers[] = $header;
        $key = strtolower(substr($header, 0, strpos($header, \':\')));
        $this->headers_exists[$key] = TRUE;
    }

    function SetFrom($from){
        $this->SetHeader(\"From: $from\");
    }

    function SetSubject($subject){
        $this->SetHeader(\"Subject: $subject\");
    }

    function SetDate($date = NULL, $istimestamp = FALSE){
        if ($date == NULL) {
            $date = time();
        }
        if ($istimestamp == TRUE) {
            $date = date(\'D, d M Y H:i:s O\', $date);
        }
        $this->SetHeader(\"Date: $date\");
    }

    function SetBoundary($boundary = NULL){
        if ($boundary == NULL) {
            $this->boundary = \'--\' . strtoupper(md5(mt_rand())) . \'_MULTIPART_MIXED\';
        } else {
            $this->boundary = $boundary;
        }
    }

    function SetBaseDir($dir){
        $this->dir_base = str_replace(\"\\\\\", \"/\", realpath($dir));
    }

    function SetFirstPage($filename){
        $this->page_first = str_replace(\"\\\\\", \"/\", realpath(\"{$this->dir_base}/$filename\"));
    }

    function AutoAddFiles(){
        if (!isset($this->page_first)) {
            exit (\'Not set the first page.\');
        }
        $filepath = str_replace($this->dir_base, \'\', $this->page_first);
        $filepath = \'http://mhtfile\' . $filepath;
        $this->AddFile($this->page_first, $filepath, NULL);
        $this->AddDir($this->dir_base);
    }

    function AddDir($dir){
        $handle_dir = opendir($dir);
        while ($filename = readdir($handle_dir)) {
            if (($filename!=\'.\') && ($filename!=\'..\') && (\"$dir/$filename\"!=$this->page_first)) {
                if (is_dir(\"$dir/$filename\")) {
                    $this->AddDir(\"$dir/$filename\");
                } elseif (is_file(\"$dir/$filename\")) {
                    $filepath = str_replace($this->dir_base, \'\', \"$dir/$filename\");
                    $filepath = \'http://mhtfile\' . $filepath;
                    $this->AddFile(\"$dir/$filename\", $filepath, NULL);
                }
            }
        }
        closedir($handle_dir);
    }

    function AddFile($filename, $filepath = NULL, $encoding = NULL){
        if ($filepath == NULL) {
            $filepath = $filename;
        }
        $mimetype = $this->GetMimeType($filename);
        $filecont = file_get_contents($filename);
        $this->AddContents($filepath, $mimetype, $filecont, $encoding);
    }

    function AddContents($filepath, $mimetype, $filecont, $encoding = NULL){
        if ($encoding == NULL) {
            $filecont = chunk_split(base64_encode($filecont), 76);
            $encoding = \'base64\';
        }
        $this->files[] = array(\'filepath\' => $filepath,
                               \'mimetype\' => $mimetype,
                               \'filecont\' => $filecont,
                               \'encoding\' => $encoding);
    }

    function CheckHeaders(){
        if (!array_key_exists(\'date\', $this->headers_exists)) {
            $this->SetDate(NULL, TRUE);
        }
        if ($this->boundary == NULL) {
            $this->SetBoundary();
        }
    }

    function CheckFiles(){
        if (count($this->files) == 0) {
            return FALSE;
        } else {
            return TRUE;
        }
    }

    function GetFile(){
        $this->CheckHeaders();
        if (!$this->CheckFiles()) {
            exit (\'No file was added.\');
        }
        $contents = implode(\"\\r\\n\", $this->headers);
        $contents .= \"\\r\\n\";
        $contents .= \"MIME-Version: 1.0\\r\\n\";
        $contents .= \"Content-Type: multipart/related;\\r\\n\";
        $contents .= \"\\tboundary=\\\"{$this->boundary}\\\";\\r\\n\";
        $contents .= \"\\ttype=\\\"\" . $this->files[0][\'mimetype\'] . \"\\\"\\r\\n\";
        $contents .= \"X-MimeOLE: Produced By Mht File Maker v1.0 beta\\r\\n\";
        $contents .= \"\\r\\n\";
        $contents .= \"This is a multi-part message in MIME format.\\r\\n\";
        $contents .= \"\\r\\n\";
        foreach ($this->files as $file) {
            $contents .= \"--{$this->boundary}\\r\\n\";
            $contents .= \"Content-Type: $file[mimetype]\\r\\n\";
            $contents .= \"Content-Transfer-Encoding: $file[encoding]\\r\\n\";
            $contents .= \"Content-Location: $file[filepath]\\r\\n\";
            $contents .= \"\\r\\n\";
            $contents .= $file[\'filecont\'];
            $contents .= \"\\r\\n\";
        }
        $contents .= \"--{$this->boundary}--\\r\\n\";
        return $contents;
    }

    function MakeFile($filename){
        $contents = $this->GetFile();
        $fp = fopen($filename, \'w\');
        fwrite($fp, $contents);
        fclose($fp);
    }

    function GetMimeType($filename){
        $pathinfo = pathinfo($filename);
        switch ($pathinfo[\'extension\']) {
            case \'htm\': $mimetype = \'text/html\'; break;
            case \'html\': $mimetype = \'text/html\'; break;
            case \'txt\': $mimetype = \'text/plain\'; break;
            case \'cgi\': $mimetype = \'text/plain\'; break;
            case \'php\': $mimetype = \'text/plain\'; break;
            case \'css\': $mimetype = \'text/css\'; break;
            case \'jpg\': $mimetype = \'image/jpeg\'; break;
            case \'jpeg\': $mimetype = \'image/jpeg\'; break;
            case \'jpe\': $mimetype = \'image/jpeg\'; break;
            case \'gif\': $mimetype = \'image/gif\'; break;
            case \'png\': $mimetype = \'image/png\'; break;
            default: $mimetype = \'application/octet-stream\'; break;
        }
        return $mimetype;
    }
}
?>

点评：这种方法的缺点是不支持批量生成下载，因为一个页面只能有一个header，（无论远程使用还是本地生成声明header页面只能输出一个header），即使你循环生成，结果还是只有一个word生成（当然你可以修改上面的方式来实现）

2.纯HTML格式写入word

原理：

利用ob_start把html页面先存储起来（解决一下页面多个header问题，可以批量生成），然后在写入doc文档内容利用

代码：

复制代码代码如下:

<?php
class word
{
    function start()
    {
        ob_start();
        echo \'<html xmlns:o=\"urn:schemas-microsoft-com:office:office\"
        xmlns:w=\"urn:schemas-microsoft-com:office:word\"
        xmlns=\"http://www.w3.org/TR/REC-html40\">\';
    }
    function save($path)
    {

        echo \"</html>\";
        $data = ob_get_contents();
        ob_end_clean();

        $this->wirtefile ($path,$data);
    }

    function wirtefile ($fn,$data)
    {
        $fp=fopen($fn,\"wb\");
        fwrite($fp,$data);
        fclose($fp);
    }
}

复制代码代码如下:

$html = \'
<table width=600 cellpadding=\"6\" cellspacing=\"1\" bgcolor=\"#336699\">
<tr bgcolor=\"White\">
  <td>PHP10086</td>
  <td><a href=\"http://www.phpstudy.net\" target=\"_blank\" >http://www.phpstudy.net</a></td>
</tr>
<tr bgcolor=\"red\">
  <td>PHP10086</td>
  <td><a href=\"http://www.phpstudy.net\" target=\"_blank\" >http://www.phpstudy.net</a></td>
</tr>
<tr bgcolor=\"White\">
  <td colspan=2 >
  PHP10086<br>
  最靠谱的PHP技术分享网站
  <img src=\"http://www.phpstudy.net/wp-content/themes/WPortal-Blue/images/logo.gif\">
  </td>
</tr>
</table>
\';

//批量生成
for($i=1;$i<=3;$i++){
    $word = new word();
    $word->start();
    //$html = \"aaa\".$i;
    $wordname = \'PHP教程网站--phpstudy.net\'.$i.\".doc\";
    echo $html;
    $word->save($wordname);
    ob_flush();//每次执行前刷新缓存
    flush();
}

个人点评：这种方法效果最好，原因有三个：

第一代码比较简洁，很容易理解
第二是支持批量生成word（这个很重要）
第三是支持完整的html代码

本文地址：https://www.stayed.cn/item/3340

转载请注明出处。

本站部分内容来源于网络,如侵犯到您的权益,请联系我

微信
QQ好友
QQ空间
腾讯微博
新浪微博
人人网

我的博客

人生若只如初见，何事秋风悲画扇。

我的标签

随笔档案

2024-02(2)
2023-06(1)
2023-05(1)
2023-04(14)
2023-03(3)
2023-01(6)
2022-12(5)
2022-11(5)
2022-07(2)
2022-06(4)
2022-05(3)
2022-03(1)
2021-12(6)
2021-11(1)
2021-10(3)
2021-09(5)
2021-07(5)
2021-02(2)
2021-01(7)
2020-12(18)
2020-11(14)
2020-10(12)
2020-09(10)
2020-08(22)
2020-07(2)
2020-06(1)
2020-04(5)
2020-03(9)
2020-02(7)
2020-01(9)
2019-12(8)
2019-11(10)
2019-10(11)
2019-09(17)
2019-08(16)
2019-07(6)
2019-06(3)
2019-04(1)
2019-03(8)
2019-02(5)
2019-01(1)
2018-11(2)
2018-10(3)
2018-09(1)
2018-08(3)
2018-07(3)
2018-06(7)
2018-04(4)
2018-03(5)
2018-02(4)
2018-01(22)
2017-12(3)
2017-11(5)
2017-10(15)
2017-09(26)
2017-08(1)
2017-07(3)