更改WordPress中所有img标签的html结构

时间：2013-01-14 作者：kaffolder

我有一个Wordpress博客，正在尝试实现foresight.js 图像脚本。简而言之，我需要针对所有帖子图像，交换掉src, width, & height 具有的属性data-src, data-width, & data-height 属性。然后我需要复制图像线并将其包裹<noscript> 标签。这是我试图让Wordpress生成/创建的结构：

<img data-src="wordpress/image/url/pic.jpg" data-width="{get width of image with PHP & pass-in that value here} data-height="{get height of image with PHP and pass-in that value here}" class="fs-img">
<noscript>
    <img src="wordpress/image/url/pic.jpg">
</noscript>

我已经搜索了Wordpress codex，我能找到的最佳途径是使用过滤器（即“get\\u image\\u tag”（获取图像标签）和“image\\u tag”（图像标签）来修改Wordpress为每个图像输出的html。我在想，这些选项中的一个应该有效，或者我可以用regex进行一些模式匹配（我知道，不理想）preg_replace 然后把这个注射回the_content 滤器

我已经尝试了其中的一些选项，但都没有成功。有人能帮忙吗？找到一个建议here, 但甚至不能让它工作！

“get\\u image\\u tag”尝试：

在web上找到了这个特定的标签，但需要修改它以符合我的逻辑（请参见上面的结构）。。。无法理解preg_replace 数组是我自己做的

<?php function image_tag($html, $id, $alt, $title) {
    return preg_replace(array(
        \'/\'.str_replace(\'//\',\'\\/\\/\',get_bloginfo(\'url\')).\'/i\',
        \'/\\s+width="\\d+"/i\',
        \'/\\s+height="\\d+"/i\',
        \'/alt=""/i\'
    ),
    array(
        \'\',
        \'\',
        \'\',
        alt=\'"\' . $title . \'"\'
    ),
    $html);
}
add_filter(\'get_image_tag\', \'image_tag\', 0, 4);
?>

另一次“get\\u image\\u标记”尝试：

<?php function get_image_tag($id, $alt, $title, $align, $size=\'full\') {
    list($width, $height, $type, $attr) = getimagesize($img_src);
    $hwstring = image_hwstring($width, $height);

    $class = \'align\' . esc_attr($align) . \' size-\' . esc_attr($size) . \' wp-image-\' . $id;
    $class = apply_filters(\'get_image_tag_class\', $class, $id, $align, $size);

    $html = \'<img src="\' . esc_attr($img_src) . \'" alt="\' . esc_attr($alt) . \'" title="\' . esc_attr($title).\'" data-width="\' . $width . \'" data-height="\' . $height . \'" class="\' . $class . \' fs-img" />\';
    $html = apply_filters( \'get_image_tag\', $html, $id, $alt, $title, $align, $size);

    return $html;
}
?>

模式匹配尝试：尝试在这一个上创建我自己的正则表达式，但不确定是否正确。

<?php function restructure_imgs($content) {
    global $post;
    $pattern = "/<img(.*?)src=(\'|\\")(.*?).(bmp|gif|jpeg|jpg|png)(|\\")(.*?)>/i";

    list($width, $height, $type, $attr) = getimagesize($2$3.$4$5);
    $hwstring = image_hwstring($width, $height);

    $replacement = \'<img$1data-src=$2$3.$4$5 title="\'.$post->post_title.\'" data-width="\'.$width.\'" data-height="\'.$height.\'" class="fs-img"$6>\';
    $content = preg_replace($pattern, $replacement, $content);
    return $content;
}
add_filter(\'the_content\', \'restructure_imgs\');
?>

不幸的是，这些例子中没有一个能起作用。如果您能提供任何帮助或分享您预先编写的脚本/功能，我们将不胜感激！感谢您帮助学生学习！！

2 个回复

SO网友:Sunyatasattva

您尝试在其上运行的筛选器image insertion, 因此，使用这些过滤器无法交换帖子中已经存在的所有图像。但是，如果您打算更改为img 从现在开始标记。

过滤器the_content, 但是，在从数据库检索到帖子之后，在将其显示到屏幕之前，将应用于帖子。我相信，为了在不重新插入图像的情况下更改现有帖子，您可以使用此过滤器。

您可以分析the_content 使用PHP DOMDocument class. 在PHP中进行HTML解析时，do not use regex.

我为您想要做的事情编写了一个示例函数，为了解释这些段落，它有点冗长。可以随意调整。

<?php
function foresight_hires_img_replace($the_content) {
    // Create a new istance of DOMDocument
    $post = new DOMDocument();
    // Load $the_content as HTML
    $post->loadHTML($the_content);
    // Look up for all the <img> tags.
    $imgs = $post->getElementsByTagName(\'img\');

    // Iteration time
    foreach( $imgs as $img ) {
        // Let\'s make sure the img has not been already manipulated by us
        // by checking if it has a data-src attribute (we could also check
        // if it has the fs-img class, or whatever check you might feel is
        // the most appropriate.
        if( $img->hasAttribute(\'data-src\') ) continue;

        // Also, let\'s check that the <img> we found is not child of a <noscript>
        // tag, we want to leave those alone as well.
        if( $img->parentNode->tagName == \'noscript\' ) continue;

        // Let\'s clone the node for later usage.
        $clone = $img->cloneNode();

        // Get the src attribute, remove it from the element, swap it with
        // data-src
        $src = $img->getAttribute(\'src\');
        $img->removeAttribute(\'src\');   
        $img->setAttribute(\'data-src\', $src);

        // Same goes for width...
        $width = $img->getAttribute(\'width\');
        $img->removeAttribute(\'width\');
        $img->setAttribute(\'data-width\', $width);

        // And height... (and whatever other attribute your js may need
        $height = $img->getAttribute(\'height\');
        $img->removeAttribute(\'height\');
        $img->setAttribute(\'data-height\', $height);

    // Get the class and add fs-img to the existing classes
        $imgClass = $img->getAttribute(\'class\');
        $img->setAttribute(\'class\', $imgClass . \' fs-img\');

        // Let\'s create the <noscript> element and append our original
        // tag, which we cloned earlier, as its child. Then, let\'s insert
        // it before our manipulated element
        $no_script = $post->createElement(\'noscript\');
        $no_script->appendChild($clone);
        $img->parentNode->insertBefore($no_script, $img);
    };

     return $post->saveHTML();
 }

 add_filter(\'the_content\', \'foresight_hires_img_replace\');
 ?>

我没有专门用Wordpress测试它，但我用一个示例post输出测试了它，它应该可以工作。

SO网友:Marcos paiva

这段代码对我来说效果很好，但在获得最终版本的过程中，我遇到了一些问题。

Warning

首先，服务器开始显示一些警告，如Warning: DOMDocument::loadHTML(): Unexpected end tag. This question 显示错误的更多详细信息以及解决方法，但要添加，请添加此libxml_use_internal_errors(true); 调用前主函数处的代码loadHTML 解决问题。

第二组问题是方法DOMDocument::loadHTML.

Character encoding

该文档具有very important comment Shane Harte关于UTF-8文档的字符编码。所以，之前loadHTML 我必须使用[mb_convert_encoding][4] 具有\'HTML-ENTITIES\', "UTF-8" 参数。

HTML wrapper

该方法的第二个问题是，输出总是包含doctype+HTML+body标记，在这种情况下，这是一个巨大的问题，因为您只处理文档的一个剪辑(the_content()) 而不是全部。

解决此问题的最简单方法是使用libxml常量作为loadHTML方法的参数

LIBXML_HTML_NOIMPLIED 关闭自动添加隐含的html/正文元素LIBXML_HTML_NODEFDTD 防止在找不到默认doctype时添加默认doctype。

类似于$output->loadHTML(mb_convert_encoding($the_content, \'HTML-ENTITIES\', \'UTF-8\'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

Unexpected end first tag of the_content()

我开始发现的另一个问题是first element “the\\u content”的未正确关闭。

例如，如果第一个元素是<p>, 的全部内容the_content 被第一个包裹<p>. 在许多情况下，内容以H2开头，然后出现了相同的问题。

经过大量研究，我发现this comment 打开我心扉的尼古拉斯·香克斯：

LibXML需要一个根节点，并将它找到的第一个元素视为根节点，在中途删除它找到的结束标记（位置不正确），然后输出在文档末尾找到的第一个元素的结束标记。

因此，我的代码的第一部分如下所示：

libxml_use_internal_errors(true);
$encode_content = mb_convert_encoding($the_content, \'HTML-ENTITIES\', \'UTF-8\');
$post = new DOMDocument();
$workarround = \'<section class="sanitized-content">\'. $encode_content . \'</section>\';
$post->loadHTML( $workarround, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );

✌️

结束

小码农CODE

更改WordPress中所有img标签的html结构

“get\\u image\\u tag”尝试：

相关推荐

Different Server for Images