SANITIZE_TITLE_WITH_DASSES格式化函数是否过于自由(就接受的字符而言)?

时间:2011-04-22 作者:julien_c

sanitize_title_with_dashes (参考见下面的代码)是Wordpress用来格式化“漂亮”URL的函数。然而,与函数的注释头相反,它允许的不仅仅是字母数字字符、下划线(\\u0)和破折号(-)。它还允许诸如°、等标志。

我如何才能真正做到只允许字母数字字符和破折号?

/**
 * Sanitizes title, replacing whitespace with dashes.
 *
 * Limits the output to alphanumeric characters, underscore (_) and dash (-).
 * Whitespace becomes a dash.
 *
 * @since 1.2.0
 *
 * @param string $title The title to be sanitized.
 * @return string The sanitized title.
 */
function sanitize_title_with_dashes($title) {
    $title = strip_tags($title);
    // Preserve escaped octets.
    $title = preg_replace(\'|%([a-fA-F0-9][a-fA-F0-9])|\', \'---$1---\', $title);
    // Remove percent signs that are not part of an octet.
    $title = str_replace(\'%\', \'\', $title);
    // Restore octets.
    $title = preg_replace(\'|---([a-fA-F0-9][a-fA-F0-9])---|\', \'%$1\', $title);

    $title = remove_accents($title);
    if (seems_utf8($title)) {
        if (function_exists(\'mb_strtolower\')) {
            $title = mb_strtolower($title, \'UTF-8\');
        }
        $title = utf8_uri_encode($title, 200);
    }

    $title = strtolower($title);
    $title = preg_replace(\'/&.+?;/\', \'\', $title); // kill entities
    $title = str_replace(\'.\', \'-\', $title);
    $title = preg_replace(\'/[^%a-z0-9 _-]/\', \'\', $title);
    $title = preg_replace(\'/\\s+/\', \'-\', $title);
    $title = preg_replace(\'|-+|\', \'-\', $title);
    $title = trim($title, \'-\');

    return $title;
}

1 个回复
最合适的回答,由SO网友:fuxia 整理而成

将此函数视为粗略的占位符。它的缺陷比你想象的要多……:)
有许多插件可以改进不同语言和需求的转换。你可以看看我的插件Germanix 看看如何做到这一点。

结束