我遇到了类似的需求,但找不到现成的解决方案,因此我创建了一个基于标准PHP函数parse\\u url()的函数,并随时间添加到该函数中,以提取我能想到的所有内容。
下面是我的代码和两个输出示例。这将提取子域、根域、tld、扩展、路径、绝对地址等:
/**
* Parse and check the URL Sets the following array parameters
* scheme, host, port, user, pass, path, query, fragment, dirname, basename, filename, extension, domain,
* domainX, absolute address
* @param string $url of the site
* @param string $retdata if true then return the parsed URL data otherwise set the $urldata class variable
* @return array|mixed|boolean
*/
function parseURL($url,$retdata=true){
$url = substr($url,0,4)==\'http\'? $url: \'http://\'.$url; //assume http if not supplied
if ($urldata = parse_url(str_replace(\'&\',\'&\',$url))){
$path_parts = pathinfo($urldata[\'host\']);
$tmp = explode(\'.\',$urldata[\'host\']); $n = count($tmp);
if ($n>=2){
if ($n==4 || ($n==3 && strlen($tmp[($n-2)])<=3)){
$urldata[\'domain\'] = $tmp[($n-3)].".".$tmp[($n-2)].".".$tmp[($n-1)];
$urldata[\'tld\'] = $tmp[($n-2)].".".$tmp[($n-1)]; //top-level domain
$urldata[\'root\'] = $tmp[($n-3)]; //second-level domain
$urldata[\'subdomain\'] = $n==4? $tmp[0]: ($n==3 && strlen($tmp[($n-2)])<=3)? $tmp[0]: \'\';
} else {
$urldata[\'domain\'] = $tmp[($n-2)].".".$tmp[($n-1)];
$urldata[\'tld\'] = $tmp[($n-1)];
$urldata[\'root\'] = $tmp[($n-2)];
$urldata[\'subdomain\'] = $n==3? $tmp[0]: \'\';
}
}
//$urldata[\'dirname\'] = $path_parts[\'dirname\'];
$urldata[\'basename\'] = $path_parts[\'basename\'];
$urldata[\'filename\'] = $path_parts[\'filename\'];
$urldata[\'extension\'] = $path_parts[\'extension\'];
$urldata[\'base\'] = $urldata[\'scheme\']."://".$urldata[\'host\'];
$urldata[\'abs\'] = (isset($urldata[\'path\']) && strlen($urldata[\'path\']))? $urldata[\'path\']: \'/\';
$urldata[\'abs\'] .= (isset($urldata[\'query\']) && strlen($urldata[\'query\']))? \'?\'.$urldata[\'query\']: \'\';
//Set data
if ($retdata){
return $urldata;
} else {
$this->urldata = $urldata;
return true;
}
} else {
//invalid URL
return false;
}
}
示例1:如果您提交示例url(
https://developer.wordpress.org/reference/functions/wp_parse_url/) 输出如下:
array (
\'scheme\' => \'https\',
\'host\' => \'developer.wordpress.org\',
\'path\' => \'/reference/functions/wp_parse_url/\',
\'domain\' => \'wordpress.org\',
\'tld\' => \'org\',
\'root\' => \'wordpress\',
\'subdomain\' => \'developer\',
\'basename\' => \'developer.wordpress.org\',
\'filename\' => \'developer.wordpress\',
\'extension\' => \'org\',
\'base\' => \'https://developer.wordpress.org\',
\'abs\' => \'/reference/functions/wp_parse_url/\',
)
示例2:其他一些虚构的url,其中包含更多内容
http://dev.yoursite.com/some/other/directory/index.php?pg=7 - 现在输出为:
array (
\'scheme\' => \'http\',
\'host\' => \'dev.yoursite.com\',
\'path\' => \'/some/other/directory/index.php\',
\'query\' => \'pg=7\',
\'domain\' => \'yoursite.com\',
\'tld\' => \'com\',
\'root\' => \'yoursite\',
\'subdomain\' => \'dev\',
\'basename\' => \'dev.yoursite.com\',
\'filename\' => \'dev.yoursite\',
\'extension\' => \'com\',
\'base\' => \'http://dev.yoursite.com\',
\'abs\' => \'/some/other/directory/index.php?pg=7\',
)
可能信息比您想要的要多,而且有些信息是多余的,但您可以稍微修改该函数以获得所需的信息,也可以按原样使用它并使用所需的阵列部分。
注意:如果您提交https://developer.wordpress.org 对于wordpress或PHP内置的url解析函数,将不会在输出中定义“路径”。函数的作用是:将路径设置为“/”。