Trim URL with PHP
I have a form where a user can submit a web address, in the following format:
I want to trim the url so it looks like this:
I have used parse_url with PHP_URL_HOST which gives me
I just need to remove the further 'www' or '123' or whatever may be there in the prefix for that matter.
you can do a string search for the first occurrence of the . and trim from ther to the end of the string
the strpos function will find the first dot and the value returned can be used with substr to return the bit you want
try $stripped = substr($url,strpos($url,".")+1)
Just be careful as to what format you think a URL should appear (before trying to parse it).
How to Obscure Any URL makes interesting reading!
How about explode(".",$url)?
I too, thought of strrpos to search backwards, but .coms will have 1 dot and co.uks etc will have 2.
I've struggled with this for a content management routine I provided for a friend of mine. It's tough to parse his updates and find the urls and make them active when displayed on the site.
Have you looked at the full array returned by parse_url, the manual shows a lot of information can be returned...
[scheme] => http
[host] => hostname
[user] => username
[pass] => password
[path] => /path
[query] => arg=value
[fragment] => anchor
you could us substr_count to see how many dots there are and then decide where to truncate
Hmmm, doesn't seem like a simple fix then?
strpos() will find the 1st occurrence, so if you only want to remove the prefix, it should work fine.
strpos was my suggestion...seems easy enough to me
But in the instance when somebody submits what will happen? I'll be left with won't I?
Originally Posted by Gerry
What I started thinking was that you need to make sure that there are at least 2 full stops left before you remove anything - that deals with "google.com" - it has 1 full stop, you don't strip any more. Sadly, it fails with bbc.co.uk - it has 2 full stops so you remove to get co.uk
The only thing I can think of is to work back from the end and do an nslookup on each name that you get; once you can resolve, you know you've got a name.
eg - you're given Google so you try "com" and it fails; you try "google.com" and it works
Not sure how you do name lookups in PHP (I can do it in ASP.Net but that doesn't really help :-)) but I'm sure it must be there (everything else is)
use strrpos, that counts backwards and only slice if you find >1 if .com or >2 if co.uk etc. That's the problem I found, what is the domain and how many dots does it contain?
I'm wondering why you would want to do this in the first place - mainly for the reasons mentioned. www.site.com might be the only hostname that works - site.com might not have an appropriate record configured. And what has also been said is that it is very hit and miss as to the results you will get back from splitting, exploding, strpos'ing etc.
I have a database that staff can add sites to, by giving an address. The site is stored exactly as they enter it
Originally Posted by webman
and then is sent to a text file stripped of any rubbish
so that our whitelist unblocks it. The reason I want to unblock everything for bbc.co.uk is because styles and images are stored under a different prefix (as there are for many sites nowadays) and without this the sites dont display properly.
ah so now you have changed the question...
you originally said urls were entered as www.site.com etc so counting from the front would work... :D