RegExLib.com - The first Regular Expression Library on the Web!

Please support RegExLib Sponsors

Sponsors

Regular Expression Details

Title Test Find Pattern Title
Expression
([\d\w-.]+?\.(a[cdefgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmnoz]|e[ceghrst]|f[ijkmnor]|g[abdefghilmnpqrstuwy]|h[kmnrtu]|i[delmnoqrst]|j[emop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrstwy]|qa|r[eouw]|s[abcdeghijklmnortuvyz]|t[cdfghjkmnoprtvwz]|u[augkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|aero|arpa|biz|com|coop|edu|info|int|gov|mil|museum|name|net|org|pro)(\b|\W(?<!&|=)(?!\.\s|\.{3}).*?))(\s|$)
Description
This will find URLs in plain text. With or without protocol. It matches against all toplevel domains to find the URL in the text.
Matches
http://www.website.com/index.html | www.website.com | website.com
Non-Matches
Works in all my tests. Does not capture protocol.
Author Rating: The rating for this expression. James Johnston
Source Modified, can't remember original source
Your Rating
Bad Good

Enter New Comment

Title
 
Name
 
Comment
 
Spammers suck - we apologize. Please enter the text shown below to enable your comment (not case sensitive - try as many times as you need to if the first ones are too hard):

Existing User Comments

Title: Doesn't work
Name: Anonymous
Date: 10/20/2011 5:05:14 AM
Comment:
Matches on everything!


Title: Alternate version
Name: Sect
Date: 5/29/2011 11:15:55 PM
Comment:
The following will work to capture "site.com/index.html." without including the trailing punctuation: \b(([\w-]+://?|www[.]|[\d\w-.]+?\.(a[cdefgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmnoz]|e[ceghrst]|f[ijkmnor]|g[abdefghilmnpqrstuwy]|h[kmnrtu]|i[delmnoqrst]|j[emop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrstwy]|qa|r[eouw]|s[abcdeghijklmnortuvyz]|t[cdfghjkmnoprtvwz]|u[augkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|aero|arpa|biz|com|coop|edu|info|int|gov|mil|museum|name|net|org|pro)[/])[^\s()<>]*(\([\w\d]+\)|([^[:punct:]\s]|/)))


Title: This does not work
Name: JohnC
Date: 11/20/2008 2:14:35 AM
Comment:
This does not work at all. Very few of the regular expressions on this site do.


Title: James Johnstons url regex
Name: DC
Date: 6/24/2008 10:54:01 PM
Comment:
Works on almost all my tests except ftp://myname@host.dom/%2Fetc/motd prospero://host.dom//pros/name


Title: Ok... I got it.
Name: James Johnston
Date: 3/1/2005 11:03:43 PM
Comment:
I understand. Thanks. I appreciate the help. :) It includes all TLDs but how do I exclude certain matches? I've noticed that if a jpeg image is listed in the text it matches the .jp part of the extension and thinks it's a URL.


Title: Whuh? {1} DOES NOTHING, GET IT?
Name: Randal L. Schwartz
Date: 3/1/2005 9:37:50 PM
Comment:
You still have {1} there. It DOES NOTHING WASTE THREE CHARACTERS OF YOUR REGEX. Get it? {1} is useless. Pointless. Always. a matches 1 a a{1} matches 1 a Same exact thing Get it?


Title: Updated...
Name: James Johnston
Date: 3/1/2005 2:16:24 PM
Comment:
There... updated it. :) That should be better.


Title: Sorry... :]
Name: James Johnston
Date: 3/1/2005 2:00:50 PM
Comment:
Thanks for the suggestion about the PERL "URI::Find". I'm new to regexps and this worked in my tests. I didn't see a regexp on this site that did exactly what I wanted. I did notice the error with the second {1} after I'd already posted this. The last part of the regexp should be "coop){1}[:/]?.*?)(\s|$)" would also remove the need for the regexp to be followed by a newline or space. Is there a way to change my post?


Title: Bad
Name: Randal L. Schwartz
Date: 3/1/2005 7:00:05 AM
Comment:
First off, the {1} do absolutely nothing except take up three characters (twice!). Second, this is case sensitive. Third, you should probably look at the Perl "URI::Find" module to see how to do it right.


Copyright © 2001-2014, RegexAdvice.com | ASP.NET Tutorials