RegExLib.com - The first Regular Expression Library on the Web!

Please support RegExLib Sponsors

Sponsors

Regular Expression Details

Title Test Find Pattern Title
Expression
^<a\s+href\s*=\s*"http:\/\/([^"]*)"([^>]*)>(.*?(?=<\/a>))<\/a>$
Description
Regexp to find all external links in a HTML string. Can easily be modified to handle all/other links/protocols (like file/https/ftp). Uses lookahead assertions and non-greedy modifier to check for the end </a> but still allow html tags inbetween start and end A tag. Takes into account that there could be linebreaks and other nasty whitespace chars in the middle of the tag. I am using it to find all external links in embedded HTML code and change 1.the target of the link 2.insert a "Leaving Site" logo to illustrate you are leaving site.
Matches
<a href="http://www.mysite.com">my external link</a> | <a href="http:/
Non-Matches
<a href="myinternalpage.html">my internal link</a>
Author Rating: Not yet rated. Anders Rask
Source
Your Rating
Bad Good

Enter New Comment

Title
 
Name
 
Comment
 
Spammers suck - we apologize. Please enter the text shown below to enable your comment (not case sensitive - try as many times as you need to if the first ones are too hard):

Existing User Comments

Title: Limited...
Name: Tatham Oddie
Date: 7/10/2004 3:51:07 AM
Comment:
Dont mean to jump Anders, but this is a limited regex. I have just finished writing a link validator (http://www.ssw.com.au/ssw/linkauditor) and it turns out to be a lot more complex. It doesn't handle javascript based links - ie <a href="#" onclick="...">... would evaluate as an external link, even though it points to the same page. It doesn't handle nested links - which some fools try to do. If you were trying to extract links from a page, for some form of auditing or crawling purpose, you would also need to handle the <base href="..."> tag, the assosciated HTTP header, or assosciated HTTP-Equiv META tag.


Title: Unusual, but migh appear
Name: Adi Rotaru
Date: 6/6/2004 4:16:04 AM
Comment:
Hi, There are some (we could say) exceptional situations which you did not address in your regexp: Situation 1: <a href=http://www.yahoo.com>YAHOO!</a> or: <a href='http://www.yahoo.com>YAHOO!'</a> or: <a href=' http://www.yahoo.com '>YAHOO!</a> notice the different VALID ways to specify URL. Don't forget that HTML is "blank-blind" :) Situation 2: <a href=http://www.yahoo.com>YAHOO!</a > or: <a href = http://www.yahoo.com>YAHOO!</a > Hope to be helpful ;) Now, get to work :)) Bye!


Title: Using the pattern
Name: Anders Rask
Date: 12/10/2003 1:58:31 PM
Comment:
Hi, the pattern contains parentheses that can be used in either test or (as in your case) replace. Depending on what language you code in, you can access the data inside the parentheses with properties. For Javascript check the following two references for RegExp and string.Replace() http://www.devguru.com/Technologies/ecmascript/quickref/regexp.html http://www.devguru.com/Technologies/ecmascript/quickref/string_replace.html hope this helps -Anders


Title: add target attribute to external urls
Name: Michael Iantosca
Date: 12/10/2003 12:35:50 PM
Comment:
You mentioned you were going to use it to add targets to all external urls. This is exactly what I need but how would I use it to add a target attribute to all external urls? thanks Michael


Copyright © 2001-2024, RegexAdvice.com | ASP.NET Tutorials