RegExLib.com - The first Regular Expression Library on the Web!

Please support RegExLib Sponsors

Sponsors

Regular Expression Details

Title Test Find Pattern to find Anchor Tag in a web page
Expression
<a[\s]+[^>]*?href[\s]?=[\s\"\']*(.*?)[\"\']*.*?>([^<]+|.*?)?<\/a>
Description
This pattern is a slight modification in pattern submitted by Jacek Sompel. Using this tag one can also match anchor tags not having ' (single quote) or " (double quote) in href. This is useful for web crawler for crawling all links in a web page.
Matches
&lt;a href='http://www.regexlib.com'&gt;Text&lt;/a&gt; | &lt;a href="..."&gt;Text&lt;/a&gt; | &lt;a href=http://www.regexlib.com&gt;Text&lt;/a&gt;
Non-Matches
all other html tags
Author Rating: The rating for this expression. Kuleen Upadhyaya
Source
Your Rating
Bad Good

Enter New Comment

Title

Name

Comment

Spammers suck - we apologize. Please enter the text shown below to enable your comment (not case sensitive - try as many times as you need to if the first ones are too hard):

Existing User Comments

Title: Do not match HTML with a regex
Name: Randal L. Schwartz
Date: 4/20/2006 5:53:57 PM
Comment:
HTML matching with a regex is very hard. This pattern is only an approximation, and will be fooled by some webpages.


Copyright © 2001-2024, RegexAdvice.com | ASP.NET Tutorials