Title |
Test
Find
Pattern to find Anchor Tag in a web page
|
Expression |
<a[\s]+[^>]*?href[\s]?=[\s\"\']*(.*?)[\"\']*.*?>([^<]+|.*?)?<\/a> |
Description |
This pattern is a slight modification in pattern submitted by Jacek Sompel. Using this tag one can also match anchor tags not having ' (single quote) or " (double quote) in href. This is useful for web crawler for crawling all links in a web page. |
Matches |
<a href='http://www.regexlib.com'>Text</a> | <a href="...">Text</a> | <a href=http://www.regexlib.com>Text</a> |
Non-Matches |
all other html tags |
Author |
Rating:
Kuleen Upadhyaya
|
Source |
|
Your Rating |
|
Title: Do not match HTML with a regex
Name: Randal L. Schwartz
Date: 4/20/2006 5:53:57 PM
Comment:
HTML matching with a regex is very hard. This pattern is only an approximation, and will be fooled by some webpages.