Displaying page
of
pages;
Items to
Title |
Test
Details
Pattern Title
|
Expression |
\[link="(?<link>((.|\n)*?))"\](?<text>((.|\n)*?))\[\/link\] |
Description |
This can be used in conjunction with the replace method to provide pseudo-code support without having to enable HTML. The replacement string (in ASP.NET, use RegExp.Replace(SourceString, RegularExpressionPattern, ReplacementString) is <a href="${link}">${text}</a>. |
Matches |
[link="http://www.yahoo.com"]Yahoo[/link] |
Non-Matches |
[link]http://www.yahoo.com[/link] | [link=http://www.yahoo.com]Yahoo[/link] |
Author |
Rating:
Not yet rated.
Ryan S
|
Title |
Test
Details
Pattern Title
|
Expression |
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])? |
Description |
*CORRECTED: Again thanks for all the comments below. If you want to include internal domain as well change the partial code (\.[\w-_]+)+ to (\.[\w-_]+)?
See the comments below*
This is the regular expression I use to add links in my email program. It also ignores those suppose-to-be commas/periods/colons at the end of the URL, like this sentence "check out http://www.yahoo.com/." (the period will be ignored) Note that it requires some modification to match ones that dont start with http. |
Matches |
http://regxlib.com/Default.aspx | http://electronics.cnet.com/electronics/0-6342366-8-8994967-1.html |
Non-Matches |
www.yahoo.com |
Author |
Rating:
M H
|
Title |
Test
Details
Pattern Title
|
Expression |
^(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*[^\.\,\)\(\s]$ |
Description |
This Regex (can be used e.g. in PHP with eregi) will match any valid URL. Unlike the other exapmles here, it will NOT match a valid URL ending with a dot or bracket. This is important if you use this regex to find and "activate" Links in an Text |
Matches |
https://www.restrictd.com/~myhome/ |
Non-Matches |
http://www.krumedia.com. | (http://www.krumedia.com) | http://www.krumedia.com, |
Author |
Rating:
Not yet rated.
Michael Krutwig
|
Title |
Test
Details
Pattern Title
|
Expression |
(("|')[a-z0-9\/\.\?\=\&]*(\.htm|\.asp|\.php|\.jsp)[a-z0-9\/\.\?\=\&]*("|'))|(href=*?[a-z0-9\/\.\?\=\&"']*) |
Description |
Will locate an URL in a webpage.
It'll search in 2 ways - first it will try to locate a href=, and then go to the end of the link. If there is nu href=, it will search for the end of the file instead (.asp, .htm and so on), and then take the data between the "xxxxxx" or 'xxxxxx' |
Matches |
href="produktsida.asp?kategori2=218" | href="NuclearTesting.htm" |
Non-Matches |
U Suck |
Author |
Rating:
Not yet rated.
Henric Rosvall
|
Title |
Test
Details
Pattern Title
|
Expression |
^<a\s+href\s*=\s*"http:\/\/([^"]*)"([^>]*)>(.*?(?=<\/a>))<\/a>$ |
Description |
Regexp to find all external links in a HTML string.
Can easily be modified to handle all/other links/protocols (like file/https/ftp).
Uses lookahead assertions and non-greedy modifier to check for the end </a> but still allow html tags inbetween start and end A tag.
Takes into account that there could be linebreaks and other nasty whitespace chars in the middle of the tag.
I am using it to find all external links in embedded HTML code and change 1.the target of the link 2.insert a "Leaving Site" logo to illustrate you are leaving site. |
Matches |
<a href="http://www.mysite.com">my external link</a> | <a href="http:/ |
Non-Matches |
<a href="myinternalpage.html">my internal link</a> |
Author |
Rating:
Not yet rated.
Anders Rask
|
Title |
Test
Details
Pattern Title
|
Expression |
<[aA][ ]{0,}([a-zA-Z0-9"'_,.:;!?@$&()%=/ ]|[-]|[ \f]){0,}>((<(([a-zA-Z0-9"'_,.:;!?@$&()%=/ ]|[-]|[ \f]){0,})>([a-zA-Z0-9"'_,.:;!?@$&()%=/ ]|[-]|[ \f]){0,})|(([a-zA-Z0-9"'_,.:;!?@$&()%=/ ]|[-]|[ \f]){0,})){0,} |
Description |
I wrote this sweet little (well, not so little really) reg to extract links from an HTML source.... it is very robust, give it a try.
The only limitation I have discovered is that it can't match invalid HTML... |
Matches |
<a href='javascript:functionA();'><i>this text is italicized</i></a> |
Non-Matches |
<A href='#'><P</A></P> |
Author |
Rating:
Not yet rated.
Brian Webb
|
Title |
Test
Details
Pattern Title
|
Expression |
\b(((\S+)?)(@|mailto\:|(news|(ht|f)tp(s?))\://)\S+)\b |
Description |
Whilst writing a plain-text to HTML function, I ran into the problem of links that users had written with &lt;a&gt; tags (as opposed to just writing the URL) were linking improperly. This regular expression returns many types of URL, and preceding characters, if any. This allows you to handle each type of match appropriately |
Matches |
|
Non-Matches |
www.deepart.org | deepart.org | 123.123.123.123 |
Author |
Rating:
Not yet rated.
Demo Gorgon
|
Title |
Test
Details
Pattern Title
|
Expression |
(mailto\:|(news|(ht|f)tp(s?))\://)(([^[:space:]]+)|([^[:space:]]+)( #([^#]+)#)?) |
Description |
this is a very little regex for use within a content management software. links within textfields has not to be written in html. the editor of the cms is instructed to use it like this: 1. mention spaces in front and behind the url 2. start url with http://, mailto://, ftp:// ... 3. use optional linktext within #linktext# (separated with single space) 4. if there is no linktext the url/email will show up as linktext 5. avoid url with spaces in filename (use %20 urldecode) replace pattern (space in front): <a href="\\1\\3\\4" target="_blank">\\3\\6</a> |
Matches |
http://www.domain.com | http://www.domain.com/index%20page.htm #linktext# | mailto://user@domai |
Non-Matches |
<a href="http://www.domain.com">real html link</a> | http://www.without_space_ |
Author |
Rating:
Not yet rated.
Martin Schwedes
|
Title |
Test
Details
U.S. Street Address
|
Expression |
^(?n:(?<address1>(\d{1,5}(\ 1\/[234])?(\x20[A-Z]([a-z])+)+ )|(P\.O\.\ Box\ \d{1,5}))\s{1,2}(?i:(?<address2>(((APT|B LDG|DEPT|FL|HNGR|LOT|PIER|RM|S(LIP|PC|T(E|OP))|TRLR|UNIT)\x20\w{1,5})|(BSMT|FRNT|LBBY|LOWR|OFC|PH|REAR|SIDE|UPPR)\.?)\s{1,2})?)(?<city>[A-Z]([a-z])+(\.?)(\x20[A-Z]([a-z])+){0,2})\, \x20(?<state>A[LKSZRAP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADL N]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD] |T[NX]|UT|V[AIT]|W[AIVY])\x20(?<zipcode>(?!0{5})\d{5}(-\d {4})?))$ |
Description |
captures US street address.
Address format: ##### Street 2ndunit City, ST zip+4
address1 - must have street number and proper case street name. no punctuation or P.O Box #### punctuation manditory for P.O.
address2 - optional secondary unit abbr. Secondary range required for some units.
City - Proper case city name.
State - State abbreviation. All caps
zip - zip+4. Can't be all zeroes
Abbreviations for secondary units and States are those used by the US Postal Service.
http://www.usps.com/ncsc/lookups/usps_abbreviations.html
Certain secondary units require a secondary range, see the above link
THis RE isn't unbreakable, Probably will allow some false positives but should work for most addresses. |
Matches |
123 Park Ave Apt 123 New York City, NY 10002 | P.O. Box 12345 Los Angeles, CA 12304 |
Non-Matches |
123 Main St | 123 City, State 00000 | 123 street city, ST 00000 |
Author |
Rating:
Michael Ash
|
Title |
Test
Details
Pattern Title
|
Expression |
(\s|\n|^)(\w+://[^\s\n]+) |
Description |
will match free floating valid protocol + urls in text ... will not touch the ones wrapped in a tag, so that you can auto-link the ones that aren't :) couple of things to know :
1. if the url is next to a tag this won't work (eg : <br>http://www.acme.com), the url must either start with a \s, \n or any character other than >.
2. the pattern will match the preceding \s and \n too, so when you replace put them back in place $1 will either be \s or \n, $2 will be the exact match
vb usage :
set re = New RegExp
re.Pattern ="(\s|\n|^)(\w+://[^\s\n]+)"
strResult = re.Replace(strText, "$1<a href='$2' target='_new'>$2</a>") |
Matches |
http://www.acme.com | ftp://ftp.acme.com/hede | gopher://asdfasd.asdfasdf |
Non-Matches |
<a href="http://acme.com">http://www.acme.com</a> | <br>http://www.acme. |
Author |
Rating:
ic onur
|
Title |
Test
Details
email address (RFC 2822 mailbox)
|
Expression |
^((?>[a-zA-Z\d!#$%&'*+\-/=?^_`{|}~]+\x20*|"((?=[\x01-\x7f])[^"\\]|\\[\x01-\x7f])*"\x20*)*(?<angle><))?((?!\.)(?>\.?[a-zA-Z\d!#$%&'*+\-/=?^_`{|}~]+)+|"((?=[\x01-\x7f])[^"\\]|\\[\x01-\x7f])*")@(((?!-)[a-zA-Z\d\-]+(?<!-)\.)+[a-zA-Z]{2,}|\[(((?(?<!\[)\.)(25[0-5]|2[0-4]\d|[01]?\d?\d)){4}|[a-zA-Z\d\-]*[a-zA-Z\d]:((?=[\x01-\x7f])[^\\\[\]]|\\[\x01-\x7f])+)\])(?(angle)>)$ |
Description |
This accepts RFC 2822 email addresses in the form:<br>
[email protected] OR<br>
Blah < [email protected]><br>
<br>
RFC 2822 email 'mailbox':<br>
mailbox = name-addr | addr-spec<br>
name-addr = [display-name] "<" addr-spec ">"<br>
addr-spec = local-part "@" domain<br>
domain = rfc2821domain | rfc2821domain-literal<br>
<br>
local-part conforms to RFC 2822.<br>
<br>
domain is either:<br>
An rfc 2821 domain (EXCEPT that the final sub-domain must consist of 2 or more letters only).<br>
OR<br>
An rfc 2821 address-literal.<br>
(Note, no attempt is made to fully validate an IPv6 address-literal.)<br>
<br>
Notes:<br>
This pattern uses (.NET/Perl only?) features named group "(?<name>)" and alternation/IF (?(name)).<br>
<br>
See <a href="http://regexadvice.com/forums/permalink/26742/26742/ShowThread.aspx#26742">this regexadvice.com thread</a> for more info, including a version that does not use .NET features.<br>
<br>
RFC 2822 (and 822) do allow embedded comments, whitespace, and newlines within *some* parts of an email address, but this pattern above DOES NOT.<br>
<br>
RFC 2822 (and 822) allow the domain to be a simple domain with NO ".", but this pattern requires a compound domain at least one "." in the domain name, as per RFC 2821 (4.1.2).<br>
<br>
RFC 2822 allows/disallows certain whitespace characters in parts of an email address, such as TAB, CR, LF BUT the pattern above does NOT test for these, and assumes that they are not present in the string (on the basis that these characters are hard to enter into an edit box). |
Matches |
|
Non-Matches |
|
Author |
Rating:
Mark Cranness
|
Title |
Test
Details
Pattern Title
|
Expression |
href[ ]*=[ ]*('|\")([^\"'])*('|\") |
Description |
the regex's on this site for pulling links off a page always seemed to be faulty, or at least never worked with PHP, so i made this one. simple, as i'm an amateur with regex's, but stumbled thru it and this one actually works. tested with PHP function: preg_match_all("/href[ ]*=[ ]*('|\")([^\"'])*('|\")/",$string,$matches) |
Matches |
href="index.php" | href = 'http://www.dailymedication.com' | href = "irc://irc.junk |
Non-Matches |
href=http://www.dailymedication.com |
Author |
Rating:
Jason Paschal
|
Title |
Test
Details
Pattern Title
|
Expression |
<\s*a\s[^>]*\bhref\s*=\s*
('(?<url>[^']*)'|""(?<url>[^""]*)""|(?<url>\S*))[^>]*>
(?<body>(.|\s)*?)<\s*/a\s*> |
Description |
Suitable for extraction of all hyperlinks in the format:
<a ... href="..." ...> some text </a>
from a text document. Separates in groups the components of the links (url and body). |
Matches |
<a href="javascript:'window.close()'">close the window</a> | <a target=&quo |
Non-Matches |
<aa href="test.htm">test</a> | < a href hr = 'http://www.nakov.com'>...& |
Author |
Rating:
Svetlin Nakov
|
Title |
Test
Details
Pattern Title
|
Expression |
<a\s*href=(.*?)[\s|>] |
Description |
Retrieves all anchor links in a html document, useful for spidering. You will need to do a replace of " and ' after the regular expression, as the expression gets all links. As far as I know there is no way, even with \1 groupings, of getting a condition on whether the link contains a ",' or nothing at all (" and ' is easy enough, but what happens if the link starts with ", and has a javascript function call with a string in it). If there is, it's probably quicker to do it like this and do a string replace anyway. |
Matches |
<a href="http://www.blah.com"> | <a href='../blah.html' target="_top"&a |
Non-Matches |
<a href = http://www.idiothtmlprogrammers.com > |
Author |
Rating:
chris s
|
Title |
Test
Details
Pattern Title
|
Expression |
<a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a> |
Description |
This regex will extract the link and the link title for every a href in HTML source. Useful for crawling sites.
Note that this pattern will also allow for links that are spread over multiple lines. |
Matches |
<a href='http://www.regexlib.com'>Text</a> | <a href="...">Text</a> |
Non-Matches |
all other html tags |
Author |
Rating:
Not yet rated.
Jacek Sompel
|
Title |
Test
Details
Pattern Title
|
Expression |
(?<HTML><a[^>]*href\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>(?<Title>[^<]+|.*?)?</a\s*>) |
Description |
Powerful href extractor for HTML Element A.
Groups extracted result separately that you can easily use HTML Element, URI or its title.
These may be useful to:
(?<HTML><area[^>]*href\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>)
(?<HTML><form[^>]*action\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>)
(?<HTML><frame[^>]*scr\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>)
(?<HTML><iframe[^>]*scr\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>)
(?<HTML><link[^>]*href\s*=\s*[\"\']?(?<HRef>[^"'>\s]*)[\"\']?[^>]*>) |
Matches |
<a href='http://www.regexlib.com'>Text</a> | <a href="...'>Text</a> | & |
Non-Matches |
all other html tags |
Author |
Rating:
Not yet rated.
Aivar Holyfield
|
Title |
Test
Details
Pattern Title
|
Expression |
<a\s*.*?href\s*=\s*['"](?!http:\/\/).*?>(.*?)<\/a> |
Description |
Finds all local links, but doesnt match on external links.
Use replace with $1 to leave the link text but remove the link. |
Matches |
<a href='locallink.htm'>my local link</a> | <a title='click here' href="/a/local |
Non-Matches |
<a href='http://www.site.com/page.htm'>www.site.com</a> | <a href='http://www.site.co |
Author |
Rating:
Not yet rated.
james mountain
|
Title |
Test
Details
Pattern Title
|
Expression |
href\s*=\s*(?:(?:\"(?<url>[^\"]*)\")|(?<url>[^\s*] ))>(?<title>[^<]+)</\w> |
Description |
finds the url and url description for all links in a given text. |
Matches |
<td bgcolor="#ffffff" class="small">&nbsp;<A HREF=" http:// |
Non-Matches |
<td bgcolor="#ffffff" class="small">&nbsp;<A HREF http://www.thepla |
Author |
Rating:
Not yet rated.
Matt Bruce
|
Title |
Test
Details
Pattern Title
|
Expression |
((http\://|https\://|ftp\://)|(www.))+(([a-zA-Z0-9\.-]+\.[a-zA-Z]{2,4})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(/[a-zA-Z0-9%:/-_\?\.'~]*)? |
Description |
This RE matches the web links which begin http://, ftp://, https:// or www.
You can edit this disadvantage easy... |
Matches |
www.diskusneforum.sk | http://diskusneforum.sk | ftp://23.45.267.189/ |
Non-Matches |
diskusneforum.sk | localhost |
Author |
Rating:
Not yet rated.
Martin Ille
|
Title |
Test
Details
Pattern Title
|
Expression |
(((file|gopher|news|nntp|telnet|http|ftp|https|ftps|sftp)://)|(www\.))+(([a-zA-Z0-9\._-]+\.[a-zA-Z]{2,6})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(/[a-zA-Z0-9\&%_\./-~-]*)? |
Description |
You can use this regular expression in your PHP scripts to convert entered URL in text to URL link. Example:
$text=ereg_replace("(((file|gopher|news|nntp|telnet|http|ftp|https|ftps|sftp)://)|(www\.))+(([a-zA-Z0-9\._-]+\.[a-zA-Z]{2,6})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(/[a-zA-Z0-9\&%_\./-~-]*)?","<a href=\"./redir.php?url=\\0\" target=\"_blank\">\\0</a>",$text); |
Matches |
http://diskusneforum.sk | www.diskusneforum.sk | ftp://123.123.123.123/ |
Non-Matches |
diskusneforum.sk |
Author |
Rating:
Martin Ille
|
Displaying page
of
pages;
Items to