RegExLib.com - The first Regular Expression Library on the Web!

Please support RegExLib Sponsors

Sponsors

Regular Expression Details

Title Test Find Pattern Title
Expression
<(?:[^"']+?|.+?(?:"|').*?(?:"|')?.*?)*?>
Description
This will match all tags in a string, it's good for stripping HTML or XML tags to get the plain text.It works with attributes that include javascript or "<>". It will match all these <hr size="3" noshade color="#000000" align="left"> <p style="margin-top:0px;margin-bottom:0px" align="center"><font face="Times New Roman" size="5"><b>UNITED STATES</b></font></p> <input type=button onclick='if(n.value>5)do_this();'> not this <br> <input type=button onclick="n>5?a():b();" value=test> not this <br> <input type=button onclick="n>5?a(\"OK\"):b('Not Ok');" value=test> not this <br> <input type=button onclick='n>5' value=test onmouseover="n<5&&n>8" onmouseout='if(n>5)alert(\'True\');else alert("False")'> not this <br>
Matches
<input type=button onclick='n>5' value=test onmouseover="n<5&&n>8" onm
Non-Matches
haven't found any exceptions yet
Author Rating: The rating for this expression. Toby Henderson
Source
Your Rating
Bad Good

Enter New Comment

Title
 
Name
 
Comment
 
Spammers suck - we apologize. Please enter the text shown below to enable your comment (not case sensitive - try as many times as you need to if the first ones are too hard):

Existing User Comments

Title: Memory Peak
Name: Dave S
Date: 1/25/2006 10:06:34 AM
Comment:
This regexp choked on a string containing the 'less-than' character as part of invalid HTML. As in: 1 is < 2. Everything following the &lt; character causes greedy validation and with a long string (748 characters long), this regular expression caused CPU usage to peak and remain at 100%. This problem happened consistently (i.e. EVERY TIME that string was passed through the regex. I tracked down the problem to THIS regex with a Microsoft Tech agent who studied the memory dump produced by Windows and IIS. The memory dump pointed to this line: isHTML = objRegExp.Test(str) This indicates that the .Test method (in VBScript) of the regular expression object would choke on the 748-character-long string containing the 'less-than' character. Obviously, in valid HTML that should be written as: 1 is &lt; 2. But many users don't know proper HTML entities. I've reverted to <[^>]+> for the time being.


Title: best one so
Name: manit chanthavong
Date: 11/3/2005 6:42:57 PM
Comment:
looked for RE to strip html tags from a document. This is the best one I've seen.


Title: Very good
Name: Simon Cann
Date: 10/3/2005 11:34:30 AM
Comment:
Well done for a great expression, it's just what I needed.


Title: RE:Half right, half wrong
Name: Toby Henderson
Date: 4/5/2005 6:11:32 AM
Comment:
Gideon you are correct as those are not valid html tags. But seeing that they are meant to be a tags, I would want them captured. I'm not testing for validity I just want to find every tag in document to do something with them.


Title: Half right, half wrong
Name: Gideon Engelberth
Date: 4/4/2005 11:27:53 AM
Comment:
This expression may not give false negatives (because it allows things inside tags), but it definately gives false positives. Two examples of matches that as far as I know should not match are: <tag attr="test> <tag attr="test'>


Copyright © 2001-2024, RegexAdvice.com | ASP.NET Tutorials