Displaying page
of
pages;
Items to
Title |
Test
Details
Cleaning HTML
|
Expression |
<\/{0,1}(?!\/|b>|i>|p>|a\s|a>|br|em>|ol|li|strong>)[^>]*>
|
Description |
following a bit of work this morning trying to get something to strip out arbitrary html but leave 'known' tags in place, we have come up with the following which may be useful. This uses the 'negative lookahead' construct using '?!' It looks for an angle bracket and perhaps a backslash, as long as it is *not* followed by one of the terms in the ?! section. The brackets in this section do not return a value, they are part of the construct. This regexp can therefore be used to replace all unknown tags with blanks. Obviously you can add other 'good' html tags to the list.
|
Matches |
<table>...</table>
|
Non-Matches |
blah blah blah.
|
Author |
Rating:
Not yet rated.
Gordon Buxton
|
Title |
Test
Details
Cleaning HTML
|
Expression |
<\/{0,1}(?!\/|b>|i>|p>|a\s|a>|br|em>|ol|li|strong>)[^>]*>
|
Description |
following a bit of work this morning trying to get something to strip out arbitrary html but leave 'known' tags in place, we have come up with the following which may be useful. This uses the 'negative lookahead' construct using '?!' It looks for an angle bracket and perhaps a backslash, as long as it is *not* followed by one of the terms in the ?! section. The brackets in this section do not return a value, they are part of the construct. This regexp can therefore be used to replace all unknown tags with blanks. Obviously you can add other 'good' html tags to the list.
|
Matches |
<table>...</table>
|
Non-Matches |
blah blah blah.
|
Author |
Rating:
Not yet rated.
Gordon Buxton
|
Displaying page
of
pages;
Items to