Title |
Test
Find
Pattern Title
|
Expression |
(?s)( class=\w+(?=([^<]*>)))|(<!--\[if.*?<!\[endif\]-->)|(<!\[if !\w+\]>)|(<!\[endif\]>)|(<o:p>[^<]*</o:p>)|(<span[^>]*>)|(</span>)|(font-family:[^>]*[;'])|(font-size:[^>]*[;'])(?-s) |
Description |
Word HTML cleanup code. Use this expression to get rid of most of the stuff that Word adds to an HTML document such as: lots of span elements, font-family and font-size style attributes, class attributes, a whole bunch of if-then statements. Use this expression in a regex.replace(originalHtml, regExpr, "").
|
Matches |
<span> |
Non-Matches |
<table> |
Author |
Rating:
Peter Donker
|
Source |
|
Your Rating |
|
Title: Nice Pattern
Name: Justin West
Date: 4/20/2005 1:59:26 PM
Comment:
you just saved me Hours
Title: Url field doesn't work?
Name: MikeG
Date: 1/2/2005 8:25:15 PM
Comment:
I guess the Url field doesn't work. Here's the Url for the app...
http://fresh.no-ip.org/include/downloads.html
Title: GoodOnYa
Name: MikeG
Date: 1/2/2005 8:22:36 PM
Comment:
Kudos and thanks for the work. I've encorporated part of your code into a c# app I wrote to parse out Word copy & paste snippets for repasting in html form posts (like blogs, forums, etc). Source code and exe is downloadable from the link above.
Thanks again and best regards!
Title: Nice pattern
Name: Darren Neimke
Date: 5/31/2004 7:53:02 AM
Comment:
Nice one!
For your interest you can also save patterns with embedded whitespace which can help to make them more readable (maybe). For example:
(?s)
( class=\w+(?=([^<]*>)))
|(<!--\[if.*?<!\[endif\]-->)
|(<!\[if !\w+\]>)
|(<!\[endif\]>)
|(<o:p>[^<]*</o:p>)
|(<span[^>]*>)
|(</span>)
|(font-family:[^>]*[;'])
|(font-size:[^>]*[;'])
(?-s)