- The first Regular Expression Library on the Web!

Please support RegExLib Sponsors


Regular Expression Details

Title Test Find Pattern Title
(?s)( class=\w+(?=([^<]*>)))|(<!--\[if.*?<!\[endif\]-->)|(<!\[if !\w+\]>)|(<!\[endif\]>)|(<o:p>[^<]*</o:p>)|(<span[^>]*>)|(</span>)|(font-family:[^>]*[;'])|(font-size:[^>]*[;'])(?-s)
Word HTML cleanup code. Use this expression to get rid of most of the stuff that Word adds to an HTML document such as: lots of span elements, font-family and font-size style attributes, class attributes, a whole bunch of if-then statements. Use this expression in a regex.replace(originalHtml, regExpr, "").
Author Rating: The rating for this expression. Peter Donker
Your Rating
Bad Good

Enter New Comment

Spammers suck - we apologize. Please enter the text shown below to enable your comment (not case sensitive - try as many times as you need to if the first ones are too hard):

Existing User Comments

Title: Nice Pattern
Name: Justin West
Date: 4/20/2005 1:59:26 PM
you just saved me Hours

Title: Url field doesn't work?
Name: MikeG
Date: 1/2/2005 8:25:15 PM
I guess the Url field doesn't work. Here's the Url for the app...

Title: GoodOnYa
Name: MikeG
Date: 1/2/2005 8:22:36 PM
Kudos and thanks for the work. I've encorporated part of your code into a c# app I wrote to parse out Word copy & paste snippets for repasting in html form posts (like blogs, forums, etc). Source code and exe is downloadable from the link above. Thanks again and best regards!

Title: Nice pattern
Name: Darren Neimke
Date: 5/31/2004 7:53:02 AM
Nice one! For your interest you can also save patterns with embedded whitespace which can help to make them more readable (maybe). For example: (?s) ( class=\w+(?=([^<]*>))) |(<!--\[if.*?<!\[endif\]-->) |(<!\[if !\w+\]>) |(<!\[endif\]>) |(<o:p>[^<]*</o:p>) |(<span[^>]*>) |(</span>) |(font-family:[^>]*[;']) |(font-size:[^>]*[;']) (?-s)

Copyright © 2001-2021, | ASP.NET Tutorials