Title |
Test
Find
Pattern Title
|
Expression |
&\#x0*(0|1|2|3|4|5|6|7|8|B|C|E|F|10|11|12|13|14|15|16|17|18|19|1A|1B|1C|1D|1E|1F); |
Description |
Can be used to match on (and strip out) low-order non-printable ASCII characters (ASCII 0-31) from string data prior to adding to an XML document. Useful when using parsers like Microsoft's MSXML3 that strictly enforce W3C specification on allowable characters. Does not match on ASCII 9 (horiz tab), 10 (carriage return), 13 (line feed). |
Matches |
 |  |
Non-Matches |
  | � |
Author |
Rating:
Not yet rated.
Matt Skone
|
Source |
|
Your Rating |
|
Title: Alternative?
Name: J'son
Date: 9/29/2005 2:56:48 PM
Comment:
I'm using .NET and in order to get my XML text into the x0000 format you describe I had to encode the xml string using XmlConvert.EncodeName(rawXMLData) - this converts undesirables in an xml string like "jason smith" into "jason_x0020_smith" - and then I'm able to use your pattern (with a couple mods), remove the offending characters and then decode back to normal.
I've found that I can skip the encode, decode process altogether by using the \u hex switch, so I've modifed your pattern as follows:
\u0000|\u0001|\u0002|\u0003|\u0004|\u0005|\u0006|\u0007|\u0008|\u000B|\u000C|\u000E|\u000F|\u0010|\u0011|\u0012|\u0013|\u0014|\u0015|\u0016|\u0017|\u0018|\u0019|\u001A|\u001B|\u001C|\u001D|\u001E|\u001F
Works like a charm so far, but let me know if you see any holes - I'm new to hex, unicode, et. :P
Thanx! -- J'son