RegExLib.com - The first Regular Expression Library on the Web!

Please support RegExLib Sponsors

Sponsors

Regular Expression Details

Title Test Find English Sentence Matching
Expression
\b(((["'/,&%\:\(\)\$\+\-\*\w\000-\032])|(-*\d+\.\d+[%]*))+[\s]+)+\b[\w"',%\(\)]+[.!?](['"\s]|$)
Description
Focused on scraping English sentences from HTML/Java (without having to parse). Correctly matches the vast majority of English sentences. There are undoubtedly a number of cases which do not match, but I felt they were oblique enough to be omitted. (Surely, the fellow that commented on this script had some sentences not match, but the example he describes does correctly match, and I provide it as the fourth example.) Cheers
Matches
This is an example. | "Matching sentence." | A 9.7% increase over the last 10+ years. | The vehicle has a 5.2 liter, four-wheel drive engine.
Non-Matches
Class.Function
Author Rating: The rating for this expression. Scotty
Source Myself
Your Rating
Bad Good

Enter New Comment

Title

Name

Comment

Spammers suck - we apologize. Please enter the text shown below to enable your comment (not case sensitive - try as many times as you need to if the first ones are too hard):

Existing User Comments

Title: PHP?
Name: Noni
Date: 9/8/2010 10:49:09 AM
Comment:
Any idea on how to get this to work in PHP with preg_match?


Title: Edited to include sentences with square brackets in them
Name: cam8001
Date: 4/28/2008 12:35:40 AM
Comment:
This choked on a sentence with square brackets in it, so I've edited it just a little to fix that. \b(((["'/,&%\[\]\:\(\)\$\+\-\*\w\000-\032])|(-*\d+\.\d+[%]*))+[\s]+)+\b[\w"',%\(\)]+[.!?](['"\s]|$)


Title: Edited to include sentences with square brackets in them
Name: cam8001
Date: 4/28/2008 12:35:08 AM
Comment:
This choked on a sentence with square brackets in it, so I've edited it just a little to fix that. \b(((["'/,&%\[\]\:\(\)\$\+\-\*\w\000-\032])|(-*\d+\.\d+[%]*))+[\s]+)+\b[\w"',%\(\)]+[.!?](['"\s]|$)


Title: Edited to include sentences with square brackets in them
Name: cam8001
Date: 4/28/2008 12:34:22 AM
Comment:
This choked on a sentence with square brackets in it, so I've edited it just a little to fix that. \b(((["'/,&%\[\]\:\(\)\$\+\-\*\w\000-\032])|(-*\d+\.\d+[%]*))+[\s]+)+\b[\w"',%\(\)]+[.!?](['"\s]|$)


Title: sentence slitter
Name: Rudolf Stammis (Alkmaar, The Netherlands)
Date: 9/23/2005 2:15:41 PM
Comment:
Nice and useful, but... when there is a number in a sentence it will sometimes have a dot (.) in the middle, and this regex also breaks such sentences. A workaround could be to glue these parts togehter again. (that is when a sentence starts with a numeral and the previous one ends in a numeral)


Copyright © 2001-2025, RegexAdvice.com | ASP.NET Tutorials