|
-
Oct 9th, 2008, 10:50 AM
#1
Thread Starter
Frenzied Member
[RESOLVED] Split Paragraph into sentences
I have an app that has a paragraph(s) passed in, and I need to figure out how many sentences it has.
Right now, this is how I'm doing it:
Code:
private long getSentenceCount(String text){
String delim = "@@@@";
String delim2 = "####";
String tempText = text.replace(". ", delim);
tempText = tempText.replace(".\r\n", delim);
tempText = tempText.replace("! ", delim);
tempText = tempText.replace("!\r\n", delim);
tempText = tempText.replace("? ", delim);
tempText = tempText.replace("?\r\n", delim);
tempText = tempText.replace("\r\n", delim2);
String [] sentences = tempText.split(delim);
long sCnt = 0;
for(String s : sentences){
if(s.contains(delim2)){
String[] temp = s.split(delim2);
for(String t : temp){
if(textIsSentence(t) == true){ //textIsSentence checks that the string is not empty, that there are more than 4 words (arbitrary number for now) and the first letter is uppercasee
sCnt ++;
}
}
}else{
if(textIsSentence(s) == true){
sCnt ++;
}
}
}
return sCnt;
}
I'm wondering if there is a better way to do this. With regex prehaps. But I'm having trouble figuring out how to write the pattern.
What it needs to find is:
period, question mark or exclamation point, followed by either a space or a new line. Or just a new line.
Thanks
-
Oct 9th, 2008, 07:13 PM
#2
Re: Split Paragraph into sentences
I hope this helps
Code:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.regex.Pattern;
public class Test
{
private static final String lineSeparator = System.getProperty("line.separator");
public static void main(final String[] args)
{
final File f = new File("test.txt");
try
{
final String paragraph = Test.readFileString(f);
final Pattern p = Pattern.compile("[\\.\\!\\?]\\s+", Pattern.MULTILINE);
final int value = p.split(paragraph).length;
System.out.println("Number Of Sentences: " + value);
}
catch (final FileNotFoundException e)
{
System.err.println("File \"Test.txt\" Was not found");
}
}
private static String readFileString(final File file) throws FileNotFoundException
{
final Scanner scanner = new Scanner(file);
final StringBuilder sBuilder = new StringBuilder();
while (scanner.hasNextLine())
sBuilder.append(scanner.nextLine() + Test.lineSeparator);
return sBuilder.toString();
}
}
You might also want to take a look at class java.util.regex.Pattern
"I'm not normally a praying man, but if you're up there, save me... Superman!" - Homer Simpson
My Blog
-
Oct 10th, 2008, 10:21 AM
#3
Thread Starter
Frenzied Member
Re: [RESOLVED] Split Paragraph into sentences
Thanks, that worked. I knew it could be done with regex. For some reason I just can't grasp regex. It's usually pure luck if I can figure out the correct pattern to use, and that's always a simple pattern.
-
Oct 14th, 2008, 03:35 AM
#4
Re: [RESOLVED] Split Paragraph into sentences
Just bear in mind that you can only have per character selections as implemented by square bracket, e.g. [\\.\\!\\?] at char position you can have period, exclamation or question mark. You can't have character group selections, e.g. you want either aa or zz such as [(aa)(zz)]+ but that syntax is invalid. You'll need two patterns, one for aa and the other for zz.
-
Oct 14th, 2008, 05:22 AM
#5
Re: [RESOLVED] Split Paragraph into sentences
 Originally Posted by leinad31
Just bear in mind that you can only have per character selections as implemented by square bracket, e.g. [\\.\\!\\?] at char position you can have period, exclamation or question mark. You can't have character group selections, e.g. you want either aa or zz such as [(aa)(zz)]+ but that syntax is invalid. You'll need two patterns, one for aa and the other for zz.
Are you a software engineer or something??
"I'm not normally a praying man, but if you're up there, save me... Superman!" - Homer Simpson
My Blog
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|