Results 1 to 5 of 5

Thread: [RESOLVED] Split Paragraph into sentences

  1. #1

    Thread Starter
    Frenzied Member
    Join Date
    Jul 2005
    Posts
    1,521

    Resolved [RESOLVED] Split Paragraph into sentences

    I have an app that has a paragraph(s) passed in, and I need to figure out how many sentences it has.

    Right now, this is how I'm doing it:
    Code:
    	private long getSentenceCount(String text){
    		String delim = "@@@@";
    		String delim2 = "####";
    		String tempText = text.replace(". ", delim);
    		tempText = tempText.replace(".\r\n", delim);
    		tempText = tempText.replace("! ", delim);
    		tempText = tempText.replace("!\r\n", delim);
    		tempText = tempText.replace("? ", delim);
    		tempText = tempText.replace("?\r\n", delim);
    		tempText = tempText.replace("\r\n", delim2);
    		
    		String [] sentences = tempText.split(delim);
    		long sCnt = 0;
    		for(String s : sentences){
    			if(s.contains(delim2)){
    				String[] temp = s.split(delim2);
    				for(String t : temp){
    					if(textIsSentence(t) == true){ //textIsSentence checks that the string is not empty, that there are more than 4 words (arbitrary number for now) and the first letter is uppercasee
    						sCnt ++;
    					}
    				}
    			}else{
    				if(textIsSentence(s) == true){
    					sCnt ++;
    				}
    			}
    		}		
    		return sCnt;
    	}
    I'm wondering if there is a better way to do this. With regex prehaps. But I'm having trouble figuring out how to write the pattern.

    What it needs to find is:
    period, question mark or exclamation point, followed by either a space or a new line. Or just a new line.

    Thanks
    Visual Studio Team Edition 2005
    GDI+ Links: Bob Powell VB.Net Heaven
    API Links: All API Pinvoke.Net
    VB6 to VB.Net: Visual Basic 6 to .NET Function Equivalents (Thread)

  2. #2
    Arabic Poster ComputerJy's Avatar
    Join Date
    Nov 2005
    Location
    Happily misplaced
    Posts
    2,513

    Re: Split Paragraph into sentences

    I hope this helps
    Code:
    import java.io.File;
    import java.io.FileNotFoundException;
    import java.util.Scanner;
    import java.util.regex.Pattern;
    
    public class Test
    {
    	private static final String lineSeparator = System.getProperty("line.separator");
    
    	public static void main(final String[] args)
    	{
    		final File f = new File("test.txt");
    		try
    		{
    			final String paragraph = Test.readFileString(f);
    			final Pattern p = Pattern.compile("[\\.\\!\\?]\\s+", Pattern.MULTILINE);
    			final int value = p.split(paragraph).length;
    			System.out.println("Number Of Sentences: " + value);
    		}
    		catch (final FileNotFoundException e)
    		{
    			System.err.println("File \"Test.txt\" Was not found");
    		}
    
    	}
    
    	private static String readFileString(final File file) throws FileNotFoundException
    	{
    		final Scanner scanner = new Scanner(file);
    		final StringBuilder sBuilder = new StringBuilder();
    		while (scanner.hasNextLine())
    			sBuilder.append(scanner.nextLine() + Test.lineSeparator);
    		return sBuilder.toString();
    	}
    }
    You might also want to take a look at class java.util.regex.Pattern
    "I'm not normally a praying man, but if you're up there, save me... Superman!" - Homer Simpson
    My Blog

  3. #3

    Thread Starter
    Frenzied Member
    Join Date
    Jul 2005
    Posts
    1,521

    Re: [RESOLVED] Split Paragraph into sentences

    Thanks, that worked. I knew it could be done with regex. For some reason I just can't grasp regex. It's usually pure luck if I can figure out the correct pattern to use, and that's always a simple pattern.
    Visual Studio Team Edition 2005
    GDI+ Links: Bob Powell VB.Net Heaven
    API Links: All API Pinvoke.Net
    VB6 to VB.Net: Visual Basic 6 to .NET Function Equivalents (Thread)

  4. #4
    PowerPoster
    Join Date
    Nov 2002
    Location
    Manila
    Posts
    7,629

    Re: [RESOLVED] Split Paragraph into sentences

    Just bear in mind that you can only have per character selections as implemented by square bracket, e.g. [\\.\\!\\?] at char position you can have period, exclamation or question mark. You can't have character group selections, e.g. you want either aa or zz such as [(aa)(zz)]+ but that syntax is invalid. You'll need two patterns, one for aa and the other for zz.

  5. #5
    Arabic Poster ComputerJy's Avatar
    Join Date
    Nov 2005
    Location
    Happily misplaced
    Posts
    2,513

    Re: [RESOLVED] Split Paragraph into sentences

    Quote Originally Posted by leinad31
    Just bear in mind that you can only have per character selections as implemented by square bracket, e.g. [\\.\\!\\?] at char position you can have period, exclamation or question mark. You can't have character group selections, e.g. you want either aa or zz such as [(aa)(zz)]+ but that syntax is invalid. You'll need two patterns, one for aa and the other for zz.
    Are you a software engineer or something??
    "I'm not normally a praying man, but if you're up there, save me... Superman!" - Homer Simpson
    My Blog

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width