Results 1 to 2 of 2

Thread: How to check the similar words in two text files. Pls Help

  1. #1

    Thread Starter
    Member
    Join Date
    Jul 2003
    Posts
    53

    How to check the similar words in two text files. Pls Help

    How to check the similar words in two text files. Pls Help
    I have two text files named File1.txt and File2.txt. These files are having some english words. I just need to findout the similar words occuring in both the files using the Hash Table or any other method you have. Also, please check the maximum numbers of words occuring in both the Text files. Pls see the eg.

    example:

    File1.txt has the text:
    How are you. Hope you are fine, you sent me a book. I got it.

    File2.txt has the text:
    I am fine. What about you.

    Program output would be:
    Similar words in two text files:
    you
    fine
    I


    Maximum similar words occuring:
    you

    Please help me in this regard.

    Waiting for your response. Please don't disappoint me.

    Thank you.

  2. #2
    Frenzied Member moinkhan's Avatar
    Join Date
    Jun 2000
    Location
    Karachi, Pakistan
    Posts
    2,011
    import java.util.*;
    import java.io.*;

    public class CompareTwo
    {
    public static void main(String[] s) throws Exception
    {
    HashSet hs1 = new HashSet();
    HashSet hs2 = new HashSet();

    Hashtable ht1 = new Hashtable();
    Hashtable ht2 = new Hashtable();
    if (s.length!=2)
    throw new Exception("Syntax: java CompareTwo Filename1 Filename2");
    String file1 = s[0];
    String file2 = s[1];

    RandomAccessFile raf = new RandomAccessFile(file1,"r");
    System.out.println(""+raf.length());
    byte[] bfile1Data = new byte[(int)raf.length()];
    raf.read(bfile1Data);
    String sfile1Data = new String(bfile1Data);

    raf = new RandomAccessFile(file2,"r");
    byte[] bfile2Data = new byte[(int)raf.length()];
    raf.read(bfile2Data);
    String sfile2Data = new String(bfile2Data);

    StringTokenizer stFile1 = new StringTokenizer(sfile1Data," .\n\t\r");
    while(stFile1.hasMoreElements())
    {
    String word= stFile1.nextToken();
    hs1.add(word);
    Integer prevCount = (Integer)ht1.get(word);
    if(prevCount==null)
    ht1.put(word,new Integer(1));
    else
    ht1.put(word,new Integer(prevCount.intValue()+1));

    }

    StringTokenizer stFile2 = new StringTokenizer(sfile2Data," .\n\t\r");
    while(stFile2.hasMoreElements())
    {
    String word= stFile2.nextToken();
    hs2.add(word);
    Integer prevCount = (Integer)ht2.get(word);
    if(prevCount==null)
    ht2.put(word,new Integer(1));
    else
    ht2.put(word,new Integer(prevCount.intValue()+1));

    }


    hs1.retainAll(hs2);
    Iterator i = hs1.iterator();
    System.out.println("These are the common words in both files");
    while(i.hasNext())
    {
    System.out.println(i.next());
    }

    Enumeration e = ht1.keys();
    System.out.println("Repeating Word count in " + file1);
    while(e.hasMoreElements())
    {
    String key = (String)e.nextElement();
    System.out.println(key + ":\t\t" + ht1.get(key));
    }

    e = ht2.keys();
    System.out.println("Repeating Word count in " + file2);
    while(e.hasMoreElements())
    {
    String key = (String)e.nextElement();
    System.out.println(key + ":\t\t" + ht2.get(key));
    }

    }
    }



    [b]Okay okay...i know its complex.. but i cudn't keep it simpler than that...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width