|
-
Aug 7th, 2004, 08:58 AM
#1
Thread Starter
Member
How to check the similar words in two text files. Pls Help
How to check the similar words in two text files. Pls Help
I have two text files named File1.txt and File2.txt. These files are having some english words. I just need to findout the similar words occuring in both the files using the Hash Table or any other method you have. Also, please check the maximum numbers of words occuring in both the Text files. Pls see the eg.
example:
File1.txt has the text:
How are you. Hope you are fine, you sent me a book. I got it.
File2.txt has the text:
I am fine. What about you.
Program output would be:
Similar words in two text files:
you
fine
I
Maximum similar words occuring:
you
Please help me in this regard.
Waiting for your response. Please don't disappoint me.
Thank you.
-
Aug 17th, 2004, 07:12 AM
#2
Frenzied Member
import java.util.*;
import java.io.*;
public class CompareTwo
{
public static void main(String[] s) throws Exception
{
HashSet hs1 = new HashSet();
HashSet hs2 = new HashSet();
Hashtable ht1 = new Hashtable();
Hashtable ht2 = new Hashtable();
if (s.length!=2)
throw new Exception("Syntax: java CompareTwo Filename1 Filename2");
String file1 = s[0];
String file2 = s[1];
RandomAccessFile raf = new RandomAccessFile(file1,"r");
System.out.println(""+raf.length());
byte[] bfile1Data = new byte[(int)raf.length()];
raf.read(bfile1Data);
String sfile1Data = new String(bfile1Data);
raf = new RandomAccessFile(file2,"r");
byte[] bfile2Data = new byte[(int)raf.length()];
raf.read(bfile2Data);
String sfile2Data = new String(bfile2Data);
StringTokenizer stFile1 = new StringTokenizer(sfile1Data," .\n\t\r");
while(stFile1.hasMoreElements())
{
String word= stFile1.nextToken();
hs1.add(word);
Integer prevCount = (Integer)ht1.get(word);
if(prevCount==null)
ht1.put(word,new Integer(1));
else
ht1.put(word,new Integer(prevCount.intValue()+1));
}
StringTokenizer stFile2 = new StringTokenizer(sfile2Data," .\n\t\r");
while(stFile2.hasMoreElements())
{
String word= stFile2.nextToken();
hs2.add(word);
Integer prevCount = (Integer)ht2.get(word);
if(prevCount==null)
ht2.put(word,new Integer(1));
else
ht2.put(word,new Integer(prevCount.intValue()+1));
}
hs1.retainAll(hs2);
Iterator i = hs1.iterator();
System.out.println("These are the common words in both files");
while(i.hasNext())
{
System.out.println(i.next());
}
Enumeration e = ht1.keys();
System.out.println("Repeating Word count in " + file1);
while(e.hasMoreElements())
{
String key = (String)e.nextElement();
System.out.println(key + ":\t\t" + ht1.get(key));
}
e = ht2.keys();
System.out.println("Repeating Word count in " + file2);
while(e.hasMoreElements())
{
String key = (String)e.nextElement();
System.out.println(key + ":\t\t" + ht2.get(key));
}
}
}
[b]Okay okay...i know its complex.. but i cudn't keep it simpler than that...
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|