|
-
Dec 2nd, 2002, 01:50 PM
#1
Thread Starter
Registered User
Suggestions for speeding up my program? (RESOLVED)
I have a program that is running extremely slow. I need some suggestions on how to speed it up. While it will probably run somewhat slow because of the number of records, I fear that I may not be using the fastest possible logic/code. The program basically consists of 2 tables in an Access to Informix database link and two pipe delimited text files. The text files have roughly 23000 lines in them a piece and the tables have roughly 30000 records in them a piece. Here is what the program does:
1. Reads the first record of table 1 in.
2. Tries to find a matching record via primary key in table 2.
3. Tries to locate the records primary key in file 1.
4. If not found in file 1, tries to find it in file 2.
5. If it isn't in either file, deletes the record from the database.
6. Moves onto the next record in table 1.
I use a few for/next loops and I try to break out of any loop whenever I can. It takes roughly 3 seconds per record, and I currently only look in file 1! At 30000 records, that's 25 hours to complete the process. Doubling that by adding in the search for file 2. That's 50 hours. That's a long time!
I think that the slow part is finding the records primary key in the files. I am reading it in line by line. I am hoping there is some quicker way to search for a string in a text file and just return whether or not it is there.
Thanks for any help. If requested, I will post the code if it helps. Thanks again!
shootsnlad
Last edited by shootsnlad; Dec 10th, 2002 at 11:12 AM.
-
Dec 2nd, 2002, 02:00 PM
#2
What does your code look like right now?
Are you reading the files in once and then searching through them? What are you reading them into? What is the relation between the keys?
You could read the data into a dataset and then use Select methods to filter out certain matching rows.
-
Dec 2nd, 2002, 02:30 PM
#3
Thread Starter
Registered User
the code
Here is the code:
Code:
Dim foundcity, foundcounty As Boolean
Dim textline,newparcel As String
Dim rowcount, rowcount2, i, z As Integer
Dim dsparcel As New DataSet()
Dim dsparcelowner As New DataSet()
DaParcel.Fill(dsparcel)
DaParcelOwner.Fill(dsparcelowner)
Dim dtparcel As DataTable = dsparcel.Tables(0)
Dim dtparcelowner As DataTable = dsparcelowner.Tables(0)
rowcount = dtparcelowner.Rows.Count
rowcount2 = dtparcel.Rows.Count
For i = 0 To rowcount - 1
For z = 0 To rowcount2 - 1
If dtparcelowner.Rows(i).Item("parcel_no") = dtparcel.Rows(z).Item("parcel-number") Then
FileOpen(1, "cn-ci.txt", OpenMode.Binary, OpenAccess.Read, OpenShare.Shared)
FileOpen(2, "cn-co.txt", OpenMode.Binary, OpenAccess.Read, OpenShare.Shared)
While Not EOF(1)
textline = LineInput(1)
Dim fieldarray() As String = textline.Split("|")
newparcel = Mid(fieldarray(9), 1, 2) + "." + Mid(fieldarray(9), 3, 2) + "." + Mid(fieldarray(9), 5, 3) + "." + Mid(fieldarray(9), 8, 3)
If dtparcelowner.Rows(i).Item("parcel_no") = newparcel Then
foundcity = True
Exit While
End If
End While
While Not EOF(2)
textline = LineInput(2)
Dim fieldarray() As String = textline.Split("|")
newparcel = Mid(fieldarray(9), 1, 2) + "." + Mid(fieldarray(9), 3, 2) + "." + Mid(fieldarray(9), 5, 3) + "." + Mid(fieldarray(9), 8, 3)
If dtparcelowner.Rows(i).Item("parcel_no") = fieldarray(9) Then
foundcounty = True
End If
End While
If foundcity = True Or foundcounty = True Then
Else
dtparcelowner.Rows(i).Delete()
dtparcelowner.Rows(z).Delete()
foundcity = False
foundcounty = False
End If
FileClose(1)
FileClose(2)
z = rowcount2 - 1
End If
Next z
Next i
-
Dec 2nd, 2002, 03:37 PM
#4
Opening and reading the text files is probably what is so slow. Try reading these:
http://www.c-sharpcorner.com/databas...ctODBCText.asp
http://msdn.microsoft.com/library/de...DataSource.asp
http://www.aspalliance.com/ericm/articles/textdb.asp
http://msdn.microsoft.com/library/de...a_ini_file.asp
By converting the text to a datasource you can run normal queries on it to find a match and that should speed things up dramatically.
-
Dec 2nd, 2002, 04:15 PM
#5
Well I had read but not tested those links so I decided to test them and I couldn't get it to work. But I figured it out, after I made the DSN I downloaded the .NET ODBC provider and then everything worked peachy. MS has the ODBC provider here. After that the ODBC objects are the same as the SQL and OLEDB counterparts and you can fill a new table in your dataset from the textfiles.
-
Dec 2nd, 2002, 04:39 PM
#6
Member
The reason it's so slow is definitely because of the way you're doing file handling. You need to use Steams.
Here's a tutorial that should help clear things up;
http://www.elitevb.com/content/01,0072,01/01.aspx
-
Dec 3rd, 2002, 12:11 PM
#7
Thread Starter
Registered User
follow-up
It looks like I have two options, connecting through ADO.NET to the text file or streams. I have a couple of follow-up questions/problems.
1. If I want to use ADO.NET to connect to a textfile, I would have to create a schema.ini file. That would be a lot of work. This file has roughly 100 columns in it that vary in size from line to line, and you need to tell ADO.NET exactly what the file looks like. Probably not possible.
2. If I add a DSN to my machine, I would have to add it to any other machine that this may run on. (Not a big deal, just wanted to verify).
3. The other option, streams, is something I don't think I still fully understand. I've read through the tutorial and I'm still confused. I believe I can setup the streamreader for what I need, but how do I use it. Do I use it to continously read in the whole file into a variable? How does it work?
Thanks for the help! I'm getting closer.
shootsnlad
-
Dec 3rd, 2002, 12:20 PM
#8
Regarding question 1, when you create the DSN it creates the schema.ini for you. Also there is a 'Guess' button when making the DSN that fills in the column info for you. If there is a decent structure to the file it should work fine after you tell it that you are using the custom delimiter | .
Question 2, I believe you can make the DSN a file DSN and then just include it with your app (and the schema.ini file which goes in the same folder as the text file(s)).
Question 3 if you play around a bit with streams you should catch on quickly they are fairly use to pick up. Although if you choose the stream method then you'll still be reading through the file on every line of the first table, whereas if you use it as a datasource you'd read it once and then just filter the data to match.
-
Dec 3rd, 2002, 01:53 PM
#9
Thread Starter
Registered User
Well I definately believe I am making progress and I understand what I need to do. The "Guess" button doesn't work for me, for some reason. I put in that it is custom delimited with a | for a delimeter. I check the "Column name header" box and I hit the "Guess" button and get three columns. The first column is the first row of my file (with all the column names in it, separated by a | symbil) and the second and third column is F1 and F2. Without the guess working, I'm in trouble. Any ideas? I've actually tried numerous things with the file, as well as making it just 2 rows (the column names and 1 row of data).
Thanks again.
shootsnlad
-
Dec 3rd, 2002, 02:13 PM
#10
Is the first row of the file the field names? Or do you have an example data text file you can post that I can take a crack at?
It worked for me but I didn't have fieldnames as the first row. Here is what my data looked like:
Ed|Marquez|15420 Bello|1|54
Cookie|Marquez|15420 Bello|2|31
John|Doe|4100 Latham|3|27
Jane|Doe|4100 Latham|4|84
And it filled in the right data types on the right columns and named them F1,F2,F3... which I renamed. I think the data would need to be:
FirstName|LastName|Address|ID|No
Ed|Marquez|15420 Bello|1|54
Cookie|Marquez|15420 Bello|2|31
John|Doe|4100 Latham|3|27
Jane|Doe|4100 Latham|4|84
For the Column name header option to work.
-
Dec 3rd, 2002, 02:34 PM
#11
Thread Starter
Registered User
I did a copy and paste of your test data (the one without the column names) into a text file. I then tried to create it using that test file. The same thing is happening that happened with my file. I have attached a JPG of what my screen looks like after I hit the "Guess" button. Any ideas?
Thanks again for the prompt replies. I appreciate it.
-
Dec 3rd, 2002, 03:13 PM
#12
I dunno here is a screen of what I get:
What version of windows are you using? Can you send either the text file or a sample of it so I can try it at my end?
-
Dec 3rd, 2002, 03:42 PM
#13
Thread Starter
Registered User
solved!
I figured it out. The reason why it was doing that was because the file was located on a network drive rather than my local drive. It works on my local drive, but messes up on the network. I will continue on. Hopefully, by following the links you posted I should be able to wrap it up. I'll be back if I have some more trouble. Thanks again!
shootsnlad
-
Dec 4th, 2002, 03:06 PM
#14
Thread Starter
Registered User
no luck
Well, while the idea of ODBCing to a file seemed like a good idea, I have had no luck with it. I created a file DSN to my pipe delimited file. I was able to get the schema.ini created fine. When I am in VB.NET and I go to make a new connection to it, I chose the Microsoft OLE DB Provider for ODBC Drivers, and I then try to build a string. I then select my DSN file that I created I get a message saying (Unable to connect to database. It is only possible to connect to SQL Server Desktop Engine databases and Microsoft Access databases with this version of Visual Studio)
Any ideas?
Thanks...
shootsnlad
-
Dec 4th, 2002, 04:30 PM
#15
You can't use the OLEDB data provider you have to download the .NEt ODBC Data Provider (which works the same way as OLEDB). I had the same problem at first, I guess in the Beta you could use the OLEDB drivers to connect to a DSN but now you need the ODBC ones.
http://msdn.microsoft.com/downloads/...mpositedoc.xml
-
Dec 5th, 2002, 03:29 PM
#16
Thread Starter
Registered User
I have the ODBC .NET data provider installed. I also added the ODBCConnection and such icons onto my toolbar. Now, what do I need to do? Usually I can generate a connection string to be able to connect to the file, but I can't.
shootsnlad
Last edited by shootsnlad; Dec 5th, 2002 at 03:38 PM.
-
Dec 5th, 2002, 03:38 PM
#17
Its not under the system.data namespace its under the microsoft.data namespace, microsoft.data.odbc to be exact. That threw me off too.
-
Dec 5th, 2002, 03:41 PM
#18
Thread Starter
Registered User
See above edited post.. (your too quick) :-)
-
Dec 5th, 2002, 04:05 PM
#19
The connection string is just:
"DSN=NameOfDSNHere"
One other thing that is wierd instead of a table name you use the filename, including extension.
"SELECT * FROM pipes.txt"
-
Dec 5th, 2002, 04:08 PM
#20
This is the code from my test:
VB Code:
Dim cnn As New OdbcConnection("DSN=PipeTest")
Dim adp As New OdbcDataAdapter("SELECT * FROM pipes.txt", cnn)
Dim ds As New DataSet()
cnn.Open()
adp.Fill(ds, "Test")
cnn.Close()
TextBox1.Text = ds.Tables("Test").Rows(0).Item("FirstName")
Dim dr() As DataRow = ds.Tables("Test").Select(TextBox2.Text)
DataGrid1.DataSource = dr
I used TextBox1 just to make sure that everything loaded fine.
TextBox2 is a Filter string that I enter at runtime the results of which then are shown in the datagrid. An example would be: "FirstName Like e*"
-
Dec 6th, 2002, 11:28 AM
#21
Thread Starter
Registered User
1 question. Where do you get "test" from? That is referencing a table name. But in a file, there really is no table name. Mine is erroring out on that statement, with a "system error". Thanks!
shootsnlad
PS: The adp.fill line is the one I'm referrring to.
-
Dec 6th, 2002, 11:32 AM
#22
Actually it is more like a key for the table in the dataset's table collection. It doesn't have to be an actual table name.
What is the error you are getting? Or what is more of the StackTrace?
-
Dec 6th, 2002, 11:41 AM
#23
Thread Starter
Registered User
Here is the error on I am getting on that line.
An unhandled exception of type 'Microsoft.Data.Odbc.OdbcException' occurred in microsoft.data.odbc.dll
Additional information: System error.
-
Dec 6th, 2002, 06:31 PM
#24
I'm not sure can you post the code? Worked fine for me, I can post the sample project if you'd like, just let me know.
-
Dec 9th, 2002, 10:42 AM
#25
Thread Starter
Registered User
Here is the code:
Code:
Dim citycon As New OdbcConnection("DSN=City")
Dim cityadp As New OdbcDataAdapter("SELECT * FROM CN-CI.txt", citycon)
Dim cityds As New DataSet()
citycon.Open()
cityadp.Fill(cityds)
citycon.Close()
Dim foundcity, foundcounty As Boolean
Dim rowcount, rowcount2, i, z As Integer
Dim dsparcel As New DataSet()
Dim dsparcelowner As New DataSet()
DaParcel.Fill(dsparcel)
DaParcelOwner.Fill(dsparcelowner)
Dim dtparcel As DataTable = dsparcel.Tables(0)
Dim dtparcelowner As DataTable = dsparcelowner.Tables(0)
rowcount = dtparcelowner.Rows.Count
rowcount2 = dtparcel.Rows.Count
For i = 0 To rowcount - 1
For z = 0 To rowcount2 - 1
If dtparcelowner.Rows(i).Item("parcel_no") = dtparcel.Rows(z).Item("parcel-number") Then
Dim tablecmd As String
Dim cmdresults As Integer
Dim cmd As New Microsoft.Data.Odbc.OdbcCommand()
tablecmd = "SELECT * FROM cn-ci.txt WHERE PARCEL = " + dtparcelowner.Rows(i).Item("parcel_no")
cmd.CommandText = tablecmd
citycon.Open()
cmdresults = cmd.ExecuteNonQuery()
citycon.Close()
MessageBox.Show("After creating the table, results = " & cmdresults.ToString)
If foundcity = True Or foundcounty = True Then
Else
dtparcelowner.Rows(i).Delete()
dtparcelowner.Rows(z).Delete()
foundcity = False
foundcounty = False
End If
z = rowcount2 - 1
End If
Next z
Next i
There is some logic I still need to put in there, but the code is there. It errors out on the cityadp.fill line (#5). I have attached a bit of my file to see if you can get it working. Thanks again!
-
Dec 9th, 2002, 11:24 AM
#26
Ok I found the trouble, its the - in the filename which becomes the table name. For some reason it doesn't like it, but if you take it out then all is well. So just change the name of the file to: CNCI.txt and make that change in either the DSN or the schema file (which is what changing it in the DSN will do) and then everything works fine. Of course I don't have the parcel data so I couldn't test your code but I got the same error until I made those changes and after that I could query the text file.
-
Dec 9th, 2002, 11:53 AM
#27
Also on the other use of it I think this may be closer to what you are trying to do:
VB Code:
Dim cmd As New OdbcCommand("SELECT Count(*) FROM cnci.txt WHERE PARCEL = " & TextBox2.Text, cnn)
Dim result As Integer = cmd.ExecuteScalar
MsgBox("Record(s) found " & result)
cnn.Close()
Although since you already have all the data in a dataset its about 4 times faster to filter the dataset instead of running a new query.
VB Code:
Dim result() as Datarow=cityds.Tables(0).Select("PARCEL = " & TextBox2.Text, cnn)
MsgBox("Record(s) found " & result.Length)
Running a new query took .4 of a second but running a filter on the dataset took .14 of a second. Which I also left the connection open from filling the dataset and normally you'd have to reopen it which will take longer, but that is not needed with the filter.
-
Dec 9th, 2002, 04:37 PM
#28
Thread Starter
Registered User
Well, everything appears to be running as it should (and A LOT faster I might add). Now it's actually runable. I think that should take care of everything. I'll let you know if I have anymore problems, but everything looks good so far. Thanks again for all your help! I appreciate it.
shootsnlad
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|