|
-
Oct 22nd, 2005, 01:33 PM
#1
Thread Starter
Hyperactive Member
Extensive Operating System and Database Corruption
Greetings,
I have been having a problem that has been plaguing me off and on for over a year now. I am seeing extensive operating system corruption, and recently, database corruption.
I have been in contact with Microsoft and other vendors and have been unable to resolve the problem.
The symptoms are as follows:
*) File system becomes corrupted. All types of file, including system dll's are deleted, or their contents are replaced with ASCII characters like !@#$ New (empty) directories are being created with the same ASCII characters. This corruption will extend to files on an FTP site, if that FTP site is open in Internet Explorer, and to mounted drives if they are mounted to the client that becomes corrupted. Once the corruption occurs the system will not boot again if it goes down. The system will stay up, but gets increasingly unstable as the corruption becomes more extensive.
*) Database records get corrupted (on a separate server. SQL Server 2000 Enterprise Edition). This one is very strange. The database itself is fine. The records themselves are fine, but a single field called WorksetId is getting corrupted. The value of this field is being changed to the name of a running application, some ASCII characters, and a running number.
Example: I run an application called "ftpSweeper". 2 minutes later, 200,000 records in the database have had their WorksetId set to !#errftpSweeper_1 through !#errftpSweeper_200000. This particular application (ftpSweeper) has NO database access in it. It doesn't have any ADO or RDO libraries linked, it makes absolutely NO calls to a database whatsoever.
However, there are other applications (built by me) that do have database access. However, none of them are named ftpSweeper or are even aware of ftpSweepers existence. There is no tie in between that application and other applications that do database access. The value being inserted is too much of a coincidence. But I, for the life of me, can not figure out how every single record in a table is being updated with that value.
The frustrating thing about the situation is that none of these problems occur during development and testing. They only occur when rolled out to the production environment.
This has all the hallmarks of a virus. However, no virus has been detected. I have scanned my development machine with Symantec, AVG, ActiveScan from Pandasoft, and Sophos. The deployed environment is under the protection of Symantec corporate anti-virus.
There is obviously something very seriously wrong somewhere. I have had a number of engineers go through my code, both inside my company, and outside my company. My code has even been reviewed by Microsoft. Nobody can find anything wrong with it. I feel it's safe to say that there is nothing wrong with the actual code itself.
Which leads me to believe there is a problem with the environment or libraries somewhere.
What could possibly cause a situation like this? Bad libraries? Too many ADO connections? Memory leaks?
What can I do to further diagnose this problem?
I am using VB6 for the majority of the development. VB.NET is used in probably 3% of the applications, and a third party tool is being used for access to a 3270 terminal.
-
Oct 22nd, 2005, 02:11 PM
#2
Re: Extensive Operating System and Database Corruption
Wow - that stinks - I cannot imagine what would cause that.
The SQL server can be set in more of a trace mode - not sure of the terminology - but you can set it to log much more of the access to it from the network - maybe that will clue you in on what's going on. It can be talked to from several different avenues - TCPIP, named pipes - maybe turn off as many of these as you can but still allow access to your app's...
Is the FTP site open to the outside world? Do you have a hardware firewall and all ports tightly closed down?
Sounds almost haunted to me...
-
Oct 22nd, 2005, 02:15 PM
#3
Re: Extensive Operating System and Database Corruption
Seems to be a virus activity...
Pradeep
-
Oct 22nd, 2005, 02:19 PM
#4
Re: Extensive Operating System and Database Corruption
umilmi81,
You have checked for viruses, Have you checked for trojans?
-
Oct 22nd, 2005, 02:21 PM
#5
Thread Starter
Hyperactive Member
Re: Extensive Operating System and Database Corruption
My latest theory is that there is some problem with libraries.
I currently link all my DLL's through early binding. I'm going to change the code to do late binding. I'm seeing that some DLL's, like the scrrun.dll, and the ADO DLL's have different create dates.
The DLL's on the production system are dated 2001, but my DLL's are dating 2004.
I'm flying out on monday so I can actually be in the production shop and do more hands on analysis. But my understanding was that if the libraries were incompatible, the application wouldn't finish loading, so I'm not hopeful that I'm working on the correct avenue.
-
Oct 22nd, 2005, 02:23 PM
#6
Thread Starter
Hyperactive Member
Re: Extensive Operating System and Database Corruption
 Originally Posted by randem
umilmi81,
You have checked for viruses, Have you checked for trojans?
I'd assume the virus scan packages would detect trojans. wouldn't they?
-
Oct 22nd, 2005, 02:23 PM
#7
Re: Extensive Operating System and Database Corruption
 Originally Posted by umilmi81
*) Database records get corrupted (on a separate server. SQL Server 2000 Enterprise Edition). This one is very strange. The database itself is fine. The records themselves are fine, but a single field called WorksetId is getting corrupted. The value of this field is being changed to the name of a running application, some ASCII characters, and a running number.
Example: I run an application called "ftpSweeper". 2 minutes later, 200,000 records in the database have had their WorksetId set to !#errftpSweeper_1 through !#errftpSweeper_200000. This particular application (ftpSweeper) has NO database access in it. It doesn't have any ADO or RDO libraries linked, it makes absolutely NO calls to a database whatsoever.
Find out how this column is being changed - put on whatever trace capabilities that the SQL Server has - and pinpoint this update to data.
I cannot imagine a worm/trogan so specific that it's affecting 200000 rows in a table - disk/data corruption could not do that - it would have to be through a SQL connection with query executed.
This one really sounds like a bug in your software - so pinpointing this one will be a major step forward.
-
Oct 22nd, 2005, 02:28 PM
#8
Re: Extensive Operating System and Database Corruption
 Originally Posted by szlamany
Find out how this column is being changed - put on whatever trace capabilities that the SQL Server has - and pinpoint this update to data.
I cannot imagine a worm/trogan so specific that it's affecting 200000 rows in a table - disk/data corruption could not do that - it would have to be through a SQL connection with query executed.
This one really sounds like a bug in your software - so pinpointing this one will be a major step forward.
I don't think so. As he already said, there's no link of this application with teh database whatsoever.
The problem seem to be something else maybe we are not able to understand.
Pradeep
-
Oct 22nd, 2005, 02:28 PM
#9
Thread Starter
Hyperactive Member
Re: Extensive Operating System and Database Corruption
 Originally Posted by szlamany
Find out how this column is being changed - put on whatever trace capabilities that the SQL Server has - and pinpoint this update to data.
I cannot imagine a worm/trogan so specific that it's affecting 200000 rows in a table - disk/data corruption could not do that - it would have to be through a SQL connection with query executed.
This one really sounds like a bug in your software - so pinpointing this one will be a major step forward.
Yes, I agree. It does look like a bug in the software level, but many eyes have been on that code, and nobody can find it. It's not that complicated to begin with.
What I'm afraid of is the database access methods I'm using work fine in a small scale, but get unstable on much larger scales.
Example, I have a DLL that all applications use. That DLL does a bunch of database stuff. I have no problems when running the applications on my development enviornment. But when there are 10 or more applications on the same machine that all use the same dll, that are all making 4 or 5 ADO objects each, will that cuase a problem?
Has anyone had any strange behavior with a LOT of ADO Connection objects being created in each application, and then several copies of that application being run on the same machine?
-
Oct 22nd, 2005, 02:29 PM
#10
Re: Extensive Operating System and Database Corruption
When you launch your application, maybe some process hooks to your app and does this.. Just a guess..
Pradeep
-
Oct 22nd, 2005, 02:31 PM
#11
Re: Extensive Operating System and Database Corruption
umilmi81,
No, Virus programs do not check for trojans very well. They act totally different. Think of a trojan as an old DOS TSR program. It just sits and waits for something to happen. It could intercept and redirect calls etc...
You could also be in dll hell. How are you deploying this app and files?
-
Oct 22nd, 2005, 02:31 PM
#12
Re: Extensive Operating System and Database Corruption
 Originally Posted by Pradeep1210
I don't think so. As he already said, there's no link of this application with teh database whatsoever.
The problem seem to be something else maybe we are not able to understand.
Pradeep
SQL being a write-ahead logging system is not going to get 200000 rows updated with sequentially assigned values without a query being executed - unless someone has written a very specific SQL-trogan.
The data isn't sitting in a notepad-like text file for assault.
-
Oct 22nd, 2005, 02:32 PM
#13
Re: Extensive Operating System and Database Corruption
Put a trigger on the update of this table and write to a "log" table the system_user value of who is affecting the data.
-
Oct 22nd, 2005, 02:37 PM
#14
Thread Starter
Hyperactive Member
Re: Extensive Operating System and Database Corruption
 Originally Posted by randem
umilmi81,
No, Virus programs do not check for trojans very well. They act totally different. Think of a trojan as an old DOS TSR program. It just sits and waits for something to happen. It could intercept and redirect calls etc...
You could also be in dll hell. How are you deploying this app and files?
Each of your points are things I'm afraid of. The decision was made against my will to deploy Windows 2000 SP4 instead of Windows XP SP2. Our IT guys are a little on the slow side, and feel that Win2K is "more secure" than XP.
Also, the image they use to deploy is aweful. It has a bunch of anti-spyware tools like adaware and spybot. It also has the google tool bar and a bunch of crap I wish they wouldn't put on.
I usually deploy by just copying the EXE, and then manually copying and registering any DLL's that are missing from the system. Usually MSCOMCTL.OCX and MSCOMCT2.OCX.
However, my action items on Monday, are to:
A) Rebuild all applications to use late binding (CreateObject instead of references in the IDE)
B) Create Package installers for all apps
C) Use the "depends.exe" tool from the System Tools on the Windows disk to check the signatures of all DLL's for all applications.
-
Oct 22nd, 2005, 02:39 PM
#15
Thread Starter
Hyperactive Member
Re: Extensive Operating System and Database Corruption
 Originally Posted by szlamany
Put a trigger on the update of this table and write to a "log" table the system_user value of who is affecting the data.
We have a log of when the corruption occures. However my DBA is telling me he can't view the log without the purchase of a $300 software package. I've gotten the approval for that package, but the guy isn't too worried about fixing MY problem.
I need to light a fire under him to get him to understand how important this is.
-
Oct 22nd, 2005, 02:43 PM
#16
Re: Extensive Operating System and Database Corruption
 Originally Posted by umilmi81
We have a log of when the corruption occures. However my DBA is telling me he can't view the log without the purchase of a $300 software package. I've gotten the approval for that package, but the guy isn't too worried about fixing MY problem.
I need to light a fire under him to get him to understand how important this is. 
It's free for you to put a trigger on the update of that table and free for you to create a table called TRIGGERLOG - where you insert a row every time the update to the "suspect" table occurs - you insert SYSTEM_USER (along with whatever else helps you figure out where the assault is coming from).
-
Oct 22nd, 2005, 02:43 PM
#17
Re: Extensive Operating System and Database Corruption
umilmi81,
Your deployment is flawed. You should have more things to deploy than that.
If you want to check for dependencies try running ********** on you vbp project. You will get a list of all dependencies for that project or project group in the debug log. What package are you using to create the install package?
There is no reason to do A.
-
Oct 22nd, 2005, 02:57 PM
#18
Thread Starter
Hyperactive Member
Re: Extensive Operating System and Database Corruption
 Originally Posted by randem
umilmi81,
Your deployment is flawed. You should have more things to deploy than that.
If you want to check for dependencies try running ********** on you vbp project. You will get a list of all dependencies for that project or project group in the debug log. What package are you using to create the install package?
There is no reason to do A.
I'm afraid I don't have a lot of experience deploying applications. I usually write billions of little programs that run on one or two computers.
I've always had the luxury of simply copying the EXE and manually registering any DLL's.
Any advise you have on deployment would be greatly appreciated.
What is **********?
I just use the package and deployment wizard that comes with Visual Studio 6.0 when I do make a setup file.
-
Oct 22nd, 2005, 02:59 PM
#19
Re: Extensive Operating System and Database Corruption
This is not a "make a sale" moment for ********** - you need to understand what's going on with your application.
P&D wizard from VB tools is fine...
-
Oct 22nd, 2005, 03:05 PM
#20
Thread Starter
Hyperactive Member
Re: Extensive Operating System and Database Corruption
 Originally Posted by szlamany
you need to understand what's going on with your application.
I agree. My problem is, I can't simulate the error anywhere but in the production enviornment.
My diagnosis skills are lacking anywhere except in the IDE. So I don't know how to tell what's going on.
Even after we read these SQL logs and I find the applications that's doing the update, the question is WHY did it do it. There certianly is NO code that says update all records with a crazy sequential number.
-
Oct 22nd, 2005, 03:06 PM
#21
Re: Extensive Operating System and Database Corruption
szlamany,
Get off it. Work on his problem not yours.
umilmi81,
When you are deploying apps you need to find out all the things that need to be deployed and if they actuall should be deployed. ********** will search you vbp, exe, dll and ocx's for any dll/ocx/exe that is referenced and needs to be on the target computer in order for correct operation. It's just a tool.
Ultimately, you need to research to find out if the dll/exe/ocx should be deployed. please read Installation Problems in my signature.
-
Oct 22nd, 2005, 03:09 PM
#22
Re: Extensive Operating System and Database Corruption
Diagnosis of all problems is pinpointing the "where" in code. I've seen so many times when DBA and tech support folks get all over the "look what happening" aspect of a bug or problem. It is so irrelevant what is occuring - it where it's coming from.
A trigger in the production database inserting a row into a simple table - this will tell you where it's coming from.
Would you like me to ask Si_the_geek to move this to the DB section of the forum - you will get a lot of helpful replies from that area...
-
Oct 22nd, 2005, 03:10 PM
#23
Re: Extensive Operating System and Database Corruption
 Originally Posted by randem
szlamany,
Get off it. Work on his problem not yours.
umilmi81,
When you are deploying apps you need to find out all the things that need to be deployed and if they actuall should be deployed. ********** will search you vbp, exe, dll and ocx's for any dll/ocx/exe that is referenced and needs to be on the target computer in order for correct operation. It's just a tool.
Ultimately, you need to research to find out if the dll/exe/ocx should be deployed. please read Installation Problems in my signature.
Randem - what in the world do you mean by that? I've seen many posts from you telling people to get on your website and find answers. That's not the way the forum works. What's up with that?
I am working on his problem. With simple suggestions that might yield immediate results.
-
Oct 22nd, 2005, 03:14 PM
#24
Thread Starter
Hyperactive Member
Re: Extensive Operating System and Database Corruption
 Originally Posted by szlamany
Diagnosis of all problems is pinpointing the "where" in code. I've seen so many times when DBA and tech support folks get all over the "look what happening" aspect of a bug or problem. It is so irrelevant what is occuring - it where it's coming from.
A trigger in the production database inserting a row into a simple table - this will tell you where it's coming from.
Would you like me to ask Si_the_geek to move this to the DB section of the forum - you will get a lot of helpful replies from that area...
Well the problems with the database records getting corrupted is a second and new symtom. The orginal and more severe problem is the file system corruption.
I suspect that when this is solved, the problem will have more to do with VB and the eviornment rather than the database. I think the database is just one of the ways the problem is manafesting itself
-
Oct 22nd, 2005, 03:18 PM
#25
Thread Starter
Hyperactive Member
Re: Extensive Operating System and Database Corruption
 Originally Posted by szlamany
I am working on his problem. With simple suggestions that might yield immediate results.
Your suggestions are very helpful and well taken. I appreciate your efforts. But I think I'm better off analyzing the logs we already have rather than waiting for an event to occur again.
Again, this is a production environment. I'm afraid of slowing the system down with triggers on all insert, update, delete's, on all tables.
We have this event actually logged, I just need to punch some people in the face till they get this damn log reading software.
-
Oct 22nd, 2005, 03:19 PM
#26
Re: Extensive Operating System and Database Corruption
szlamany,
And your problem is... NIH
It's not as important as to where they get the information as long as it helps. Or should we all just default to PDW and MS for everything. It's the content not the location.
umilmi81
Have you checked for trojans yet. You should probably eliminate the simplest situation first. Since you believe that it should not be database specific (since dll's are getting deleted), check the other avenues.
Last edited by randem; Oct 22nd, 2005 at 03:33 PM.
-
Oct 22nd, 2005, 03:22 PM
#27
Thread Starter
Hyperactive Member
Re: Extensive Operating System and Database Corruption
 Originally Posted by randem
umilmi81
Have you checked for trojans yet. You should probably eliminate the simplest situation first. Since you believe that it should not be database specific (since dll's are getting deleted), check the other avenues.
Ok, but how do I check for trojans other than anti-virus packages?
Does anybody thing it could be because of TOO MANY ADO connections from a single computer? What if there were dozens, or hundreds of different connection objects? Could that be an issue, or should it not be that?
-
Oct 22nd, 2005, 03:25 PM
#28
Re: Extensive Operating System and Database Corruption
There is no reason to have more then one connection with ADO from a client to the server. You can have dozens and dozens of recordset and command objects all using the same connection.
ACCESS, when it talks to MS SQL Server, makes a couple of connections - I've always found that sloppy and bothersome - but not hundreds of connections.
-
Oct 22nd, 2005, 03:32 PM
#29
Re: Extensive Operating System and Database Corruption
umilmi81,
Look in my signature for Trojan Detector. That's the one I use and it is effective. There are others available also. If you have too many ADO connections you could be causing that if your code opens and closes connection rapidly. If a client server database is hit with many request to open/close connections, sometime it does not handle the connection pool the way one might thing and actually issues a new connection even thought the pooled one is still there. This is one reason not to open and close many connections.
For more on that subject read Database Problems also in my signature.
-
Oct 22nd, 2005, 03:38 PM
#30
Thread Starter
Hyperactive Member
Re: Extensive Operating System and Database Corruption
 Originally Posted by szlamany
There is no reason to have more then one connection with ADO from a client to the server. You can have dozens and dozens of recordset and command objects all using the same connection.
Its like this. I have an object called a workset object.
That object has it's own ADODB.Connection object.
But a new copy of that object is created for every workset. There are sometimes hundreds, even thousands of workset objects in use by an application at any one time.
And there are a dozen or so applications, that all have workset objects. So there could easily be 10,000 workset objects in memory, each with their own ADODB.Connection.
Am I an idiot? Was this a bad way to do this? Even if this is a bad way to do it. Can anyone evision this as causing file system corruption, and database corrption as described?
-
Oct 22nd, 2005, 03:44 PM
#31
Re: Extensive Operating System and Database Corruption
I'm afraid that's not a good idea.
Each connection basically becomes a "application thread" on the server side. There is not enough memory on the server to have this many "applications threads".
I think I've read that each connection takes 2 mb of memory - so you can see how this could be a problem.
Why not have a global connection shared in the app by each workset?
-
Oct 22nd, 2005, 03:49 PM
#32
Re: Extensive Operating System and Database Corruption
umilmi81,
If you are opening that many connection in one app, You know that when VB allocates memory it does not release it until the whole VB subsystem is exited. You could easily run out of memory and have corruption in memory that transfers to disk. No envision a hundred connection each with several recordset retrieving data. Each one wants it's own area in memory. This could cause you problems.
-
Oct 22nd, 2005, 04:24 PM
#33
Thread Starter
Hyperactive Member
Re: Extensive Operating System and Database Corruption
 Originally Posted by randem
umilmi81,
If you are opening that many connection in one app, You know that when VB allocates memory it does not release it until the whole VB subsystem is exited. You could easily run out of memory and have corruption in memory that transfers to disk. No envision a hundred connection each with several recordset retrieving data. Each one wants it's own area in memory. This could cause you problems.
Then maybe this is the problem.
So here is what I'll do. I'll rewrite the workflow layer to only use a single ADO connection object.
Thanks Guys! I'd prefer to leave this thread as unresolved for now. If I don't see the problem again by the end of a week of production, I'll bump it and mark it resolved.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|