|
-
Nov 19th, 2000, 02:31 PM
#1
Thread Starter
Addicted Member
I have made a program that calculates a user-specified number of random DNA base pairs. The four possible pair combinations are:
A:T
T:A
G:C
C:G
The problem comes when calculating millions of these pairs. How can I quickly calculate 10 million base pairs? With the code that I am using it takes over 15 seconds for each million base pairs as a compiled EXE. All that really needs to be calculated is the number of each kind so that I can display percentages.
So in summary I need a way to randomly choose one of these four base pairs millions of times and display totals. Thanks for any help!
Portion of code used:
Code:
NumberofTimes = 10000000
Randomize
For X = 1 To NumberofTimes
r = Int(Rnd * 8)
If r = 0 Or r = 1 Then
'-Base Pair A:T
AT = AT + 1
ElseIf r = 2 Or r = 3 Then
'-Base Pair T:A
TA = TA + 1
ElseIf r = 4 Or r = 5 Then
'-Base Pair G:C
GC = GC + 1
ElseIf r = 6 Or r = 7 Or r = 8 Then
'-Base Pair C:G
CG = CG + 1
End If
Next X
[Edited by overhill on 11-19-2000 at 02:37 PM]
-
Nov 19th, 2000, 02:41 PM
#2
transcendental analytic
may i ask you what you need this code for, what's the purpose? I'm pretty sure the results will be about the same every time. Also looks like you wanted to have the occurance of C:G more often, but that won't do it anyway since r won't contain 8 ever.
Use  
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.
-
Nov 19th, 2000, 08:47 PM
#3
Thread Starter
Addicted Member
Purpose
Thank you for pointing out that it will never equal 8. I originally thought that 7.5 and above would round up to 8, but not with Int(). Thanks.
The purpose of using smaller numbers (1-1,000,000) shows how the percentage of each base pair gets closer and closer to 25% each as the number increases. As for larger numbers, the human genome contains 3 billion base pairs. Some students wanted to have the computer calculate a sample human's DNA. When I figured out how long it was going to take I wanted to see if some other code was more efficient.
Essentially, I guess I was wondering if their is a more efficient way to code this. Here is the updated code.
Code:
NumberofTimes = 10000000
Randomize
For X = 1 To NumberofTimes
r = Int(Rnd * 4)
If r = 0 Then
'-Base Pair A:T
AT = AT + 1
ElseIf r = 1 Then
'-Base Pair T:A
TA = TA + 1
ElseIf r = 2 Then
'-Base Pair G:C
GC = GC + 1
ElseIf r = 3 Then
'-Base Pair C:G
CG = CG + 1
End If
Next X
-
Nov 20th, 2000, 12:41 AM
#4
Fanatic Member
Just to insert a little comment here, that counting wouldn't really be that helpful. If you only have a range of 4 possible numbers, the larger your sample size gets, the closer the percentage of each number will get to 25%. In general, as sample size increases, each number of a set of N numbers will start to appear about 1 / N percent of the time. Unless you wanted to make more numbers, and stick more things into the Case statements like if you wanted to skew the occurance of say G:C. YOu could make it slightly more efficient (I think), using something like this, although making VB do something this many times isn't really efficient in the first place .
Code:
Dim alngCounts(1 To 4) As Long
Dim intIndex As Integer
Dim i As Long
Randomize Timer
For i = 1 To 1000000
intIndex = Int(Rnd * 4) + 1
alngCounts(intIndex) = alngCounts(intIndex) + 1
Next i
Oh, and being a biology person, you might want to take into account Chargaff's rule when you make a program for this. Because adenine and thymine, and cytosine and guanine pair up like they do, their percentages are very close. A genome might be 30% A, 30% T, 20% G and 20% C. And if you're really a stickler for biochem, don't forget uracil and other purine/pyrimidine bases that can get put in (in RNA for example). I could ramble about bio all day ....
I'm baaaack...
VB5 Professional Edition, VC++ 6
Using a 1 gHz Thunderbird, 256 mb RAM, 40 gb HD system with Win98se
I feel special because I finally figured out how to loop midis: Post link
I'm a fanatic too 
-
Nov 20th, 2000, 12:03 PM
#5
transcendental analytic
Well i'm not that into genetics and stuff like you Kaverin, although i had some of that in my biology courses and it was the part i liked most. Anyway youre absolutely right, there's no good in emulating the randomness by counting each individual combination of 10000000. A waste of cpu i think. Well you could do it the mathematical way using the binominal formula. I'm not sure how you implement it with 10000000 elements but i'm searching for a short cut right now.
Use  
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|