Results 1 to 10 of 10

Thread: DataSet with ThreadPool causing problems...

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Feb 2011
    Posts
    151

    DataSet with ThreadPool causing problems...

    Hey, im allready getting bit fustrated with this. Been trying to find solution for few hours allready.

    So, im making proxy scraper and beacause adding proxies directly to datagridview was causing program to run slow, somebody said I should use database for it.

    As I said on thread title, this is using ThreadPool and its giving me this error.

    Code:
    DataTable internal index is corrupted: 5
    So, this is my code that should add proxy:

    Code:
                    Dim proxyregex As New Regex("[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}:[0-9]{1,4}")
                    Dim proxies As MatchCollection = proxyregex.Matches(html)
                    Dim pcount As Integer = 0
    
                    For Each m As Match In proxies
                        Dim splitted() As String = m.Value.Split(":"c)
                        If splitted.Length > 1 Then
                            Dim newProxyRow As DataRow = ProxiesDBDataSet.Tables(0).NewRow()
                            newProxyRow(0) = tcount.ToString
                            newProxyRow(1) = splitted(0).ToString
                            newProxyRow(2) = splitted(1).ToString
                            ProxiesDBDataSet.Tables(0).Rows.Add(newProxyRow)
                            tcount += 1
                            pcount += 1
                        End If
                    Next
    Any help?

  2. #2
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    110,344

    Re: DataSet with ThreadPool causing problems...

    Is your DataTable bound to a DataGridView at the time? If so then that may be the root of the issue.

    By the way, why are you using the ThreadPool? That is generally for queuing multiple work items. If all you want to do is run a single loop on a single thread then there are better options.

  3. #3
    Angel of Code Niya's Avatar
    Join Date
    Nov 2011
    Posts
    8,600

    Re: DataSet with ThreadPool causing problems...

    Hmm....We need a little more info on what you're doing. As suggested by jmc, it would help to know if that DataTable is bound or not. Also, I'd like to know what it is you're doing that requires threading and more specifically, how you're doing it. Also, explain what this 'proxy' thing is about because that totally lost me

    Your error instinctively tells me a couple of SyncLocks may need to be factored in to your code but I can't make such an assertion accurately without knowing exactly what you're doing.
    Treeview with NodeAdded/NodesRemoved events | BlinkLabel control | Calculate Permutations | Object Enums | ComboBox with centered items | .Net Internals article(not mine) | Wizard Control | Understanding Multi-Threading | Simple file compression | Demon Arena

    Copy/move files using Windows Shell | I'm not wanted

    C++ programmers will dismiss you as a cretinous simpleton for your inability to keep track of pointers chained 6 levels deep and Java programmers will pillory you for buying into the evils of Microsoft. Meanwhile C# programmers will get paid just a little bit more than you for writing exactly the same code and VB6 programmers will continue to whitter on about "footprints". - FunkyDexter

    There's just no reason to use garbage like InputBox. - jmcilhinney

    The threads I start are Niya and Olaf free zones. No arguing about the benefits of VB6 over .NET here please. Happiness must reign. - yereverluvinuncleber

  4. #4
    Junior Member
    Join Date
    Sep 2012
    Posts
    17

    Re: DataSet with ThreadPool causing problems...

    Thank you

  5. #5

    Thread Starter
    Addicted Member
    Join Date
    Feb 2011
    Posts
    151

    Re: DataSet with ThreadPool causing problems...

    Quote Originally Posted by jmcilhinney View Post
    Is your DataTable bound to a DataGridView at the time? If so then that may be the root of the issue.

    By the way, why are you using the ThreadPool? That is generally for queuing multiple work items. If all you want to do is run a single loop on a single thread then there are better options.
    Quote Originally Posted by Niya View Post
    Hmm....We need a little more info on what you're doing. As suggested by jmc, it would help to know if that DataTable is bound or not. Also, I'd like to know what it is you're doing that requires threading and more specifically, how you're doing it. Also, explain what this 'proxy' thing is about because that totally lost me

    Your error instinctively tells me a couple of SyncLocks may need to be factored in to your code but I can't make such an assertion accurately without knowing exactly what you're doing.
    I use threadpool beacause there is over few hundred sites where my programs scrapes proxies.
    And yes, its bound to datagridview.

    This is how I have made it now:

    Code:
        Public Sub dowork()
            For i = 0 To ProxiesDBDataSet.Proxies.Rows.Count - 1
                ProxiesDBDataSet.Proxies.Rows(i).Delete()
            Next
            Dim upperbound As Integer = multisource.Items.Count() - 1
            totalTaskCount = upperbound
            For i = 0 To upperbound
                ThreadPool.QueueUserWorkItem(AddressOf Scrape, multisource.Items(i).ToString())
            Next
        End Sub
    Code:
        Private Sub Scrape(ByVal webaddress As Object)
            Dim website As String = CType(webAddress, String)
            Dim http As New Chilkat.Http()
            Dim success As Boolean = http.UnlockComponent(ChilKatLicenses.Lic_Chilkat_HTTP)
            If success = True Then
                http.ConnectTimeout = 5
                Dim html As String = http.QuickGetStr(website)
                If String.IsNullOrEmpty(html) Then
                    logger("Timedout: " & website)
                Else
                    Dim proxyregex As New Regex("[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}:[0-9]{1,4}")
                    Dim proxies As MatchCollection = proxyregex.Matches(html)
                    Dim pcount As Integer = 0
    
                    For Each m As Match In proxies
                        Dim splitted() As String = m.Value.Split(":"c)
                        If splitted.Length > 1 Then
                            Dim newProxyRow As DataRow = ProxiesDBDataSet.Tables(0).NewRow()
                            newProxyRow(0) = tcount.ToString
                            newProxyRow(1) = splitted(0).ToString
                            newProxyRow(2) = splitted(1).ToString
                            ProxiesDBDataSet.Tables(0).Rows.Add(newProxyRow)
                            tcount += 1
                            pcount += 1
                        End If
                    Next
    
                    Dim msg As String = "Scraped " & pcount.ToString & " proxies from " & website
                    SetControlText(totalcount, tcount.ToString)
                    SetControlText(Label28, msg)
                    logger(msg)
                    Interlocked.Decrement(totalTaskCount)
                    If Interlocked.Read(totalTaskCount) <= 0 Then
                        logger("Scraping completed for website " & website)
                    End If
                End If
            End If
        End Sub

  6. #6
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    110,344

    Re: DataSet with ThreadPool causing problems...

    As I said, the fact that it's bound is probably the issue. When you make a change to the DataTable that will cause a change in the DataGridView. That means that you are updating the UI from a secondary thread, which is a no-no. You should be marshalling a method call to the UI thread to update the DataTable. From the looks of the fact that you're calling a SetControlText method you already know how to do that. I would tend to do this:
    Code:
    Private Sub AddRowToDataTable(table As DataTable, ParamArray values As Object())
        If InvokeRequired Then
            Invoke(New Action(Of DataTable, Object())(AddressOf AddRowToDataTable), table, values)
        Else
            table.Rows.Add(values)
        End If
    End Sub
    and then replace this:
    Code:
                            Dim newProxyRow As DataRow = ProxiesDBDataSet.Tables(0).NewRow()
                            newProxyRow(0) = tcount.ToString
                            newProxyRow(1) = splitted(0).ToString
                            newProxyRow(2) = splitted(1).ToString
                            ProxiesDBDataSet.Tables(0).Rows.Add(newProxyRow)
    with this:
    Code:
    AddRowToDataTable(ProxiesDBDataSet.Tables(0), tcount.ToString(), splitted(0).ToString(), splitted(1).ToString())

  7. #7

    Thread Starter
    Addicted Member
    Join Date
    Feb 2011
    Posts
    151

    Re: DataSet with ThreadPool causing problems...

    Quote Originally Posted by jmcilhinney View Post
    As I said, the fact that it's bound is probably the issue. When you make a change to the DataTable that will cause a change in the DataGridView. That means that you are updating the UI from a secondary thread, which is a no-no. You should be marshalling a method call to the UI thread to update the DataTable. From the looks of the fact that you're calling a SetControlText method you already know how to do that. I would tend to do this:
    Code:
    Private Sub AddRowToDataTable(table As DataTable, ParamArray values As Object())
        If InvokeRequired Then
            Invoke(New Action(Of DataTable, Object())(AddressOf AddRowToDataTable), table, values)
        Else
            table.Rows.Add(values)
        End If
    End Sub
    and then replace this:
    Code:
                            Dim newProxyRow As DataRow = ProxiesDBDataSet.Tables(0).NewRow()
                            newProxyRow(0) = tcount.ToString
                            newProxyRow(1) = splitted(0).ToString
                            newProxyRow(2) = splitted(1).ToString
                            ProxiesDBDataSet.Tables(0).Rows.Add(newProxyRow)
    with this:
    Code:
    AddRowToDataTable(ProxiesDBDataSet.Tables(0), tcount.ToString(), splitted(0).ToString(), splitted(1).ToString())
    Thanks alot! No more that error and works very fast

    Anyway, I added this into scraper
    Code:
                            Dim row As ProxiesDBDataSet.ProxiesRow = ProxiesDBDataSet.Proxies.FindByIpPort(splitted(0).ToString, splitted(1).ToString)
                            If row Is Nothing Then
                                AddRowToDataTable(ProxiesDBDataSet.Tables(0), tcount.ToString(), splitted(0).ToString(), splitted(1).ToString())
                                tcount += 1
                                pcount += 1
                            End If
    And it should check if table allready contains that row but it still gives this error:
    Column Ip Port is constrained to be unique. Value ..... is already present.

    On this code:
    Code:
        Private Sub AddRowToDataTable(table As DataTable, ParamArray values As Object())
            If InvokeRequired Then
                Invoke(New Action(Of DataTable, Object())(AddressOf AddRowToDataTable), table, values)
            Else
                table.Rows.Add(values)
            End If
        End Sub
    Any ideas?

  8. #8
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    110,344

    Re: DataSet with ThreadPool causing problems...

    You could call the table's Select method to see if there is an existing row with a particular value in a particular column. That will be a bit on the slow side though, so I'd suggest a different option. I'd use a HashSet to store the values that are in that column. When you want to add a new row, you add the value you want in that column to the HashSet first. If the Add method returns True then the item was added, meaning that it was not already in the HashSet, meaning that it's not already in the DataTable, so you can go ahead and add the row. If Add returns False then the value already exists so you shouldn't add the row. Note that that check can safely be performed on the background thread because it doesn't affect the DataTable so it doesn't affect the grid.

  9. #9

    Thread Starter
    Addicted Member
    Join Date
    Feb 2011
    Posts
    151

    Re: DataSet with ThreadPool causing problems...

    Could you give some example of this? I havent heard of this HashSet before, checked on google but cant understand how to make it work like how you said.

  10. #10
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    110,344

    Re: DataSet with ThreadPool causing problems...

    It's not rocket science. You already know how to Add an item to a collection because you're already doing it in the code you posted. In the case of a HashSet, it cannot contain duplicates so Add returns True or False depending on whether the item was unique and added successfully or a duplicate and rejected. You simply use that Boolean value to determine whether to add your row or not.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width