Results 1 to 2 of 2

Thread: Parsing problen with youtube downloader

  1. #1

    Thread Starter
    Junior Member
    Join Date
    Jan 2013
    Posts
    24

    Parsing problen with youtube downloader

    Hello guys.

    I downloaded a source code of youtube information grabber:

    Code:
    Imports System.Net
    Imports System.Text.RegularExpressions
    
    Public Class Form1
    
        Private Function GetBetween(ByVal Source As String, ByVal Str1 As String, ByVal Str2 As String, Optional ByVal Index As Integer = 0) As String
            Return Regex.Split(Regex.Split(Source, Str1)(Index + 1), Str2)(0)
        End Function
    
        Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
            Dim fs As SaveFileDialog = New SaveFileDialog
            fs.RestoreDirectory = True
            fs.Filter = "txt files (*.txt)|*.txt"
            fs.FilterIndex = 1
            fs.ShowDialog()
            Dim url As String = ""
            If (TextBox1.Text.ToLower().Contains("youtube")) Then
                If (TextBox1.Text.ToLower().StartsWith("http://") Or TextBox1.Text.ToLower().StartsWith("https://")) Then
                    url = TextBox1.Text
                Else
                    If (TextBox1.Text.ToLower().StartsWith("www.")) Then
                        url = "http://" & TextBox1.Text
                    Else
                        url = "http://www." & TextBox1.Text
                    End If
                End If
            ElseIf (TextBox1.Text.ToLower().StartsWith("/watch")) Then
                url = "http://www.youtube.com" & TextBox1.Text.ToLower()
            ElseIf (TextBox1.Text.ToLower().StartsWith("watch")) Then
                url = "http://www.youtube.com/" & TextBox1.Text.ToLower()
            End If
            Dim r As HttpWebRequest = HttpWebRequest.Create(url)
            Dim re As HttpWebResponse = r.GetResponse
            Dim src As String = New System.IO.StreamReader(re.GetResponseStream()).ReadToEnd()
            Dim title2 As String = GetBetween(src, "<span id=""eow-title""", ">")
            Dim title As String = GetBetween(title2, "title=""", """")
            Dim desc As String = GetBetween(src, "<p id=""eow-description"" >", "</p>")
            Dim likes As String = GetBetween(src, "<span class=""likes-count"">", "</span")
            Dim dislikes As String = GetBetween(src, "<span class=""dislikes-count"">", "</span")
            Dim views As String = GetBetween(src, "<span class=""watch-view-count "" >", "</span")
            title = removeExtras(title, False)
            desc = removeExtras(desc, False)
            likes = removeExtras(likes)
            dislikes = removeExtras(dislikes)
            views = removeExtras(views)
            Using sw As New System.IO.StreamWriter(fs.FileName)
                sw.WriteLine(title)
                sw.WriteLine(desc)
                sw.WriteLine("Likes: " & likes)
                sw.WriteLine("Dislikes: " & dislikes)
                sw.WriteLine("Total Views: " & views)
            End Using
        End Sub
    
        Private Function removeExtras(ByVal s As String, Optional ByVal removeSpaces As Boolean = True)
            Dim ret As String = s
            If (s.Contains(" ") And removeSpaces) Then
                ret = ""
                For Each c As String In s
                    If (Not c = " ") Then ret &= c
                Next
            End If
            If (ret.Contains("<") And ret.Contains(">")) Then
                Dim sa As Boolean = True
                Dim temp As String = ""
                For Each c As String In ret
                    If (c = "<") Then sa = False
                    If (c = ">") Then sa = True
                    If (Not c = "<" And Not c = ">" And sa) Then
                        temp &= c
                    End If
                Next
                ret = temp
            End If
            If (ret.Contains("&quot;")) Then
                ret = ret.Replace("&quot;", """")
            End If
            If (ret.Contains("&#39;")) Then ret = ret.Replace("&#39;", "'")
            Return ret
        End Function
    
        Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
    
        End Sub
    End Class

    The first problem is here: Return Regex.Split(Regex.Split(Source, Str1)(Index + 1), Str2)(0) because index shows 0 and is Out of Bound. I put there just 0 and it worked but for the item:

    Code:
    Total Views:
    I'm getting the following:

    Code:
    Total Views: varytcsi={gt:function(n){n=(n||'')+'data_';returnytcsi[n]||(ytcsi[n]={tick:{},span:{},info:{}});},tick:function(l,t,n){ytcsi.gt(n).tick[l]=t||+newDate();},span:function(l,s,n){ytcsi.gt(n).span[l]=(typeofs=='number')?s:+newDate()-ytcsi.data_.tick[l];},info:function(k,v,n){ytcsi.gt(n).info[k]=v;}};ytcsi.perf=window.perfor bla bla bla
    I checked HTML code and it's
    Code:
    <span class="watch-view-count">59,672</span>
    Then why the regex parser can't extract the number of view properly?

    Any idea?

    Thanks in advance

  2. #2
    Bad man! ident's Avatar
    Join Date
    Mar 2009
    Location
    Cambridge
    Posts
    5,398

    Re: Parsing problen with youtube downloader

    The problem is who ever wrote that has no idea about regex. The whole source code above is very badly written. This is usually the case because youtube applications are generally written by kids. The whole point of regex is to match patterns. Not silly split after split.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width