Results 1 to 21 of 21

Thread: [RESOLVED] Extract word between "

  1. #1

    Thread Starter
    PowerPoster Radjesh Klauke's Avatar
    Join Date
    Dec 2005
    Location
    Sexbierum (Netherlands)
    Posts
    2,244

    Resolved [RESOLVED] Extract word between "

    Heya.

    I have an html file, which I imported into an rtb.
    Does anyone know how I can extract all words that are between "...myword..."
    Need to list them in another rtb.

    Thanks for the help in advance.
    Last edited by Radjesh Klauke; Nov 25th, 2013 at 05:14 AM.


    If you found my post helpful, please rate it.

    Codebank Submission: FireFox Browser (Gecko) in VB.NET, Load files, (sub)folders treeview with Windows icons

  2. #2
    eXtreme Programmer .paul.'s Avatar
    Join Date
    May 2007
    Location
    Chelmsford UK
    Posts
    25,464

    Re: Extract word between "

    you could use a simple regex:

    Code:
    Imports System.Text.RegularExpressions
    
    Public Class Form1
    
        Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
            For x As Integer = 1 To 100
                RichTextBox1.Text &= String.Format(""word{0}"", x)
            Next
        End Sub
    
        Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
            Dim rx As New Regex(""(.+?)"")
            Dim words() As String = rx.Matches(RichTextBox1.Text).Cast(Of Match).Select(Function(m) m.Groups(1).Value).ToArray
            Stop
        End Sub
    
    End Class

  3. #3

    Thread Starter
    PowerPoster Radjesh Klauke's Avatar
    Join Date
    Dec 2005
    Location
    Sexbierum (Netherlands)
    Posts
    2,244

    Re: Extract word between "

    Heya Paul. How have you been. Been a while.

    Thanks for the example. Unfortunately I can't seem to get the "words" in RichTextbox2. (Now I think of it... perhaps it would be even better when reading it into a stream. (import them directly from the html-file))

    Can you explain this part of the code?
    Code:
    Dim words() As String = rx.Matches(RichTextBox1.Text).Cast(Of Match).Select(Function(m) m.Groups(1).Value).ToArray
    Stop
    Thanks in advance.


    If you found my post helpful, please rate it.

    Codebank Submission: FireFox Browser (Gecko) in VB.NET, Load files, (sub)folders treeview with Windows icons

  4. #4
    eXtreme Programmer .paul.'s Avatar
    Join Date
    May 2007
    Location
    Chelmsford UK
    Posts
    25,464

    Re: Extract word between "

    Code:
    Dim words() As String = rx.Matches(RichTextBox1.Text).Cast(Of Match).Select(Function(m) m.Groups(1).Value).ToArray
    RichTextBox2.Lines = RichTextBox2.Lines.Concat(words).ToArray

  5. #5
    Bad man! ident's Avatar
    Join Date
    Mar 2009
    Location
    Cambridge
    Posts
    5,398

    Re: Extract word between "

    Quote Originally Posted by Radjesh Klauke View Post
    Heya Paul. How have you been. Been a while.

    Thanks for the example. Unfortunately I can't seem to get the "words" in RichTextbox2. (Now I think of it... perhaps it would be even better when reading it into a stream. (import them directly from the html-file))

    Can you explain this part of the code?
    Code:
    Dim words() As String = rx.Matches(RichTextBox1.Text).Cast(Of Match).Select(Function(m) m.Groups(1).Value).ToArray
    Stop
    Thanks in advance.
    Obviously words() as an array of strings. The line is a lambda expression. rx.matches is a matchcollection of regular expressions. We need to implicit cast each item in the matchcollection and select it. The regex expression (.+?) Using the ungreedy quantifier +? matches as little as possible. The expression matches different groups. If you changed 1 to 0 the group captured would not be what you wanted.

  6. #6
    Bad man! ident's Avatar
    Join Date
    Mar 2009
    Location
    Cambridge
    Posts
    5,398

    Re: Extract word between "

    Where is this HTML file. Online or on a hard drive? I don't see any reason to import the html data into a control when it's not being used.

    vb Code:
    1. Imports System.IO
    2. Imports System.Net
    3. Imports System.Text.RegularExpressions
    4.  
    5. Public Class Form1
    6.  
    7.     Private ReadOnly m_pattern As String = ""(.+?)""
    8.     Private ReadOnly m_match As New Regex(Me.m_pattern)
    9.  
    10.     ''' <summary>
    11.     ''' Reads text from a file and returns an array of
    12.     ''' strings using regular expressions.
    13.     ''' </summary>
    14.     ''' <param name="path">The file path.</param>
    15.     ''' <returns>An array of string.</returns>
    16.     ''' <remarks></remarks>
    17.     Private Function GetDataFromFile(ByVal path As String) As String()
    18.         Return Me.m_match.Matches(File.ReadAllText(path)) _
    19.                                   .Cast(Of Match) _
    20.                                   .Select(Function(m) m.Groups(1).Value) _
    21.                                   .ToArray
    22.     End Function
    23.  
    24.     ''' <summary>
    25.     ''' Reads text from a a web resource and returns an array of
    26.     ''' strings using regular expressions.
    27.     ''' </summary>
    28.     ''' <param name="uri">The uniform resource identifier.</param>
    29.     ''' <returns>An array of string.</returns>
    30.     ''' <remarks>I would normally make use of the using blocks. Just a quick example.</remarks>
    31.     Private Function GetDataFromWeb(ByVal uri As Uri) As String()
    32.         Return Me.m_match.Matches(New WebClient() _
    33.                                   .DownloadString(uri)) _
    34.                                   .Cast(Of Match) _
    35.                                   .Select(Function(m) m.Groups(1).Value) _
    36.                                   .ToArray
    37.     End Function
    38. End Class

  7. #7
    eXtreme Programmer .paul.'s Avatar
    Join Date
    May 2007
    Location
    Chelmsford UK
    Posts
    25,464

    Re: Extract word between &quot;

    Quote Originally Posted by Radjesh Klauke View Post
    Heya Paul. How have you been. Been a while.

    Thanks for the example. Unfortunately I can't seem to get the "words" in RichTextbox2. (Now I think of it... perhaps it would be even better when reading it into a stream. (import them directly from the html-file))

    Can you explain this part of the code?
    Code:
    Dim words() As String = rx.Matches(RichTextBox1.Text).Cast(Of Match).Select(Function(m) m.Groups(1).Value).ToArray
    Stop
    Thanks in advance.
    it gets all matches from RichTextBox1.Text, then creates an array of strings containing each match's value, using LINQ + Lambdas as ident told you...

  8. #8

    Thread Starter
    PowerPoster Radjesh Klauke's Avatar
    Join Date
    Dec 2005
    Location
    Sexbierum (Netherlands)
    Posts
    2,244

    Re: Extract word between &quot;

    Heya. Thanks for the information.
    @iudent: Just as a test I tried on button1.Click: msgbox(GetDataFromWeb(textbox1.text)). --> Value of type 'String' cannot be converted to 'System.Uri'.
    Tried New Uri... whatever. Can't get it to work.


    If you found my post helpful, please rate it.

    Codebank Submission: FireFox Browser (Gecko) in VB.NET, Load files, (sub)folders treeview with Windows icons

  9. #9
    Bad man! ident's Avatar
    Join Date
    Mar 2009
    Location
    Cambridge
    Posts
    5,398

    Re: Extract word between &quot;

    Quote Originally Posted by Radjesh Klauke View Post
    Heya. Thanks for the information.
    @iudent: Just as a test I tried on button1.Click: msgbox(GetDataFromWeb(textbox1.text)). --> Value of type 'String' cannot be converted to 'System.Uri'.
    Tried New Uri... whatever. Can't get it to work.
    GetDataFromWeb return an array of strings.

    Code:
     Dim items() As String = GetDataFromWeb(New Uri(Me.TextBox1.Text))

  10. #10
    Fanatic Member AceInfinity's Avatar
    Join Date
    May 2011
    Posts
    696

    Re: Extract word between &quot;

    This is one of the things I don't like about Visual Basic. Standards need to be better enforced IMO: http://msdn.microsoft.com/en-us/libr.../h63fsef3.aspx

    () before or after the type for an array? They deviate from that standard within the very article itself. Intellisense in Visual Studio also shows it the way they suggest not to write it...

    Anwyays, a little off topic, but hows the speed of Regex in this case? For larger data you may want to think about setting some specific regex flags for compiled Regex: http://msdn.microsoft.com/en-us/libr...vs.110%29.aspx Otherwise, use regular string manipulation methods and maybe a StringBuilder, if the data is large enough and lots of string manipulation is required.
    <<<------------
    Improving Managed Code Performance | .NET Application Performance
    < Please if this helped you out. Any kind of thanks is gladly appreciated >


    .NET Programming (2012 - 2018)
    ®Crestron - DMC-T Certified Programmer | Software Developer
    <<<------------

  11. #11
    Bad man! ident's Avatar
    Join Date
    Mar 2009
    Location
    Cambridge
    Posts
    5,398

    Re: Extract word between &quot;

    Quote Originally Posted by AceInfinity View Post
    This is one of the things I don't like about Visual Basic. Standards need to be better enforced IMO: http://msdn.microsoft.com/en-us/libr.../h63fsef3.aspx

    () before or after the type for an array? They deviate from that standard within the very article itself. Intellisense in Visual Studio also shows it the way they suggest not to write it...

    Anwyays, a little off topic, but hows the speed of Regex in this case? For larger data you may want to think about setting some specific regex flags for compiled Regex: http://msdn.microsoft.com/en-us/libr...vs.110%29.aspx Otherwise, use regular string manipulation methods and maybe a StringBuilder, if the data is large enough and lots of string manipulation is required.
    Your points are nit picking. MSDN also suggest using cstr over Tostring.... I think il follow examples witnessing dbasnett, paul, forum mod Joacim Andersso, tbh if a member like sitten uses it, whos not famously known on here but but One of the best VB coders if ever known on another forum i dont think i'll worry

    http://www.vbforums.com/showthread.p...nsional-arrays

    the argument on regex speed on a few lines is laugh worthy. 100k lines will take like what, 1 second?

  12. #12
    Fanatic Member AceInfinity's Avatar
    Join Date
    May 2011
    Posts
    696

    Re: Extract word between &quot;

    Quote Originally Posted by ident View Post
    Your points are nit picking. MSDN also suggest using cstr over Tostring.... I think il follow examples witnessing dbasnett, paul, forum mod Joacim Andersso, tbh if a member like sitten uses it, whos not famously known on here but but One of the best VB coders if ever known on another forum i dont think i'll worry

    http://www.vbforums.com/showthread.p...nsional-arrays

    the argument on regex speed on a few lines is laugh worthy. 100k lines will take like what, 1 second?
    A few lines? For starters he never posted the source he was parsing from, so to ask the question, how do you know what the source looks like for which he's trying to parse from? Regex can be significantly slower when dealing with larger string data (In some cases up to 3-5 times slower than standard string manipulation)... Hence why I asked about speed and didn't suggest to use that regex flag right off the bat, but this is precisely the reason why I avoid Regex for string parsing, unless it really is necessary to match a pattern (which is what Regex was truly designed for).

    1. I wanted to know if the performance was a consideration here or not (ie. It is not fast enough for his requirements)
    2. I was curious about what kind of source data he was dealing with; he has yet to post anything which would provide insight in that regard

    You also can't judge by lines without knowing how many chars there are per line and other factors that play a role in the performance determining, not to mention the regex pattern itself.

    Although, to do something just because another programmer does it is nonsense. You may be able to learn from a better programmer, but to strive to do everything they do is silly.

    My argument was that there should be better formulated standards for .NET languages. I didn't say because MSDN says to put () after or before the type in your array declarations that you should do it. I pointed it out as a mention because it's something I've been watching for a while now. I don't see how you can claim my "argument" to be nitpicking because I never instigated one, I posted it as a tidbit of information just for the curious, because it had caught my mind before I started brainstorming my relevant statement for this thread.

    IMHO you didn't get that Hungarian notation habit from a good .NET programmer, wherever it came from. Read this article: http://10rem.net/articles/net-naming...best-practices (See: Why Hungarian Has Fallen Out of Favor with .NET)

    ~Ace
    Last edited by AceInfinity; Nov 28th, 2013 at 04:37 PM.
    <<<------------
    Improving Managed Code Performance | .NET Application Performance
    < Please if this helped you out. Any kind of thanks is gladly appreciated >


    .NET Programming (2012 - 2018)
    ®Crestron - DMC-T Certified Programmer | Software Developer
    <<<------------

  13. #13
    Fanatic Member AceInfinity's Avatar
    Join Date
    May 2011
    Posts
    696

    Re: Extract word between &quot;

    csharp Code:
    1. public static void SomeMethod()
    2. {
    3.     const string findStr = @"&quot;";
    4.     List<string> words = GetAllWords(File.ReadAllText(@"Z:\quotes.txt"), findStr);
    5.     words.ForEach(Console.WriteLine);
    6. }
    7. static List<string> GetAllWords(string source, string findStr)
    8. {
    9.     List<string> words = new List<string>();
    10.     int pos, index = 0;
    11.     while ((pos = source.IndexOf(findStr, index, StringComparison.OrdinalIgnoreCase)) > -1)
    12.     {
    13.         index = pos + findStr.Length;
    14.         pos = source.IndexOf(findStr, index, StringComparison.OrdinalIgnoreCase);
    15.         if (pos > -1)
    16.         {
    17.             words.Add(source.Substring(index, pos - index));
    18.             index = pos + findStr.Length;
    19.         }
    20.     }
    21.     return words;
    22. }

    This is C# but the same principle still applies to VB.NET. (I had a C# project open at the time, but if you need me to translate this to VB.NET I can write an example together in VB.NET.)

    Here was the test paragraph I used:
    Lorem &quot;ipsum&quot; dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum &quot;iriure&quot; dolor &quot;in&quot; hendrerit &quot;in&quot; vulputate velit esse molestie consequat, vel &quot;illum&quot; dolore eu feugiat nulla facilisis at vero eros et accumsan et &quot;iusto&quot; odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Nam liber tempor *** soluta nobis eleifend option congue nihil &quot;imperdiet&quot; doming &quot;id&quot; quod mazim placerat facer possim assum. Typi non habent claritatem &quot;insitam&quot;; est usus legentis &quot;in&quot; &quot;iis&quot; qui facit eorum claritatem. &quot;investigationes&quot; demonstraverunt lectores legere me lius quod &quot;ii&quot; legunt saepius. Claritas est etiam processus dynamicus, qui sequitur mutationem consuetudium lectorum. Mirum est notare quam littera gothica, quam nunc putamus parum claram, anteposuerit litterarum formas humanitatis per seacula quarta decima et quinta decima. Eodem modo typi, qui nunc nobis videntur parum clari, fiant sollemnes &quot;in&quot; futurum.&quot;
    Along with the results:
    Code:
    ipsum
    iriure
    in
    in
    illum
    iusto
    imperdiet
    id
    insitam
    in
    iis
    investigationes
    ii
    in
    Quick VB.NET conversion:
    vbnet Code:
    1. Public Sub SomeMethod()
    2.     Const findStr As String = "&quot;"
    3.     Dim words As List(Of String) = GetAllWords(File.ReadAllText("Z:\quotes.txt"), findStr)
    4.     words.ForEach(Sub(n) Console.WriteLine(n))
    5. End Sub
    6.  
    7. Private Function GetAllWords(source As String, findStr As String) As List(Of String)
    8.     Dim words As New List(Of String)
    9.     Dim index As Integer = 0
    10.     Dim pos As Integer = source.IndexOf(findStr, index, StringComparison.OrdinalIgnoreCase)
    11.     While pos > -1
    12.         index = pos + findStr.Length
    13.         pos = source.IndexOf(findStr, index, StringComparison.OrdinalIgnoreCase)
    14.         If pos > -1 Then
    15.             words.Add(source.Substring(index, pos - index))
    16.             index = pos + findStr.Length
    17.         End If
    18.         pos = source.IndexOf(findStr, index, StringComparison.OrdinalIgnoreCase)
    19.     End While
    20.     Return words
    21. End Function
    Last edited by AceInfinity; Nov 28th, 2013 at 04:35 PM.
    <<<------------
    Improving Managed Code Performance | .NET Application Performance
    < Please if this helped you out. Any kind of thanks is gladly appreciated >


    .NET Programming (2012 - 2018)
    ®Crestron - DMC-T Certified Programmer | Software Developer
    <<<------------

  14. #14

    Thread Starter
    PowerPoster Radjesh Klauke's Avatar
    Join Date
    Dec 2005
    Location
    Sexbierum (Netherlands)
    Posts
    2,244

    Re: Extract word between &quot;

    Don't know what's wrong with me the last few weeks...

    @ident: How do I get the values? It's retuning me en error Index was outside the bounds of the array.

    Code:
    For i As Integer = 0 To items.Length
      RichTextBox1.text= items(i).ToString
    Next
    EDIT: Now that I look to the code again, it seems trhat it will only return numbers... ()
    How do I return the values betweens the quotes in the rtb?


    If you found my post helpful, please rate it.

    Codebank Submission: FireFox Browser (Gecko) in VB.NET, Load files, (sub)folders treeview with Windows icons

  15. #15
    Fanatic Member AceInfinity's Avatar
    Join Date
    May 2011
    Posts
    696

    Re: Extract word between &quot;

    Did you skip my example? I posted a function that basically did 99% of the work.
    <<<------------
    Improving Managed Code Performance | .NET Application Performance
    < Please if this helped you out. Any kind of thanks is gladly appreciated >


    .NET Programming (2012 - 2018)
    ®Crestron - DMC-T Certified Programmer | Software Developer
    <<<------------

  16. #16

    Thread Starter
    PowerPoster Radjesh Klauke's Avatar
    Join Date
    Dec 2005
    Location
    Sexbierum (Netherlands)
    Posts
    2,244

    Re: Extract word between &quot;

    Ha!! Hi Ace

    Sorry, Was trying to work on the example ident first. Want to understand how it works. After that I'll try yours

    Code:
    For Each x As String In items
       RichTextBox1.Text = x
    Next
    Seems I'm almost there. Gimme a moment.


    If you found my post helpful, please rate it.

    Codebank Submission: FireFox Browser (Gecko) in VB.NET, Load files, (sub)folders treeview with Windows icons

  17. #17

    Thread Starter
    PowerPoster Radjesh Klauke's Avatar
    Join Date
    Dec 2005
    Location
    Sexbierum (Netherlands)
    Posts
    2,244

    Re: Extract word between &quot;

    Got it!
    Code:
            For Each x As String In items
                RichTextBox1.Text += x & vbNewLine
            Next
    Now I'll try yours Ace.


    If you found my post helpful, please rate it.

    Codebank Submission: FireFox Browser (Gecko) in VB.NET, Load files, (sub)folders treeview with Windows icons

  18. #18

    Thread Starter
    PowerPoster Radjesh Klauke's Avatar
    Join Date
    Dec 2005
    Location
    Sexbierum (Netherlands)
    Posts
    2,244

    Re: Extract word between &quot;

    Ace: Your code works fine also
    Wonder how to use it with an URL. Want to compare the speed of the codes online.
    Everyone +REP. Really helped me out here. A real time-saver.


    If you found my post helpful, please rate it.

    Codebank Submission: FireFox Browser (Gecko) in VB.NET, Load files, (sub)folders treeview with Windows icons

  19. #19
    Fanatic Member AceInfinity's Avatar
    Join Date
    May 2011
    Posts
    696

    Re: [RESOLVED] Extract word between &quot;

    To use it with a URL, all you have to do is pass the page source (as a string that contains the page HTML) to my function, along with the search string as "&quot;". Easiest way is to use the WebClient wrapper.
    <<<------------
    Improving Managed Code Performance | .NET Application Performance
    < Please if this helped you out. Any kind of thanks is gladly appreciated >


    .NET Programming (2012 - 2018)
    ®Crestron - DMC-T Certified Programmer | Software Developer
    <<<------------

  20. #20
    Bad man! ident's Avatar
    Join Date
    Mar 2009
    Location
    Cambridge
    Posts
    5,398

    Re: [RESOLVED] Extract word between &quot;

    @Mr read msdn You provide an article for backup that uses vb6 hang overs? vbCrLf Its a constant that returns a string containing a carriage return and a line feed. The net standard for that is ControlChars.CrLf, and msgbox? Good article.

    MsgBox("hello" & vbCrLf & "goodbye") you

  21. #21
    Fanatic Member AceInfinity's Avatar
    Join Date
    May 2011
    Posts
    696

    Re: [RESOLVED] Extract word between &quot;

    Quote Originally Posted by ident View Post
    @Mr read msdn You provide an article for backup that uses vb6 hang overs? vbCrLf Its a constant that returns a string containing a carriage return and a line feed. The net standard for that is ControlChars.CrLf, and msgbox? Good article.

    MsgBox("hello" & vbCrLf & "goodbye") you
    The points in that article for the one I specifically linked to, still stand as far as naming conventions. There are errors on MSDN too, but by no means does that mean the entire site is bogus; inadequate judgement would dictate that it is. Many avoid the use Hungarian notation for .NET for comparable reasons, and it is not part of the .NET standard naming conventions for anything currently as outlined on MSDN's documentation relating to design guidelines.

    I could provide another link: http://stackoverflow.com/questions/1...ion-in-c-sharp (Jon Skeet provides an MSDN link there with the naming conventions that are standard for .NET in one of the top answers. You can read them if you like, but you should anyways.)
    Quote Originally Posted by Jon Skeet
    Private names are up to you, but I tend to follow the same conventions as for everything else. Hungarian notation (in the style of Win32) is discouraged, although many places use "m_" or "_" as a prefix for instance variables.
    Discouraged, combined with the fact that it is still used, doesn't mean it is a proper habit however.

    Last edited by AceInfinity; Dec 1st, 2013 at 09:21 PM.
    <<<------------
    Improving Managed Code Performance | .NET Application Performance
    < Please if this helped you out. Any kind of thanks is gladly appreciated >


    .NET Programming (2012 - 2018)
    ®Crestron - DMC-T Certified Programmer | Software Developer
    <<<------------

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width