-
Nov 25th, 2013, 05:11 AM
#1
Thread Starter
PowerPoster
[RESOLVED] Extract word between "
Heya.
I have an html file, which I imported into an rtb.
Does anyone know how I can extract all words that are between "...myword..."
Need to list them in another rtb.
Thanks for the help in advance.
Last edited by Radjesh Klauke; Nov 25th, 2013 at 05:14 AM.
-
Nov 25th, 2013, 08:19 AM
#2
Re: Extract word between "
you could use a simple regex:
Code:
Imports System.Text.RegularExpressions
Public Class Form1
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
For x As Integer = 1 To 100
RichTextBox1.Text &= String.Format(""word{0}"", x)
Next
End Sub
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim rx As New Regex(""(.+?)"")
Dim words() As String = rx.Matches(RichTextBox1.Text).Cast(Of Match).Select(Function(m) m.Groups(1).Value).ToArray
Stop
End Sub
End Class
- Coding Examples:
- Features:
- Online Games:
- Compiled Games:
-
Nov 26th, 2013, 03:11 AM
#3
Thread Starter
PowerPoster
Re: Extract word between "
Heya Paul. How have you been. Been a while.
Thanks for the example. Unfortunately I can't seem to get the "words" in RichTextbox2. (Now I think of it... perhaps it would be even better when reading it into a stream. (import them directly from the html-file))
Can you explain this part of the code?
Code:
Dim words() As String = rx.Matches(RichTextBox1.Text).Cast(Of Match).Select(Function(m) m.Groups(1).Value).ToArray
Stop
Thanks in advance.
-
Nov 26th, 2013, 04:06 AM
#4
Re: Extract word between "
Code:
Dim words() As String = rx.Matches(RichTextBox1.Text).Cast(Of Match).Select(Function(m) m.Groups(1).Value).ToArray
RichTextBox2.Lines = RichTextBox2.Lines.Concat(words).ToArray
- Coding Examples:
- Features:
- Online Games:
- Compiled Games:
-
Nov 26th, 2013, 05:17 AM
#5
Re: Extract word between "
Originally Posted by Radjesh Klauke
Heya Paul. How have you been. Been a while.
Thanks for the example. Unfortunately I can't seem to get the "words" in RichTextbox2. (Now I think of it... perhaps it would be even better when reading it into a stream. (import them directly from the html-file))
Can you explain this part of the code?
Code:
Dim words() As String = rx.Matches(RichTextBox1.Text).Cast(Of Match).Select(Function(m) m.Groups(1).Value).ToArray
Stop
Thanks in advance.
Obviously words() as an array of strings. The line is a lambda expression. rx.matches is a matchcollection of regular expressions. We need to implicit cast each item in the matchcollection and select it. The regex expression (.+?) Using the ungreedy quantifier +? matches as little as possible. The expression matches different groups. If you changed 1 to 0 the group captured would not be what you wanted.
-
Nov 26th, 2013, 08:22 AM
#6
Re: Extract word between "
Where is this HTML file. Online or on a hard drive? I don't see any reason to import the html data into a control when it's not being used.
vb Code:
Imports System.IO Imports System.Net Imports System.Text.RegularExpressions Public Class Form1 Private ReadOnly m_pattern As String = ""(.+?)"" Private ReadOnly m_match As New Regex(Me.m_pattern) ''' <summary> ''' Reads text from a file and returns an array of ''' strings using regular expressions. ''' </summary> ''' <param name="path">The file path.</param> ''' <returns>An array of string.</returns> ''' <remarks></remarks> Private Function GetDataFromFile(ByVal path As String) As String() Return Me.m_match.Matches(File.ReadAllText(path)) _ .Cast(Of Match) _ .Select(Function(m) m.Groups(1).Value) _ .ToArray End Function ''' <summary> ''' Reads text from a a web resource and returns an array of ''' strings using regular expressions. ''' </summary> ''' <param name="uri">The uniform resource identifier.</param> ''' <returns>An array of string.</returns> ''' <remarks>I would normally make use of the using blocks. Just a quick example.</remarks> Private Function GetDataFromWeb(ByVal uri As Uri) As String() Return Me.m_match.Matches(New WebClient() _ .DownloadString(uri)) _ .Cast(Of Match) _ .Select(Function(m) m.Groups(1).Value) _ .ToArray End Function End Class
Last edited by ident; Nov 26th, 2013 at 08:34 AM.
-
Nov 26th, 2013, 09:59 AM
#7
Re: Extract word between "
Originally Posted by Radjesh Klauke
Heya Paul. How have you been. Been a while.
Thanks for the example. Unfortunately I can't seem to get the "words" in RichTextbox2. (Now I think of it... perhaps it would be even better when reading it into a stream. (import them directly from the html-file))
Can you explain this part of the code?
Code:
Dim words() As String = rx.Matches(RichTextBox1.Text).Cast(Of Match).Select(Function(m) m.Groups(1).Value).ToArray
Stop
Thanks in advance.
it gets all matches from RichTextBox1.Text, then creates an array of strings containing each match's value, using LINQ + Lambdas as ident told you...
- Coding Examples:
- Features:
- Online Games:
- Compiled Games:
-
Nov 28th, 2013, 07:29 AM
#8
Thread Starter
PowerPoster
Re: Extract word between "
Heya. Thanks for the information.
@iudent: Just as a test I tried on button1.Click: msgbox(GetDataFromWeb(textbox1.text)). --> Value of type 'String' cannot be converted to 'System.Uri'.
Tried New Uri... whatever. Can't get it to work.
-
Nov 28th, 2013, 08:34 AM
#9
Re: Extract word between "
Originally Posted by Radjesh Klauke
Heya. Thanks for the information.
@iudent: Just as a test I tried on button1.Click: msgbox(GetDataFromWeb(textbox1.text)). --> Value of type 'String' cannot be converted to 'System.Uri'.
Tried New Uri... whatever. Can't get it to work.
GetDataFromWeb return an array of strings.
Code:
Dim items() As String = GetDataFromWeb(New Uri(Me.TextBox1.Text))
-
Nov 28th, 2013, 02:44 PM
#10
Re: Extract word between "
This is one of the things I don't like about Visual Basic. Standards need to be better enforced IMO: http://msdn.microsoft.com/en-us/libr.../h63fsef3.aspx
() before or after the type for an array? They deviate from that standard within the very article itself. Intellisense in Visual Studio also shows it the way they suggest not to write it...
Anwyays, a little off topic, but hows the speed of Regex in this case? For larger data you may want to think about setting some specific regex flags for compiled Regex: http://msdn.microsoft.com/en-us/libr...vs.110%29.aspx Otherwise, use regular string manipulation methods and maybe a StringBuilder, if the data is large enough and lots of string manipulation is required.
<<<------------
.NET Programming (2012 - 2018)
®Crestron - DMC-T Certified Programmer | Software Developer <<<------------
-
Nov 28th, 2013, 03:18 PM
#11
Re: Extract word between "
Originally Posted by AceInfinity
This is one of the things I don't like about Visual Basic. Standards need to be better enforced IMO: http://msdn.microsoft.com/en-us/libr.../h63fsef3.aspx
() before or after the type for an array? They deviate from that standard within the very article itself. Intellisense in Visual Studio also shows it the way they suggest not to write it...
Anwyays, a little off topic, but hows the speed of Regex in this case? For larger data you may want to think about setting some specific regex flags for compiled Regex: http://msdn.microsoft.com/en-us/libr...vs.110%29.aspx Otherwise, use regular string manipulation methods and maybe a StringBuilder, if the data is large enough and lots of string manipulation is required.
Your points are nit picking. MSDN also suggest using cstr over Tostring.... I think il follow examples witnessing dbasnett, paul, forum mod Joacim Andersso, tbh if a member like sitten uses it, whos not famously known on here but but One of the best VB coders if ever known on another forum i dont think i'll worry
http://www.vbforums.com/showthread.p...nsional-arrays
the argument on regex speed on a few lines is laugh worthy. 100k lines will take like what, 1 second?
-
Nov 28th, 2013, 03:26 PM
#12
Re: Extract word between "
Originally Posted by ident
Your points are nit picking. MSDN also suggest using cstr over Tostring.... I think il follow examples witnessing dbasnett, paul, forum mod Joacim Andersso, tbh if a member like sitten uses it, whos not famously known on here but but One of the best VB coders if ever known on another forum i dont think i'll worry
http://www.vbforums.com/showthread.p...nsional-arrays
the argument on regex speed on a few lines is laugh worthy. 100k lines will take like what, 1 second?
A few lines? For starters he never posted the source he was parsing from, so to ask the question, how do you know what the source looks like for which he's trying to parse from? Regex can be significantly slower when dealing with larger string data (In some cases up to 3-5 times slower than standard string manipulation)... Hence why I asked about speed and didn't suggest to use that regex flag right off the bat, but this is precisely the reason why I avoid Regex for string parsing, unless it really is necessary to match a pattern (which is what Regex was truly designed for).
1. I wanted to know if the performance was a consideration here or not (ie. It is not fast enough for his requirements)
2. I was curious about what kind of source data he was dealing with; he has yet to post anything which would provide insight in that regard
You also can't judge by lines without knowing how many chars there are per line and other factors that play a role in the performance determining, not to mention the regex pattern itself.
Although, to do something just because another programmer does it is nonsense. You may be able to learn from a better programmer, but to strive to do everything they do is silly.
My argument was that there should be better formulated standards for .NET languages. I didn't say because MSDN says to put () after or before the type in your array declarations that you should do it. I pointed it out as a mention because it's something I've been watching for a while now. I don't see how you can claim my "argument" to be nitpicking because I never instigated one, I posted it as a tidbit of information just for the curious, because it had caught my mind before I started brainstorming my relevant statement for this thread.
IMHO you didn't get that Hungarian notation habit from a good .NET programmer, wherever it came from. Read this article: http://10rem.net/articles/net-naming...best-practices (See: Why Hungarian Has Fallen Out of Favor with .NET)
~Ace
Last edited by AceInfinity; Nov 28th, 2013 at 04:37 PM.
<<<------------
.NET Programming (2012 - 2018)
®Crestron - DMC-T Certified Programmer | Software Developer <<<------------
-
Nov 28th, 2013, 04:03 PM
#13
Re: Extract word between "
csharp Code:
public static void SomeMethod() { const string findStr = @"""; List<string> words = GetAllWords(File.ReadAllText(@"Z:\quotes.txt"), findStr); words.ForEach(Console.WriteLine); } static List<string> GetAllWords(string source, string findStr) { List<string> words = new List<string>(); int pos, index = 0; while ((pos = source.IndexOf(findStr, index, StringComparison.OrdinalIgnoreCase)) > -1) { index = pos + findStr.Length; pos = source.IndexOf(findStr, index, StringComparison.OrdinalIgnoreCase); if (pos > -1) { words.Add(source.Substring(index, pos - index)); index = pos + findStr.Length; } } return words; }
This is C# but the same principle still applies to VB.NET. (I had a C# project open at the time, but if you need me to translate this to VB.NET I can write an example together in VB.NET.)
Here was the test paragraph I used:
Lorem "ipsum" dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum "iriure" dolor "in" hendrerit "in" vulputate velit esse molestie consequat, vel "illum" dolore eu feugiat nulla facilisis at vero eros et accumsan et "iusto" odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Nam liber tempor *** soluta nobis eleifend option congue nihil "imperdiet" doming "id" quod mazim placerat facer possim assum. Typi non habent claritatem "insitam"; est usus legentis "in" "iis" qui facit eorum claritatem. "investigationes" demonstraverunt lectores legere me lius quod "ii" legunt saepius. Claritas est etiam processus dynamicus, qui sequitur mutationem consuetudium lectorum. Mirum est notare quam littera gothica, quam nunc putamus parum claram, anteposuerit litterarum formas humanitatis per seacula quarta decima et quinta decima. Eodem modo typi, qui nunc nobis videntur parum clari, fiant sollemnes "in" futurum."
Along with the results:
Code:
ipsum
iriure
in
in
illum
iusto
imperdiet
id
insitam
in
iis
investigationes
ii
in
Quick VB.NET conversion:
vbnet Code:
Public Sub SomeMethod() Const findStr As String = """ Dim words As List(Of String) = GetAllWords(File.ReadAllText("Z:\quotes.txt"), findStr) words.ForEach(Sub(n) Console.WriteLine(n)) End Sub Private Function GetAllWords(source As String, findStr As String) As List(Of String) Dim words As New List(Of String) Dim index As Integer = 0 Dim pos As Integer = source.IndexOf(findStr, index, StringComparison.OrdinalIgnoreCase) While pos > -1 index = pos + findStr.Length pos = source.IndexOf(findStr, index, StringComparison.OrdinalIgnoreCase) If pos > -1 Then words.Add(source.Substring(index, pos - index)) index = pos + findStr.Length End If pos = source.IndexOf(findStr, index, StringComparison.OrdinalIgnoreCase) End While Return words End Function
Last edited by AceInfinity; Nov 28th, 2013 at 04:35 PM.
<<<------------
.NET Programming (2012 - 2018)
®Crestron - DMC-T Certified Programmer | Software Developer <<<------------
-
Nov 29th, 2013, 03:55 AM
#14
Thread Starter
PowerPoster
Re: Extract word between "
-
Nov 29th, 2013, 03:56 AM
#15
Re: Extract word between "
Did you skip my example? I posted a function that basically did 99% of the work.
<<<------------
.NET Programming (2012 - 2018)
®Crestron - DMC-T Certified Programmer | Software Developer <<<------------
-
Nov 29th, 2013, 04:22 AM
#16
Thread Starter
PowerPoster
Re: Extract word between "
Ha!! Hi Ace
Sorry, Was trying to work on the example ident first. Want to understand how it works. After that I'll try yours
Code:
For Each x As String In items
RichTextBox1.Text = x
Next
Seems I'm almost there. Gimme a moment.
-
Nov 29th, 2013, 04:30 AM
#17
Thread Starter
PowerPoster
Re: Extract word between "
Got it!
Code:
For Each x As String In items
RichTextBox1.Text += x & vbNewLine
Next
Now I'll try yours Ace.
-
Nov 29th, 2013, 04:48 AM
#18
Thread Starter
PowerPoster
Re: Extract word between "
Ace: Your code works fine also
Wonder how to use it with an URL. Want to compare the speed of the codes online.
Everyone +REP. Really helped me out here. A real time-saver.
-
Nov 29th, 2013, 02:44 PM
#19
Re: [RESOLVED] Extract word between "
To use it with a URL, all you have to do is pass the page source (as a string that contains the page HTML) to my function, along with the search string as """. Easiest way is to use the WebClient wrapper.
<<<------------
.NET Programming (2012 - 2018)
®Crestron - DMC-T Certified Programmer | Software Developer <<<------------
-
Nov 30th, 2013, 10:36 AM
#20
Re: [RESOLVED] Extract word between "
@Mr read msdn You provide an article for backup that uses vb6 hang overs? vbCrLf Its a constant that returns a string containing a carriage return and a line feed. The net standard for that is ControlChars.CrLf, and msgbox? Good article.
MsgBox("hello" & vbCrLf & "goodbye") you
-
Dec 1st, 2013, 09:02 PM
#21
Re: [RESOLVED] Extract word between "
Originally Posted by ident
@Mr read msdn You provide an article for backup that uses vb6 hang overs? vbCrLf Its a constant that returns a string containing a carriage return and a line feed. The net standard for that is ControlChars.CrLf, and msgbox? Good article.
MsgBox("hello" & vbCrLf & "goodbye") you
The points in that article for the one I specifically linked to, still stand as far as naming conventions. There are errors on MSDN too, but by no means does that mean the entire site is bogus; inadequate judgement would dictate that it is. Many avoid the use Hungarian notation for .NET for comparable reasons, and it is not part of the .NET standard naming conventions for anything currently as outlined on MSDN's documentation relating to design guidelines.
I could provide another link: http://stackoverflow.com/questions/1...ion-in-c-sharp (Jon Skeet provides an MSDN link there with the naming conventions that are standard for .NET in one of the top answers. You can read them if you like, but you should anyways.)
Originally Posted by Jon Skeet
Private names are up to you, but I tend to follow the same conventions as for everything else. Hungarian notation (in the style of Win32) is discouraged, although many places use "m_" or "_" as a prefix for instance variables.
Discouraged, combined with the fact that it is still used, doesn't mean it is a proper habit however.
Last edited by AceInfinity; Dec 1st, 2013 at 09:21 PM.
<<<------------
.NET Programming (2012 - 2018)
®Crestron - DMC-T Certified Programmer | Software Developer <<<------------
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|