-
Find string between two other strings
Hi All,
I am trying to take a value from an html file. I know what code will be around the value. So I want to split the html doc with these values so I can get the value between them.
For example:
Code:
<h4>Availability</h4>
<p>
<b>Availability</b>: VALUE IS HERE <br />
</p>
So
Code:
<h4>Availability</h4>
<p>
<b>Availability</b>:
is at the start and is at the end.
Could someone help?
Cheers
-
Re: Find string between two other strings
Have a look at regex. Search the forum, there are many similar questions that have been answered.
-
Re: Find string between two other strings
Ok, I have had a look at regex. and downloaded the expresso software.
But as I am unable to understand the how to use it, I am not sure what expression to put in.
I have tried this:
Code:
"(?<=<b>Availability</b>: ).+?(?=<br />)"
But it gives no matches, this was just a copy and paste job. I dont really know what it means.
cheers
-
Re: Find string between two other strings
Try this:
Code:
Dim input As String = "<h4>Availability</h4>" & Environment.NewLine & _
"<p>" & Environment.NewLine & _
"<b>Availability</b>: VALUE IS HERE <br />" & Environment.NewLine & _
"</p>"
Dim pattern As String = "(?<=<b>Availability</b>).*?(?=<br />)"
For Each m As System.Text.RegularExpressions.Match In System.Text.RegularExpressions.Regex.Matches(input, pattern)
MessageBox.Show(m.Value)
Next
-
Re: Find string between two other strings
Ok I got this to work, if I just created a button and used your code.
But when tryin to implement it into to my code it produces very weird results, the code is in the document completed procedure.
Heres my code:
vb Code:
For i = start To finish
s = objExcel.Range("E" & i).Value
If s.Contains("RS") Then
s = objExcel.Range("D" & i).Value
status = 2
rs.Navigate("http://uk.rs-online.com/web/search/searchBrowseAction.html?searchProducts&searchTerm=" + s)
r_loaded = False
Do While r_loaded = False
Application.DoEvents()
Loop
ElseIf s.Contains("Farnell") Then
s = objExcel.Range("D" & i).Value
status = 2
farnell.Navigate("http://uk.farnell.com/jsp/search/productdetail.jsp?sku=" + s)
f_loaded = False
Do While f_loaded = False
Application.DoEvents()
Loop
End If
i = +1
Next
vb Code:
Private Sub farnell_DocumentCompleted(ByVal sender As System.Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles farnell.DocumentCompleted
f_loaded = True
If status = 1 Then
ElseIf status = 2 Then
Dim strSource As String
If status = 1 Then
ElseIf status = 2 Then
strSource = farnell.DocumentText
'find stock
Dim input As String = "<h4>Availability</h4>" & Environment.NewLine & _
"<p>" & Environment.NewLine & _
"<b>Availability</b>: VALUE IS HERE <br />" & Environment.NewLine & _
"</p>"
Dim pattern As String = "(?<=<b>Availability</b>.*?(?=<br />)"
For Each m As System.Text.RegularExpressions.Match In System.Text.RegularExpressions.Regex.Matches(input, pattern)
MsgBox(m.Value)
Next
ElseIf status = 3 Then
End If
ElseIf status = 3 Then
End If
End Sub
When it gets to the loop for each m, the code jumps back to if s.contains("RS") then. But the variable i is one less than before. And there is not anywhere it is changed.
Cheers
-
Re: Find string between two other strings
You need to change the input string to the actual html source code in the DocumentCompleted event handler
Code:
Change this:
strSource = farnell.DocumentText
'find stock
Dim input As String = "<h4>Availability</h4>" & Environment.NewLine & _
"<p>" & Environment.NewLine & _
"<b>Availability</b>: VALUE IS HERE <br />" & Environment.NewLine & _
"</p>"
'To this
'find stock
Dim input As String = farnell.DocumentText
-
Re: Find string between two other strings
I had tried that too, but I tried to just use the hardcoded one to see if it made a difference.
I really cant figure out what is going on, as soon it gets to the for each loop. The program jumps from
vb Code:
For Each m As System.Text.RegularExpressions.Match In System.Text.RegularExpressions.Regex.Matches(input, pattern)
to
But the variable i = one less that what it did before.
For ref s = 1301912, i = 3 at the time of when document completed is processed. But when it jumps back i = 2. Which makes s = nothing as there is nothing in the cell in the spread sheet.
Any suggestions?
-
Re: Find string between two other strings
Of course it'll jumps back to the loop becuse you have this in the for loop:
Code:
f_loaded = False
Do While f_loaded = False
Application.DoEvents()
Loop
What happens is when your code executes those lines, because f_loaded = false, it goes into the Do While loop, which has application.doevents, which in turn allows the documentcompleted event handler to run. In here, you set f_loaded = true, so the Do While loop exists and your for loop continues on the next iteration.
I don't know what's your logic is behind all this, so I can't give you a recommendation. But if you tel me what exactly you are trying to achieve, maybe I can give some suggestions.
-
Re: Find string between two other strings
I have a spreadsheet which has stock codes and the supplier either RS or Farnell.
The first for loop, loops through each row. Each time it gets the stock code it loads that product in either of the web browser components.
Then waits for the page to load and the regex is then suppose to find the part in the HTML that shows how many there is in stock. This will then be saved in the spreadsheet.
I know it will jump back to the first for loop, but the for each loop does not run or find any matches. So it seems to jump prematurely. When I step through the code, it does not even go to next as you would expect even if there were no matches.
Where as if you put this code in a button event and click that button it finds the "VALUE IS HERE".
Also when the code jumps back the variable i is 1 less, and I do not know how this has happened.
Hope this help?
-
Re: Find string between two other strings
You need to move the line f_loaded = True to the end of the documentCompleted sub. This way, you only allow the next iteration of the for loop to run AFter you've done working with the html document.
-
Re: Find string between two other strings
OK I have fixed this problem now but I now have another problem, as sometimes the html code has another new line before the value.
E.G
Code:
Dim input As String = "<h4>Availability</h4>" & Environment.NewLine & _
"<p>" & Environment.NewLine & _
"<b>Availability</b>:" & Environment.NewLine & "VALUE IS HERE <br />" & Environment.NewLine & _
"</p>"
Instead of
Code:
"<h4>Availability</h4>" & Environment.NewLine & _
"<p>" & Environment.NewLine & _
"<b>Availability</b>: VALUE IS HERE <br />" & Environment.NewLine & _
"</p>"
This means that these values are not found, could someone give me an idea of how to change the expression to be able to find the value?
-
Re: Find string between two other strings
Just change the regex pattern to this:
Code:
Dim pattern As String = "(?<=<b>Availability</b>:[\s\n]*).*?(?=<br />)"
The rest of the code remains the same and that should do it.
-
Re: Find string between two other strings
Thanks Stanav,
I had tried using \n but didn't know it needed a [] round it!
Have to do it for a couple other websites now, so I may be back
-
Re: Find string between two other strings
This may be a bit of subject, but it is still the same project and trying to get a string value.
I am going to two supplier websites to check stock as you know, a problem is that one of the sites seems update stock with javascript and this value does not show up in the source.
But if you highlight it in firefox and look at the selected source it does
e.g
go to http://uk.rs-online.com/web/search/s...duct&R=0261094
and use the check stock facility to the right. The text changes, but if you look at the source code it has not changed from In stock for next working day delivery.
This means I have no way of checking how many are in stock.
Any ideas?
-
Re: Find string between two other strings
I could not find the link/button/or whatever it is that reads "check stock facility" on that webpage. There's a link that says "Check a different quantity" which does nothing when clicked. Is that the one you're talking about?
-
Re: Find string between two other strings
Yes sorry, if you enter a number say 10 in the text box above the "Check a different quantity" link and then click that link. The text will change above the text box, but this change is not shown in the source.
Cheers
-
Re: Find string between two other strings
I tried that, and yes, the text in that textbox changed because I changed it. That is, whatever value I put in it stays there. In other words, the javascript failed to run (I get script error on that page), so I cannot really see what's happening. BTW, I'm using IE8.
-
Re: Find string between two other strings
Ah I see, I'm using firefox and IE7. There must be a problem with that script on IE8 then. Have you got another browser installed to take a look?
This is the only bit left I have to do, the rest is done. I suppose they are using ajax to change the text, but I thought this would still update the source.
Its kinda hard to explain what is happening without you seeing it.
-
Re: Find string between two other strings
OK... I tried it with Firefox and saw what got changed. If you look at the page source code, in this section:
Code:
<div id="lineAtp">
<div class="stockY" id="stockY">In stock for next working day delivery</div>
<!-- LIVE STOCK CHECK -->
<div id="lineCheck">
<form id="form20261094" name="form20261094" action="/web/cart/shoppingCart.html?method=updateShoppingCart" method="post">
<input type="text" size="4" id="qty" maxlength="6" name="qty" onkeydown="return checkCount('form20261094',event);return false;" value="1">
<input type="hidden" size="3" id="stocknum" name="stocknum" value="0261094"/>
<input type="hidden" name="ean" type="text" />
<input type="hidden" name="addProduct" value="Y"/>
</form>
<a href="javascript:void(0);" onclick="checkAvailability('form20261094')">Check a different quantity</a>
</div>
</div>
You see that that check different quantity link has no id. So you will have to get the collection of all links on the page (using webbrowser.document.Links) and loop thru the collection, test each one for the attribute "onclick" or innertext to find the right link. Once you find it, you can invokemember("click") on that htmlelement. Wait a little for the ajax to retrieve the value, and you use webbrowser.getElementById to get the "stockY" div. You now can read the innertext of this div, which contains the data you're after.
-
Re: Find string between two other strings
Hi stan,
I am able to do everything up until actually getting the value, how would you recommend waiting for the value to be updated?
As I am only getting "In stock for next working day delivery" at the moment. But it seems to be loading after I have got this text, I have tried to used thread.sleep but would this stop the page loading?
Heres my code:
vb Code:
For i2 = 0 To rs.Document.Links.Count - 1
t = rs.Document.Links.Item(i2).InnerText
If rs.Document.Links.Item(i2).InnerText = "Check a different quantity" Then
rs.Document.Links.Item(i2).InvokeMember("click")
System.Threading.Thread.Sleep(2000)
t = rs.Document.GetElementById("stockY").InnerText
End If
Next
The value does change but it does not show in t.
cheers
-
Re: Find string between two other strings
Ok I have got that working now, but I dont seem to be able to change the value in the text box to be able to check a certain amount.
I am using this :
vb Code:
rs.Document.GetElementId("qty").SetAttribute("value", needed)
where needed is a string.
-
Re: Find string between two other strings
OK I have most of this sorted now. I just have one more regex question to be able to get the price. I am able to get it to work on one website but not the other::
REgex I have tried:
Code:
(?<=<td>10 - 40</td>[\s\n]*<td>[\s\n]*).*?(?=</td>)
Need to find £0.059 in here:
Code:
<td>10 - 40</td>
<td>
£0.059
</td>
-
Re: Find string between two other strings
ok sorry done it:
Code:
(?<=<td>10 - 40</td>[\s\n]*<td>[\s\n]*).*?(?=[\s\n]*</td>)
-
Re: Find string between two other strings
Quote:
Originally Posted by
samtaylor08
ok sorry done it:
Code:
(?<=<td>10 - 40</td>[\s\n]*<td>[\s\n]*).*?(?=[\s\n]*</td>)
Good job :thumb: I was pretty sure that you would figure it out yourself and that was why I didn't follow up with this thread after my last post. It feels very rewarding when you get something accomplish with your own efforts, isn't it?
Anyway, keep up the good spirits :)