|
-
Jul 21st, 2013, 12:26 AM
#1
Thread Starter
Hyperactive Member
-
Jul 21st, 2013, 06:24 AM
#2
Re: Disable adobe reader plugin in Webbrowser
Is there a particular reason why you are using the browser control if you are simply scrapping the site?
-
Jul 21st, 2013, 06:56 AM
#3
Thread Starter
Hyperactive Member
Re: Disable adobe reader plugin in Webbrowser
 Originally Posted by ident
Is there a particular reason why you are using the browser control if you are simply scrapping the site?
Yes, because I am using the document.innertext function to avoid some messy highlighting/link code from the html source.
If my post has been helpful, please rate it! 
-
Jul 21st, 2013, 07:00 AM
#4
Re: Disable adobe reader plugin in Webbrowser
Yes but why don't you simply use the webclient class?
-
Jul 21st, 2013, 07:07 AM
#5
Thread Starter
Hyperactive Member
Re: Disable adobe reader plugin in Webbrowser
Sorry, what do you mean? Can you give an example? I think its because my program performs a search that involves extracting links from the first results page, browsing through these links and scrapping the results from the second set of pages. Hence the second set of results is dependent on the links scrapped from the first, rather then a stationary set of urls that can be scrapped using the webclient class.
Not sure if the above makes sense? O.o Sorry for the confusion.
If my post has been helpful, please rate it! 
-
Jul 21st, 2013, 07:14 AM
#6
Re: Disable adobe reader plugin in Webbrowser
A web browser is UI element. You are not using it as such. If all you want is the pages html then use the webclient class.
vb Code:
Public Class Form1 Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load Dim source As String = Nothing Using wClient As New Net.WebClient Try source = wClient.DownloadString(New Uri("url")) Catch ex As Net.WebException MessageBox.Show(ex.Message) End Try End Using ' do what ever with the pages source.... End Sub End Class
You would want to download the page using the async method. This will block the calling thread. But it's enough to give you an idea.
-
Jul 21st, 2013, 07:57 AM
#7
Thread Starter
Hyperactive Member
Re: Disable adobe reader plugin in Webbrowser
Ye, there is a few issues with using that. The main thing is that I have an existing code in place that uses the browser, mainly because the online database requires a login password on first search.
The second issue is that I need to inject variables systematically (names and dates) to perform the search on the website which will give me a list of links. I then browse through the second list of links to extract the WebBrowser1.Document.Body.InnerText only, as an easier way to capture the extract wanted in a readable format without all the html code bits left.
I don't know why the results show in a pdf reader when used in my program but not through any other browsers.
If my post has been helpful, please rate it! 
-
Jul 21st, 2013, 11:16 AM
#8
Re: Disable adobe reader plugin in Webbrowser
I am currently trying to program a web scrapping program
How very Luddite!
I don't know why the results show in a pdf reader when used in my program but not through any other browsers.
I suspect that you need to find out before any progress can be made. It could be a browser recognition problem. Unlike Internet Explorer the browser does not announce itself as an advanced browser. Having said that, it seems a little counter-intuitive for the site to default to the more complicated format if it cannot determine the browser's capabilities.
As I seem to have said a lot recently the control of plug-ins etc. is handled in Windows by Internet Options, a separation of powers which is intended to make it impossible for a programmer to interfere with the user's personal choices. That means that there is no way (or at least none that I know of) of changing settings on the fly (which, on balance, is probably a good thing!)
As the 6-dimensional mathematics professor said to the brain surgeon, "It ain't Rocket Science!"
Reviews: "dunfiddlin likes his DataTables" - jmcilhinney
Please be aware that whilst I will read private messages (one day!) I am unlikely to reply to anything that does not contain offers of cash, fame or marriage!
-
Jul 21st, 2013, 08:19 PM
#9
Thread Starter
Hyperactive Member
Re: Disable adobe reader plugin in Webbrowser
Is there some setting in either internet explorer or adobe reader that I can change to handle this?
If my post has been helpful, please rate it! 
-
Jul 21st, 2013, 09:39 PM
#10
Addicted Member
Re: Disable adobe reader plugin in Webbrowser
It would be impossible to scrap a pdf page anyways? it's not HTML.. its like lets say a exe file you try to open that in internet explorer you'll get a bunch of symbols.
if you open a pdf page in internet explorer with view-source:http://to.com/file.pdf what do you get?
Thats the answer too.. just use
Code:
view-source:http://www.bapio.co.uk/uploads/publications/1342172154.pdf
if the file extension is PDF instead of html/php etc...
Here is what your scrapper will see, it's not HTML code
Code:
%PDF-1.2
%âãÏÓ
9 0 obj
<<
/Length 10 0 R
/Filter /FlateDecode
>>
stream
H‰ÍÑJÃ0†Ÿ ïð{§²fç$M“ínÒ-‚[&jeŠâÛÛ¤ñ~‚$ÉÉÿ}ÉÉ…¬Ij«¬ÌsÀ—‚Ç~€XÖ-],÷‚$Y—÷Ó)ü'N«u*1!œ„ÀVÙ?ŸÁ?
žb1RbbœÒ‰ÉH²[¹™TD:#ž&Ø*ÙÌX®¦øiç»$qnf¬ƒ¿†¶]»ÀõËîãaÿ¶{ÿÂØ£‰›×q|JªLs]™QÒI¸¬jî„%¯Œ9Øé`ß঺¼ÅU»itezÛ$›’Ú¿OeBÆÄ’Ò¯á¸Råþ@zÜ—úóÿgª¼ø<õ¡ª
endstream
endobj
10 0 obj
246
endobj
4 0 obj
<<
/Type /Page
/Parent 5 0 R
/Resources <<
/Font <<
/F0 6 0 R
/F1 7 0 R
>>
/ProcSet 2 0 R
>>
/Contents 9 0 R
-
Jul 22nd, 2013, 12:35 AM
#11
Thread Starter
Hyperactive Member
Re: Disable adobe reader plugin in Webbrowser
 Originally Posted by sspoke
It would be impossible to scrap a pdf page anyways? it's not HTML.. its like lets say a exe file you try to open that in internet explorer you'll get a bunch of symbols.
if you open a pdf page in internet explorer with view-source: http://to.com/file.pdf what do you get?
Thats the answer too.. just use
Code:
view-source:http://www.bapio.co.uk/uploads/publications/1342172154.pdf
if the file extension is PDF instead of html/php etc...
Here is what your scrapper will see, it's not HTML code
Code:
%PDF-1.2
%âãÏÓ
9 0 obj
<<
/Length 10 0 R
/Filter /FlateDecode
>>
stream
H‰ÍÑJÃ0†Ÿ ïð{§²fç$M“ínÒ-‚[&jeŠâÛÛ¤ñ~‚$ÉÉÿ}ÉÉ…¬Ij«¬ÌsÀ—‚Ç~€XÖ-],÷‚$Y—÷Ó)ü'N«u*1!œ„ÀVÙ?ŸÁ?
žb1RbbœÒ‰ÉH²[¹™TD:#ž&Ø*ÙÌX®¦øiç»$qnf¬ƒ¿†¶]»ÀõËîãaÿ¶{ÿÂØ£‰›×q|JªLs]™QÒI¸¬jî„%¯Œ9Øé`ß঺¼ÅU»itezÛ$›’Ú¿OeBÆÄ’Ò¯á¸Råþ@zÜ—úóÿgª¼ø<õ¡ª
endstream
endobj
10 0 obj
246
endobj
4 0 obj
<<
/Type /Page
/Parent 5 0 R
/Resources <<
/Font <<
/F0 6 0 R
/F1 7 0 R
>>
/ProcSet 2 0 R
>>
/Contents 9 0 R
I don't just mean the actual file, but rather an embedded PDF viewer within the webpage. I'm guessing their code has something to detect whether or not the PDF viewer plugin is enabled, and then feeds results either in HTML as text or through PDF viewer as an embedded PDF.
If my post has been helpful, please rate it! 
-
Jul 22nd, 2013, 01:27 AM
#12
Addicted Member
Re: Disable adobe reader plugin in Webbrowser
Well best you can do is when WebBrowser one is done loading when DocumentCompleted Event is fired do
webBrowser1.Stop()
it may cancel the pdf viewer from loading
Or the iframe which contains the pdf? just delete the iframe and problem is solved.. just detect if the iframe has a pdf first..
Here is a code that removes all iframes.
Code:
For Each x As HtmlElement In DirectCast(sender, WebBrowser).Document.GetElementsByTagName("iframe")
x.OuterHtml = String.Empty
Next
but if it's not iframe but instead embed you can try
Code:
For Each x As HtmlElement In DirectCast(sender, WebBrowser).Document.GetElementsByTagName("embed")
x.OuterHtml = String.Empty
//or
//x.SetAttribute("src", String.Empty)
Next
in chrome its like this normally
Code:
<embed width="100%" height="100%" name="plugin" src="http://example.com/pdf.pdf" type="application/pdf">
Last edited by sspoke; Jul 22nd, 2013 at 01:54 AM.
-
Jul 22nd, 2013, 11:46 AM
#13
Re: Disable adobe reader plugin in Webbrowser
 Originally Posted by JXDOS
Is there some setting in either internet explorer or adobe reader that I can change to handle this?
Was I not clear?
the control of plug-ins etc. is handled
entirely, solely and exclusively
in Windows by Internet Options
Better?
As the 6-dimensional mathematics professor said to the brain surgeon, "It ain't Rocket Science!"
Reviews: "dunfiddlin likes his DataTables" - jmcilhinney
Please be aware that whilst I will read private messages (one day!) I am unlikely to reply to anything that does not contain offers of cash, fame or marriage!
-
Jul 23rd, 2013, 12:19 AM
#14
Thread Starter
Hyperactive Member
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|