[RESOLVED] Disable adobe reader plugin in Webbrowser
Hi all,
I am currently trying to program a web scrapping program, which should be working fine if I searched using internet explorer or chrome - since the results would appear as text (html). However, when I try to load it in VB.NET 2010, it always loads it with the adobe reader plugin embedded within the webpage (pdf). Is there a way I can disable this plugin or tell the webbrowser it does not have this plugin?
Thanks in advance
Re: Disable adobe reader plugin in Webbrowser
Is there a particular reason why you are using the browser control if you are simply scrapping the site?
Re: Disable adobe reader plugin in Webbrowser
Quote:
Originally Posted by
ident
Is there a particular reason why you are using the browser control if you are simply scrapping the site?
Yes, because I am using the document.innertext function to avoid some messy highlighting/link code from the html source.
Re: Disable adobe reader plugin in Webbrowser
Yes but why don't you simply use the webclient class?
Re: Disable adobe reader plugin in Webbrowser
Sorry, what do you mean? Can you give an example? I think its because my program performs a search that involves extracting links from the first results page, browsing through these links and scrapping the results from the second set of pages. Hence the second set of results is dependent on the links scrapped from the first, rather then a stationary set of urls that can be scrapped using the webclient class.
Not sure if the above makes sense? O.o Sorry for the confusion.
Re: Disable adobe reader plugin in Webbrowser
A web browser is UI element. You are not using it as such. If all you want is the pages html then use the webclient class.
vb Code:
Public Class Form1
Private Sub Form1_Load(ByVal sender As System.Object,
ByVal e As System.EventArgs) Handles MyBase.Load
Dim source As String = Nothing
Using wClient As New Net.WebClient
Try
source = wClient.DownloadString(New Uri("url"))
Catch ex As Net.WebException
MessageBox.Show(ex.Message)
End Try
End Using
' do what ever with the pages source....
End Sub
End Class
You would want to download the page using the async method. This will block the calling thread. But it's enough to give you an idea.
Re: Disable adobe reader plugin in Webbrowser
Ye, there is a few issues with using that. The main thing is that I have an existing code in place that uses the browser, mainly because the online database requires a login password on first search.
The second issue is that I need to inject variables systematically (names and dates) to perform the search on the website which will give me a list of links. I then browse through the second list of links to extract the WebBrowser1.Document.Body.InnerText only, as an easier way to capture the extract wanted in a readable format without all the html code bits left.
I don't know why the results show in a pdf reader when used in my program but not through any other browsers.
Re: Disable adobe reader plugin in Webbrowser
Quote:
I am currently trying to program a web scrapping program
How very Luddite!
Quote:
I don't know why the results show in a pdf reader when used in my program but not through any other browsers.
I suspect that you need to find out before any progress can be made. It could be a browser recognition problem. Unlike Internet Explorer the browser does not announce itself as an advanced browser. Having said that, it seems a little counter-intuitive for the site to default to the more complicated format if it cannot determine the browser's capabilities.
As I seem to have said a lot recently the control of plug-ins etc. is handled in Windows by Internet Options, a separation of powers which is intended to make it impossible for a programmer to interfere with the user's personal choices. That means that there is no way (or at least none that I know of) of changing settings on the fly (which, on balance, is probably a good thing!)
Re: Disable adobe reader plugin in Webbrowser
Is there some setting in either internet explorer or adobe reader that I can change to handle this?
Re: Disable adobe reader plugin in Webbrowser
It would be impossible to scrap a pdf page anyways? it's not HTML.. its like lets say a exe file you try to open that in internet explorer you'll get a bunch of symbols.
if you open a pdf page in internet explorer with view-source:http://to.com/file.pdf what do you get?
Thats the answer too.. just use
Code:
view-source:http://www.bapio.co.uk/uploads/publications/1342172154.pdf
if the file extension is PDF instead of html/php etc...
Here is what your scrapper will see, it's not HTML code
Code:
%PDF-1.2
%âãÏÓ
9 0 obj
<<
/Length 10 0 R
/Filter /FlateDecode
>>
stream
H‰ÍÑJÃ0†Ÿ ïð{§²fç$M“ínÒ-‚[&jeŠâÛÛ¤ñ~‚$ÉÉÿ}ÉÉ…¬Ij«¬ÌsÀ—‚Ç~€XÖ-],÷‚$Y—÷Ó)ü'N«u*1!œ„ÀVÙ?ŸÁ?
žb1RbbœÒ‰ÉH²[¹™TD:#ž&Ø*ÙÌX®¦øiç»$qnf¬ƒ¿†¶]»ÀõËîãaÿ¶{ÿÂØ£‰›×q|JªLs]™QÒI¸¬jî„%¯Œ9Øé`ß঺¼ÅU»itezÛ$›’Ú¿OeBÆÄ’Ò¯á¸Råþ@zÜ—úóÿgª¼ø<õ¡ª
endstream
endobj
10 0 obj
246
endobj
4 0 obj
<<
/Type /Page
/Parent 5 0 R
/Resources <<
/Font <<
/F0 6 0 R
/F1 7 0 R
>>
/ProcSet 2 0 R
>>
/Contents 9 0 R
Re: Disable adobe reader plugin in Webbrowser
Quote:
Originally Posted by
sspoke
It would be impossible to scrap a pdf page anyways? it's not HTML.. its like lets say a exe file you try to open that in internet explorer you'll get a bunch of symbols.
if you open a pdf page in internet explorer with view-source:
http://to.com/file.pdf what do you get?
Thats the answer too.. just use
Code:
view-source:http://www.bapio.co.uk/uploads/publications/1342172154.pdf
if the file extension is PDF instead of html/php etc...
Here is what your scrapper will see, it's not HTML code
Code:
%PDF-1.2
%âãÏÓ
9 0 obj
<<
/Length 10 0 R
/Filter /FlateDecode
>>
stream
H‰ÍÑJÃ0†Ÿ ïð{§²fç$M“ínÒ-‚[&jeŠâÛÛ¤ñ~‚$ÉÉÿ}ÉÉ…¬Ij«¬ÌsÀ—‚Ç~€XÖ-],÷‚$Y—÷Ó)ü'N«u*1!œ„ÀVÙ?ŸÁ?
žb1RbbœÒ‰ÉH²[¹™TD:#ž&Ø*ÙÌX®¦øiç»$qnf¬ƒ¿†¶]»ÀõËîãaÿ¶{ÿÂØ£‰›×q|JªLs]™QÒI¸¬jî„%¯Œ9Øé`ß঺¼ÅU»itezÛ$›’Ú¿OeBÆÄ’Ò¯á¸Råþ@zÜ—úóÿgª¼ø<õ¡ª
endstream
endobj
10 0 obj
246
endobj
4 0 obj
<<
/Type /Page
/Parent 5 0 R
/Resources <<
/Font <<
/F0 6 0 R
/F1 7 0 R
>>
/ProcSet 2 0 R
>>
/Contents 9 0 R
I don't just mean the actual file, but rather an embedded PDF viewer within the webpage. I'm guessing their code has something to detect whether or not the PDF viewer plugin is enabled, and then feeds results either in HTML as text or through PDF viewer as an embedded PDF.
Re: Disable adobe reader plugin in Webbrowser
Well best you can do is when WebBrowser one is done loading when DocumentCompleted Event is fired do
webBrowser1.Stop()
it may cancel the pdf viewer from loading
Or the iframe which contains the pdf? just delete the iframe and problem is solved.. just detect if the iframe has a pdf first..
Here is a code that removes all iframes.
Code:
For Each x As HtmlElement In DirectCast(sender, WebBrowser).Document.GetElementsByTagName("iframe")
x.OuterHtml = String.Empty
Next
but if it's not iframe but instead embed you can try
Code:
For Each x As HtmlElement In DirectCast(sender, WebBrowser).Document.GetElementsByTagName("embed")
x.OuterHtml = String.Empty
//or
//x.SetAttribute("src", String.Empty)
Next
in chrome its like this normally
Code:
<embed width="100%" height="100%" name="plugin" src="http://example.com/pdf.pdf" type="application/pdf">
Re: Disable adobe reader plugin in Webbrowser
Quote:
Originally Posted by
JXDOS
Is there some setting in either internet explorer or adobe reader that I can change to handle this?
Was I not clear?
Quote:
the control of plug-ins etc. is handled
entirely, solely and exclusively
Quote:
in Windows by Internet Options
Better?
Re: Disable adobe reader plugin in Webbrowser
Thanks for the help guys. sspoke's solution seems to do the trick :)