[RESOLVED] Disable adobe reader plugin in Webbrowser

**JXDOS** · Jul 21st, 2013, 12:26 AM

Hi all,

I am currently trying to program a web scrapping program, which should be working fine if I searched using internet explorer or chrome - since the results would appear as text (html). However, when I try to load it in VB.NET 2010, it always loads it with the adobe reader plugin embedded within the webpage (pdf). Is there a way I can disable this plugin or tell the webbrowser it does not have this plugin?

Thanks in advance

**ident** · Jul 21st, 2013, 06:24 AM

Is there a particular reason why you are using the browser control if you are simply scrapping the site?

**JXDOS** · Jul 21st, 2013, 06:56 AM

Originally Posted by ident

Is there a particular reason why you are using the browser control if you are simply scrapping the site?

Yes, because I am using the document.innertext function to avoid some messy highlighting/link code from the html source.

**ident** · Jul 21st, 2013, 07:00 AM

Yes but why don't you simply use the webclient class?

**JXDOS** · Jul 21st, 2013, 07:07 AM

Sorry, what do you mean? Can you give an example? I think its because my program performs a search that involves extracting links from the first results page, browsing through these links and scrapping the results from the second set of pages. Hence the second set of results is dependent on the links scrapped from the first, rather then a stationary set of urls that can be scrapped using the webclient class.

Not sure if the above makes sense? O.o Sorry for the confusion.

**ident** · Jul 21st, 2013, 07:14 AM

A web browser is UI element. You are not using it as such. If all you want is the pages html then use the webclient class.

vb Code:

Public Class Form1
 
    Private Sub Form1_Load(ByVal sender As System.Object,
                           ByVal e As System.EventArgs) Handles MyBase.Load
        Dim source As String = Nothing
        Using wClient As New Net.WebClient
            Try
                source = wClient.DownloadString(New Uri("url"))
            Catch ex As Net.WebException
                MessageBox.Show(ex.Message)
            End Try
        End Using
 
        ' do what ever with the pages source....
    End Sub
End Class

You would want to download the page using the async method. This will block the calling thread. But it's enough to give you an idea.

**JXDOS** · Jul 21st, 2013, 07:57 AM

Ye, there is a few issues with using that. The main thing is that I have an existing code in place that uses the browser, mainly because the online database requires a login password on first search.

The second issue is that I need to inject variables systematically (names and dates) to perform the search on the website which will give me a list of links. I then browse through the second list of links to extract the WebBrowser1.Document.Body.InnerText only, as an easier way to capture the extract wanted in a readable format without all the html code bits left.

I don't know why the results show in a pdf reader when used in my program but not through any other browsers.

**dunfiddlin** · Jul 21st, 2013, 11:16 AM

I am currently trying to program a web scrapping program

How very Luddite!

I don't know why the results show in a pdf reader when used in my program but not through any other browsers.

I suspect that you need to find out before any progress can be made. It could be a browser recognition problem. Unlike Internet Explorer the browser does not announce itself as an advanced browser. Having said that, it seems a little counter-intuitive for the site to default to the more complicated format if it cannot determine the browser's capabilities.

As I seem to have said a lot recently the control of plug-ins etc. is handled in Windows by Internet Options, a separation of powers which is intended to make it impossible for a programmer to interfere with the user's personal choices. That means that there is no way (or at least none that I know of) of changing settings on the fly (which, on balance, is probably a good thing!)

**JXDOS** · Jul 21st, 2013, 08:19 PM

Is there some setting in either internet explorer or adobe reader that I can change to handle this?

**sspoke** · Jul 21st, 2013, 09:39 PM

It would be impossible to scrap a pdf page anyways? it's not HTML.. its like lets say a exe file you try to open that in internet explorer you'll get a bunch of symbols.

if you open a pdf page in internet explorer with view-source:http://to.com/file.pdf what do you get?

Thats the answer too.. just use

Code:

view-source:http://www.bapio.co.uk/uploads/publications/1342172154.pdf

if the file extension is PDF instead of html/php etc...

Here is what your scrapper will see, it's not HTML code

Code:

%PDF-1.2 
%âãÏÓ
 
9 0 obj
<<
/Length 10 0 R
/Filter /FlateDecode 
>>
stream
H‰ÍÑJÃ0†Ÿ ïð{§²fç$M“ínÒ-‚[&jeŠâÛÛ¤ñ~‚$ÉÉÿ}ÉÉ…¬Ij«¬ÌsÀ—‚Ç~€XÖ-],÷‚$Y—÷Ó)ü'N«u*1!œ„ÀVÙ?ŸÁ?
žb1RbbœÒ‰ÉH²[¹™TD:#ž&Ø*ÙÌX®¦øiç»$qnf¬ƒ¿†¶]»ÀõËîãaÿ¶{ÿÂØ£‰›×q|JªLs]™QÒI¸¬jî„%¯Œ9Øé`ßà¦º¼ÅU»itezÛ$›’Ú¿OeBÆÄ’Ò¯á¸Råþ@zÜ—úóÿgª¼ø<õ¡ª
endstream
endobj
10 0 obj
246
endobj
4 0 obj
<<
/Type /Page
/Parent 5 0 R
/Resources <<
/Font <<
/F0 6 0 R 
/F1 7 0 R 
>>
/ProcSet 2 0 R
>>
/Contents 9 0 R

**JXDOS** · Jul 22nd, 2013, 12:35 AM

Originally Posted by sspoke

It would be impossible to scrap a pdf page anyways? it's not HTML.. its like lets say a exe file you try to open that in internet explorer you'll get a bunch of symbols.

if you open a pdf page in internet explorer with view-source:http://to.com/file.pdf what do you get?

Thats the answer too.. just use

Code:

view-source:http://www.bapio.co.uk/uploads/publications/1342172154.pdf

if the file extension is PDF instead of html/php etc...

Here is what your scrapper will see, it's not HTML code

Code:

%PDF-1.2 
%âãÏÓ
 
9 0 obj
<<
/Length 10 0 R
/Filter /FlateDecode 
>>
stream
H‰ÍÑJÃ0†Ÿ ïð{§²fç$M“ínÒ-‚[&jeŠâÛÛ¤ñ~‚$ÉÉÿ}ÉÉ…¬Ij«¬ÌsÀ—‚Ç~€XÖ-],÷‚$Y—÷Ó)ü'N«u*1!œ„ÀVÙ?ŸÁ?
žb1RbbœÒ‰ÉH²[¹™TD:#ž&Ø*ÙÌX®¦øiç»$qnf¬ƒ¿†¶]»ÀõËîãaÿ¶{ÿÂØ£‰›×q|JªLs]™QÒI¸¬jî„%¯Œ9Øé`ßà¦º¼ÅU»itezÛ$›’Ú¿OeBÆÄ’Ò¯á¸Råþ@zÜ—úóÿgª¼ø<õ¡ª
endstream
endobj
10 0 obj
246
endobj
4 0 obj
<<
/Type /Page
/Parent 5 0 R
/Resources <<
/Font <<
/F0 6 0 R 
/F1 7 0 R 
>>
/ProcSet 2 0 R
>>
/Contents 9 0 R

I don't just mean the actual file, but rather an embedded PDF viewer within the webpage. I'm guessing their code has something to detect whether or not the PDF viewer plugin is enabled, and then feeds results either in HTML as text or through PDF viewer as an embedded PDF.

**sspoke** · Jul 22nd, 2013, 01:27 AM

Well best you can do is when WebBrowser one is done loading when DocumentCompleted Event is fired do
webBrowser1.Stop()

it may cancel the pdf viewer from loading

Or the iframe which contains the pdf? just delete the iframe and problem is solved.. just detect if the iframe has a pdf first..
Here is a code that removes all iframes.

Code:

For Each x As HtmlElement In DirectCast(sender, WebBrowser).Document.GetElementsByTagName("iframe")
	x.OuterHtml = String.Empty
Next

but if it's not iframe but instead embed you can try

Code:

For Each x As HtmlElement In DirectCast(sender, WebBrowser).Document.GetElementsByTagName("embed")
	x.OuterHtml = String.Empty
       //or
       //x.SetAttribute("src", String.Empty)
Next

in chrome its like this normally

Code:

<embed width="100%" height="100%" name="plugin" src="http://example.com/pdf.pdf" type="application/pdf">

**dunfiddlin** · Jul 22nd, 2013, 11:46 AM

Originally Posted by JXDOS

Is there some setting in either internet explorer or adobe reader that I can change to handle this?

Was I not clear?

the control of plug-ins etc. is handled

entirely, solely and exclusively

in Windows by Internet Options

Better?

**JXDOS** · Jul 23rd, 2013, 12:19 AM

Thanks for the help guys. sspoke's solution seems to do the trick

Thread: [RESOLVED] Disable adobe reader plugin in Webbrowser

Thread Tools

Display

[RESOLVED] Disable adobe reader plugin in Webbrowser

Re: Disable adobe reader plugin in Webbrowser

Re: Disable adobe reader plugin in Webbrowser

Re: Disable adobe reader plugin in Webbrowser

Re: Disable adobe reader plugin in Webbrowser

Re: Disable adobe reader plugin in Webbrowser

Re: Disable adobe reader plugin in Webbrowser

Re: Disable adobe reader plugin in Webbrowser

Re: Disable adobe reader plugin in Webbrowser

Re: Disable adobe reader plugin in Webbrowser

Re: Disable adobe reader plugin in Webbrowser

Re: Disable adobe reader plugin in Webbrowser

Re: Disable adobe reader plugin in Webbrowser

Re: Disable adobe reader plugin in Webbrowser

Tags for this Thread

Posting Permissions