Results 1 to 9 of 9

Thread: Help needed translating c# code

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Sep 2018
    Posts
    160

    Help needed translating c# code

    Hi

    I've been trying to translate some example itext code into vb.net for a few days now.

    Link is here

    The code is to grab text from a specified area of a pdf and only if it's a specified font. I'm trying to use this to separate out some text that has some other text beneath it in a different font.

    The code on the page is available in c# and java, but I only know vb.net, so I've had to copy and paste into online translators like carloslag and that's taken me to the point where I can extract by font, but the specified area is being ignored and I'm getting all the text in the pdf with the specified font.

    I could post what I've done so far, but I thought it might be better for someone to look at this from scratch. If it would help to post what I've done I'll happily do so.

    Please help - I'm really struggling with this as I don't have sufficient knowledge of either c# OR itext!

    The c# code is below.

    Code:
    using System;
    using System.IO;
    using iText.Kernel.Font;
    using iText.Kernel.Geom;
    using iText.Kernel.Pdf;
    using iText.Kernel.Pdf.Canvas.Parser;
    using iText.Kernel.Pdf.Canvas.Parser.Data;
    using iText.Kernel.Pdf.Canvas.Parser.Filter;
    using iText.Kernel.Pdf.Canvas.Parser.Listener;
    
    namespace iText.Samples.Sandbox.Parse
    {
        public class ParseCustom
        {
            public static readonly String DEST = "results/txt/parse_custom.txt";
    
            public static readonly String SRC = "../../../resources/pdfs/nameddestinations.pdf";
    
            public static void Main(String[] args)
            {
                FileInfo file = new FileInfo(DEST);
                file.Directory.Create();
    
                new ParseCustom().ManipulatePdf(DEST);
            }
    
            public virtual void ManipulatePdf(String dest)
            {
                PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC));
    
                Rectangle rect = new Rectangle(36, 750, 523, 56);
                CustomFontFilter fontFilter = new CustomFontFilter(rect);
                FilteredEventListener listener = new FilteredEventListener();
    
                // Create a text extraction renderer
                LocationTextExtractionStrategy extractionStrategy = listener
                    .AttachEventListener(new LocationTextExtractionStrategy(), fontFilter);
    
                // Note: If you want to re-use the PdfCanvasProcessor, you must call PdfCanvasProcessor.reset()
                new PdfCanvasProcessor(listener).ProcessPageContent(pdfDoc.GetFirstPage());
    
                // Get the resultant text after applying the custom filter
                String actualText = extractionStrategy.GetResultantText();
    
                pdfDoc.Close();
    
                // See the resultant text in the console
                Console.Out.WriteLine(actualText);
    
                using (StreamWriter writer = new StreamWriter(dest))
                {
                    writer.Write(actualText);
                }
            }
    
            // The custom filter filters only the text of which the font name ends with Bold or Oblique.
            protected class CustomFontFilter : TextRegionEventFilter
            {
                public CustomFontFilter(Rectangle filterRect)
                    : base(filterRect)
                {
                }
    
                public override bool Accept(IEventData data, EventType type)
                {
                    if (type.Equals(EventType.RENDER_TEXT))
                    {
                        TextRenderInfo renderInfo = (TextRenderInfo) data;
                        PdfFont font = renderInfo.GetFont();
                        if (null != font)
                        {
                            String fontName = font.GetFontProgram().GetFontNames().GetFontName();
                            return fontName.EndsWith("Bold") || fontName.EndsWith("Oblique");
                        }
                    }
    
                    return false;
                }
            }
        }
    }
    Thanks

  2. #2
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: Help needed translating c# code

    Don't use online translators. Download Instant VB from Tangible Software Solutions. Once you've done the conversion, compare the C# and VB code to see how similar they are and where the specific differences are, so you will know what to look for next time.
    Last edited by jmcilhinney; Mar 16th, 2021 at 11:39 AM.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  3. #3

    Thread Starter
    Addicted Member
    Join Date
    Sep 2018
    Posts
    160

    Re: Help needed translating c# code

    Thanks for the tip re Instant VB - it seems to work much better than the online translators.

    Unfortunately though the resulting code still seems to ignore the filter area - 'Rectangle(36, 750, 523, 56)' - and pulls out all the text across the whole pdf.

    I'll keep trying to suss out what's wrong - maybe something I've done is causing an issue somehow.

  4. #4

    Thread Starter
    Addicted Member
    Join Date
    Sep 2018
    Posts
    160

    Re: Help needed translating c# code

    OK I'm still scratching my head as to why this won't work. Below is the code from Instant VB - I've just changed the input/output file names.

    I've attached my test file here: test page.pdf

    .... which is just some text spread over the page and some lines which say 'THIS TEXT SHOULD NOT BE READ' which are in Arial which I've used for testing the font filter.

    When I run the below I get the whole page of text, not just a region.

    Any suggestions as to what I'm missing gratefully received! - I get the feeling it must be something minor but I'm stumped.

    Code:
    Imports System
    Imports System.IO
    Imports iText.Kernel.Font
    Imports iText.Kernel.Geom
    Imports iText.Kernel.Pdf
    Imports iText.Kernel.Pdf.Canvas.Parser
    Imports iText.Kernel.Pdf.Canvas.Parser.Data
    Imports iText.Kernel.Pdf.Canvas.Parser.Filter
    Imports iText.Kernel.Pdf.Canvas.Parser.Listener
    
    Namespace iText.Samples.Sandbox.Parse
    	Public Class ParseCustom
    
    		Public Shared ReadOnly DEST As String = "C:\test\output.xt"
    
    		Public Shared ReadOnly SRC As String = "C:\test\test page.pdf"
    
    		Public Shared Sub Main(ByVal args() As String)
    			Dim file As New FileInfo(DEST)
    			file.Directory.Create()
    
    			Call (New ParseCustom()).ManipulatePdf(DEST)
    		End Sub
    
    		Public Overridable Sub ManipulatePdf(ByVal dest As String)
    			Dim pdfDoc As New PdfDocument(New PdfReader(SRC))
    
    			Dim rect As New Rectangle(36, 750, 523, 56)
    
    			Dim fontFilter As New CustomFontFilter(rect)
    			Dim listener As New FilteredEventListener()
    
    			' Create a text extraction renderer
    			Dim extractionStrategy As LocationTextExtractionStrategy = listener.AttachEventListener(New LocationTextExtractionStrategy(), fontFilter)
    
    			' Note: If you want to re-use the PdfCanvasProcessor, you must call PdfCanvasProcessor.reset()
    			Call (New PdfCanvasProcessor(listener)).ProcessPageContent(pdfDoc.GetFirstPage())
    
    			' Get the resultant text after applying the custom filter
    			Dim actualText As String = extractionStrategy.GetResultantText()
    
    			pdfDoc.Close()
    
    			' See the resultant text in the console
    			Console.Out.WriteLine(actualText)
    
    			Using writer As New StreamWriter(dest)
    				writer.Write(actualText)
    			End Using
    		End Sub
    
    		' The custom filter filters only the text of which the font name ends with Calibri.
    		Protected Class CustomFontFilter
    			Inherits TextRegionEventFilter
    
    			Public Sub New(ByVal filterRect As Rectangle)
    				MyBase.New(filterRect)
    			End Sub
    
    			Public Overrides Function Accept(ByVal data As IEventData, ByVal type As EventType) As Boolean
    				If type.Equals(EventType.RENDER_TEXT) Then
    					Dim renderInfo As TextRenderInfo = DirectCast(data, TextRenderInfo)
    					Dim font As PdfFont = renderInfo.GetFont()
    					If Nothing IsNot font Then
    						Dim fontName As String = font.GetFontProgram().GetFontNames().GetFontName()
    						Return fontName.EndsWith("Calibri")
    					End If
    				End If
    
    				Return False
    			End Function
    		End Class
    	End Class
    End Namespace

  5. #5
    Frenzied Member
    Join Date
    Jul 2011
    Location
    UK
    Posts
    1,335

    Re: Help needed translating c# code

    The problem lies in the overridden Attach method.

    As it stands, it returns True if text is about to be rendered using the Calibri Font, and False for all other circumstances.

    Returning True allows the processing to continue, so all the callibri text is processed, but returning False stops further processing so whilst the text in Arial font is not rendered, the clipping rectangle is also ignored.

    What it should be doing is returning False when text is about to be rendered in a font that is NOT Calibri (so rendering of that text does not happen). For all other circumstances, the base class's Accept method should be called to check if processing should be allowed to continue according to any other filters in effect, and then the result of that call to the base method is what needs to be returned by the overridden method:
    Code:
    Public Overrides Function Accept(ByVal data As IEventData, ByVal type As EventType) As Boolean
    
    	'   ignore all text rendering where the Font is not Calibri
    	If type.Equals(EventType.RENDER_TEXT) Then
    		Dim renderInfo As TextRenderInfo = DirectCast(data, TextRenderInfo)
    
    		Dim font As PdfFont = renderInfo.GetFont()
    		If font IsNot Nothing Then
    			Dim fontName As String = font.GetFontProgram().GetFontNames().GetFontName()
    			If Not fontName.EndsWith("Calibri") Then
    				'   font is not Calibri so
    				'   do not continue processing this TEXT RENDER event
    				Return False
    			End If
    		End If
    
    	End If
    
    	'   check if the base class allows processing of everything else
    	Return MyBase.Accept(data, type)
    End Function
    Last edited by Inferrd; Mar 16th, 2021 at 03:52 PM.

  6. #6

    Thread Starter
    Addicted Member
    Join Date
    Sep 2018
    Posts
    160

    Re: Help needed translating c# code

    You nailed it! - just tried out your code and it works just fine.

    Many, many thanks for taking the time to explain it - I'll learn from this.

    Do you have a favourite charity? - I feel like I should make a small donation on your behalf as getting this right was really important to me!

  7. #7
    Frenzied Member
    Join Date
    Jul 2011
    Location
    UK
    Posts
    1,335

    Re: Help needed translating c# code

    Glad it worked for you. I'm happy to help when I'm able to, just as others have helped me. Always been an advocate for "Pay it Forward"

  8. #8

    Thread Starter
    Addicted Member
    Join Date
    Sep 2018
    Posts
    160

    Re: Help needed translating c# code

    That's a great attitude, and since I spent so long banging my head against the wall on this....




  9. #9
    Frenzied Member
    Join Date
    Jul 2011
    Location
    UK
    Posts
    1,335

    Re: Help needed translating c# code

    Nice. My cat died 2 years ago and I still miss her deeply, so I thank you

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width