dcsimg
Results 1 to 10 of 10

Thread: Copy to Clipboard as Unicode and Html Form

  1. #1

    Thread Starter
    Fanatic Member
    Join Date
    May 2014
    Location
    Preveza Greece
    Posts
    948

    Copy to Clipboard as Unicode and Html Form

    Working for M2000 Interpreter I found this https://support.microsoft.com/en-us/kb/274326
    For copy text to Html, but without using utf-8 (but works for english because utf-8 has one byte for English language). So I do the job to make this to send text in utf-8 format, so it can be used for export colored text, or in other format, and we can paste this to an office application like Word or in a Blog (in blogspot, as I do for my Intertpeter, M2000)
    Put this in a Module and call TestThis from Immediate Mode.
    I also include two helpers, the SpellUnicode which get a string and give a string of parameters. These parameters are for ListenUnicode which convert back to unicode string. Is the only way to pass unicode strings in a Module file (without using external file or a resource like .res file).

    Enjoy it

    Code:
    Private Declare Function RegisterClipboardFormat Lib "user32" Alias _
       "RegisterClipboardFormatA" (ByVal lpString As String) As Long
    Private m_cfHTMLClipFormat As Long
    Private Const Utf8CodePage As Long = 65001
    Private Declare Function WideCharToMultiByte Lib "kernel32" (ByVal codepage As Long, ByVal dwFlags As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpDefaultChar As Long, ByVal lpUsedDefaultChar As Long) As Long
    Private Declare Function MultiByteToWideChar& Lib "kernel32" (ByVal codepage&, ByVal dwFlags&, MultiBytes As Any, ByVal cBytes&, ByVal pWideChars&, ByVal cWideChars&)
    Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (lpvDest As Any, lpvSource As Any, ByVal cbCopy As Long)
    Private Declare Function GlobalAlloc Lib "kernel32" (ByVal wFlags As Long, ByVal dwBytes As Long) As Long
    Private Declare Function GlobalFree Lib "kernel32" (ByVal hMem As Long) As Long
    Private Declare Function GlobalLock Lib "kernel32" (ByVal hMem As Long) As Long
    Private Declare Function GlobalUnlock Lib "kernel32" (ByVal hMem As Long) As Long
    Private Declare Function OpenClipboard Lib "user32" (ByVal hWnd As Long) As Long
    Private Declare Function CloseClipboard Lib "user32" () As Long
    Private Declare Function EmptyClipboard Lib "user32" () As Long
    Private Declare Function SetClipboardData Lib "user32" (ByVal wFormat As Long, ByVal hMem As Long) As Long
    
    Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
    ' here is the sub for send text to clipboard as unicode and as Html Format -utf8
    Public Sub TestThis()
    Copy2Clipboard ListenUnicode(915, 953, 974, 961, 947, 959, 962, 32, 922, 945, 961, 961, 940, 962) + vbCrLf + "Greetings from George Karras from West Greece"
    End Sub
    Public Sub Copy2Clipboard(ByVal unicodetext As String)
    Dim ph As String
    Clipboard.Clear  ' always
    DoEvents
    Sleep 10
    ph = PrepareHtml(unicodetext) ' here you have to prepare for html
    SimpleHtmlData ph
    SetTextData 13, unicodetext
    End Sub
    Function ReplaceStr(sStr As String, dStr As String, fromStr As String) As String
    '' Sory but i like this one, with source first
      ReplaceStr = Replace$(fromStr, sStr, dStr)
    End Function
    Private Function PrepareHtml(neodata As String) As String
    Dim A$
    ' WE DO SOME WORK TO PRESERVE FORMAT
    ' MAYBE IS NOT COMPLETE BUT IT IS A TRY
    A$ = ReplaceStr("</", Chr$(1) + Chr$(2), neodata)
    A$ = ReplaceStr("<", Chr$(3), A$)
    A$ = ReplaceStr(">", Chr$(4), A$)
    A$ = ReplaceStr("  ", Chr$(7) + Chr$(7), A$)
    A$ = ReplaceStr(Chr$(7) + " ", Chr$(7) + Chr$(7), A$)
    '' here you can process line by line and or embed tags
    A$ = "<FONT COLOR=blue>" + A$ + "</FONT>"
    
    A$ = ReplaceStr(Chr$(1) + Chr$(2), "&lt;⁄", A$)
    A$ = ReplaceStr(Chr$(3), "&lt;", A$)
    A$ = ReplaceStr(Chr$(4), "&gt;", A$)
    ' SO ALL SPACES ARE NOW NBSP IF ARE IN A SEQUENCE OF TWO OR MORE
    A$ = ReplaceStr(Chr$(7), "&nbsp;", A$)
    
    PrepareHtml = Replace(A$, vbCrLf, "<br>")  ' or you can use <p>
    End Function
    
    Public Function HTML(sText As String, _
    Optional sContextStart As String = "<HTML><BODY>", _
    Optional sContextEnd As String = "</BODY></HTML>") As Byte()
    ' part of this code from an example from Microsfot
        Dim m_sDescription As String
        m_sDescription = "Version:1.0" & vbCrLf & _
        "StartHTML:aaaaaaaaaa" & vbCrLf & _
        "EndHTML:bbbbbbbbbb" & vbCrLf & _
        "StartFragment:cccccccccc" & vbCrLf & _
        "EndFragment:dddddddddd" & vbCrLf
        Dim A() As Byte, b() As Byte, c() As Byte
        A() = Utf16toUtf8(sContextStart & "<!--StartFragment -->")
        b() = Utf16toUtf8(sText)
        c() = Utf16toUtf8("<!--EndFragment -->" & sContextEnd)
        Dim sData As String, mdata As Long, eData As Long, fData As Long
        eData = UBound(A()) - LBound(A()) + 1
        mdata = UBound(b()) - LBound(b()) + 1
        fData = UBound(c()) - LBound(c()) + 1
        m_sDescription = Replace(m_sDescription, "aaaaaaaaaa", Format(Len(m_sDescription), "0000000000"))
        m_sDescription = Replace(m_sDescription, "bbbbbbbbbb", Format(Len(m_sDescription) + eData + mdata + fData, "0000000000"))
        m_sDescription = Replace(m_sDescription, "cccccccccc", Format(Len(m_sDescription) + eData, "0000000000"))
        m_sDescription = Replace(m_sDescription, "dddddddddd", Format(Len(m_sDescription) + eData + mdata, "0000000000"))
        Dim all() As Byte, m() As Byte
        ReDim all(Len(m_sDescription) + eData + mdata + fData)
        m() = Utf16toUtf8(m_sDescription)
        CopyMemory all(0), m(0), Len(m_sDescription)
        CopyMemory all(Len(m_sDescription)), A(0), eData
        CopyMemory all(Len(m_sDescription) + eData), b(0), mdata
        CopyMemory all(Len(m_sDescription) + eData + mdata), c(0), fData
        HTML = all()
    End Function
    Function RegisterCF() As Long
    
    
       'Register the HTML clipboard format
       If (m_cfHTMLClipFormat = 0) Then
          m_cfHTMLClipFormat = RegisterClipboardFormat("HTML Format")
       End If
       RegisterCF = m_cfHTMLClipFormat
       
    End Function
    Public Function SimpleHtmlData(ByVal sText As String)
        Dim lFormatId As Long, bb() As Byte
        lFormatId = RegisterCF
        If lFormatId <> 0 Then
        If sText = "" Then Exit Function
        bb() = HTML(sText)
        If CBool(OpenClipboard(0)) Then
              Dim hMemHandle As Long, lpData As Long
              hMemHandle = GlobalAlloc(0, UBound(bb()) - LBound(bb()) + 10)
              If CBool(hMemHandle) Then
                 lpData = GlobalLock(hMemHandle)
                 If lpData <> 0 Then
                    CopyMemory ByVal lpData, bb(0), UBound(bb()) - LBound(bb())
                    GlobalUnlock hMemHandle
                    EmptyClipboard
                    SetClipboardData lFormatId, hMemHandle
                 End If
              End If
              Call CloseClipboard
           End If
    End If
    End Function
    Private Function SetTextData( _
            ByVal lFormatId As Long, _
            ByVal sText As String _
        ) As Boolean
        If lFormatId = 0 Then Exit Function
        Dim hMem As Long, lPtr As Long
        Dim lSize As Long
            lSize = LenB(sText)
        hMem = GlobalAlloc(0, lSize + 2)
    If (hMem > 0) Then
            lPtr = GlobalLock(hMem)
            CopyMemory ByVal lPtr, ByVal StrPtr(sText), lSize + 1
            GlobalUnlock hMem
           If (OpenClipboard(0) <> 0) Then
         SetClipboardData lFormatId, hMem
          CloseClipboard
          Else
          GlobalFree hMem
           End If
        End If
    End Function
    Public Function Utf16toUtf8(s As String) As Byte()
        ' code from vbforum
        ' UTF-8 returned to VB6 as a byte array (zero based) because it's pretty useless to VB6 as anything else.
        Dim iLen As Long
        Dim bbBuf() As Byte
        '
        iLen = WideCharToMultiByte(Utf8CodePage, 0, StrPtr(s), Len(s), 0, 0, 0, 0)
        ReDim bbBuf(0 To iLen - 1) ' Will be initialized as all &h00.
        iLen = WideCharToMultiByte(Utf8CodePage, 0, StrPtr(s), Len(s), VarPtr(bbBuf(0)), iLen, 0, 0)
        Utf16toUtf8 = bbBuf
    End Function
    Public Function SpellUnicode(A$)
    ' use spellunicode to get numbers in Immediate Mode ? SpellUnicode("Γιώργος Καρράς") 'Greek Letters
    ' and make a ListenUnicode...with numbers for input text
    ' You can see that if you have Arial Greek
    ' ? ListenUnicode(915,953,974,961,947,959,962,32,922,945,961,961,940,962)
    Dim b$, i As Long
    For i = 1 To Len(A$) - 1
    b$ = b$ & CStr(AscW(Mid$(A$, i, 1))) & ","
    Next i
    SpellUnicode = b$ & CStr(AscW(Right$(A$, 1)))
    End Function
    Public Function ListenUnicode(ParamArray aa() As Variant) As String
    Dim all$, i As Long
    For i = 0 To UBound(aa)
        all$ = all$ & ChrW(aa(i))
    Next i
    ListenUnicode = all$
    End Function

  2. #2
    PowerPoster
    Join Date
    Jul 2010
    Location
    NYC
    Posts
    2,244

    Re: Copy to Clipboard as Unicode and Html Form

    HTML is usually given in ANSI (I know at least Chrome and Firefox do); and there's no separate A/W like for other text clipboard formats... so, is your code posting HTML in Unicode format (UTF-8, UTF-16LE?) If so how could one tell if the html on the clipboard is ansi or unicode? It's important to know when you're receiving HTML. With the browsers if I use the unicode ptr->str functions I get all zeroes back, but I don't know if that should be relied on.

  3. #3

    Thread Starter
    Fanatic Member
    Join Date
    May 2014
    Location
    Preveza Greece
    Posts
    948

    Re: Copy to Clipboard as Unicode and Html Form

    fafalone
    Html Format as I understand is in utf-8 format, but ansi format and utf-8 is the same if you use English letters. I check my routine with Word and in ιExplorer 10 and Chrome in blogspot
    (I also write here http://georgekarras.blogspot.gr in Greek language, so I make this routine for Greek. You can see also that in my today post I have a paragraph in Finnish Language (I put there as an example, I didn't know that language).
    So Check it in Chrome (Works for me) and if you have an office like Ms Office check it in Word Processor.
    If you don't see my name in Greek language from example above just tell me (also I put the unicode part separate, so check if you get blue text)

    fafalone
    In a second reading...I suppose that you mean to get code from chrome. No my routine is for copy from your vb program to chrome or word or any other "target" where we can put html code.
    Last edited by georgekar; Nov 26th, 2015 at 03:50 PM.

  4. #4

    Thread Starter
    Fanatic Member
    Join Date
    May 2014
    Location
    Preveza Greece
    Posts
    948

    Re: Copy to Clipboard as Unicode and Html Form

    In M2000 interpreter I put the Html export only in Cut and Copy (not in drag mode)
    See the Finnish Language and Greek in one text
    Recorded in VirtualBox, (running in an Ubuntu Studio)
    https://youtu.be/L_3ZSO7CSck

  5. #5
    PowerPoster
    Join Date
    Jul 2010
    Location
    NYC
    Posts
    2,244

    Re: Copy to Clipboard as Unicode and Html Form

    I understand your code is for copying HTML format, but if Word and others can read your Unicode HTML format, that means HTML format can be either UTF-8 or ANSI. So in VB the methods of retrieving text are different for each; so it's required to know which format the HTML is.

  6. #6

    Thread Starter
    Fanatic Member
    Join Date
    May 2014
    Location
    Preveza Greece
    Posts
    948

    Re: Copy to Clipboard as Unicode and Html Form

    I make two copies, one unicode and one html. The first is UTF16LE, same as strings in vb, second is html without line breaks. Html no need for line breaks. I put <p>. If our code can be parsed then we see the option to paste html code, and if not we have the unicode not html formated text with line breaks. Until now I didn't find any problem. I place my colour code from M2000 programms in my blog without any problem (this is done after some trials and error corrections). So get it and try..to prove the opossit, that there is the X Program that can't import html from this routine because except ansi only....

  7. #7

  8. #8

    Thread Starter
    Fanatic Member
    Join Date
    May 2014
    Location
    Preveza Greece
    Posts
    948

    Re: Copy to Clipboard as Unicode and Html Form

    The Html Format is in UTF8 format not in ANSI but there is no an indicator that is UTF8. It is a part of the target to parse the code. If parser found error- because at the interpretation expected to find close tags and a special tag for clipboard "<!--EndFragment -->"- then check internal that this in not a proper html format, so you get in target application only the unicode format without html tags.

  9. #9
    PowerPoster
    Join Date
    Jul 2010
    Location
    NYC
    Posts
    2,244

    Re: Copy to Clipboard as Unicode and Html Form

    But the actual text content on the clipboard is ANSI... I ran your example, and I can only later read the text with ANSI string pointer functions; the Unicode pointer ones (like those used for CF_UNICODETEXT and LPWSTR api returns in general) don't work.

    Edit: Here's what the data looks like; it's clearly 1 byte per character ANSI encoding:


    Whereas the UTF-8 Unicode in CF_UNICODETEXT is 2 bytes per character:

  10. #10

    Thread Starter
    Fanatic Member
    Join Date
    May 2014
    Location
    Preveza Greece
    Posts
    948

    Re: Copy to Clipboard as Unicode and Html Form

    See that: Utf-8 is not UTF16LE. The second image display UTF16LE, so you see 2 bytes for A but in UTF8 each english letter has one byte so ABC need three bytes. So for that reason you say that is an ANSI, but it isn't.
    Look careful my routine above. I use Utf16toUtf8() which give an array not a string. I put arrays in a big one and then I put the big array to a place in memory where I get a handler (which is the same as the memory in Win32, as I know). So there is no conversion.
    This is a part of code above. From line in bold we leave string and we handle array. So there are noway array that can be convert to anything, but stay with same bytes as we provide in CopyMemory statement.

    Code:
    Public Function SimpleHtmlData(ByVal sText As String)
        Dim lFormatId As Long, bb() As Byte
        lFormatId = RegisterCF
        If lFormatId <> 0 Then
        If sText = "" Then Exit Function
        bb() = HTML(sText)
        If CBool(OpenClipboard(0)) Then
              Dim hMemHandle As Long, lpData As Long
              hMemHandle = GlobalAlloc(0, UBound(bb()) - LBound(bb()) + 10)
              If CBool(hMemHandle) Then
                 lpData = GlobalLock(hMemHandle)
                 If lpData <> 0 Then
                    CopyMemory ByVal lpData, bb(0), UBound(bb()) - LBound(bb())
                    GlobalUnlock hMemHandle
                    EmptyClipboard
                    SetClipboardData lFormatId, hMemHandle
                 End If
              End If
              Call CloseClipboard
           End If
    End If
    So we provide a UTF8 enconding text and inside Greek letters are with double bytes but English letters are with one byte. So you compare a UTF-8 code with UTF-16LE and you think that you have an ANSI text and a UNICODE text. Do you get the point?
    If you are right then how you can see Greek letters from Ansi text, where you don't use Greek char code page as system char code page? This can be happen in a non Greek system if you read unicode, and unicode is not only the UTF-16E (with each char in two bytes, as in your second image)

    To read from vb the HTML format is not easy like to get in one action a string in utf16lE. You have to read bytes in an array first. Because offsets (encoded in header) are in bytes not in chars you have to extract bytes to specific arrays and then you need to convert this in UTF16.
    You need these
    Code:
    Private Declare Function MultiByteToWideChar& Lib "kernel32" (ByVal codepage&, ByVal dwFlags&, MultiBytes As Any, ByVal cBytes&, ByVal pWideChars&, ByVal cWideChars&)
    Private Declare Function GlobalSize Lib "kernel32" (ByVal hMem As Long) As Long
    ' you need global size to get the size of clipboard hmem and then you make the big array, to copy this
    ' so lets say that shortbuf() is not so short, and has all parts, then you can change offset 1 to mark the real offset (in bytes) of the main html part. 
    ' THIS IS A PART FROM M2000 CODE - THERE IS A LINK IN MY SIGN
         WChars = MultiByteToWideChar(65001, 0, shortbuf(1), st, 0, 0)
                getUniStringLineUtF8 = Space$(WChars)
                MultiByteToWideChar 65001, 0, shortbuf(1), st, StrPtr(getUniStringLineUtF8), WChars
    So here think shortbuf() as the copy from big array that you get from clipboard (clipboard provide the HMem and you can find size with function GlobalSize(). If you do that then you can make a Paste routine for Html Format (I didn't make that, because I have no need to use it)
    If you make it I will be happy to find a place to use it;

    EDIT: Double bytes for Greek lettes in utf-8 are not the same double bytes in Utf-16LE -it is not the same enconding.
    Last edited by georgekar; Nov 30th, 2015 at 06:33 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width