Results 1 to 6 of 6

Thread: AscU unicode equivalent of Asc

  1. #1

    Thread Starter
    Lively Member anycoder's Avatar
    Join Date
    Jan 2025
    Posts
    67

    AscU unicode equivalent of Asc

    This proposal introduces the AscU utility, a variant of the Asc function designed to retrieve the Unicode code point of any character.

    While the standard AscW function is often used for this purpose, it is frequently misunderstood and misused. In practice, AscW returns a signed integer (ranging from -32,768 to 32,767). Some developers attempt to use it for UTF-8 conversion, and tests are often misinterpreted ex:

    Code:
          Select Case  AscW(Mid(Txt, i, 1))  'AscW may returns a negative value
            Case Is < 128:
                
            Case Is < 2048:
               
           ...
    AscU:


    Code:
    Function AscU(s As String, aPos) As Long
    Dim h As Long, l As Long
      h = AscW(Mid(s, aPos, 1)) And &HFFFF&
      aPos = aPos + 1
      If (h >= &HD800&) And (h <= &HDBFF&) Then
         l = AscW(Mid(s, aPos, 1)) And &HFFFF&
         aPos = aPos + 1
         If (l >= &HDC00&) And (l <= &HDFFF&) Then
            AscU = (h And &H3FF&) * 1024
            AscU = (AscU Or (l And &H3FF&)) + &H10000
            Exit Function
         End If
      End If
      AscU = h
    End Function
    Example:
    Code:
    Private Sub test()
    Dim i As Long, s As String
    i = 1
    s = TextBox1 ' Unicode office TextBox control
    While i <= Len(s)
      Debug.Print Hex(AscU(s, i))
    Wend
    
    End Sub
    Hope this works.

  2. #2
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: AscU unicode equivalent of Asc

    https://www.vbforums.com/showthread....urrogate-pairs

    All done for ALL of the VB6 string functions.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  3. #3

    Thread Starter
    Lively Member anycoder's Avatar
    Join Date
    Jan 2025
    Posts
    67

    Re: AscU unicode equivalent of Asc

    Thanks for your contribution,
    I think your code has some major limitations at least for AscWEx:

    It doesn't handle individual characters, in the middle of a string for example.
    Valid code points range from 0 to 0x10FFFF, but AscWEx returns the concatenated surrogate pairs instead.

    Ex:(?) code point 0x20024

    AscWEx returns: D840DC24
    Last edited by anycoder; May 5th, 2026 at 04:00 AM.

  4. #4
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: AscU unicode equivalent of Asc

    anycoder, when you're dealing with a surrogate pair (four bytes), the entire four bytes represents a single character. That's the way surrogate pairs work.

    All my functions are doing is expanding the handling of strings from the UCS-2 characterset, which VB6 was designed around, to the entire UTF-16 characterset, which includes the surrogate pairs.

    VB6, with its intrinsic functions, assumes there's no such thing as surrogate pairs, which, when dealing with the complete UTF-16 characterset, isn't correct. My functions simply correct this oversight.

    Now, if you wish to bring UTF-8 into the discussion (which you may be trying to do), that's a completely different discussion. And none of the VB6 intrinsic string functions ever make a UTF-8 assumption. To treat a VB6 BSTR string as UTF-8 would take a completely different set of functions. And, truth be told, isn't really worth it. It'd be far easier to convert the UTF-8 to UTF-16, get your work done, and then possibly convert back to UTF-8, if that's what you need.
    Last edited by Elroy; May 7th, 2026 at 07:15 AM.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  5. #5

    Thread Starter
    Lively Member anycoder's Avatar
    Join Date
    Jan 2025
    Posts
    67

    Re: AscU unicode equivalent of Asc

    VB6, with its intrinsic functions, assumes there's no such thing as surrogate pairs, which, when dealing with the complete UTF-16 characterset, isn't correct. My functions simply correct this oversight.
    Nowadays, surrogate pairs are appearing more frequently in text, and they aren't just limited to emojis.
    Since the beginning, scripts that use them have caused issues not only for storage, but also due to linguistic characteristics that make standard string functions like Mid or InStr unusable.

  6. #6
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: AscU unicode equivalent of Asc

    Quote Originally Posted by anycoder View Post
    Nowadays, surrogate pairs are appearing more frequently in text, and they aren't just limited to emojis.
    Since the beginning, scripts that use them have caused issues not only for storage, but also due to linguistic characteristics that make standard string functions like Mid or InStr unusable.
    I agree. Also, regarding your statement about wanting a "code point". I assume you're talking about a Unicode (with no specific Unicode 'flavor' specified) code point. It might be useful to write some functions with names like:

    CodePointFromUTF16(UTF16 As Long) As Long
    UTF16FromCodePoint(CodePoint As Long) As Long

    For anyone actually wanting the Unicode Code Point values, they could use those functions. They'd be quite easy to write, as all the non-surrogate-pairs are already the code point values. And the math for the surrogate-pairs is fairly straightforward.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width