AscU unicode equivalent of Asc

**anycoder** · May 1st, 2026, 11:09 AM

This proposal introduces the AscU utility, a variant of the Asc function designed to retrieve the Unicode code point of any character.

While the standard AscW function is often used for this purpose, it is frequently misunderstood and misused. In practice, AscW returns a signed integer (ranging from -32,768 to 32,767). Some developers attempt to use it for UTF-8 conversion, and tests are often misinterpreted ex:

Code:

      Select Case  AscW(Mid(Txt, i, 1))  'AscW may returns a negative value
        Case Is < 128:
            
        Case Is < 2048:
           
       ...

AscU:

Code:

Function AscU(s As String, aPos) As Long
Dim h As Long, l As Long
  h = AscW(Mid(s, aPos, 1)) And &HFFFF&
  aPos = aPos + 1
  If (h >= &HD800&) And (h <= &HDBFF&) Then
     l = AscW(Mid(s, aPos, 1)) And &HFFFF&
     aPos = aPos + 1
     If (l >= &HDC00&) And (l <= &HDFFF&) Then
        AscU = (h And &H3FF&) * 1024
        AscU = (AscU Or (l And &H3FF&)) + &H10000
        Exit Function
     End If
  End If
  AscU = h
End Function

Example:

Code:

Private Sub test()
Dim i As Long, s As String
i = 1
s = TextBox1 ' Unicode office TextBox control
While i <= Len(s)
  Debug.Print Hex(AscU(s, i))
Wend

End Sub

Hope this works.

**Elroy** · May 4th, 2026, 08:31 AM

https://www.vbforums.com/showthread....urrogate-pairs

All done for ALL of the VB6 string functions.

**anycoder** · May 5th, 2026, 03:53 AM

Thanks for your contribution,
I think your code has some major limitations at least for AscWEx:

It doesn't handle individual characters, in the middle of a string for example.
Valid code points range from 0 to 0x10FFFF, but AscWEx returns the concatenated surrogate pairs instead.

Ex:(?) code point 0x20024

AscWEx returns: D840DC24

**Elroy** · May 7th, 2026, 07:09 AM

anycoder, when you're dealing with a surrogate pair (four bytes), the entire four bytes represents a single character. That's the way surrogate pairs work.

All my functions are doing is expanding the handling of strings from the UCS-2 characterset, which VB6 was designed around, to the entire UTF-16 characterset, which includes the surrogate pairs.

VB6, with its intrinsic functions, assumes there's no such thing as surrogate pairs, which, when dealing with the complete UTF-16 characterset, isn't correct. My functions simply correct this oversight.

Now, if you wish to bring UTF-8 into the discussion (which you may be trying to do), that's a completely different discussion. And none of the VB6 intrinsic string functions ever make a UTF-8 assumption. To treat a VB6 BSTR string as UTF-8 would take a completely different set of functions. And, truth be told, isn't really worth it. It'd be far easier to convert the UTF-8 to UTF-16, get your work done, and then possibly convert back to UTF-8, if that's what you need.

**anycoder** · May 15th, 2026, 02:04 AM

VB6, with its intrinsic functions, assumes there's no such thing as surrogate pairs, which, when dealing with the complete UTF-16 characterset, isn't correct. My functions simply correct this oversight.

Nowadays, surrogate pairs are appearing more frequently in text, and they aren't just limited to emojis.
Since the beginning, scripts that use them have caused issues not only for storage, but also due to linguistic characteristics that make standard string functions like Mid or InStr unusable.

**Elroy** · May 19th, 2026, 02:50 PM

Originally Posted by anycoder

Nowadays, surrogate pairs are appearing more frequently in text, and they aren't just limited to emojis.
Since the beginning, scripts that use them have caused issues not only for storage, but also due to linguistic characteristics that make standard string functions like Mid or InStr unusable.

I agree. Also, regarding your statement about wanting a "code point". I assume you're talking about a Unicode (with no specific Unicode 'flavor' specified) code point. It might be useful to write some functions with names like:

CodePointFromUTF16(UTF16 As Long) As Long
UTF16FromCodePoint(CodePoint As Long) As Long

For anyone actually wanting the Unicode Code Point values, they could use those functions. They'd be quite easy to write, as all the non-surrogate-pairs are already the code point values. And the math for the surrogate-pairs is fairly straightforward.

Thread: AscU unicode equivalent of Asc

Thread Tools

Display

AscU unicode equivalent of Asc

Re: AscU unicode equivalent of Asc

Re: AscU unicode equivalent of Asc

Re: AscU unicode equivalent of Asc

Re: AscU unicode equivalent of Asc

Re: AscU unicode equivalent of Asc

Posting Permissions