[RESOLVED] em dash display
A database (Access) I have includes em dashes (the long dash, as opposed to the short, en dash). When I retrieve a field with that character (within a sentence or phrase of normal text), how do I recognize it to display as an em dash in, say, a flexgrid, listbox or textbox? Currently a thick vertical bar is displayed in my controls.
Re: [RESOLVED] em dash display
"Em Dash" is ANSI &H97 and "En Dash" is ANSI &H96, both display just fine in ANSI controls for me. Are you sure you don't have some other dash instead?
Re: [RESOLVED] em dash display
Quote:
Originally Posted by
dilettante
"Em Dash" is ANSI &H97 and "En Dash" is ANSI &H96, both display just fine in ANSI controls for me. Are you sure you don't have some other dash instead?
No, sure am not sure. Appears several times in a downloaded (free) Bible App from the Microsoft Store. As I copied the contents from the app (via C-P), it "appears" to be an Em Dash from the source as well as the Excel File into which I pasted it (then imported file into MS Access). No idea 'what' it is, but it didn't appear as a 'long dash' in my program...but, I got around it by replacing it in Excel (Did THAT by copy/pasting the 'long dash' (whatever it really is) into the Search field, and Replaced it with a regular (N) dash. Good to know a 'real' M Dash displays fine.
Re: [RESOLVED] em dash display
Quote:
Originally Posted by
dilettante
"Em Dash" is ANSI &H97 and "En Dash" is ANSI &H96, both display just fine in ANSI controls for me. Are you sure you don't have some other dash instead?
Sounds like it might be Unicode u+2014
Re: [RESOLVED] em dash display
Quote:
Originally Posted by
DllHell
Sounds like it might be Unicode u+2014
That's the same character as &H97.
Re: [RESOLVED] em dash display
Quote:
Originally Posted by
dilettante
That's the same character as &H97.
This miracles of Windows code pages :-))
Everything beyond &H80 is subjected to interpretation according current system code page for non-Unicode application, but. . . En and Em dashes are &H96 and &H97 in every code page except far-eastern ones.
cheers,
</wqw>
Re: [RESOLVED] em dash display
Yep, anything over 127 is subject to code page translation. The default page for US-English is Windows 1252. What's more is when you assign characters in the range 0 to 255 by using Chr(), they are internally converted to Unicode, and they are not the same as code page 1252, so AscW() wouldn't give you numbers in the range 0 to 255. In Unicode, characters in the range 128 to 255 are fixed and are not subject to code page translation.
Here is a simple loop going through 0 to 255, and printing out characters that differ from code page 1252:
VB Code:
Option Explicit
Private Declare Function GetACP Lib "kernel32" () As Long
Private Sub Form_Load()
Dim i As Long
Dim s As String
Debug.Print "Active Code Page = " & GetACP()
Debug.Print "Chr(Dec)", "Chr(Hex)", "Asc(Hex)", "AscW(Hex)"
For i = 0 To 255
s = Chr(i)
' Compare Chr() with ChrW(), and print where they differ
If s <> ChrW(i) Then
Debug.Print i, Hex(i), Hex(Asc(s)), GetHex(AscW(s))
End If
Next
End Sub
' Get Hex value padded with 0 to the left
Public Function GetHex(ByVal i As Long) As String
GetHex = Right("000" & Hex(i), 4)
End Function
Output:
Code:
Active Code Page = 1252
Chr(Dec) Chr(Hex) Asc(Hex) AscW(Hex)
128 80 80 20AC
130 82 82 201A
131 83 83 0192
132 84 84 201E
133 85 85 2026
134 86 86 2020
135 87 87 2021
136 88 88 02C6
137 89 89 2030
138 8A 8A 0160
139 8B 8B 2039
140 8C 8C 0152
142 8E 8E 017D
145 91 91 2018
146 92 92 2019
147 93 93 201C
148 94 94 201D
149 95 95 2022
150 96 96 2013
151 97 97 2014
152 98 98 02DC
153 99 99 2122
154 9A 9A 0161
155 9B 9B 203A
156 9C 9C 0153
158 9E 9E 017E
159 9F 9F 0178
Re: [RESOLVED] em dash display
Quote:
Originally Posted by
qvb6
Yep, anything over 127 is subject to code page translation.
Not really.
If you are talking about the Chr$() function, it makes a call to MultiByteToWideChar() passing CP_ACP. What happens depends on the current ANSI codepage.
Re: [RESOLVED] em dash display
Code:
Option Explicit
Private Const CP_USASCII As Long = 20127
Private Declare Function MultiByteToWideChar Lib "Kernel32" ( _
ByVal CodePage As Long, _
ByVal dwFlags As Long, _
ByRef MultiByteStr As Byte, _
ByVal cbMultiByte As Long, _
ByVal lpWideCharStr As Long, _
ByVal cchWideChar As Long) As Long
Private Sub Dump()
Dim Bytes(0 To 255) As Byte
Dim I As Long
Dim Chars As String
For I = 0 To 255
Bytes(I) = CByte(I)
Next
Chars = Space$(256)
MultiByteToWideChar CP_USASCII, 0, Bytes(0), 256, StrPtr(Chars), 256
For I = 1 To 64
Debug.Print I; "="; AscW(Mid$(Chars, I, 1)), _
I + 64; "="; AscW(Mid$(Chars, I + 64, 1)), _
I + 128; "="; AscW(Mid$(Chars, I + 128, 1)), _
I + 192; "="; AscW(Mid$(Chars, I + 192, 1))
Next
End Sub
Results in:
Code:
1 = 0 65 = 64 129 = 0 193 = 64
2 = 1 66 = 65 130 = 1 194 = 65
3 = 2 67 = 66 131 = 2 195 = 66
4 = 3 68 = 67 132 = 3 196 = 67
5 = 4 69 = 68 133 = 4 197 = 68
6 = 5 70 = 69 134 = 5 198 = 69
7 = 6 71 = 70 135 = 6 199 = 70
8 = 7 72 = 71 136 = 7 200 = 71
9 = 8 73 = 72 137 = 8 201 = 72
10 = 9 74 = 73 138 = 9 202 = 73
11 = 10 75 = 74 139 = 10 203 = 74
12 = 11 76 = 75 140 = 11 204 = 75
13 = 12 77 = 76 141 = 12 205 = 76
14 = 13 78 = 77 142 = 13 206 = 77
15 = 14 79 = 78 143 = 14 207 = 78
16 = 15 80 = 79 144 = 15 208 = 79
17 = 16 81 = 80 145 = 16 209 = 80
18 = 17 82 = 81 146 = 17 210 = 81
19 = 18 83 = 82 147 = 18 211 = 82
20 = 19 84 = 83 148 = 19 212 = 83
21 = 20 85 = 84 149 = 20 213 = 84
22 = 21 86 = 85 150 = 21 214 = 85
23 = 22 87 = 86 151 = 22 215 = 86
24 = 23 88 = 87 152 = 23 216 = 87
25 = 24 89 = 88 153 = 24 217 = 88
26 = 25 90 = 89 154 = 25 218 = 89
27 = 26 91 = 90 155 = 26 219 = 90
28 = 27 92 = 91 156 = 27 220 = 91
29 = 28 93 = 92 157 = 28 221 = 92
30 = 29 94 = 93 158 = 29 222 = 93
31 = 30 95 = 94 159 = 30 223 = 94
32 = 31 96 = 95 160 = 31 224 = 95
33 = 32 97 = 96 161 = 32 225 = 96
34 = 33 98 = 97 162 = 33 226 = 97
35 = 34 99 = 98 163 = 34 227 = 98
36 = 35 100 = 99 164 = 35 228 = 99
37 = 36 101 = 100 165 = 36 229 = 100
38 = 37 102 = 101 166 = 37 230 = 101
39 = 38 103 = 102 167 = 38 231 = 102
40 = 39 104 = 103 168 = 39 232 = 103
41 = 40 105 = 104 169 = 40 233 = 104
42 = 41 106 = 105 170 = 41 234 = 105
43 = 42 107 = 106 171 = 42 235 = 106
44 = 43 108 = 107 172 = 43 236 = 107
45 = 44 109 = 108 173 = 44 237 = 108
46 = 45 110 = 109 174 = 45 238 = 109
47 = 46 111 = 110 175 = 46 239 = 110
48 = 47 112 = 111 176 = 47 240 = 111
49 = 48 113 = 112 177 = 48 241 = 112
50 = 49 114 = 113 178 = 49 242 = 113
51 = 50 115 = 114 179 = 50 243 = 114
52 = 51 116 = 115 180 = 51 244 = 115
53 = 52 117 = 116 181 = 52 245 = 116
54 = 53 118 = 117 182 = 53 246 = 117
55 = 54 119 = 118 183 = 54 247 = 118
56 = 55 120 = 119 184 = 55 248 = 119
57 = 56 121 = 120 185 = 56 249 = 120
58 = 57 122 = 121 186 = 57 250 = 121
59 = 58 123 = 122 187 = 58 251 = 122
60 = 59 124 = 123 188 = 59 252 = 123
61 = 60 125 = 124 189 = 60 253 = 124
62 = 61 126 = 125 190 = 61 254 = 125
63 = 62 127 = 126 191 = 62 255 = 126
64 = 63 128 = 127 192 = 63 256 = 127
Re: [RESOLVED] em dash display
Quote:
Originally Posted by
wqweto
En and Em dashes are &H96 and &H97 in every code page except far-eastern
This explains a lot. I have several clients in the area so tend to always test unicode stuff using PRC settings.
Re: [RESOLVED] em dash display
Quote:
Originally Posted by
dilettante
Private Const CP_USASCII As Long = 20127
Not to pick a fight, but I found this page defines CP_USASCII as 1252, while this one shows that "20127" is the code page for "us-ascii". I am not surprised by the confusing information out there, even in MSDN. I find that specifying character codes without the encoding scheme(ANSI+CodePage/Other single char per byte schemes/Unicode) is like saying that the temperature is 40 degrees, without saying the units(C/F). Even in VB6, char values(by using Asc, not AscW) are like temperature without units(CodePage). Saying the code page each time, and trying to be as accurate as possible would mean writing walls of text, which not many have the time for.
Re: [RESOLVED] em dash display
My only point was that things are more complicated than any absolute rules might indicate. You could have an SBCS codepage that produces almost anything from inputs of 0-255.
You are correct that Microsoft defines a CODEPAGEID of CP_USASCII as 1252 in one place, as a local constant only, in an interface definition for IMimeInternational. And they warn Do not use. In other words this codepage symbolic name has no real definition and certainly no global definition in the Win32 API. Meanwhile "us-ascii" is not valid syntax in most programming languages anyway aside from maybe Cobol and the codepage value 20127 has no official symbolic name assigned anywhere.
Would it have changed anything if I had named the constant US_ASCII or HIGH_BIT_STRIPPER_ASCII? No.