I wonder I can't use unicode text box,
can I use form2 conrols in my application? it is a free application
is there any free to distrubute control for this purpose?
or a robust crack for a known control or a custom conrol which works well
Printable View
I wonder I can't use unicode text box,
can I use form2 conrols in my application? it is a free application
is there any free to distrubute control for this purpose?
or a robust crack for a known control or a custom conrol which works well
Try searching this forum. Mainly in CodeBank. You will find unicode supported controls....:wave:
Also PSC is your friend on this one (Planet-Source-Code.com)
Good Luck
I happen to have made a Unicode-empowered textbox user control which is:
(1) capable of accepting input of text/characters in both ANSI and Unicode, and displaying them accordingly, e.g. English &/or Chinese/Japanese/Korean, if the applicable fonts exist in system. (2) DBCS-aware of clipboard, keyboard and/or hand writing device input. (3) automatically with a popup menu for Copy/Cut/Paste etc of ANSI and Unicode text, without any additional code. (4) suitable to serve as label user control as well.
No extra file required and no subclassing involved.
For system of XP and onwards only. File size as small as 10K.
Attached is the said textbox user control, together with a ready-to-run test project.
Note: After adding the user control to application's Form, please set it'sTabStop property to False.
The problem of not having any subclassing and not accounting for IOleInPlaceActiveObject results in some behavior that makes the TextBox not usable in many cases. Arrow keys may not always work. Sometimes hitting an arrow key will result in focus jumping to another control.
The subject textbox user control is not meant to be a full fledged one, e.g it only includes Chinese/Japanese/Korean. The intension is to make it simple for basic usage and thus helping some users. As such, I don't want to engage subclassing (or self-subclassing). Able users please feel free to add other languages and/or other features as they think fit. Those who would like to engage subclassing, please free to modify the code as they think fit.
AFAIK, the RichTextBox control can handle unicode, have you tried it?
The RichTextBox control can't handle Unicode itself, but you can use a few tricks to "reach under the hood" to the underlying Windows RichEdit control if your system has a recent enough version.
Here's a simple demo that should work in various Windows versions. Vista and Win7 for sure, probably XP and Win2K as well.
dilettante:
Thank you for sharing your work. Just a feedback of my test: By clicking the two buttons on the Form, I could see the default unicode strings in the 4 RTB boxes alright. However, when I tried to input some Chinese characters through a writing device, I failed -- the first 3 boxes showed only a series of "?" characters and the 4th one showed nothing. The test is done on Vista.
The reason I don't use RTB is that, I totally avoid involving the use of any extrinsic control in the user control, so that there wouldn't be any chance of "version conflict" if one has to to distribute his/her product (e.g. a user has been using my user control in his shareware paint programs so that his customers can superimpose both ANSI and Unicode characters on pictures). I also purposely avoid engaging subclassing, because I believe it would be more convenient for users to add to their project just a single ctl file (i.e. if I engage subclassing at all, I would do self-subclassing instead).
Well, I don't have one of these "writing devices" so I couldn't say.
However I can enter various symbols via the keyboard using their hex values and it works fine:
- Move cursor to the RTB (tab, click with the mouse, etc.).
- Enter the hexadecimal value, e.g. fa22, 9a21, etc.
- Alternatively, many can be entered in decimal, e.g. hold Alt type 01244 and release Alt.
- Press Alt-x to tell the RTB to convert to the Unicode character.
I believe this to be a bogus argument, yet one seemingly very popular here. This is exactly why VB was designed as it was: to be able to create programs out of different components. Versioning of the RTB control isn't an issue, it was never available in more than one unless you cound Service Pack fixes and such. My guess is what you fear is doing proper application deployment, which addresses the issue. Beyond that we've had isolated applications and registration-free COM since XP came out, making the problem disappear entirely if you create the proper isolation manifests.
This is another issue that goes away if you create an OCX instead of a CTL file.
Do as you like, we can use more ideas to choose from. I was just offering a suggestion that might help.
Can you tell us why in your example the cursor disappears and text jumps around and such? Perhaps that has something to do with my attempts to run it on a non-Asian machine?
I do not intend to enter into an argument here, because a forum discussion in this manner would lead to nowhere. I just want to clarify one point:
When I say input, I mean I write out the real Chinese characters directly (like the ones shown in your 3rd RTW box), not via the code representing them (such as you listed: fa22, 9a21). It is like when we input English, we write "ABCDE...." directly, not entering ASCII "65, 66, 67, 68, 69....".
Obviously there is a difference in perception.
Have now set the user control's TabStop property to False (after it is placed on the application's Form).
Entering Unicode characters in hex or decimal format was just an example showing that the RTB will accept them from a stadnard keyboard on a non-Asian version of Windows. Those of us with such systems can't work with IME, though the documentation suggests that the RTB can do so.
Have you had success using the Forms20 controls that were mentioned in the original post?
All that was requested was Unicode support anyway. I'm not sure the OP needs IME character entry. But then again his stated requirements were not very precise.
As I explained previously, I don't use any extrinsic control (Forms20 or RTB is one of them) in the User Control. In fact, it does not use any control at all, because I call CreateWindowExW to create a "virtual" Windows TextBox (Note "W" in CreateWindowsW). If you inspect the user control in the test project, you will see that it is a blank form only.
If I use Forms20, then I don't need to create a user control at all, life will be much easier. But I don't want to distribute Forms20, nor RTB. Cumbersome and troublesome in some cases (version conflict, the so called "dll hell" is something many of my fellow programmers grumble often).
There are millions of people in China using writing pad to do the input, people also use keyboard of course. But the user control is not aimed at them, because they are using Chinese version of Windows already.
As I see it, yours is just doing the job of conversion between UTF16 and UTF8. When I ask my users to write out a phrase on my control "My dear friend" in Chinese, I don't expect them to enter code no. so and so.
If you are interested, a quick search in PSC shows there is a good unicode textbox user control (search "AndRay"), which uses subclassing (and a bit of ASM). I have just tested it, it properly accepts Chinese input (it had omitted to deal with Tab key thing, but this is only a very minor side issue, unless you want to be hairs splitting). When it comes to self-subclassing, "Paul Caton" there is the guy (he really knows his potatoes and yet is so humble).
All I was doing in my example was using the RichTextBox control in a way that allows the program to write and read Unicode (UTF-16LE) String data to/from the underlying RichEdit control. It is doing no conversions at all, certainly not to UTF-8.
Note that UTF-8 is not ANSI even though people seem to keep confusing the two over and over. Perhaps that's because characters 0 to 127 in both have the same single byte value?
My example didn't involve any conversions at all. It is pure UTF-16LE. That was entirely the point!
I wasn't suggesting you should use the Forms 2.0 controls, just curious about whether they work with your writing pad devices.
Neither can TxText, the premier text box control. In fact, when typical ASCII conversions are done on the original strings read in from files, you might as well forget common encrypted files when dealing with machines requiring Unicode. Even when your encrypted files are read in using binary mode, you cannot decrypt them on Asian-based machines.
On USA-based machines, the encryption-decryption will work. But, on Asian machines, the same routines will fail. :sick:
No, I don't like to "play" with stuff like Fm20.dll. No fun at all, stiff and tied to MS Office.
Not just Asian machines, pretty much any machine with different locale settings will fail.
So you either use binary I/O or else another text I/O method that doesn't convert to/from ANSI. This is one of the few things useful about TextStream I/O (Scripting Rutime), it can handle UTF-16LE I/O.
Or you can use 3rd party textstream I/O objects that handle various encodings losslessly, like UTF-8 or UTF-16BE.
And it isn't just encryption. One of the worst errors I see with the Winsock control is people using it with String variables containing binary data. Once again you get ANSI conversions that scramble data.
You will also run into True-Type fonts that do not work with "different locale settings" as Dilletante calls them (BTW, that's a good description, not a criticism. It's not limited to Asian machines.)
I ran into this problem especially when the font calls for an ASCII that exceeds 128 (high bit). The only solution was to find a font that produced a similar character in the low-bit range and substitute. Loads of fun. :sick:
Hmm? It isn't what I call them, it's what they are called.
Code Doc,
Frankly I didn't look at dilettante's code, because a simple test already showed that actually the code couldn't accept Chinese input (i.e. not meeting the main "Unicode TextBox" criterion, the current thread subject). Because of what you had pointed out, I have now had a quick look of it, I agree with you.
I got a bit lost meanwhile, from dilettante's postings I saw "different locale settings", "encryption" and "winsock" etc, and now the quotable quotes, are we still talking about unicode textbox (the thread is on "Unicode TextBox")?
Obviously there is not only a difference in perception, a big one!
Just to sum it up: if you don't use any third party component (this including Forms 2.0), the only way you get a true Unicode textbox is by creating it with CreateWindowW (or CreateWindowExW).
Anything else is a hack that won't work with every Windows locale setting. The Microsoft provided RichTextBox control will fail displaying all characters correctly on a non 8-bit ANSI computer when trying to use Unicode with it. Also, more importantly, it doesn't support direct input – be it via writing pad or IME. I think it should be totally forgotten as a suggestion for Unicode aware textboxes to use.
Merri said, "Anything else is a hack that won't work with every Windows locale setting. The Microsoft provided RichTextBox control will fail displaying all characters correctly on a non 8-bit ANSI computer when trying to use Unicode with it. Also, more importantly, it doesn't support direct input – be it via writing pad or IME. I think it should be totally forgotten as a suggestion for Unicode aware textboxes to use."
-----------------
I could not agree more. However, I am not sure whether to supply a thumb up for Merri or a thumb down for language and/or system weakness. Unicode problems remain somewhat of a can of worms. :sick:
The Unicode snake has bitten me so many times that I am lucky to still be alive.
I wish to append a key point, in my Unicode TextBox posting, there are two comboboxes:
------------------------
cboFontName.AddItem "Arial Unicode MS"
cboFontName.AddItem "SimSun"
------------------------
cboFontCharSet.AddItem "128 Japanese Shift-JIS"
cboFontCharSet.AddItem "129 Korean Hangeul (Wansung)"
cboFontCharSet.AddItem "130 Korean Johab"
cboFontCharSet.AddItem "134 Chinese GB2312 Simplified - PRC & Singapore"
cboFontCharSet.AddItem "135 Chinese BIG5 Traditional - Taiwan & HK"
Such limited choices are not without a reason, readers who would like to add other languages should pay attention to this point.
That isn't as important as you think it is. Font's character set is only meaningful when the textbox holds ANSI data. With Unicode Windows has an inbuilt mechanism to use character glyphs from other fonts when the font in use does not have the glyphs that are used in the text. So even if you use a font that in itself does not contain Chinese glyphs, you will still see them if you do have any font installed that has those characters.
I can't anymore recall what this mechanism was called; but unless you can prove this otherwise, setting a specific font or charset is a no-issue. Windows has solved this problem for you.
Merri,
That I am not so sure. To a foreigner, Chinese is Chinese, Korean is Korean, but there are variations within each. For example, some one has sent me a Chinese letter, I know it is in Chinese, but the display shows that many of the characters are represented by tiny square (boxes), or strange characters. So I would have to do an error-and-trial to see which one to set to, untilI I see the display is alright. For the same reason, if you go to a Chinese website, it is not uncommon that a button is provided on screen to allow page viewer to "adjust".
That isn't a problem for a textbox to handle. If the message is transmitted in Unicode (UTF-16 or UTF-8), there should be no problem. The problem only arises when you enter the world of ANSI, ie. the message is saved in a specific character set and then needs to be loaded using the correct character set. This is not a problem for Unicode textbox to handle. The problem is how the information about the used character set is transferred. Traditional text files did not contain this information. Unicode text files often include Byte-order mark in Windows.
So, the square characters only appear because they really are a character code that does not have any glyph defined in Unicode. The problem gets fixed when the correct character set for loading the ANSI text is being used.
The website issue you've mentioned sounds a lot like a server misconfiguration issue, ie. server is configured to Chinese Simplified, page is in Chinese Traditional, server tells Chinese Simplified -> client uses Chinese Simplified, because server says so -> problem. But this is only a guess; I have never seen such a button on a website, and I haven't been to many Chinese websites.
I only know that at the moment there is Unicode + three character sets in wide use (Chinese Simplified, Chinese Traditional & National Standard GB 18030). Sounds like a mess. There were some issues when people started swithing to UTF-8 in the ISO-8859 world, but these days UTF-8 is so common knowledge for webmasters & web developers to handle that it gets done right most of the time. New software is almost all in UTF-8 too.
Let me say it again, there might well be variations or versions within a language and we cannot just rely on "auto fallback" (I don't mean "fallback" is not a great thing here).
To simplify the disucssion in simplest terms. Take Chinese for example and assume there are only two variations in existence: the tranditional characters and simplied ones (published by Government on certain selected ones - in thousands, to make life easier). There is still the need to let user select which specific one to use and Microsoft does not have the ability to automate it, no one has.
To help a non-Chinese to understand better the issue, let us further assume there are two friends viewing a letter on the screen, the older guy (by the way he is very learned) may ask the younger guy to make a switch (they don't know, nor care about the technical involvement at the background which is in fact to switch "font+codepage"), because the elder has difficulties in recognizing the simplied characters. Should the young guy refuse such a simple request by just saying "No. because the current font+codepage supports only simplied characters.". In a typical scenario (almost invariably so), the young guy would say "Yes" and a quick change of selection in the combobox satisfies the older guy.
All, yes all, softwares providing chinese writing pad (they empower not only writing, but viewing) list the possible variations and allow users to make a quick switch.
As your explanation doesn't really explain anything technically, I did a little googling.
http://people.w3.org/rishida/scripts/chinese/
Changing font does have effect, but it is with minor strokes (Song/Ming fonts). For other characters to be different in Unicode they'd have to be changed using some other means, ie. conversion between Simplified & Traditional – which appears not to be straighforward for a computer. For the most part with Unicode you can write both Simplified & Traditional to the same textbox. As well as any other language, of course.Quote:
So the characters for 'country' in Simplified and Traditional Chinese, 国 and 國 respectively, are stored as separate codes and you cannot simply switch between the two by using a different font. On the other hand, the character for 'the world' in both Simplied and Traditional writing looks the same, 界, and both writing systems do share the same code point. Then there are characters such as 雪 ('snow') which share the code point because they are not significantly different in appearance, but may typically exhibit systematic differences in stroke overshoot and rotation of minor strokes between simplified and traditional writing systems. To see these correctly you need to apply the right font, eg. a Song font for simplified and a Ming font for traditional.
Merri,
I wouldn't blame you, as Code Doc said Unicode and associated issues are really complex. Having laid out "in simplest terms" earlier, let me try the explanation in programming sense with the following:
Mr. A and Mr. B are using the same font "Arial MS Unicode". In Mr. A's machine, the default code page is "1234" (which means Traditional characeters). One day, Mr. B sends him a letter written using Simplied characters (at the background code page "5678"). On receiving the letter, Mr. A sees some square boxes and funny characters on his screen. Because Mr. A knows Mr. B well, so he knows the letter must be in Chinese, so he makes a quick error-and-trial switch from a combobox (always only a few selections available in fact). Soon he sees the correct display.
In above, it is said "Mr. A and Mr. B are using the same font", the implications here are, not only Font matters, but CodePage AT THE SAME TIME.
This problem has nothing to do with Unicode textbox. The problem is in ANSI -> Unicode conversion, ie. before passing the text string to the Unicode textbox.
If one has a deep-rooted dogmatic idea and refuses to listen, then it would not be possible to go further. I make a last try (and then stop as my knowledge on the subject is limited).
Why there is an Enum in the code making up my unicode textbox?
Public Enum CharSetConstants
[Font Default] = 1
[Japanese Shift-JIS] = 128
[Korean Wansung] = 129
[Korean Jahab] = 130
[Chinese GB2312] = 134
[Chinese BIG5] = 135
End Enum
(At the background, the above concerns CodePage. If one wants to cover more, avail more entries).
It is there to cover the variabtions I repeatedly said. One should realize the fact that unlike English, Chinese has more than one CodePages, so does Korean ....
That is the main reason ALL softwares accompanying a Chinese writing pad provides a combbox for user to switch a Font+CodePage within Chinese. (To avoid complicating the matter further, I don't want to explain why the available selections are always more then two, Traditional and Simplied).
[Remarks: The above is a re-type of what I remember -- a bit earlier I might have clicked a wrong button. If more than one copies of this text (not exactly the same) appear, just discard it. The above text already conveys what I said.]
That is because you're mixing ANSI stuff with Unicode. Unicode does not "know" about character sets at all. ANSI is any specific character set that is not Unicode, most likely a character set that is specialized for one language only. Unicode instead tries to have all the characters of the world in one big set (a continuing standardization process).
ANSI can only display one character set at once. For example, you simply can't write Thai if the character set is set to Japanese. Font.Charset is very useful for ANSI, because by changing it's value you can try to force to use a font that supports the character set (if the font that is currently selected does not support the character set that is being asked for). When a textbox is ANSI and Charset changes you will also see the characters change: in many cases you end up with garbage/boxes for non-English characters. This won't happen with Unicode textbox, because it can display all the characters. The reason it happens with ANSI is because the bytes that represent characters are interpreted differently when Font.Charset changes. With Unicode the exact character codes are already known and reprocessing the text bytes is not required.
I don't know the details of how writing pads work and what is their level of Unicode support, ie. do they send their window messages as ANSI or as Unicode (and if ANSI, how that works with Unicode control). However, the task of providing the character codes of written characters is a problem of the writing pad software, not one of a Unicode textbox in a program. So: if you write using one of these writing pads into your textbox control that has "incorrect" Font.Charset, do you get an incorrect character?
If you bother to read the code making up the unicode textbox, don't you think if I know something about ANSI and Unicode. There is no use to just talk about the bookish stuff. Follow the logic of what I said if really want to learn something If you don't, I have to stop any futher discussion and only to say say that you are great!
Very interesting.
I took my exact code as posted above, installed the Japanese IME, ran the program, and followed the instructions here:
How to Use Microsoft IME to Input East Asian Characters
I got exactly the results I expected: Japanese keyboard entry via IME.
And as I have stated repeatedly there is no ANSI conversion involved here whatsoever. Pure UTF-16 in and UTF-16 out via the code I supplied, as well as entry of mixed charset Unicode text from the keyboard or clipboard.
As far as I'm concerned the matter is closed: my code works perfectly. I don't have a writing pad to evaluate this code against and I take your word that it didn't work with yours. However I find nothing in Microsoft's documentation to suggest writing pad input should not work and a great deal suggesting it should.
I'll note that this test was done on Vista SP2, which seems to result in the RichTextBox using Riched20.dll version 3.0, which relies on "1.0 emulation" for the RichTextBox control's usual (ANSI) properties and methods. I am not using those ANSI properties and methods here though. It appears that this should work the same way on Win2K, XP, or Win7 as well.
About Rich Edit Controls
dilettante,
Interesting indeed. You adhere to the topic this time. Since you have actually done something on the subject matter (good or not so good is not the issue), not just talking only the bookish stuff, I would like to further the discussion.
I believe you of what you have tried. Now, not sure you would like to satisify the curiosity -- would you be able to add a Paste function to your code? If you do, I would like to do a download of it. What I am driving at is this:
The 3rd character of the last line is a Simplified Chinese character. I can paste a Traditional Chinese character of it into the RTB, to see whether your RTB can display it (forget about writing pad input for the time being).
This version has an added button to paste any Unicode text on the clipboard into the larger bottom RTB. I also turned on the context menus (right click) for all four RTBs here.
Feedback of test result: Confirm that I can paste a Tranditional Chinese character into all your RTB boxes.
A side note: In the course I found an interesting thing, the first time I copied from somewhere the Tranditional Chinese characters (equivalent to 2nd , 3rd & 4th ones in your RTB) into WordPad (before I copy from there into your RTB), WordPad automatically switched to a JGothic font and showed in its 2nd combo (i.e. the one for character set codepage) to be Japanese. The first time I come across this.
The 2nd time, I copied and pasted exactly the same characters again from another place, WordPad switched to Arial MS Unicode font, and showed the character set to be Chinese. This is the one I always see in the past.
Having resolved input by Paste (see above posting) and by Keyboard (you had said that), only one item left: by a writing pad. It remains an unknown: a textbox created by CreateWindowExW is able to accept input via a writing pad, RTB is unable.
You said you cannot find writing pad in Micosoft documentation, they don't say it writing pad as such, would you like to search Windows help. I've come across ".... handwriting recognition" (meaning input using writing pad).
Goodbye.