Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 10 of 10
  1. #1
    Regular Coder
    Join Date
    Feb 2007
    Location
    London
    Posts
    225
    Thanks
    16
    Thanked 2 Times in 2 Posts

    UTF-8 Korean characters not showing

    I've got a UTF-8 notepad2 file, but typing (or pasting) Korean Hangul characters into simply show up as rectangles, as you would expect if the coding were ANSI.

    What's the deal here?

    Googling reveals many people having trouble with Korean, whereas Japanese, Chinese, Armenian, what-have-you, characters in the UTf-8 encoded Notepad file are no problem.

    I can't get a definitive answer anywhere, and am being forced to enter extremely-hard-to-proof-read nonsense like
    Code:
    하다
    in order to make them appear on the webpage.

    Any theories?

    Thanks

  • #2
    Senior Coder Arbitrator's Avatar
    Join Date
    Mar 2006
    Location
    Splendora, Texas, United States of America
    Posts
    3,316
    Thanks
    29
    Thanked 279 Times in 273 Posts
    The characters show up fine in Microsoft Notepad and in the output when I copy and paste the literal characters into an HTML document.

    Notepad++ is another story; I get replacement character glyphs in place of the Hangul glyphs. It seems that Notepad++ will only show glyphs present in the current font (Lucida Console); in other words, it doesn’t seem to support glyph substitution (borrowing from other fonts). If I change it to a font that contains more glyphs, such as Code2000, then the characters appear as expected. Unfortunately, Code2000 isn’t a monospace font (and lacks elegance), so I won’t use it for code.

    In short, I would try some other fonts, ensure that the fonts contain the relevant characters or that glyph substitution is supported, and ensure that the document has (A) been encoded correctly and (B) is telling the browser to use the correct encoding. You can tell what encoding is being used to display a document by looking at the View > Character Encoding menu in Firefox 2, for example.
    For every complex problem, there is an answer that is clear, simple, and wrong.

  • #3
    Regular Coder
    Join Date
    Feb 2007
    Location
    London
    Posts
    225
    Thanks
    16
    Thanked 2 Times in 2 Posts
    Thanks so much forthe input - Notepad++'s font lacking certain character sets never occurred to me!

  • #4
    Senior Coder koyama's Avatar
    Join Date
    Dec 2006
    Location
    Copenhagen, Denmark
    Posts
    1,246
    Thanks
    1
    Thanked 5 Times in 5 Posts
    Quote Originally Posted by Arbitrator View Post
    Notepad++ is another story; I get replacement character glyphs in place of the Hangul glyphs. It seems that Notepad++ will only show glyphs present in the current font (Lucida Console); in other words, it doesn’t seem to support glyph substitution (borrowing from other fonts).
    This is annoying me too. I see some weird things going on in Notepad++ (version 4.1.2)

    I cannot for the life of me find out what is going on:

    I open a new file (ANSI, UTF-8 without BOM). Then when I paste two Hangul glyphs 한국 (copied from Wikipedia) I get those replacement characters as you mention: check this screenshot.

    But now I paste two more characters after those replacement characters for the Hangul. This time I paste Japanese Kanji: 日本 (which appear fine for some reason). Then suddenly (to my joy) the original replacements characters for the Hangul are replaced by real Hangul glyphs: check screenshot.

    Now I thought that I could just trick Notepad++ and delete those Kanji and the Hangul would stay as they are. But when I do that the Hangul reverts to replacement characters.

    If someone knows what is going on I would like to know.

  • #5
    Regular Coder
    Join Date
    Feb 2007
    Location
    London
    Posts
    225
    Thanks
    16
    Thanked 2 Times in 2 Posts
    Koyama, what you describe about the characters appearing correctly, but then reverting to squares after deleting adjacent special characters is exactly what I'm finding too.

    Did you ever make any progress with this problem?
    I'm still stumped. (And cannot find a text editor that deals with it correctly)

    I started a new thread recently:

    http://www.codingforums.com/showthread.php?t=120424

    which was getting at pretty much the same thing.

    Feedback from anyone very much appreciated.
    For now, I'm manualy entering char refs with ampersand hash numbers semicolon to reference Korean, IPA, musical symbols etc (all of which, unfortunately, I need to display very often on my sites), and proof-reading is nigh on impossible.

  • #6
    Supreme Master coder! abduraooft's Avatar
    Join Date
    Mar 2007
    Location
    N/A
    Posts
    14,862
    Thanks
    160
    Thanked 2,223 Times in 2,210 Posts
    I had the same problem (for Malayalam utf-8), but I solved this as...

    First created a file in notepad (V 5.1, got with XP) and saved it by selecting "Save As" and then selected "UTF-8" for encoding. Then onwards I can use Dreamweaver or NP++ to edit/modify the same.

    Edit: Now I'm using the exact character in code, not its hexadecimal (&*****; ) equivalent.

    regards,
    art
    Last edited by abduraooft; 08-05-2007 at 12:29 PM.
    The Dream is not what you see in sleep; Dream is the thing which doesn't let you sleep. --(Dr. APJ. Abdul Kalam)

  • #7
    Senior Coder koyama's Avatar
    Join Date
    Dec 2006
    Location
    Copenhagen, Denmark
    Posts
    1,246
    Thanks
    1
    Thanked 5 Times in 5 Posts
    Quote Originally Posted by cfructose View Post
    Did you ever make any progress with this problem?
    I'm still stumped. (And cannot find a text editor that deals with it correctly)

    I started a new thread recently:

    http://www.codingforums.com/showthread.php?t=120424

    which was getting at pretty much the same thing.
    I seem to (partially) have overcome the problem in Notepad++ by doing just about what Arbitrator suggested. (I couldn't even find similar information in the Notepad++ forum). Here are the steps:

    1. Download and install Notepad++
    2. Download and install the Code2000 font having “many” characters. (see Arbitrator's link)
    3. In the styler configurator in Notepad++ set for Global Styles the font style to be Code2000 (e.g. font size 10).
    4. Also, in the styler configurator, set for language HTML/TAG and HTML/ATTRIBUTE the font style to e.g Courier New because here you want a monospace font. There will be other places you will want to set a monospace font. Just don't set a monospace font where there may be special characters such as HTML/DEFAULT, but leave the font style blank so that the font style will be inherited from Global Styles.

    I am not sure about your problems in Notepad2 which is different from Notepad++. Is Notepad2 even being developed anymore?
    Quote Originally Posted by Arbitrator View Post
    It seems that Notepad++ will only show glyphs present in the current font (Lucida Console); in other words, it doesn’t seem to support glyph substitution (borrowing from other fonts).
    Or rather, in light of my previous example, I would say that Notepad++ 4.1.2 does have some support for glyph substitution feature albeit broken support. I am just hoping that they (or rather he) will fix this in a future version.
    Quote Originally Posted by abduraooft View Post
    I had the same problem (for Malayalam utf-8)
    I tried with Malayalam characters too. Code2000 seems to include Malayalam characters so the above steps may do the trick for you as well. I do see some weirdness with highlighted text being repeated, but I'm not sure what is going on.
    Last edited by koyama; 08-05-2007 at 08:17 PM. Reason: typos

  • #8
    Senior Coder Arbitrator's Avatar
    Join Date
    Mar 2006
    Location
    Splendora, Texas, United States of America
    Posts
    3,316
    Thanks
    29
    Thanked 279 Times in 273 Posts
    Quote Originally Posted by koyama View Post
    If someone knows what is going on I would like to know.
    I posted a brief inquiry in the official Notepad++ forum shortly after your post and never got a response. [1]

    1. http://sourceforge.net/forum/message.php?msg_id=4378687

    Quote Originally Posted by koyama View Post
    Download and install the Code2000 font having “many” characters. (see Arbitrator's link)
    I’ve found that the DejaVu group of fonts has many characters as well and also includes a monospace font. It exhibits significantly more elegant aesthetics that Code2000 and is still being actively worked on; it was last updated today. The main disadvantage (personally speaking) is that it lacks characters from the Asian character blocks.

    Unfortunately, Notepad++ won’t allow me to select the DejaVu Sans Mono (regular) font; I can only select the Bold Oblique variant. As a result, I’m trying out a new editor, jEdit, which seems to allow me to select more fonts and also has some new features. Unfortunately, it also lacks some features of Notepad++, including the ability to substitute in Japanese characters when the font lacks them; I get rectangles instead of the proper glyphs. On the other hand, I can get certain other characters to display correctly while sticking to a monospace font.

    Quote Originally Posted by koyama View Post
    I am not sure about your problems in Notepad2 which is different from Notepad++. Is Notepad2 even being developed anymore?
    Since it was last updated ten days ago, it is, presumably, still being developed.
    For every complex problem, there is an answer that is clear, simple, and wrong.

  • #9
    Senior Coder koyama's Avatar
    Join Date
    Dec 2006
    Location
    Copenhagen, Denmark
    Posts
    1,246
    Thanks
    1
    Thanked 5 Times in 5 Posts
    Quote Originally Posted by Arbitrator View Post
    I posted a brief inquiry in the official Notepad++ forum shortly after your post and never got a response. [1]

    1. http://sourceforge.net/forum/message.php?msg_id=4378687
    wow... I really appreciate your help. Thank you very much for your tireless efforts for finding solutions.
    Quote Originally Posted by Arbitrator View Post
    I’ve found that the DejaVu group of fonts has many characters as well and also includes a monospace font. It exhibits significantly more elegant aesthetics that Code2000 and is still being actively worked on; it was last updated today. The main disadvantage (personally speaking) is that it lacks characters from the Asian character blocks.
    I will definitely check out the DejaVu group of fonts. After trying Code2000 I would agree with you that the look and feel of that font isn't suitable in a code editing environment. Currently, I am also trying out Bitstream CyberBit holding 29,934 glyphs including the Asian CJK glyphs.
    Quote Originally Posted by Arbitrator View Post
    Unfortunately, Notepad++ won’t allow me to select the DejaVu Sans Mono (regular) font; I can only select the Bold Oblique variant.
    Now that you mention this problem with Notepad++ I recall having seen this thread in the Notepad++ forum by someone having a similar problem just with the “Bitstream Vera Sans Mono” font. I guess that the cause of the problem is the same as yours with the “DejaVu Sans Mono” font.

    Indeed, when I look in styler configurator in Notepad++ then “Bitstream Vera Sans Mono” isn't available from the drop-down menu, but strangely “Bitstream Vera Sans Mono Bold” is. Luckily, the fix suggested in the thread to manually edit the styler.xml configuration file worked for me. On Windows XP, what I did was to manually edit the file C:\Documents and Settings\<User>\Application Data\Notepad++\styler.xml. Towards the bottom of the file (line 691) I found the global styles which I edited manually and set to the desired font.
    Code:
    <GlobalStyles>
        <!-- Attention : Don't modify the name of styleID="0" -->
        <WidgetStyle name="Default Style" styleID="32" fgColor="000000" bgColor="FFFFFF"
          fontName="Bitstream Vera Sans Mono" fontStyle="0" fontSize="10" />
        <WidgetStyle name="Indent guideline style" styleID="37" fgColor="C0C0C0" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" />
        <WidgetStyle name="Brace highlight style" styleID="34" fgColor="FF0000" bgColor="FFFFFF" fontName="" fontStyle="1" fontSize="10" />
        <WidgetStyle name="Bad brace colour" styleID="35" fgColor="800000" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" />
        <WidgetStyle name="Current line background colour" styleID="0" bgColor="E8E8FF" />
        <WidgetStyle name="Mark colour" styleID="0" fgColor="C00000" bgColor="FFFF00" />
        <WidgetStyle name="Selected text colour" styleID="0" bgColor="C0C0C0" />
        <WidgetStyle name="Caret colour" styleID="2069" fgColor="8000FF" />
        <WidgetStyle name="Find Mark Style" styleID="31" fgColor="FFFF00" bgColor="FF0000" fontName="" fontStyle="1" fontSize="" />
        <WidgetStyle name="Edge colour" styleID="0" fgColor="80FFFF" />
        <WidgetStyle name="Line number margin" styleID="33" fgColor="808080" bgColor="E4E4E4" fontName="" fontStyle="0" fontSize="" />
        <WidgetStyle name="Fold" styleID="0" fgColor="808080" bgColor="F3F3F3" />
        <WidgetStyle name="Fold margin" styleID="0" fgColor="FFFFFF" bgColor="E9E9E9" />
        <WidgetStyle name="White space symbol" styleID="0" fgColor="FFB56A" />
    </GlobalStyles>
    Quote Originally Posted by Arbitrator View Post
    As a result, I’m trying out a new editor, jEdit, which seems to allow me to select more fonts and also has some new features.
    Thanks for the tip. Never heard of that editor, but I will check it out.
    Quote Originally Posted by Arbitrator View Post
    Since it was last updated ten days ago, it is, presumably, still being developed.
    LOL... then I guess it's still alive. I had confused Notepad2 with Notepad2 MOD which seems to be a branch of Notepad2. The latest release of the latter was from July 26, 2006.

    Reading the latest news (July 6, 2007) about Notepad++ one finds that the next release has been delayed because the programmer's cat had tilted his cup of coffee on his laptop so that it had gone kaput.

  • #10
    Regular Coder
    Join Date
    Feb 2007
    Location
    London
    Posts
    225
    Thanks
    16
    Thanked 2 Times in 2 Posts
    Just came back from holiday and saw the 'evolution' of this thread.

    Thanks for all the research - it's comforting to know that others are battling with similar problems. Long live the internet!

    Now, time to investigate all your invaluable input...


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •