Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 6 of 6
  1. #1
    New Coder
    Join Date
    Sep 2007
    Posts
    21
    Thanks
    0
    Thanked 0 Times in 0 Posts

    character encoding

    I have been using ISO-8859-1 charset decleration in my html docs but I just discovered that the editor I use to create my html docs(UltraEditStudio) only can save in format of ansi/ascii, utf-8, or utf-16. I have been saving in default format which I am not sure which one of those is, but I presume it is not ISO, since there is no option for it. So how come when I have declared ISO in my html doc, my browser renders it correctly anyway? Would it only cause a problem if someone tried to view my site on a foreign computer?
    I think I should declare utf-8, since my html editor(UE) only supports this. But then do I also need to declare css char-encoding in my css docs? I think if it is undeclared, css doc will take encoding of html doc, but I can not find out what char-encoding my css editor(topstyle pro) saves as. Is there a way to find out the char-encoding of a saved text(css) document?

  • #2
    Regular Coder GO ILLINI's Avatar
    Join Date
    Jun 2005
    Location
    USA
    Posts
    634
    Thanks
    0
    Thanked 7 Times in 7 Posts
    it probably saves as UTF-8, but maybe read the help docs to be sure. As for the site working/not working, With browsers that can support what the page was saved as, it will render correctly. If the browser doesn't have that charset, it will use its default which may be close/exactly the same as what you meant to show, but in other places it may not show anything correctly.

    -Adam
    Why not thank me?

    http://adamsworld.name

  • #3
    New Coder
    Join Date
    Sep 2007
    Posts
    21
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I couldnt find anywhere that mentioned encoding type the other program saved with.
    I was wondering if there was a program or something that could analyse a document to see what character encoding it used.... but thinking about it, if text files had some kind of header data embedded saying(i use this char-set), then surley web browsers and computers would be able to automatically tell the character encoding type and thus there wouldnt be any need for such declarations within the text itself. hmmmmmmm

  • #4
    Senior Coder Arbitrator's Avatar
    Join Date
    Mar 2006
    Location
    Splendora, Texas, United States of America
    Posts
    3,387
    Thanks
    32
    Thanked 288 Times in 282 Posts
    Quote Originally Posted by cyjetsu View Post
    I have been using ISO-8859-1 charset decleration in my html docs but I just discovered that the editor I use to create my html docs(UltraEditStudio) only can save in format of ansi/ascii, utf-8, or utf-16. I have been saving in default format which I am not sure which one of those is, but I presume it is not ISO, since there is no option for it.
    The encoding is most likely ANSI, which, I believe, ends up being Windows-1252. Windows-1252 is the same as ISO-8859-1 except that Microsoft added some proprietary characters in place of the (generally unused) second set of control characters.

    If you serve a Windows-1252 document as ISO-8859-1, it should display fine as long as you don’t use any of the proprietary code points. If you use the proprietary code points, you still won’t notice any issues though since major browsers silently treat ISO-8859-1 as if it were Windows-1252.

    To see the list of characters that need to be avoided if your Windows-1252 document is mis‐served* as ISO-8859-1, you can check out the table in Wikipedia’s Windows-1252 article [1]; the table cells with green and yellow backgrounds identify the relevant characters. Compare that table with the last table in Wikipedia’s ISO-8859-1 article [2] to see what the correct characters are.

    * Technically, if you avoid those characters, your document isn’t mis‐served at all since it’s both a correct Windows-1252 document and a correct ISO-8859-1 document at the same time. (That would also make it a correct UTF-8 document since UTF-8 is a superset of ISO-8859-1.)

    Quote Originally Posted by cyjetsu View Post
    So how come when I have declared ISO in my html doc, my browser renders it correctly anyway?
    You haven’t used any of the offending characters as mentioned above, your browser is purposely misinterpreting ISO-8859-1 as Windows-1252, or you didn’t declare the character encoding and the browser has successfully auto‐detected the encoding.

    Quote Originally Posted by cyjetsu View Post
    Would it only cause a problem if someone tried to view my site on a foreign computer? I think I should declare utf-8, since my html editor(UE) only supports this.
    As far as successful rendering goes, the encoding chosen doesn’t matter as long as it’s been declared, the declared encoding is the encoding that was actually used, and the browser supports the chosen encoding.

    However, use of UTF-8 is beneficial since, unlike ISO-8859-1 and Windows-1252, it allows you to directly input more than 256 characters. Otherwise, if you want to enter characters outside of the encoding, you’d have to enter them in escaped form. For example, to enter the character U+2605 Black Star* (★), you could enter it directly under UTF-8; to do this under ISO-8859-1 or Windows-1252, you would need to use ★ or ★. Thus, your document’s file size and source code readability are improved by using UTF-8.

    * You may need to install a font containing a glyph for the character to be able to see it properly.

    Quote Originally Posted by cyjetsu View Post
    But then do I also need to declare css char-encoding in my css docs? I think if it is undeclared, css doc will take encoding of html doc, but I can not find out what char-encoding my css editor(topstyle pro) saves as.
    You should declare the encoding of all documents.

    You can declare the encoding of your CSS documents with a charset at‐rule. If your CSS document contains a byte order mark (BOM) character, the (UTF-8) encoding will be detected automatically. Alternatively, you can set an HTTP header (i.e., Content-Type: text/css; charset=utf-8). If you use one of the latter two methods, the at‐rule isn’t necessary.

    Quote Originally Posted by cyjetsu View Post
    Is there a way to find out the char-encoding of a saved text(css) document?
    There’s no sure way for a computer to tell the difference between various encodings; this is exactly why encodings are supposed to be declared.

    Unless a charset at‐rule, HTTP header, or BOM were used, you won’t know the encoding without a little guess work. The best way to guess is to display it under a given encoding and, if all of the characters show up fine, use that encoding; try Windows-1252 and UTF-8. If it displays fine in both, then you can declare Windows-1252, ISO-8859-1, or UTF-8. If it doesn’t (you see Replacement Glyph characters under UTF-8 but not Windows-1252) then the document must be Windows-1252‐encoded.

    1. http://en.wikipedia.org/wiki/Windows-1252
    2. http://en.wikipedia.org/wiki/ISO-8859-1#ISO-8859-1
    For every complex problem, there is an answer that is clear, simple, and wrong.

  • #5
    New Coder
    Join Date
    Sep 2007
    Posts
    21
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks. Thats a lot of useful info. I will check it all out now...It will take a moment for me to absorb it as I am new to charsets.
    I am guessing the " character is among those that could cause problems becuase somtiemes when I copy text from a webpage:
    {fred said "hello"}
    would paste as {fred said ?hello?} in my text editors, or even some weird o symbol.
    The " character better not be trouble becuase it is critical to the syntax of all html/css strings and propeties etc.

  • #6
    Senior Coder Arbitrator's Avatar
    Join Date
    Mar 2006
    Location
    Splendora, Texas, United States of America
    Posts
    3,387
    Thanks
    32
    Thanked 288 Times in 282 Posts
    Quote Originally Posted by cyjetsu View Post
    I am guessing the " character is among those that could cause problems becuase somtiemes when I copy text from a webpage:
    {fred said "hello"}
    would paste as {fred said ?hello?} in my text editors, or even some weird o symbol.
    The character U+0022 Quotation Mark (") is not a problematic character, as is shown in the two articles that I had linked to. The offending characters are probably U+201C Left Double Quotation Mark (“) and U+201D Right Double Quotation Mark (”); these characters cannot be entered directly under ISO-8859-1. They can be entered directly under Windows-1252 and UTF-8.

    Note that it may seem that the Quotation Mark character is the source of problems if you use an application that automatically converts them into the Left and Right Double Quotation Mark characters. I believe that Microsoft Word is an example of such an application.
    For every complex problem, there is an answer that is clear, simple, and wrong.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •