Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 5 of 5
  1. #1
    Regular Coder
    Join Date
    Oct 2009
    Posts
    461
    Thanks
    7
    Thanked 3 Times in 3 Posts

    how do i convert all characters from non utf-8 to utf-8

    I have been playing around with my feed all day and think I may have a lot of non utf-8 characters in my databases and wish to know how I could convert all of the non utf-8 characters to the equivalent of the utf-8.

    I think its some of the special ` ' " and maybe some others that seem to be hidden from view as i hit delete and it deletes something from the field but the cursor does not move! (this actually fixed some of them) no idea what characters I need to convert.

    I have this bit of code that I used to use for a previous rss feed i done some time ago that still works for that feed, but using it on my recent sites feed it does not seem to convert everything that needs to be converted.

    Is there a list of characters that are special and what the equivalents are so I could use some sort of search replace on the string before it is echo'd on the page?

    Code:
    $description = stripslashes($result['description']);
    $description = str_replace(""", '"', $description);//	convert	'	to	 single quote mark.
    $description = str_replace("&", "&", $description);//	convert	'	to	 single quote mark.
    $description = str_replace("'", "'", $description);//	convert	'	to	 single quote mark.
    	$title = stripslashes($result['title']);
    	$title = str_replace(""", '"', $title);//	convert	'	to	 single quote mark.
    	$title = str_replace("&", "&", $title);//	convert	'	to	 single quote mark.
    	$title = str_replace("'", "'", $title);//	convert	'	to	 single quote mark.

  • #2
    Senior Coder
    Join Date
    Sep 2010
    Posts
    2,259
    Thanks
    15
    Thanked 255 Times in 255 Posts
    The non-utf-8 characters, AKA ascii charset have decimal numbers and look like this.

    & #48; which is zero. In utf-8, zero looks like this & #x30; and the 30 is hexadecimal. To interconvert among them you need these four php functions, dechex, hexdec, to change the numbers, and chr, to make the character from the decimal number, and ord to get the decimal number from the character. So ord('0') = 48. They can also handle the extended ascii characters. And with them you may have to use charset=iso8859-1, instead of utf-8.
    Welcome to http://www.myphotowizard.net

    where you can edit images, make a photo calendar, add text to images, and do much more.


    When you know what you're doing it's called Engineering, when you don't know, it's called Research and Development. And you can always charge more for Research and Development.

  • #3
    Regular Coder
    Join Date
    Oct 2009
    Posts
    461
    Thanks
    7
    Thanked 3 Times in 3 Posts
    Hey Thank you.

    What I have is an RSS Feed and using w3.org to validate it I get the following errors...

    -------
    This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.
    -------
    Feeds should not be served with the "text/html" media type [help]
    -------
    Your feed appears to be encoded as "utf-8", but your server is reporting "US-ASCII" [help]
    -------

    I have this at the start just before the <channel>

    Code:
    <?xml version="1.0" encoding="utf-8" ?>
    <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    this is an edited version of my feed... ( I have only removed the product text and replaced with random words )

    Code:
    <?xml version="1.0" encoding="utf-8" ?>
    <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
           <title>My Site</title>
           <description>MySite</description>
           <link>http://www.MySite.com</link>
           <language>en</language>
           <copyright>www.MySite.com</copyright>
           <pubDate>Fri, 16 Aug 2013 00:50:11 GMT</pubDate>
           <lastBuildDate>Fri, 16 Aug 2013 00:50:11 GMT</lastBuildDate>
           <generator>www.MySite.com</generator>
           <ttl>30</ttl>
           <atom:link href="http://www.MySite.com/sitemap.xml" rel="self" type="application/rss+xml" />
           <item>
             <title><![CDATA[My Product Title]]></title>
             <description><![CDATA[My Product Details]]></description>
             <link><![CDATA[http://www.MySite.com/store/index.php?route=product/product&product_id=1&h=r]]></link>
     <pubDate>Sat, 06 Jul 2013 14:05:13 GMT</pubDate>
             <source url="http://www.MySite.com">www.MySite.com</source>
    <guid isPermaLink="true"><![CDATA[http://www.MySite.com/store/index.php?route=product/product&product_id=1&h=r]]></guid>
           </item>
    </channel>
    </rss>
    Is this the correct format I should be using ? It seems to validate apart from it saying that it is not utf-8 ? even though it shows in the file it is being set to that.

  • #4
    Senior Coder
    Join Date
    Sep 2010
    Posts
    2,259
    Thanks
    15
    Thanked 255 Times in 255 Posts
    I would certainly try changing the encoding to iso8859-1 and the mime type to text/plain. They seem to be telling you to change the encoding. I can't really tell you what's right. You just have to try things till everything works.
    Welcome to http://www.myphotowizard.net

    where you can edit images, make a photo calendar, add text to images, and do much more.


    When you know what you're doing it's called Engineering, when you don't know, it's called Research and Development. And you can always charge more for Research and Development.

  • #5
    Regular Coder
    Join Date
    Oct 2009
    Posts
    461
    Thanks
    7
    Thanked 3 Times in 3 Posts
    so true, never a right way or wrong, way, just it works or doesn't. will try that in the morning, as i am shattered. been on this for 6 hours straight now.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •