Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 4 of 4
  1. #1
    New to the CF scene
    Join Date
    Oct 2007
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Stripping dynamic HTML from a string...

    In a project I am working on at the moment, I am looking to pull specific data from ESPN's RSS feed. This was working fine until they recently changed the format for their feed. I now have to adjust my code to get the result I had before.

    Here is a current example of one items from ESPN's Top News feed:

    Code:
    <item>
    <dc:creator><![CDATA[ESPN.com news services]]></dc:creator>
    <title><![CDATA[Donte Stallworth to sign contract with Baltimore Ravens, source says]]></title>
    <description><![CDATA[<a href="http://api.tweetmeme.com/share?url=http://sports.espn.go.com/nfl/news/story?id=4921333&amp;campaign=rss&amp;source=ESPNHeadlines&amp;service=tinyurl.com&amp;source=espn"><img style="padding-left:10px;" align="right" border="0" style="border:none;" src="http://api.tweetmeme.com/imagebutton.gif?url=http://sports.espn.go.com/nfl/news/story?id=4921333&amp;campaign=rss&amp;source=ESPNHeadlines" height="49" width="41" /></a>Wide receiver Donte Stallworth, whose contract with Cleveland was terminated last week, will sign a one-year contract with the Baltimore Ravens, a league source told ESPN NFL Insider Adam Schefter.]]></description>
    <pubDate>Wed, 17 Feb 2010 09:10:52 PST</pubDate>
    <guid>http://sports.espn.go.com/nfl/news/story?id=4921333&amp;campaign=rss&amp;source=ESPNHeadlines</guid>
    <link>http://sports.espn.go.com/nfl/news/story?id=4921333&amp;campaign=rss&amp;source=ESPNHeadlines</link>
    </item>
    The problem I'm encountering is that I want to pull only the text summary from the description field. They are including a HTML as well, a linked image. I want to strip this away and leave just the text summary of the news item. In the above example, I want to end up with just: Wide receiver Donte Stallworth, whose contract with Cleveland was terminated last week, will sign a one-year contract with the Baltimore Ravens, a league source told ESPN NFL Insider Adam Schefter.

    What is the best approach to this? Do I want to use the preg_split() function? To pull away all of the HTML with regular expression. Or is this example not suited for using regular expression?

    I don't necessarily want someone to do the work for me, just point me into the right direction. I'd like the challenge of figuring it out for myself.

    Thanks!

  • #2
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,994
    Thanks
    4
    Thanked 2,662 Times in 2,631 Posts
    You can pull the full result as html? Use strip_tags to remove any html from it. Since the only data you don't want are elements and not text within them, the strip_tags should remove both the a and img tags.
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 
    Been gone for a few months, and haven't programmed in that long of a time. Meh, I'll wing it ;)

  • #3
    New to the CF scene
    Join Date
    Oct 2007
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Fou-Lu View Post
    You can pull the full result as html? Use strip_tags to remove any html from it. Since the only data you don't want are elements and not text within them, the strip_tags should remove both the a and img tags.
    I was unaware of that particular function. That worked quite well, thank you.

  • #4
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,994
    Thanks
    4
    Thanked 2,662 Times in 2,631 Posts
    No problem, glad it worked out!
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 
    Been gone for a few months, and haven't programmed in that long of a time. Meh, I'll wing it ;)


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •