Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 3 of 3
  1. #1
    New to the CF scene
    Join Date
    Feb 2006
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question Shorten text and HTML 4.01 strict

    I asked this question in the HTML/CSS area a while back, and didn't get a single idea of even where to start with this problem. Basically, I'm shortening text with a PHP function to display on a blog front page, and it's cutting off ending HTML tags (like </blockquote> or </ul>, etc).

    It is displaying just fine, but it can cause my page not to validate under W3C HTML 4.01 strict. I'm wondering if there's any way to fix this, prevent it, or even just an idea of how to approach the problem.

    Here's my original post:

    HTML 4.01 Strict & shortening text

  • #2
    Super Moderator
    Join Date
    May 2002
    Location
    Perth Australia
    Posts
    4,067
    Thanks
    11
    Thanked 96 Times in 94 Posts
    Most 'teasers' if autogenerated are posted without formatting (strip_tags() etc)
    & I think half of the reason for this is that its not straightforward to do what you want to do & the other half is that the formatting used within a page may or may not work in the context of a small `teaser`.

    You could simply store a seperate field (in your db or however you are storing) just for the teaser since you often may want to summerize the main contents (rather than grab the first $x words)

    To try and parse the content and repair is not that easy since there may be nested tags etc, e.g. there is no regex one-liner to cover that.
    You could possibly use a third party sanitizer like htmlTidy but that seems overkill to me.

    a seperate field for the teaser or strip_tags() would be my choice (& in that order)
    resistance is...

    MVC is the current buzz in web application architectures. It comes from event-driven desktop application design and doesn't fit into web application design very well. But luckily nobody really knows what MVC means, so we can call our presentation layer separation mechanism MVC and move on. (Rasmus Lerdorf)

  • #3
    New to the CF scene
    Join Date
    Feb 2006
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by firepages
    a seperate field for the teaser or strip_tags() would be my choice (& in that order)
    Thanks for your reply, firepages.

    I had thought of the strip_tags option, and decided I didn't want to lose anchor tags, bold, italic, etc, in the 'teaser' as you call it.

    I hadn't thought of storing a teaser in a seperate field. I suppose that's the DB designer in me putting blinders on to anything that even comes close to data duplication. I'm not sure the extra work involved here (not just a one time cost like some magical function would be) would be worth having my page 4.01 compliant.

    I was also considering that since I'm parsing every character anyway with my ShortenText function, why not simply keep track of what tags are 'open' (recursively maybe) and simply close remaining open tags when I parse a long enough teaser string?


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •