Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 8 of 8
  1. #1
    New Coder
    Join Date
    May 2008
    Posts
    19
    Thanks
    1
    Thanked 0 Times in 0 Posts

    perl substr - match

    HI,

    I am reading a text file using perl script and want to show only few 1000 characters. This text file has the html tags like <p><a href..>ada</a> etc and the content changes daily. So sometimes the characters are getting cut off in the middle of the <a href ..> tag so the problem is it doenst close properly and then it messes up the other content below this by adding a link because of this unclosed tag. Is there any way to cut off properly like a "." (period) or look for the closing tag(</a> or </p>) .?

    Please help me out.

    the one line code that gets the content is

    Code:
    $content .=  substr($text,0,2100);

  • #2
    New Coder
    Join Date
    May 2008
    Posts
    19
    Thanks
    1
    Thanked 0 Times in 0 Posts
    any help please?

  • #3
    Master Coder
    Join Date
    Apr 2003
    Location
    in my house
    Posts
    5,211
    Thanks
    39
    Thanked 201 Times in 197 Posts
    You'll need to post your code because we can't guess what it does.

    bazz
    "The day you stop learning is the day you become obsolete"! - my late Dad.

    Why do some people say "I don't know for sure"? If they don't know for sure then, they don't know!
    Useful MySQL resource
    Useful MySQL link

  • #4
    New Coder
    Join Date
    May 2008
    Posts
    19
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Hi,

    I have $text with the content and then I am using substr to show 2100 chars. I am not sure what else I should provide here other than what I have below.. $text is getting the content from the text file and that has html tags(mostly <p> and <a>).

    Code:
    $text_file = "story.txt";
    open(FILE,"$text_file");
    
    $content1[0] = <FILE>;
    close(FILE);
    
    $content1_html = '';
    
    foreach $content1 (@content1) {
    	($date,$text) = split('===',$content1);
    	$content1_html .=  substr($text,0,2000);
    }

  • #5
    Master Coder
    Join Date
    Apr 2003
    Location
    in my house
    Posts
    5,211
    Thanks
    39
    Thanked 201 Times in 197 Posts
    I've re-read your post and I understand it differently this time.

    I think you will need to use a regex and make it split the 'text' after a </p> tag, perhaps, where it is the first one after say 1500 characters.

    I am hopeless at regexes but I'll try to think it through overnight and post back if you haven't had an answer.

    bazz
    "The day you stop learning is the day you become obsolete"! - my late Dad.

    Why do some people say "I don't know for sure"? If they don't know for sure then, they don't know!
    Useful MySQL resource
    Useful MySQL link

  • #6
    New Coder
    Join Date
    May 2008
    Posts
    19
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Hi

    I couldnt get it to work. Please let me know if you have any idea now.

    regards

  • #7
    Super Moderator
    Join Date
    May 2005
    Location
    Southern tip of Silicon Valley
    Posts
    2,877
    Thanks
    2
    Thanked 164 Times in 159 Posts
    I see 2 key problems.

    1. You're trying to manually parse an html file instead of using one of Perl's html parser modules.

    2. Manually truncating the file to a hard coded and possibly arbitrary length is bound to cause problems that may not have a viable solution.

    What is the source of the "story.txt" file?

    After processing the file, what does the script do with it?

    Why truncate the file at 2100 characters?

    Have you looked at any of Perl's html parsers and text formatting modules?

  • #8
    New Coder
    Join Date
    May 2008
    Posts
    19
    Thanks
    1
    Thanked 0 Times in 0 Posts
    hi,

    thanks for the reply.
    this is what I am doing.
    1) I open a text file (this get contents from other script when its run)
    2) show the contents from the text file.

    As the text file has html tags like <a href="'>sample text</a> or <p>sample text</p> its causing issues when its cutting off at the defined characters. I am cutting it off at 2100 characters as I just want to show just the snippet and not the complete text. So if the text around 2100th character has some <a> tag then creates the problem as its not closed and it flows to the next section on the page or the other solution for me is to cut the text after 4th or 5th <p> tag as the text file will contain atleast 8 paragraphs.

    thanks
    hope this is clear.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •