Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 4 of 4
  1. #1
    New to the CF scene
    Join Date
    Jul 2002
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Parsing parts of an HTML file?

    I have a huge webpage (over 300kb of just links) and what I want to be able to do is parse just pieces of the big page onto a template or another page. Basically I want to be able to put comments or anchors or something in the big HTML file to tell a CGI parsing script where to start parsing and where to stop parsing. Not only do I want it to be able to do that, I want it to work with variables. Being able to parse Part A or Part B not the entire page. I have found many scripts that use CGI and SSI to parse entire webpages, but I can't find anything that will parse customly defined parts of a page. Is this possible to do? If so, somebody please point me in the right direction of a script that already accomplishes this, or some code that I could use to start writing a script like this.

    To help you visualize what I want to do....I want to use a CGI script to parse out different parts of this (www.smasonline.com/lyrics/list.html) lyrics page. So I can divide it into sections for each letter of the alaphbet.


    If you could help me I would be forever greatful.

    Thanks in advance,
    Sancho

  • #2
    Regular Coder
    Join Date
    Jun 2002
    Location
    Brisbane, Australia
    Posts
    181
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Lightbulb

    What you could do is something like this:
    (not guarenteed to work and youd definately have to test it)

    Code:
    #!/usr/bin/perl
    
    use LWP; # not sure if this is correct ... maybe LWP::Simple;
    
    $addr = "http://www.somewhere.com/";
    
    $html = get("$addr");
    
    @data = split(/\n/,$html);
    
    foreach (@data) {
     if ($_ =~ /<!--(.*)-->/gis) {
      if ($1 eq "LIST START") {
       $start_typing = "true";
      } elsif ($1 eq "LIST END") {
        $start_typing = "false";
      }
     }
     if ($start_typing eq "true") {
      print $_;
     }
    }
    Note: you have to put a comment (eg: <!--LIST START--> and <!--LIST END-->) where the content or links start.
    Last edited by mr_ego; 07-14-2002 at 05:25 AM.

  • #3
    Regular Coder
    Join Date
    Jul 2002
    Location
    London, UK
    Posts
    126
    Thanks
    0
    Thanked 0 Times in 0 Posts
    What exactly are you trying to do here?

    Do you want to split the whole page into a group of pages or just print out the content within the <!-- LIST ... --> comments?

    By the way, it is LWP::Simple that you want here .

    If you want to parse HTML documents there are a few modules out there which can help you..

  • #4
    New to the CF scene
    Join Date
    Jul 2002
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Hmm...

    I want to be able to split the page into lots of smaller pages. But I want a script that will do it for me. I want to continue to make the big webpage full of links, and have it split into smaller pages by a script using comments. I want a page for each letter of alphabet.

    That way when I get new lyrics I can just update the big page and all the other pages would include the new lyrics as well; because they are just parsing whats in between comments. The idea I have is to use the big HTML file in the same kind of way I would use a database. Except I just want pull things from the database instead of searching it or anything like that.

    I know this all sounds confusing, sorry. Hopefully you will understand what I mean.

    As far as modules go, I can't use them. Thanks for the idea though. The site is being hosted by a crappy webhost company. So I can't change anything like that, or use PHP or use anything useful besides Perl and SSI.

    I have tried the script you posted mr_ego. Thanks for pointing me in the right direction. But I know very little about Perl....I've always just used other peoples scripts, never took time out to learn any language. Anyways, I set up my own web server to test it out on temporarly. I always get a 500 error and when I check the Apache error log, I get "Syntax error on line 23 of EOF". Anybody got any ideas how to fix this, or what I'm doing wrong?

    I have posted this same question in multiple forums, you guys are the first people that even responded. Thanks alot


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •