Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 4 of 4

Thread: HTTP request

  1. #1
    Regular Coder
    Join Date
    Oct 2011
    Posts
    106
    Thanks
    12
    Thanked 0 Times in 0 Posts

    HTTP request

    TLDR at bottom

    At the beginning of this school year, I was given a .doc with links to the pages (one definition each) for all the vocabulary words for the semester. As you might imagine, clicking a link for every word is not the most efficient method.

    I opened it in gedit and found that the link text was not obscured, or even encoded, so I got typing.

    I now have a Java application which creates a stack of all of the links and requests the page by parsing the String as a URL and using URL.openConnection(). In the final version, I will render the HTML to just the text, but right now, I have a problem:

    TLDR:
    Only part of the page is available. It is static HTML (with CSS, but there is no javascript), but the body main div is empty. This doesn't even happen in text-based browsers, so what is the problem?
    Last edited by Scriptr; 09-22-2012 at 12:49 AM.

  • #2
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,994
    Thanks
    4
    Thanked 2,662 Times in 2,631 Posts
    Assuming you've properly read it and you have a valid structure for the html, and the JS doesn't exist, the only thing I can think of is that an IFrame is in use.
    If it terminates at the point of the div, I'd say you haven't completed the read.

  • #3
    Regular Coder
    Join Date
    Oct 2011
    Posts
    106
    Thanks
    12
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Fou-Lu View Post
    Assuming you've properly read it and you have a valid structure for the html, and the JS doesn't exist, the only thing I can think of is that an IFrame is in use.
    If it terminates at the point of the div, I'd say you haven't completed the read.
    This is the source code for one of the pages:
    Code:
    <html>
    <head>
    <title>sustainable development</title>
    <link rel="Stylesheet" href="../../../activityshared/code/glossary.css">
    </head>
    <body class=defBody>
    
    <span class=defTerm>sustainable development</span>
    
    <span class=defText>
    The long-term prosperity of human societies and the ecosystems that support them
    </span>
    
    </body>
    </html>
    All pages are identical, save for text content

    This is the code that is read:
    Code:
    <head>
    <title>sustainable development</title>
    <link rel="Stylesheet" href="../../../activityshared/code/glossary.css">
    </head>
    <body class=defBody>
    
    <span class=defTerm>sustainable development</span>
    
    <span class=defText>
    </span>
    
    </body>
    Notice, the missing HTML tags and the defText tag being empty.

    This is the code for the reader:
    Code:
    public void get(URL u) throws IOException{
    		URLConnection con = u.openConnection();
    		con.connect();
    		BufferedReader c = new BufferedReader(new InputStreamReader(con.getInputStream()));
    		System.out.println(u);
    		String s = "";
    		
    		while(c.readLine() != null){
    			s += c.readLine() + "\n";
    		}
    		System.out.println(s);
    		c.close();
    	}
    Lastly, allow me to apologize for failing to include this in the original post.

    Edit: I am unfamiliar with web sockets; I was skipping a line, as does not happen with a Scanner (which uses InputStream.hasNext()), and on a hunch I replaced the reader code with this:
    Code:
    String s = "";
    		String sub;
    		while((sub = c.readLine()) != null){
    			s += sub + "\n";
    		}
    and all works perfectly.
    Last edited by Scriptr; 09-22-2012 at 12:49 AM.

  • #4
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,994
    Thanks
    4
    Thanked 2,662 Times in 2,631 Posts
    That will do it. Since buffers don't have hasNext or similar token checks, you simply pull the assignment during the check. Like you said with pulling a readLine twice, it effectively skips every other line (which you can see here).


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •