Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 7 of 7
  1. #1
    Regular Coder
    Join Date
    May 2006
    Posts
    123
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Why, whitespace, why?

    I'm trying to figure out reading XML into Javascript, and, frustrating as that is alone, what really boggles my mind is Mozilla's default NOT to ignore whitespace! I realize this may have its applications, but for the sake of my sanity (not to mention being cross-browser), I NEED to parse my XML document WITHOUT whitespace!

    All I want to do essentially, is read in an XML document that has, let's say, 100 or so <character> nodes off the root, and output their text values into the HTML. Not so hard right? But if I make an XML document that I can actually READ (so that I don't go insane), I'm going to wind up with much more than 100 nodes thanks to reading in the whitespace... what can I do about this?

    I've seen custom functions that will remove the whitespace nodes for me, but certainly there's an easier way to do this?!?

    I can't imagine implementations not anticipating this need with XML.

  • #2
    Master Coder
    Join Date
    Feb 2003
    Location
    UmeŚ, Sweden
    Posts
    5,575
    Thanks
    0
    Thanked 83 Times in 74 Posts
    Well, the answer on why no rational implementation of an XML DOM ignores text nodes that contain only whitespace should be pretty clear when you add some other facts:

    - In XML, the XML application is not bound to the engine, it's bound to the document, entirely unlike the HTML engines.
    - For HTML, it's reasonable to remove whitespace nodes from the DOM when they are not significant. There are good arguments either way.
    - XML has entirely different settings. Since the XML application and the XML engine are unconnected, the engine must defer the handling of whitespace.
    - In XML, only validating engines have any knowledge at all about the whitespace handling of the XML application.
    - But that doesn't help much since CSS allows changing the whitespace treatment, and whether whitespace between elements is significant or not.

    (XML application = XML language, not "computer program handling XML")

    In short, for XML there isn't really any choice. Keep the whitespace in the DOM or lose functionality.

    As for your problem, there are a slew of solutions to it:
    - Do you know the element types you want to handle? Then getElementsByTagName (for namespace unaware engines like that in iew) and getElementsByTagNameNS (for namespace aware engines, like pretty much everything else) might be a good solution.
    - Do you know what element types may contain significant whitespace? If so, you can filter out the whitespace nodes from the childNodes nodelist on beforehand.
    - You could also do a simple check on each node for nodeType to tell whether it is an element or not, and jump to the next one if it isn't.
    - XPath, NodeIterators or TreeWalkers should be able to do the filtering you want effectively, but are not supported by iew.
    Last edited by liorean; 05-11-2006 at 07:28 PM.
    liorean <[lio@wg]>
    Articles: RegEx evolt wsabstract , Named Arguments
    Useful Threads: JavaScript Docs & Refs, FAQ - HTML & CSS Docs, FAQ - XML Doc & Refs
    Moz: JavaScript DOM Interfaces MSDN: JScript DHTML KDE: KJS KHTML Opera: Standards

  • #3
    Master Coder felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, Australia
    Posts
    6,642
    Thanks
    0
    Thanked 649 Times in 639 Posts
    It is not just Mozilla that ignores whitespace. ALL browsers except intranet exploder do the same. IE is the non-standard one.
    Stephen
    Learn Modern JavaScript - http://javascriptexample.net/
    Helping others to solve their computer problem at http://www.felgall.com/

    Don't forget to start your JavaScript code with "use strict"; which makes it easier to find errors in your code.

  • #4
    Regular Coder
    Join Date
    May 2005
    Location
    Michigan, USA
    Posts
    566
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by liorean
    - You could also do a simple check on each node for nodeType to tell whether it is an element or not, and jump to the next one if it isn't.
    nodeType
    1. element/tag
    2. attribute (This will come through as either a 2 or undefined for backwards compatibility)
    3. text node
    Note: I do not test code. I just write it off the top of my head. There might be bugs in it! But if any thing I gave you the overall theory of what you need to accomplish. Also there are plenty of other ways to accomplish this same thing. I just gave one example of it. Other ways might be faster and more efficient.

  • #5
    Regular Coder
    Join Date
    May 2006
    Posts
    123
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Fine... IE is the exceptional one.

    Fine... there are reasons to read in the white space.

    It's even FINE that it DEFAULTS to reading in the whitespace!

    But if there are such good arguments either way, then why is there no simple option to tell the parser how to handle whitespace?? Or is there?

  • #6
    Master Coder
    Join Date
    Feb 2003
    Location
    UmeŚ, Sweden
    Posts
    5,575
    Thanks
    0
    Thanked 83 Times in 74 Posts
    Quote Originally Posted by bowser1111
    But if there are such good arguments either way, then why is there no simple option to tell the parser how to handle whitespace??
    Well, good arguments either way goes for HTML. And that is because with the HTML engine, the browser knows where text nodes may appear or not.

    However, the XML engine doesn't know. The XML application may, or may not, allow text node children, but everything that has to do with the XML application happens after the XML engine has already parsed the XML. Take XHTML for example: Sure, it has the same whitespace treatment as HTML, and the browser knows this. But, the browser also uses the XML engine for all other XML content types, and they may have different whitespace treatment. So the engine is built so as to retain all the data.

    The engine builds either the DOM itself, or an internal structure that can be interfaced through the DOM. Iew does the latter, and that' why it's DOM works different - the DOM is not it's internal representation. The internal representation keeps the whitespace (since it allows setting random elements to display: inline; it has to keep the whitespace), but the DOM doesn't.
    Or is there?
    Well, in at least some XML engines there problably is one such option for the program that includes the XML engine. But that is a compile time option I would wager, which means the script is stuck with what the program is built to use.









    Just use one of the methods I listed above and work around it. It's really not that much of a roadblock unless you have really high demands on the performance of the script.
    liorean <[lio@wg]>
    Articles: RegEx evolt wsabstract , Named Arguments
    Useful Threads: JavaScript Docs & Refs, FAQ - HTML & CSS Docs, FAQ - XML Doc & Refs
    Moz: JavaScript DOM Interfaces MSDN: JScript DHTML KDE: KJS KHTML Opera: Standards

  • #7
    Regular Coder
    Join Date
    May 2006
    Posts
    123
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Well I have already found a script that removes the whitespace nodes...I will use that. I just wanted to be sure that I wasn't missing an option to be passed to the parser. It appears that I'm not...


    However this still seems like a ridiculous problem. If the need was anticipated the option could easily exist... instead everyone has to do it all by hand. I guess I should just be happy that I'm not using a large XML file where removing whitespace nodes would reduce performance heavily. If that were the case I think I would have to resort to a custom parser.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •