Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Page 3 of 3 FirstFirst 123
Results 31 to 41 of 41
  1. #31
    Moderator
    Join Date
    May 2002
    Location
    Hayward, CA
    Posts
    1,461
    Thanks
    1
    Thanked 23 Times in 21 Posts
    That's what the DOM spec says it should do.
    "The first step to confirming there is a bug in someone else's work is confirming there are no bugs in your own."
    June 30, 2001
    author, Verbosio prototype XML Editor
    author, JavaScript Developer's Dictionary
    https://alexvincent.us/blog

  2. #32
    Moderator
    Join Date
    May 2002
    Location
    Hayward, CA
    Posts
    1,461
    Thanks
    1
    Thanked 23 Times in 21 Posts

    Exclamation

    Originally posted by jkd
    Code:
    document.addEventListener('load', function() {
      var treeWalker = document.createTreeWalker(document, NodeFilter.SHOW_TEXT, { acceptNode: function(node) { return /\S/.test(node.nodeValue) ? NodeFilter.FILTER_REJECT : NodeFilter.FILTER_ACCEPT } }, false);
    
      while (treeWalker.nextNode())
        treeWalker.currentNode.parentNode.removeChild(treeWalker.currentNode);
    
    }, true);
    Behold the awesomeness of DOM2 Traversal.
    EVIL, EVIL, BROKEN CODE!!!

    And besides, it's hard to read.

    Jason, your code just does not work.
    Once you remove the node, you've got to reset the treeWalker.currentNode to a node still in the document.

    Also, for some XML languages, whitespace nodes matter. XHTML is not one of them, except in the <pre/> element (which I think I can get away with removing whitespace nodes from anyway.)

    (Note: this is something I forgot for my own code as well.)

    This code works.

    Code:
    const nsIDOMNodeFilter = Components.interfaces.nsIDOMNodeFilter;
    window.addEventListener("load", function(evt) {
      var filter = {
        acceptNode: function(node) {
          if ((/\S/.test(node.nodeValue) || (node.parentNode.namespaceURI != "http://www.w3.org/1999/xhtml"))) {
            return nsIDOMNodeFilter.FILTER_SKIP;
          }
          return nsIDOMNodeFilter.FILTER_ACCEPT;
        }
      }
    
      var treeWalker = document.createTreeWalker(document, nsIDOMNodeFilter.SHOW_TEXT, filter, true);
      while (treeWalker.nextNode()) {
        treeWalker.currentNode.parentNode.removeChild(treeWalker.currentNode);
        treeWalker.currentNode = document;
      }
      var output = document.getElementById("output");
      var node = document.documentElement.firstChild;
      output.appendChild(document.createTextNode(node.nodeType));
    }, true);
    And to think I was going to use that code you gave me in a DevEdge article I'm writing... Read the spec, Jason.
    Last edited by Alex Vincent; 06-18-2003 at 02:22 AM.
    "The first step to confirming there is a bug in someone else's work is confirming there are no bugs in your own."
    June 30, 2001
    author, Verbosio prototype XML Editor
    author, JavaScript Developer's Dictionary
    https://alexvincent.us/blog

  3. #33
    jkd
    jkd is offline
    Senior Coder jkd's Avatar
    Join Date
    May 2002
    Location
    metro DC
    Posts
    3,163
    Thanks
    1
    Thanked 18 Times in 18 Posts
    You went thread digging just to blast me for code I hadn't tested? Now you're getting desperate .
    Last edited by jkd; 06-18-2003 at 02:24 AM.

  4. #34
    Master Coder
    Join Date
    Feb 2003
    Location
    UmeŚ, Sweden
    Posts
    5,575
    Thanks
    0
    Thanked 83 Times in 74 Posts
    Just a question, but how would you handle significant whitespace between nodes, such as "<a ...></a> <em>...</em> <img .../> <strong>...</strong>"? If I'm not mistaken, this function will remove those whitespaces.
    liorean <[lio@wg]>
    Articles: RegEx evolt wsabstract , Named Arguments
    Useful Threads: JavaScript Docs & Refs, FAQ - HTML & CSS Docs, FAQ - XML Doc & Refs
    Moz: JavaScript DOM Interfaces MSDN: JScript DHTML KDE: KJS KHTML Opera: Standards

  5. #35
    Moderator
    Join Date
    May 2002
    Location
    Hayward, CA
    Posts
    1,461
    Thanks
    1
    Thanked 23 Times in 21 Posts

    Try again?

    liorean, for some reason I don't see anything significant about that whitespace. Unless you're talking about single spaces.

    The code does remove that, and if that's what you're referring to, then yes, that's probably a bug. Easily fixed, though.

    Code:
        acceptNode: function(node) {
          if ((/\S/.test(node.nodeValue) || (node.parentNode.namespaceURI != "http://www.w3.org/1999/xhtml"))) {
            return nsIDOMNodeFilter.FILTER_SKIP;
          }
          if (node.nodeValue.length == 1) {
            return nsIDOMNodeFilter.FILTER_SKIP;
          }
          return nsIDOMNodeFilter.FILTER_ACCEPT;
        }
    "The first step to confirming there is a bug in someone else's work is confirming there are no bugs in your own."
    June 30, 2001
    author, Verbosio prototype XML Editor
    author, JavaScript Developer's Dictionary
    https://alexvincent.us/blog

  6. #36
    Senior Coder
    Join Date
    Jul 2004
    Location
    New Zealand
    Posts
    1,315
    Thanks
    0
    Thanked 2 Times in 2 Posts
    So does the normalize (sic) function actually work? I've never managed to get it to work. Is there a trick to it?

  7. #37
    Master Coder
    Join Date
    Feb 2003
    Location
    UmeŚ, Sweden
    Posts
    5,575
    Thanks
    0
    Thanked 83 Times in 74 Posts
    It works, kinda. It just isn't suitable for this purpose. What it does is to merge one or several #text nodes into a single one. It doesn't remove any node except for when merging it's value into another node.
    liorean <[lio@wg]>
    Articles: RegEx evolt wsabstract , Named Arguments
    Useful Threads: JavaScript Docs & Refs, FAQ - HTML & CSS Docs, FAQ - XML Doc & Refs
    Moz: JavaScript DOM Interfaces MSDN: JScript DHTML KDE: KJS KHTML Opera: Standards

  8. #38
    New Coder
    Join Date
    Nov 2004
    Posts
    12
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Re: jkd's post #6

    There is some kind of growth going on in the function, but not for the reason you put forward. There are n nodes in total, and it visits them all once.

    The TreeWalker uses much the same algorithm.

    The reasons why it will be faster are that

    1) it is "internal", not relying itself on script.

    2) It is a proper list iterator. Collections are probably implemented internally as some weirdo data type like a linked list (Have you ever noticed that you can loop an array faster than you can loop a collection ?).

    When moving through each flat collection (childNodes) iterator object keeps a reference to the current link, and moves directly on from there. Meanwhile, the scripted function accesses each childNode by index. This appears to be "direct access" - but it probably isn't. Internally, the list must be searched from [0] up to the required index each time.

    I suppose that must lead to an arithmetic progression. Looping a collection by index gets more inefficient, compared with an iterator object, the longer the collection gets, to the tune of

    n(n-1)/2 *n --> (n-1)/2

    It could be that using neighbour relationships, instead of indices could remove this issue. So maybe trying to walk using node.nextSibling.

    Then again, all this could all be 'ked up.
    Last edited by Passin Thru; 11-27-2004 at 05:57 PM.

  9. #39
    Master Coder
    Join Date
    Feb 2003
    Location
    UmeŚ, Sweden
    Posts
    5,575
    Thanks
    0
    Thanked 83 Times in 74 Posts
    Quote Originally Posted by Passin Thru
    There is some kind of growth going on in the function, but not for the reason you put forward. There are n nodes in total, and it visits them all once.

    The TreeWalker uses much the same algorithm.

    The reasons why it will be faster are that

    1) it is "internal", not relying itself on script.
    No more internal than the item() syntax is. They're both layers above an array of references, and TreeWalker has way more overhead in terms of scripting since it has to create closures while the loop doesn't.
    2) It is a proper list iterator. Collections are probably implemented internally as some weirdo data type like a linked list (Have you ever noticed that you can loop an array faster than you can loop a collection ?).
    Of course, arrays are presumably more compact, don't need to be "live", don't have to carry synchronisation code etc.
    When moving through each flat collection (childNodes) iterator object keeps a reference to the current link, and moves directly on from there. Meanwhile, the scripted function accesses each childNode by index. This appears to be "direct access" - but it probably isn't. Internally, the list must be searched from [0] up to the required index each time.
    No, it doesn't. The access time for object 0 and object length-1 are over a sequence 5000 accesses about equal.
    I suppose that must lead to an arithmetic progression. Looping a collection by index gets more inefficient, compared with an iterator object, the longer the collection gets, to the tune of

    n(n-1)/2 *n --> (n-1)/2
    As I've just showed, it doesn't.
    It could be that using neighbour relationships, instead of indices could remove this issue. So maybe trying to walk using node.nextSibling.
    Actually, the only place I see TreeWalkers or simple node traversal as faster than indiced access is in either concurrent handling (which neither is well suited to, really) or in precompiled static arrays, which DOMCollections are NOT, according to specification. TreeWalkers still have to walk through each element (so it doesn't get the benefit of just travelling #text nodes).






    Alex: Whitespace is significant in the following cases:
    - Formatting preserved contexts.
    - Elements with CDATA content.
    - Elements with #PCDATA content.
    In both the latter cases, the normalised whitespace is significant (or is it only #PCDATA normalisation takes place? I'll have to go read the XML spec again...). In the former case, all whitespace is significant. In cases with element only content models, whitespace is not significant. Thus, an implementation can only know when it may skip whitespace nodes it it's got a knowledge of the DTD. However, thanks to CSS, we can change the first point as necessary, so it doesn't make sense to not contain the text nodes in the DOM tree, even if the behavior itself is entirely up to the user agent - iew does not break any spec by not containing the source-formatting-only whitespace.
    Last edited by liorean; 11-27-2004 at 08:45 PM.
    liorean <[lio@wg]>
    Articles: RegEx evolt wsabstract , Named Arguments
    Useful Threads: JavaScript Docs & Refs, FAQ - HTML & CSS Docs, FAQ - XML Doc & Refs
    Moz: JavaScript DOM Interfaces MSDN: JScript DHTML KDE: KJS KHTML Opera: Standards

  10. #40
    New Coder
    Join Date
    Nov 2004
    Posts
    12
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks for that liorean. I wasn't 100% sure about my premises, so I didn't want to sound too certain. The motivation was that I thought the criticism of the script - that it was somehow walking further than it should - was unfair.

    Interesting, the test that accesses a 'near' and a 'distant' collection member. I think I've actually done similar myself, then completely forgotten about it. Your explanation has opened some things up a little. I have tried using a JScript Iterator and found that it actually seems slower than simple indexed looping. I've been thinking that I perhaps could have used a more efficient control structure for the iterator, but your info confirms that I shouldn't bother using it at all apart from for objects that can't be enumerated any other way.

    Javascript 'arrays' surely aren't arrays either, when it comes to access, internally speaking. They just happen to be more 'digit oriented'.

    Using the 'correct' item(i) method does seem marginally slower than array-style access. Does anyone think that scripts that use the convenient approach will one day pay the price, and fail ?

  11. #41
    New to the CF scene
    Join Date
    Nov 2006
    Location
    Leeds, UK
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Smile My Proposal

    Hi all,
    I've written a way of doing this which is hopefully smaller and faster.

    Code:
    var someElementRef = document.getElementsByTagName('body')[0];
    someElementRef.innerHTML = someElementRef.innerHTML.replace(/\B\s\B|[\n\r\t]/g,'');
    Most of the above is for the purpose of example, it's the regular expression doing the work.

    What it's looking for is;

    1) /\B\s\B - A single whitespace node without a word boundary on either side
    2) \n - a new line anywhere
    3) \r - a new line anywhere
    4) \t - a tab anywhere


    Thanks a lot,



    Jamie Mason


 
Page 3 of 3 FirstFirst 123

LinkBacks (?)


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •