Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 9 of 9
  1. #1
    Senior Coder xelawho's Avatar
    Join Date
    Nov 2010
    Posts
    3,021
    Thanks
    56
    Thanked 566 Times in 563 Posts

    storing references to nodes during recursion

    hello,

    I don't know much about the DOM so forgive my ignorance.

    At the moment, my code recurses all the nodes in a div and pushes the text nodes onto an array that I later access.

    Which works fine, but if the document is very large it takes a few seconds to do this. The document I am working with at the moment ends up with an array of 3,880 text nodes in it, which takes about 5 seconds to compile.

    So what I was thinking is it would be better, in the recursion, to just be storing a reference to those nodes rather than the nodes themselves.

    I know that there is a nodeList (although I don't know if that includes nested elements or if each child node has its separate nodeList), but to me I think ideal would be to push the index of the node onto the array during recursion and then later to access it by something like

    Code:
    document.getElementById("thediv").nodeList[nodesarray[i]]
    but is that possible, and would it work?

    (and I guess more importantly, will it be significantly quicker than storing the nodes themselves?)

    thanks for any thoughts...

  • #2
    Senior Coder rnd me's Avatar
    Join Date
    Jun 2007
    Location
    Urbana
    Posts
    4,461
    Thanks
    11
    Thanked 600 Times in 580 Posts
    nodelists will only contain the direct child nodes of the parent, not nested nodes in sub-tags.

    it would be much faster to store strings in the array instead of nodes. if you can pluck the properties of the textNodes into a new soft object, that object should be much faster to operate on in a loop of many.


    textnodes are an older interface, what are you doing with the array that you need the actual textNodes for?
    my site (updated 2014/10/20)
    BROWSER STATS [% share] (2014/9/03) IE7:0.1, IE8:4.3, IE11:9.2, IE9:2.7, IE10:2.6, FF:16.8, CH:47.5, SF:7.8, NON-MOUSE:37%

  • #3
    Senior Coder xelawho's Avatar
    Join Date
    Nov 2010
    Posts
    3,021
    Thanks
    56
    Thanked 566 Times in 563 Posts
    hmmm. let's see if I can do this briefly. The reason why I (think I) need the actual nodes is:
    - once I have them I start looping through each node, and looping through each word in each node, feeding it to the spellcheck engine (fair enough, could just use the nodeValue for that)
    - then if it comes up misspelt, the user gets the option to change it, either by direct input or from a list (ok, that bit was irrelevant)
    - but then, the page only registers a change under two conditions (and these I can't do anything about):
    1. the text has actually changed
    2. the parent node of the text node has gained and lost focus

    yeah, so that's the bit - when/if the change does occur I dispatch a focus and blur event to the parentNode of the text node that got changed, thereby registering the change. Which is why I need to know, at the time of that change, which node the text belongs to.

    I think. I mean, I could just store a reference to the parentNode that the text belongs to at the time of pushing it onto the array, but then I'm back where I started...

    unless there's some better way of going about this?

  • #4
    Senior Coder rnd me's Avatar
    Join Date
    Jun 2007
    Location
    Urbana
    Posts
    4,461
    Thanks
    11
    Thanked 600 Times in 580 Posts
    a couple of things.
    1.
    maybe you were using data or new text nodes?
    it seems that if you set the .textContent property of a node, it updates on-screen right away, no bluring required. seems to work even in contentEditable.


    2. presumably, all the text you are scanning is in an html element container.

    rather than looping through the dom and spell-checking each word, it would likely be much faster to find all the misspellings at once using a unique word list from the textContent of the container.

    once you have a short list of misspelled words, you then iterate your nodes, checking for words in the node that appear in your misspelled list. This means you have to search from a dozen words each node word, rather than from all the words each node word.

    Code:
    function uniqueWords(elm){
       return (elm.textContent.match(/\b\w+\b/g)||[])
         .filter(function(a){return this[a] ? false : (this[a]=1); }, {}).sort();
     }
    
    var dict={};// preload dictionary into object for fast property lookup:
    "fred run see".split(" ").forEach(function(a){this[a]=1;return this;}, dict);
    
    //the simulated content of the container node (for this demo):
    var content="see fred run. rin fred run!";
    
    //a list of words in the container: (using spoofed container element)
    var allWords=uniqueWords({textContent:content});
    
    //a look-up table of words in the container and not in the dictionary:
    var mispelled={};
    
    //populate misspellings:
    allWords.forEach(function(word){if(!dict[word]){mispelled[word]=1;}; });
    
    // mispelled == {"rin":1}...
    //now we can make a super-fast prefilter to sit in front on your existing routine that operates on each text node. 
    //feed the string content of the node to this function, and skip to the next node if false:
    
    function containsMispelling(strText){
      return (str.text.match(/\b\w+\b/g)||[]).some(function(word){return mispelled[word];});
    }
    using that type of pre-filter, you can use your existing code more-or-less as is.
    this gives you the ability to basically instantly skip a node if nothing is mispelled.
    it's faster because we save a lot of looping on each node using a unique word list and the look-up-table, comparing only (#words * #misspellings) to find a bad text, instead of (#words * #dictionaryWords) to find a bad one. if there is a bad one, your normal id/suggestion routine takes over.


    this pre-routine sweep also allows us to display a list of all the found bad words before the replacement wizard UI is shown. Even if it somehow took longer (which it wont), the user won't mind as much because we can say "fixing 12 of 14" somewhere.
    this can save a lot of user frustration/doubt if their misspellings occur early in the text, since using just document-order selections to indicate progress would make the process appear to be taking a long time.


    edit:
    re-reading the OP, i think i might have a faster textNode collector for you:

    Code:
    function getNodes(prop, val, meth, nd, useSelf ){
     var r=[], any= getNodes[val]===true;
     nd=nd||document.documentElement;
     if(nd.constructor===Array){nd={childNodes:nd};}
      for(var cn=nd.childNodes, i=0, mx=cn.length;i<mx;i++){
        var it=cn[i];
        if( it.childNodes.length && !useSelf ){r=r.concat(getNodes(prop, val, meth, it,useSelf ));}
    	if( any ? it[prop] : (it[prop]!==undefined && (meth ? ""[meth] && 
    		String(it[prop])[meth](val) : it[prop]==val))){
    		  r[r.length]=it; 
    	}
      }//nxt
     
    return r;
    };getNodes[null]=true; getNodes[undefined]=true;
    
    var d1=+new Date, n, rt;
    [n=getNodes("nodeType", 3).length, rt= +new Date() - d1, (n/rt)*1000 ];
    run in chrome console on http://en.wikipedia.org/wiki/Rio_de_...tional_Airport, i see:

    Code:
    nodes, ms, nodes/sec
    [3089, 25, 123560]
    which looks to be a lot faster (25ms) than 5 seconds to collect 3089 (almost 3,880) text nodes into an array. my computer is not that fast.


    i see you've been working on this project quite a bit. hopefully you can use some of the above snips or concepts to get the performance where you want it to be.

    if you are using indexOf() anywhere or an array loop, a compatible look-up-table version can run dozens of times faster...
    Last edited by rnd me; 02-24-2013 at 09:19 AM.
    my site (updated 2014/10/20)
    BROWSER STATS [% share] (2014/9/03) IE7:0.1, IE8:4.3, IE11:9.2, IE9:2.7, IE10:2.6, FF:16.8, CH:47.5, SF:7.8, NON-MOUSE:37%

  • #5
    Senior Coder xelawho's Avatar
    Join Date
    Nov 2010
    Posts
    3,021
    Thanks
    56
    Thanked 566 Times in 563 Posts
    Quote Originally Posted by rnd me View Post
    i see you've been working on this project quite a bit.
    ha, ha. yes. I am somewhere in between loving it and hating it - and this status changes various times per day.

    Thanks for your thoughts. It's kind of complicated to describe what's going on here, as this is an extension designed to run over the top of another extension (and the base extension is password protected, otherwise I would just point you at it). But basically:

    1) It's not the on-screen display that is the problem, it's a question of the page's editor registering that changes have been made. I tried it with textContent, but it's the same result - it appears we have to spoof the page into thinking that human changes have been made, and the least obtrusive way I can find is with the focus/blur routine. My guess is that the editor has some sort of "contentchanged" event that doesn't get fired if the text is altered programmatically. This bit works OK, anyway, so I'm not that worried about it.

    2) The pre-lookup idea certainly seems interesting and makes a whole lot of sense. I gather that this is to reduce waiting time while the code finds the next misspelt word. This is actually not such a big deal, either - it skips along quite nicely once it actually starts running, and part of the requirement for running the base extension is a certain amount of RAM and processing speed, so we can assume that everybody will more or less get the same results (I have a slow computer, too, which I guess is good for benchmarking). I am going to look into this later, though, as it's an interesting idea and would make the code really click rather than just amble along at an acceptable pace.

    3) Your node collector does appear to be seriously faster than what I am using at the moment. The only thing is that during collection I filter out some elements that aren't of interest, but plonking that filter down in the middle of your function of course just broke it, and being that I don't really understand how yours works I am at a bit of a loss. Here's what I'm using at the moment, which works fine in terms of grabbing what I want and ignoring the rest. If you could help me incorporate the "if" into your function I'd be really grateful:

    Code:
    function process(element) { 
      var children = element.childNodes; 
      for(var i = 0; i < children.length; i++) { 
        var child = children[i]; 
        if(child.nodeType === 3) { 
          if(child.data && content.getComputedStyle(child.parentNode, null).getPropertyValue("visibility") != "hidden" 
    	  && child.parentNode.tagName.toLowerCase() != "span" && child.parentNode.tagName.toLowerCase() != "div") { 
    	    namespace_checkManifest.txts.push(child); 
          } 
        } else { 
          process(child); 
        } 
      } 
    } 
    
    process(content.document.getElementById("manifest_wrapper"));
    4) I remembered the other reason why I need to keep track of the nodes - because once a misspelling has been found the code selects the word using a range and scrolls to that part of the page. Maybe running a simple regex there would work, but it still seems to me that, even having found that text the code would still need to know where it is on the page, which would mean looping through all of the text nodes every time a misspelling is found, as opposed to once at the start.

    Thanks again for your thoughts, and sorry it gets kind of abstract. The code's heavily reliant on elements in the base extension and some NSI components, so I guess just dumping code here wouldn't be that useful. If you're really interested, I can send you the xpi file and an html page to run it on, but either way I appreciate the input.

  • #6
    Senior Coder rnd me's Avatar
    Join Date
    Jun 2007
    Location
    Urbana
    Posts
    4,461
    Thanks
    11
    Thanked 600 Times in 580 Posts
    Quote Originally Posted by xelawho View Post

    3) Your node collector does appear to be seriously faster than what I am using at the moment. The only thing is that during collection I filter out some elements that aren't of interest, but plonking that filter down in the middle of your function of course just broke it, and being that I don't really understand how yours works I am at a bit of a loss. Here's what I'm using at the moment, which works fine in terms of grabbing what I want and ignoring the rest. If you could help me incorporate the "if" into your function I'd be really grateful:

    to be honest, that code i wrote is older and bit magical, i don't fully understand how it all works...

    it's made to support a ton of filter condtions, maybe you can use one.
    i posted it here, along with some usage examples. Checkout post#1 for the examples.

    the corralling is flexible on that utility, but it's still a bit stiff customizing multi-criteria filtering. since it gathers the nodes a lot faster, we're going to be ahead of ourselves if we simply gather them all and then filter the complete set to make the subset we're interested in.

    i would try something like this:

    Code:
    getNodes("nodeType", 3).filter(function filtNodes(node, mom){
      mom=node.parentNode;
      return  !{span:1, div:1}[mom.tagName.toLowerCase()] &&
         getComputedStyle(mom, null).visibility != "hidden";
    });
    note that adding getComputedStyle() will drastically slow down the loop execution!

    if i had the isMispelled() function working, i would add "&&" to the last line and add a new last line saying isMispelled(node.textContent) or something to find only the misspelled nodes, or invert it if i needed the good nodes, you get the idea...


    EDIT
    thinking about that getComputedStyle overhead, i see that it's applied on elements only.
    if we check every node, we check a lot more often than if we check every element, which there are fewer of.
    therefore, it should be faster to break our problem into two steps.

    1. mark all elements as hidden or visible using a fast-accessible property
    2. check each node against the element's property instead of running the method each node


    Code:
    [].slice.call(document.body.getElementsByTagName("*")).forEach(function noteVisibility( elm ){
         elm._visible = getComputedStyle(elm, null).visibility != "hidden";
    });
    
    getNodes("nodeType", 3).filter(function filtNodes(node, mom){
       return  !this[(mom=node.parentNode).tagName.toLowerCase()] && mom._visible;
    }, {span:1, div:1});
    this should again cut down on the repetitive slow comparison, drastically speeding up the loop execution.

    i made a couple other optimizations on the last function for performance and lower garbage generation, sorry if it's harder to read...
    Last edited by rnd me; 02-24-2013 at 10:17 PM.
    my site (updated 2014/10/20)
    BROWSER STATS [% share] (2014/9/03) IE7:0.1, IE8:4.3, IE11:9.2, IE9:2.7, IE10:2.6, FF:16.8, CH:47.5, SF:7.8, NON-MOUSE:37%

  • Users who have thanked rnd me for this post:

    xelawho (02-25-2013)

  • #7
    Senior Coder xelawho's Avatar
    Join Date
    Nov 2010
    Posts
    3,021
    Thanks
    56
    Thanked 566 Times in 563 Posts
    Quote Originally Posted by rnd me View Post
    note that adding getComputedStyle() will drastically slow down the loop execution!
    reading that, a little light blinked on:

    the only test I really need to do is if the parent node is editable. If it is then it needs to be spellchecked. If it's not then it doesn't. Simple.

    So (still not understanding really what goes on in your function) I hacked this together:
    Code:
    getNodes("nodeType", 3).filter(function filtNodes(node, mom){
      mom=node.parentNode;
      return  !{false:1}[mom.isContentEditable];
    });
    which seems to do the job nicely - running the benchmark you posted in #4, it scores:
    nodes, ms, nodes/sec

    [8617,3159,2728]

    (and just for comparison, without the filter, the results are)
    [32243,2754,11708]

    so the filter obviously slows stuff down. But unless there's some other way around it, a three second startup time on an 80k+ word document seems to be acceptable.

    Then I had other stuff to do, so I turned the computer off (I should really do this more often - I'm sure it would make me write better code, like those Russian filmmakers who didn't have any film) and thought on and off about the idea of pre-compiling a list of misspelt words, and how to find the node the word belongs to in that case.

    Which made me think I could use nsIWebBrowserFind, because once I have the word highlighted (which is just a selection range) I can get the node it belongs to, etc, etc.

    But then I thought, really, I'm just reinventing all this stuff that XPCOM already has, because nsIEditor has an nsIEditorSpellCheck which has all sorts of funky stuff, including a GetNextMisspelledWord method... and as much as I love rolling my own, when you see how fast the find bar in Firefox finds a word within a massive document, it's kind of hard to resist...

    all of which is to say that I think I am about to do what I have been annoyed by in the past: The Massive Mid-Thread U-Turn.

    And I only really describe the above process to let you know that while it may look like this thread was a waste of time, it was actually very useful to me in making me think about a really sensible way to be doing this that will (I hope - I haven't even begun to write it yet) get rid of all the bugs and annoyances encountered so far.

    So thanks again, rnd me

  • #8
    Senior Coder rnd me's Avatar
    Join Date
    Jun 2007
    Location
    Urbana
    Posts
    4,461
    Thanks
    11
    Thanked 600 Times in 580 Posts
    the object was only to test many possibilities at once, so we can simply a single comparison down to the statement.

    Code:
    getNodes("nodeType", 3).filter(function filtNodes(node){
      return node.parentNode.isContentEditable;
    });
    that might run a hair faster.
    window.find(strToSearchFor) works in firefox to select a phrase in the dom; perhaps you can just call it on each misspelled word to jump to the word in a wizard...
    my site (updated 2014/10/20)
    BROWSER STATS [% share] (2014/9/03) IE7:0.1, IE8:4.3, IE11:9.2, IE9:2.7, IE10:2.6, FF:16.8, CH:47.5, SF:7.8, NON-MOUSE:37%

  • #9
    Senior Coder xelawho's Avatar
    Join Date
    Nov 2010
    Posts
    3,021
    Thanks
    56
    Thanked 566 Times in 563 Posts
    Quote Originally Posted by rnd me View Post
    window.find(strToSearchFor) works in firefox to select a phrase in the dom
    mmm... so it does...this just keeps getting simpler


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •