JavaScript Performance: Iterating over Arrays with Holes

I was sketching some code that basically took some data from a database and cached it in an array to be repeatedly reused. Conveniently, each row from the database had a unique ID, so I could use that as an array index. But is that a good idea? It just means that when I iterate over the array, I’ll need to check whether an element is defined before I try to use it. Is it better to code around this? Say, storing the id and other information in an object and simply pushing the objects onto the array? This would save needlessly iterating over undefined values later. Or is it better to use the id as an array index so that I can do quick and dirty iterations with checking? This sounds like a job for jsPerf! So, following are the details of my experiment. If you like, you can check it out in full and try the jsPerf yourself.

The Setup

Here are the simple and complex arrays, the select indexes for the entries that we want to look up later, and the pairs of IDs and names that comprise our data. At the end, there’s a get() function so that we can look up entries without a numerical index.

var simple = [], complex = [],
  selectIndexes = [1, 2, 10, 14, 300, 1000],
  pairs = [
    {"id": 1,    "name": "Albert"   },
    {"id": 2,    "name": "Bailey"   },
    {"id": 4,    "name": "Charlotte"},
    {"id": 5,    "name": "Darlene"  },
    {"id": 10,   "name": "Edna"     },
    {"id": 12,   "name": "Faron"    },
    {"id": 13,   "name": "Gary"     },
    {"id": 14,   "name": "Helen"    },
    {"id": 15,   "name": "Igor"     },
    {"id": 16,   "name": "Justin"   },
    {"id": 17,   "name": "Kyle"     },
    {"id": 300,  "name": "Lynette"  },
    {"id": 500,  "name": "Morgan"   },
    {"id": 1000, "name": "Nora"     }
  ],

  get = function (a, i) {
    var ret = null;
    $.each(a, function () {
      if (this.id === i.valueOf()) {
        ret = this;
      }
    });
    return ret;
  };

The Experiment

simple

A simple array with integer indexes and holes.

$.each(pairs, function () {
  simple[this.id] = this.name;
});

$.each(simple, function () {
  if (this && this !== window) { // this !== window is a workaround for IE
    this.charAt(0);
  }
});

$.each(selectIndexes, function () {
  simple[this].charAt(0);
});

complex

A complex array which holds objects with id and name attributes. Since we’re not using integer indices, we have to supply a get function (defined in Setup above).

$.each(pairs, function () {
  complex.push(this);
});

$.each(complex, function () {
  this.name.charAt(0);
});

$.each(selectIndexes, function () {
  get(complex, this).name.charAt(0);
});

The Results

Have a look at this graph:

Browser performance graph. Simple arrays with holes are the clear winner.

Simple arrays with holes are the clear winner.

Conclusion

The simple implementation just wipes the floor with complex one across the board. Not only is it easier to implement and read, but checking whether an element is undefined before you access it appears to be a trivially cheap operation. Arrays with holes are simple and safe to use.

This entry was posted in javascript, Uncategorized and tagged , , , , , , , . Bookmark the permalink.

3 Responses to JavaScript Performance: Iterating over Arrays with Holes

  1. Christian Iversen says:

    Sorry to be a pain, but I’ve had to consider this sort of thing quite a few times while developing pyjaco, so I thought I’d offer my 2 øre ;-)

    1. You use an iteration function that always itself iterates over the entire dataset, turning your complex solution into O(n^2), while the simple one is using the built-in hash lookup of the javascript engine, which is between O(1) and O(n) on average, depending on the hash details and implementation of the js engine.
    2. You never exploit the fact that the numeric ID is (I presume?) unique. You could use this to build a index table, so you have id(n) == index[n] and val(n) == value[n].
    3. If you had this kind of table, you have multiple options, the most obvious being a binary table search for the ID (inline instead of recursive for a constant-time speed bonus). Buckets+linear or buckets+binary would also be viable options.
    4. If the complex solution wasn’t O(n^2) it would still be unlikely that it would win for datasets this small. Even a very well-tuned implementation will probably need around 100 elements before we can see any real effect of the choice of algorithm.

    On a related, but not identical note, here’s the jsPerf results of iterating over sparse arrays

    http://jsperf.com/iterate-sparse

    Now, this is just getting to the elements at all, without even searching for them, so this is only half the puzzle. But you can see quite clearly that in this case, the simplest solution also wins (for key in array), while competing solutions are only marginally slower though. I have an (unprooven) feeling that some of these sparse-array implementations would lend themselves well to fast searching, too.

    “Fun” fact: I went through something similar when implementing Python dictionaries in Pyjaco. Keys can be any hashable object (pretty much any python object at all), so JavaScript can’t even store the key, unless I do something like this, using the object as the ID. Now, I ended up using a (key, value) array-of-arrays, and just searching through it linearly when looking for a value!

    It wont scale up to many elements, but for the rather small datasets I’ve worked with for now, it’s actually the fastest solution (and by far, the simplest). Lesson? Don’t ever prematurely optimize. So yes, I do agree with your point, even if I don’t entirely agree with your findings :-)

    • force says:

      Very interesting! Thanks for the thorough reply. Addressing point 2, I actually did consider indexing the IDs (they are unique) and using them for lookup. This is an approach that I have used in the past, binary search and all. But then I weighed the complexity and readability against possible performance implications, and I decided not to write it. I should probably have set some more parameters. The jsPerf that I made really only tells us anything about fairly small data sets. For the problem that inspired this blog, I can be certain that we’ll never have more than 100 records so that’s sort of what I was looking at. The main goal in my jsPerf was to see if skipping a bunch of undefined array elements by doing a simple “if (this)” was negligibly cheap. The novelty for me was the realization that I should use the simplest solution rather than try to optimize the hell out of every part of the application. Of course, I spent more time thinking about it, constructing the jsPerf and writing about it than I saved implementing a lookup solution. But I’ll still call it a win since I think I’m getting better at learning when the “good enough” solution is better than the “best” solution.

      P.S. You know, I never apply big O. I don’t know why. Obviously, I need to retrain myself. :)

      P.P.S. I just realized that I don’t break the loop when I find the element in the “complex” result, meaning every single lookup is O(n) instead of averaging like O(n/2). Derp.

      • Christian Iversen says:

        If you look for a sorted list of integers in a sorted list of integers, you can do that in O(n+m) time, which is O(n) when m is a constant or “always smallish”. It’s certainly better than O(n^2). Do this simply by sorting each list, and stepping the index of the list with the smaller value. If it’s the same value, you’ve found an element, and you step both indices. This is a very simple algorithm, that is actually very efficient.

        I agree though, if you never have more than 100 records(ish), no complex solution will ever beat the simple one, because of the rather large constants involved in setting it up. JavaScript is (thankfully) rarely a langauge for large dataset processing anyway :)

        And yes, I completely agree with the main point: the “good enough” solution takes maybe 20% of the time of the “best”, so it’s vastly better overall.

        And yes, a break; (or even just a return) would be appropriate :)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>