Are there faster alternatives and less memory hungry alternatives that we could use as a drop in replacement to std::map and std::set?
The things that I commonly use std::map and std::set for are:
- Allow me to store objects in a retrievable collection.
- Confirm that an object exists in the collection.
- Retrieve objects by entering the key to the data.
- Iterate through the structure in the order specified by the less function (that you can optionally provide).
- Merge collections.
- Get a count of objects in the collections.
- Store objects by value or by pointer.
- and others...
Secondly these two stl classes are slow than some other solutions when searching for data in a large collection especially for strings.
What are the alternatives? Well there are lots but they basically come down to a short list that do:
Red/black or b-tree
What stl:map/set is made of, traditionally the most flexible, but performance gets hurt when data gets big. Some of these structures also have problems with node balancing that can have a serious performance hit especially when data is entered already sorted. Its strong point is that they are very fast in iteration, as the key does not need to be unpacked along with the value.Hash tables
Fast lookup, but you cannot iterate through nearby key values. Has performance hits when you have chosen a poor hash algoritum for your data, or the hash table needs to be resized, often this can be fixed by predeclaring how big the table is likely to be. You cannot iterate through similar values though so is not very useful in similar word searches etc. The structure of the hash table can also be inefficent with memory as the table may be poorly tuned to the requirements of the data.If you want the definitive rundown on hashed collection classes have a look at the tommyds site.
Patricia, Ternary and Ned tries, DWAG
Yes tries not trees! The key (e.g. a string) is split into an array of numbers or charaters and values are found by following a breadcrumb trail of these characters in the memory structure.Judy array
A highly compact exotic datastructure, that I do not fully understand (maybe one day), it took the bloke years to get it right, and named it after his sister. I have written an STL style wrapper around it, along with iterators. The keys are sorted.Performance Test
Test harness
Below are the results of my tests, I do think that some of my timing results are suspect, and there is a bit of apple and pear comparisons going here however you are welcome to compile my tests and see for your self. This was done on a lenovo x200 laptop with 64 bit windows 7, core 2 duo, 8 gig of ram running as a 32 bit process. The collections were made to store 1,000,000 strings containing values "0" to "999,999".
I have tested against
A set of 32 bit values
- std::set<const_char*>
- xt::judy_set<const_char*>
- std::set<std::string>
- xt::judy_string_set
- nPatriciaTrie
- std::map<std::string,int>
- stdext::hash_map<std::string,int>
- ToolBox::Trie
- uxn::patl::trie_map<std::string,int>
- xt::judy_string_map<int>
- xt::judy_string_val_map<int>
Anything that required the contents of the key got a speed penalty. If your string class did internal reference counting then this helped considerably with the std:: maps and sets, as the key was not copied.
Iteration was the red/black btree main strength, as the key was uncompressed. This was especially the case when the key value was a std::string, as the other collections had to make them from scratch.
There is only one good reasonably fast all rounder that can do exciting things such as hamming and levenshtein distances is the PATRICIA C++ library. You can also have a look at the Ternary Search Tree but performance seemed and issue.
Name | clear | copy | find | iterate | load | Memory used |
nPatriciaTrie | 421 | 1466 | 45596 | |||
std::map<std::string,int> | 375 | 749 | 3416 | 63 | 5335 | 55136 |
std::set<const_char*> | 468 | 531 | 562 | 62 | 1092 | 31408 |
std::set<std::string> | 405 | 749 | 3573 | 250 | 4868 | 54368 |
stdext::hash_map<std::string,int> | 483 | 1295 | 1466 | 32 | 2418 | 48480 |
tommy_hash_tester<char*,int> | 31 | 890 | 39536 | |||
ToolBox::Trie | 983 | 1060 | 21684 | |||
uxn::patl::trie_map<std::string,int> | 390 | 1778 | 1591 | 172 | 2558 | 54904 |
xt::judy_set<const_char*> | 47 | 951 | 499 | 484 | 577 | 3396 |
xt::judy_string_map<int> | 655 | 0 | 2168 | 1950 | 2683 | 10248 |
xt::judy_string_set | 671 | 2902 | 2044 | 2059 | 2714 | 12500 |
xt::judy_string_val_map<int> | 2184 | 3058 | 1997 | 1950 | 2762 | 26212 |
No comments:
Post a Comment