Вы находитесь на странице: 1из 20

Boost Trie

From Data Structure to C++


Library
Cosmin Boac
cosmin.boaca1994@gmail.com

About Me
Cosmin Boaca
Passionate about algorithms, data
structures and big data
Silver / Bronze medal at National
Olympiad in Informatics
Internships at Intel, Facebook and
Adobe
C++ fan(anatic)

Trie
In memory data structure used for
storing strings in sorted order

Trie
In memory data structure used for
storing strings in sorted order
Operations supported
Insert(key)
Find(key)
Erase(key)
LongestCommonPrefix(key)
Iterating over the keys in ascending /
descending order

Trie
Execution time for all operations is
proportional to the length of the key,
regardless of the number of keys that
are currently stored in the data
structure

Trie

Trie
Node structure:
struct Trie {
bool isKeyEnd;
Trie *sons[ 26 ];
};

Trie
Downsides
High memory consumption (each node
has SIGMA sons) where SIGMA is the
dimension of the alphabet
Most of those sons arent even used

From data structure to


library
Requirements for a C++ library
Flexibility
The user should be able to define his own
allocators and plug them into the library
without too much effort (useful in embedded
systems especially)
Advanced features (such the one above)
should be transparent to users that dont
need them

From data structure to


library
Requirements for a C++ library
Generality
The library should work with most of (ideally
all) the user defined types requiring a
minimal effort from the user perspective

From data structure to


library
Requirements for a C++ library
Efficiency
The library code should perform as good as
a code that is written for solving a specific
problem
Abstractions used should bring as low
penalty as possible

From data structure to


library
Requirements for a C++ library
Interoperability with standard library
If one wants to switch from existing code
using std::map to boost::trie_map he should
be able to do this by just textually replacing
map with trie_map
Algorithms from <algorithm> header should
work with the data structures from the
library
Iterators exposed should comply to C++
iterator standard (harder than you would
ever imagine)

C++ iterator specification

From data structure to


library
Requirements for Boost Trie
The same data structure should be
flexible enough to be used as container
for trie_map/multimap, set/multiset
Keys inserted should be any kind of
iterable structures (including user
defined structures) which iterate over
any kind of comparable types
Better performance in terms of time and
memory than the equivalent standard
containers

From data structure to


library
Challenges
Performance
In practice, when keys arent characters the
usual Trie has poor performance in terms of
both memory consumption and execution
time
Maintaining both performance and
generality is hard and it requires a lot of
technical depth and deep understanding of
the language.

From data structure to


library
Challenges
Iterators
The most complicated feature to implement
and support are C++ iterators
There are dedicated libraries for this
(Boost.Iterator)

From data structure to


library
Challenges
Generality
Handling of alocation / dealocation using
allocators instead of new / delete
Handling value types which are not default
constructible
Handling key types which are not
comparable / default constructible

From data structure to


library
Optimizations
Using Boost.Intrusive.Set as node
container instead of std::set
Varitions of Trie
Compact Trie
Burst Trie
Some other Trie that I have implemented at
the moment

From data structure to


library
Achievements
30% less time than std::map for inserts /
finds, while maintaining almost the
same level of generality
Interoperable with std::map
Partially functional interators

Boost vs Corporate
Boost
Best people in C++
over the world
A lot to learn about C+
+ and programming in
general
Hardcore code review
Insanely high code
quality standards
Remote collaboration

Average programmers
Corporate
Using only a small
subset of C++
features
More relaxed code
review
Code quality
standards emphasize
readability and
maintainability
Work environment

Вам также может понравиться