Solution

Choosing the right container is a difficult task!

By default, SortAO uses Sorted-Block-Array as the underlying container, but this may not be the optimal choice.

SortAO currently provides two types of containers:

Block Series
Tree

Block Series

The containers in the Block series are suitable for medium-sized data (up to 1,000,000 elements).

Ordered containers that can be used for packaging, including:

At a more fundamental level, the unordered containers they correspond to include:

Features

The Block series of containers serves medium-sized data, with algorithm performance biased towards querying, enabling possibilities for imbalanced performance patterns.

For mutation operations (insertions and deletions), the complexity of these containers is O(B + N/B), but the constant factors differ. For query operations, the algorithms of these containers can reduce the complexity order of magnitude of queries by increasing the constant factor of insertion.

Sorted-Block-Array a versatile structure, balancing performance in both access and insertion.
With a data volume of 500,000 elements, its query efficiency is 1/2 of that of a native array.
Sorted-Block-List K-th query in O(1) time, with efficiency consistent with that of native array.
Due to the maintenance of stricter physical block sizes, the efficiency of its mutation operation is lower than that of the former.
Sorted-Block-Deque significantly enhances the efficiency of mutation operations near extreme values (over a wide range).
With a data volume of 500,000 elements, its query efficiency is 3/4 of that of a native array.

Unlike the common tree structure, their underlying structure employs a more contiguous memory layout. Therefore, for medium-sized data, they can better utilize CPU caches and are more likely to be optimized by the JIT, often resulting in better performance (especially in query efficiency).

Cursor invalidation

The Block container is similar to the vector in C++ STL. When inserting/deleting elements, it can lead to a large range of cursor invalidation situations.

You can read Cursor to learn more about cursors.

Due to the subtle differences in cursor invalidation scenarios for Block containers, please refer to the Cursor Invalidation page for details.

Tree

In V8, the theoretical constants of algorithms for advanced data structures are often overwhelmed by memory latency, branch prediction failures, and GC overhead.

We tested various tree data structures, including strict red-black trees, amortized Splay, and expected Zip Tree (a variant of the spinless treap proposed in 2018, characterized by its concise code and expected O(1) modification efficiency).

However, under random data, there is almost no difference in their efficiency, which is likely due to various factors such as Cache Miss and write barriers.

Therefore, we currently only use Splay trees as the implementation of tree structures.

Features

Each operation of SplayTree elevates the accessed node to the root node of the tree, thereby accelerating subsequent accesses. For scenarios with hotspot data, its efficiency can be significantly higher than any other data structure.

If your business scenario involves a relatively special data access pattern, why not test the performance of this data structure!

Cursor invalidation

SplayTree is a tree structure, so unless it is deleted, its cursor will not physically become invalid (it will always point to a valid node).

However, after inserting or deleting elements, the cursor may become logically invalid (its ranking index has changed, but the cursor cannot actively perceive this).

At this point, you can use the refresh() method to refresh the rankings. For details, please refer to the Cursor Invalidation page.