Whereas before the string segment was encoded as:
LL OOOOOOOOOOOO
where L are the upper 8 bits and used to encode the length
of the segment, and O are the lower 24 bits and used to
encode the offset of the string data in the character
buffer, the new code encode as follow:
OOOOOOOOOOOO LL
And furthermore the most significant bit of the length
LL is now used to mark whether the current string segment
is a label boundary.
This means a cell can't reference a segment longer then
127 characters. To work around this limitation for when a
segment is longer than 127 characters (a rare occurrence),
the algorithm will simply split the segment into multiple
adjacent cells.
As a result, there is no longer a need to encode
"boundariness" into special cells, which simplifies
both the storing and matching algorithms.
Additionally, added minimal documentation for the NPM
package on how to import and use HNTrieContainer as a
standalone API.
The erroneous test does not seem to interfere
with the proper functioning of the trie, due
to the fact that nodes are never split without
a OR node or boundary node being present.
The issue was found when undertaking a rewrite
of the algorithm to avoid having to create
boundary nodes.
For clients who may wish to persist the intermediate compiled form
in order to be able to skip costly parsing operation when the
list is fed to the static network filtering engine.
In the static network filtering engine (snfe), the
compiling-related code was spread across two classes.
This commit makes it so that all the compiling-related
code is in FilterCompiler class, which clear purpose is
to compile raw filters into a form which can be persisted
and later fed to the snfe with no parsing overhead.
To compile raw static network filter, the new approach is:
snfe.createCompiler(parser);
Then for each single raw filter to compile:
compiler.compile(parser, writer);
The caller is responsible to keep a reference to the
compiler instance for as long as it is needed. This removes
the need for the clunky code used to keep an instance of
compiler alive in the snfe.
Additionally, snfe.tokenHistograms() has been moved to
benchmarks.js, as it has no dependency on the snfe, it's
just a utility function.
The code exported to nodejs package was revised to use modern
JavaScript syntax. A few issues were fixed at the same time.
The exported classes are:
- DynamicHostRuleFiltering
- DynamicURLRuleFiltering
- DynamicSwitchRuleFiltering
These related to the content the of "My rules" pane in the
uBlock Origin extension.