When matching a network request in the static network filtering
engine ("snfe"), these are the possible outcomes, from most
to least likely:
- No block
- Block
- Unblock ("exception" filter overriding the block)
- Block-important ("important" filter override the unblock)
Hence why the matching in the snfe always check for a match in
the "block" realm, and the "unblock" realm would be checked
if and only if there was a match in the "block" realm.
However the "block-important" realm was always matched against
first, and when a match in that realm was found, there would
be no need to check in other realms since nothing can override
the "important" option. The problem with this approach though
is that matches in the "block-important" realm are most
unlikely, which means pointless work being done for vast
majority of network requests.
This commit makes it so that the "block-important" realm is
matched against ONLY when there is a matched "unblock" filter.
The result is a measurable improvement in the snfe-related
benchmarks (though given the numbers involved, end users won't
perceive a difference).
Somewhat related discussion which was the motivation to look
more into this:
https://github.com/cliqz-oss/adblocker/discussions/2170#discussioncomment-1168125
This is a replacement for the dubious approach when the
extension itself was used to run benchmarks to detect
performance and filtering behavior regressions.
The main nodejs flavor is "npm", which is to be used to
lint/test and the publication of an official npm
package -- and by design it has dependencies on mocha,
eslint, etc.
A new flavor "dig" has been created with minimal
dependencies and which purpose is to easily allow to
write specialized code to investigate local code changes
in uBO -- and it's not meant for publication.
Consequently, "make nodejs" has been replaced with
"make npm", and a new "dig" target has been added to the
makefile, to be used for instrumenting local code changes
for investigation purpose.
Whereas before the string segment was encoded as:
LL OOOOOOOOOOOO
where L are the upper 8 bits and used to encode the length
of the segment, and O are the lower 24 bits and used to
encode the offset of the string data in the character
buffer, the new code encode as follow:
OOOOOOOOOOOO LL
And furthermore the most significant bit of the length
LL is now used to mark whether the current string segment
is a label boundary.
This means a cell can't reference a segment longer then
127 characters. To work around this limitation for when a
segment is longer than 127 characters (a rare occurrence),
the algorithm will simply split the segment into multiple
adjacent cells.
As a result, there is no longer a need to encode
"boundariness" into special cells, which simplifies
both the storing and matching algorithms.
Additionally, added minimal documentation for the NPM
package on how to import and use HNTrieContainer as a
standalone API.
The erroneous test does not seem to interfere
with the proper functioning of the trie, due
to the fact that nodes are never split without
a OR node or boundary node being present.
The issue was found when undertaking a rewrite
of the algorithm to avoid having to create
boundary nodes.