Refactored heuristics to collate set of origin-related
filter units are collated into a hostname trie, and
for better reuse of existing classes.
Generalized pre-test idea for bucket of filters, such
that in addition to origin-related filter units, there is
now a class to collate regex-based pattern-related units
into a new pre-test bucket class, FilterBucketIfRegexHits,
in order to test with a single regex test whether there is
a chance of a hit in the underlying bucket of filters.
Instances of these are rare, but at time of commit I found
this occurs with AdGuard France filter list.
Fine-tuned the "SNFE: Dump" output -- this new ability to
see the internal details of the SNFE has been really key
into finding/fixing issues during refactoring.
As was done with generic pattern-based filters, the source
string of regex-based filters is now stored into the
bidi-trie (pattern) buffer.
Additionally, added a new "dev tools" page to more
conveniently peer into uBO's internals at run time, without
having to do so from the browser's dev console -- something
which has become more difficult with the use of JS modules.
The new page can be launched from the Support pane through
the "More" button in the troubleshooting section.
The benchmark button in the About pane has been moved to this
new "dev tools" page.
The new "dev tools" page is for development purpose only,
do not open issues about it.
There are currently over 160 patterns with such pointless
trailing `*^` in uBO's filter lists, which ended up being
compiled as generic pattern filters (i.e. regex-based
internally), while the trailing `*^` accomplishes nothing
since it will always match the end of a URL ( `^` can
also match the end of URL).
This commit discards pointless trailing `*^` in patterns,
thus allowing most of those filters to be compiled as
plain pattern filters.
The syntax highlighter will reflect that a trailing
`*^` is pointless.
Rearrange logic to instantiate and add `important` filters
to the block realm when compiled lists are loaded instead
of when lists are compiled.
Additionally, removed now unused properties following
commit 68e14793cc.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1863
As per internal discussion with team, best to have a simpler
scriplet, and which is hard-coded to work only on a specific
set of domains -- only those seen used by BAB.
Turns out the various benchmarks show no benefits when compiling
filters whose pattern contains a single wildcard character into
specialized classes which threat the pattern as two sub-patterns,
and actually there is a slight improvement in performance as per
benchamrks when treating these patterns as generic ones.
This also fixes the following related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1207
Fixed serious regression in previous dev build in applying
`csp=` filters. Reported internally by uBO team.
Promote usage of `removeparam` in code instead of `queryprune`,
which is to be deprecated.
Removed test against previously tested hostname in
FilterHostnameDict since as per various benchmark, the
test does not really help.
Remove serialization API in Node.js code as the API is now
present in SNFE itself.
All the auxiliary data structures must be fully loaded before
the data structure used as entry point is populated. The race
condition could lead to a case of the entry point data structure
being populated while the auxiliary data structures are still
unpopulated, potentially causing exceptions to be thrown at
launch when the static network filtering engine is queried.
I haven't been able to reproduce such exceptions -- but it
could happen on browsers which do not support being suspended
at launch time (i.e. chromium-based browsers).
Additionally, added convenience methods to easily
serialize/unserialize when SNFE is used as a npm package.
Related feedback:
- https://github.com/orgs/uBlockOrigin/teams/ublock-issues-volunteers/discussions/293
Related commit:
- 725e6931f5
Through all the changes, forgot to pay attention to scenarios
where the `filterData` needs to grow -- the buffer's defautl
size is set to accomodate default filter lists, and subscribing
to more lists would cause the static network filtering engine
to fail because the buffer was not resized when needed.
The original motivation is to further speed up launch time
for either non-selfie-based and selfie-based initialization
of the static network filtering engine (SNFE).
As a result of the refactoring:
Filters are no longer instance-based, they are sequence-of-
integer-based. This eliminates the need to create instances
of filters at launch, and consequently eliminates all the
calls to class constructors, the resulting churning of memory,
and so forth.
All the properties defining filter instances are now as much
as possible 32-bit integer-based, and these are allocated in a
single module-scoped typed array -- this eliminates the need
to allocate memory for every filter being instantiated.
Not all filter properties can be represented as a 32-bit
integer, and in this case a filter class can allocate slots
into another module-scoped array of references.
As a result, this eliminates a lot of memory allocations when
the SNFE is populated with filters, and this makes the saving
and loading of selfie more straightforward, as the operation
is reduced to saving/loading two arrays, one of 32-bit
integers, and the other, much smaller, an array JSON-able
values.
All filter classes now only contain static methods, and all
of these methods are called with an index to the specific
filter data in the module-scoped array of 32-bit integers.
The filter sequences (used to avoid the use of JS arrays) are
also allocated in the single module-scoped array of 32-bit
integers -- they used to be stored in their own dedicated
array.
Additionally, some filters are now loaded more in a deferred
way, so as reduce uBO's time-to-readiness -- the outcome of
this still needs to be evaluated, time-to-readiness is
especially a concern in Firefox for Android or less powerful
computers.
Add ability to bring back logger button in popup panel through
the advanced setting `uiPopupConfig`. Adding `+logger` token
to `uiPopupConfig` will bring back the logger icon in the mobile
version of the popup panel.
Additionally, the link to the logger in the Support pane will
take into account whether the <Shift> key is pressed, so as
to behave like the logger icon in the popup panel.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1847
The troubleshooting information has been further fine-tuned to
report popup panel data related to the reported page, for better
diagnosis by disclosing any customization to uBO which was
affecting the reported page.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1836
The URL to report can now be picked from a list of related
URLs in order to allow the reporter to publish edited version
of the reported URL.
Additionally, the hash, user name, and password which could be
present in a reported URL are always removed.
Unredacted settings is unlikely to be useful after all,
and removing the ability to unredact ensure users won't
mistakenly publish unredacted information.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1541
A "chat" icon has been added to the popup panel to make
it easy to report filter issue on specific sites.
Reporting filter issues require a GitHub account, since
uBO does not have a home server through which reports could
be sent.
The report icon is available only for when uBO is enabled
on a given site.
On mobile devices, the logger icon is replaced by the "chat"
icon since it is more likely to be useful on small display
devices. The logger can always be opened from the Support
pane in the dashboard.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1791
The following case of invalid syntax was not reported as
invalid by the syntax highlighter:
... example.com image ...
With dynamic filtering, there can't be a specific
hostname when a specific type is used, or a
specific type when a specific hostname is used, one
or the other must be `*`.
uBO support's `#?#`, which in AdGuard and ABP means that a
cosmetic filter is procedural.
However, uBO interprets this syntax as "probably procedural"
and will use the filter in a declarative way if the filter
is found to be stylesheet-compatible.
In reality though, the likelihood that a "probably procedural"
filter is sheet-selectable is very low, so treating the filter
as procedural a priori help saves pointless tests against
sheet-selectability when using lists primarily designed for
AdGuard or ABP.
Related commits:
- 4f923384de
- 97a33c9572
- ef07171f5a
For instance, with "Experimental Web Platform features" enabled, the
following filter becomes natively query-selectable:
.fail:has(+ a > b)
Meaning uBO won't need to emulate the `:has()` operator, it will
be executed natively using `querySelectorAll()`.
This commit fixes the erroneous assumption that a query-selectable
is also sheet-selectable.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1751
Related feedback:
- https://www.reddit.com/r/uBlockOrigin/comments/qgd6fe/
It turns out Chromium has started to implement the `:has()`
operator, which becomes recognized when the browser flag
"Experimental Web Platform features" is enabled. However the
hic is that `:has()` is not supported as a declarative CSS
style rule and is only supported through `querySelector()`
et al.
The fix is to no longer detect plain CSS selectors through
`querySelector` et al. but rather use an actual stylesheet
to validate that a cosmetic filter can be injected into a
stylesheet in a declarative way.
Additionally, I added support to enforce ABP's semantic
regarding cosmetic filter with the `#?#` anchor: when using
such anchor, uBO will _first_ try to compile the filter as
a procedural one rather than a declarative one.
Related discussion:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1011#issuecomment-884824166
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/757
Sometimes a tab event may fire for a tab which is not
yet known to uBO. In such case, bind the tab internally
so that it can be processed properly in the future.
Related commit:
- a3a43c7cb4
Specifically:
- Support i18n
- Autofill issues opened through Support pane with configuration
information
- Remove from About pane items found in Support pane
For now the language locales are not available as the text on
the page needs to stabilize before asking translation
volunteers to contribute their time working on the new text.
By default uBO assumed the Shortcut pane was needed,
unless it found the current version of FF was higher
than 73. This commit reverses the test, it assumes
the Shortcut pane is not needed, unless the current
version is lower than 74.
isBlockImportant() was relying strictly on the hash bits
to detect whether a matching filter was `important`, but
this approach regressed with changes with how `important`
filters are compiled. This commit fixed this by no longer
relying on the hash bits but rather on an internal
register variable being set by `important` filters when
they match.
I couldn't find any actual cases in default filter lists
(including a couple of default regional lists) that the
regression is having any effect, due to the limited cases
for which isBlockImportant() is called.
A test was added in a previous commit to detect such
regression in the future:
- a76935b232
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1732
The regression affect filter with the `important` option when
the following conditions were fulfilled:
- The filter pattern is pure hostname
- The filter has not one of the following options:
- domain
- denyallow
- header
- strict1p, strict3p
- csp
- removeparam
- There is a matching exception filter
Related commit:
- a2a8ef7e85
A related mocha test has been added in order to detect this
specific regression in the future through `make test`.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1730
A new filter unit, FilterNotType, is introduced to enforce
negated filter type options.
Before this commit, there was no actual negated types in the
static network filtering engine, as a negated type was internally
converted to non-negated types at compile time. As a result,
the logger would never output a matching filter with its original
negated type options.
This commit no longer causes an internal conversion to take place
at compile time, but explicitly enforce negated types at match time,
and as a result the logger will from now on output matching filter
with their original negated type options.
Name: modifyWebextFlavor
Value: A list of space-separated tokens to be added/removed from the
computed default webext flavor.
The primary purpose is to give filter list authors the ability to
test mobile flavor on desktop computers. Though mobile versions of
web pages can be emulated using browser dev tools, it's not
possible to do so for uBO itself.
By using `+mobile` as a value for this setting will force uBO
to act as if it's being executed on a mobile device.
Important: this setting is best used in a dedicated browser
profile, as this affects how filter lists are compiled. So best
to set it in a new browser profile, then force all filter lists
to be recompiled, and use the profile in the future when there
is a need to test the specific webext flavor.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1692
The ids/classes from html/body elements will leave out
looking up lowly generic cosmetic filters made of a single
identifier.
This does not absolutely guarantee that html/body elements
will never be targeted, but it should greatly mitigate the
probability that this erroneously happens.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1690
New procedural operator: `:matches-path(...)`
Description: this is a all-or-nothing passthrough operator, which
on/off behavior is dictated by whether the argument match the
path of the current location. The argument can be either plain
text to be found at any position in the path, or a literal regex
against which the path is tested.
Whereas cosmetic filters can be made specific to whole domain,
the new `:matches-path()` operator allows to further narrow
the specificity according to the path of the current document
lcoation.
Typically this procedural operator is used as first operator in
a procedural cosmetic filter, so as to ensure that no further
matching work is performed should there be no match against the
current path of the current document location.
Example of usage:
example.com##:matches-path(/shop) p
Will hide all `p` elements when visiting `https://example.com/shop/stuff`,
but not when visiting `https://example.com/` or any other page
on `example.com` which has no instance of `/shop` in the path part
of the URL.
So as to allow nodejs usage to better deal with
out of date serialization/compilation.
Additionally, use FilterImportant() only when a
"block-important" filter is stored in the "block" realm.
When matching a network request in the static network filtering
engine ("snfe"), these are the possible outcomes, from most
to least likely:
- No block
- Block
- Unblock ("exception" filter overriding the block)
- Block-important ("important" filter override the unblock)
Hence why the matching in the snfe always check for a match in
the "block" realm, and the "unblock" realm would be checked
if and only if there was a match in the "block" realm.
However the "block-important" realm was always matched against
first, and when a match in that realm was found, there would
be no need to check in other realms since nothing can override
the "important" option. The problem with this approach though
is that matches in the "block-important" realm are most
unlikely, which means pointless work being done for vast
majority of network requests.
This commit makes it so that the "block-important" realm is
matched against ONLY when there is a matched "unblock" filter.
The result is a measurable improvement in the snfe-related
benchmarks (though given the numbers involved, end users won't
perceive a difference).
Somewhat related discussion which was the motivation to look
more into this:
https://github.com/cliqz-oss/adblocker/discussions/2170#discussioncomment-1168125
Whereas before the string segment was encoded as:
LL OOOOOOOOOOOO
where L are the upper 8 bits and used to encode the length
of the segment, and O are the lower 24 bits and used to
encode the offset of the string data in the character
buffer, the new code encode as follow:
OOOOOOOOOOOO LL
And furthermore the most significant bit of the length
LL is now used to mark whether the current string segment
is a label boundary.
This means a cell can't reference a segment longer then
127 characters. To work around this limitation for when a
segment is longer than 127 characters (a rare occurrence),
the algorithm will simply split the segment into multiple
adjacent cells.
As a result, there is no longer a need to encode
"boundariness" into special cells, which simplifies
both the storing and matching algorithms.
Additionally, added minimal documentation for the NPM
package on how to import and use HNTrieContainer as a
standalone API.
The erroneous test does not seem to interfere
with the proper functioning of the trie, due
to the fact that nodes are never split without
a OR node or boundary node being present.
The issue was found when undertaking a rewrite
of the algorithm to avoid having to create
boundary nodes.
In the static network filtering engine (snfe), the
compiling-related code was spread across two classes.
This commit makes it so that all the compiling-related
code is in FilterCompiler class, which clear purpose is
to compile raw filters into a form which can be persisted
and later fed to the snfe with no parsing overhead.
To compile raw static network filter, the new approach is:
snfe.createCompiler(parser);
Then for each single raw filter to compile:
compiler.compile(parser, writer);
The caller is responsible to keep a reference to the
compiler instance for as long as it is needed. This removes
the need for the clunky code used to keep an instance of
compiler alive in the snfe.
Additionally, snfe.tokenHistograms() has been moved to
benchmarks.js, as it has no dependency on the snfe, it's
just a utility function.
The code exported to nodejs package was revised to use modern
JavaScript syntax. A few issues were fixed at the same time.
The exported classes are:
- DynamicHostRuleFiltering
- DynamicURLRuleFiltering
- DynamicSwitchRuleFiltering
These related to the content the of "My rules" pane in the
uBlock Origin extension.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1664
This change allows to add the redirect engine into the
nodejs package. The purpose of the redirect engine is to
resolve a redirect token into a path to a local resource,
to be used by the caller as wished.
Invalid URLs like "http://" and "http://foo@" trigger TypeErrors
when they are passed to the URL constructor. These TypeErrors
caused the scriptlet to stop processing subsequent noscript nodes
due to uncaught exceptions.
These exceptions are now caught to allow all noscript nodes to
be processed.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1664
The various filtering engine benchmarking functions are best
isolated in their own file since they have specific
dependencies that should not be suffered by the filtering
engines.
Additionally, moved decomposeHostname() into uri-utils.js
as it's a hostname-related function required by many
filtering engine cores -- this allows to further reduce
or outright remove dependency on `µb`.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1664
The changes are enough to fulfill the related issue.
A new platform has been added in order to allow for building
a NodeJS package. From the root of the project:
./tools/make-nodejs
This will create new uBlock0.nodejs directory in the
./dist/build directory, which is a valid NodeJS package.
From the root of the package, you can try:
node test
This will instantiate a static network filtering engine,
populated by easylist and easyprivacy, which can be used
to match network requests by filling the appropriate
filtering context object.
The test.js file contains code which is typical example
of usage of the package.
Limitations: the NodeJS package can't execute the WASM
versions of the code since the WASM module requires the
use of fetch(), which is not available in NodeJS.
This is a first pass at modularizing the codebase, and
while at it a number of opportunistic small rewrites
have also been made.
This commit requires the minimum supported version for
Chromium and Firefox be raised to 61 and 60 respectively.
The title of tabs in uBO is solely to have a better
presentation in the logger -- no other purpose.
This commit simplify keeping track of the titles, from
an active approach by directly querying it from tabs
whenever a change occurs, to a passive approach by
storing it when the title string become available in
some tab event handlers.