Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1692
The ids/classes from html/body elements will leave out
looking up lowly generic cosmetic filters made of a single
identifier.
This does not absolutely guarantee that html/body elements
will never be targeted, but it should greatly mitigate the
probability that this erroneously happens.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1690
New procedural operator: `:matches-path(...)`
Description: this is a all-or-nothing passthrough operator, which
on/off behavior is dictated by whether the argument match the
path of the current location. The argument can be either plain
text to be found at any position in the path, or a literal regex
against which the path is tested.
Whereas cosmetic filters can be made specific to whole domain,
the new `:matches-path()` operator allows to further narrow
the specificity according to the path of the current document
lcoation.
Typically this procedural operator is used as first operator in
a procedural cosmetic filter, so as to ensure that no further
matching work is performed should there be no match against the
current path of the current document location.
Example of usage:
example.com##:matches-path(/shop) p
Will hide all `p` elements when visiting `https://example.com/shop/stuff`,
but not when visiting `https://example.com/` or any other page
on `example.com` which has no instance of `/shop` in the path part
of the URL.
So as to allow nodejs usage to better deal with
out of date serialization/compilation.
Additionally, use FilterImportant() only when a
"block-important" filter is stored in the "block" realm.
When matching a network request in the static network filtering
engine ("snfe"), these are the possible outcomes, from most
to least likely:
- No block
- Block
- Unblock ("exception" filter overriding the block)
- Block-important ("important" filter override the unblock)
Hence why the matching in the snfe always check for a match in
the "block" realm, and the "unblock" realm would be checked
if and only if there was a match in the "block" realm.
However the "block-important" realm was always matched against
first, and when a match in that realm was found, there would
be no need to check in other realms since nothing can override
the "important" option. The problem with this approach though
is that matches in the "block-important" realm are most
unlikely, which means pointless work being done for vast
majority of network requests.
This commit makes it so that the "block-important" realm is
matched against ONLY when there is a matched "unblock" filter.
The result is a measurable improvement in the snfe-related
benchmarks (though given the numbers involved, end users won't
perceive a difference).
Somewhat related discussion which was the motivation to look
more into this:
https://github.com/cliqz-oss/adblocker/discussions/2170#discussioncomment-1168125
Whereas before the string segment was encoded as:
LL OOOOOOOOOOOO
where L are the upper 8 bits and used to encode the length
of the segment, and O are the lower 24 bits and used to
encode the offset of the string data in the character
buffer, the new code encode as follow:
OOOOOOOOOOOO LL
And furthermore the most significant bit of the length
LL is now used to mark whether the current string segment
is a label boundary.
This means a cell can't reference a segment longer then
127 characters. To work around this limitation for when a
segment is longer than 127 characters (a rare occurrence),
the algorithm will simply split the segment into multiple
adjacent cells.
As a result, there is no longer a need to encode
"boundariness" into special cells, which simplifies
both the storing and matching algorithms.
Additionally, added minimal documentation for the NPM
package on how to import and use HNTrieContainer as a
standalone API.
The erroneous test does not seem to interfere
with the proper functioning of the trie, due
to the fact that nodes are never split without
a OR node or boundary node being present.
The issue was found when undertaking a rewrite
of the algorithm to avoid having to create
boundary nodes.
In the static network filtering engine (snfe), the
compiling-related code was spread across two classes.
This commit makes it so that all the compiling-related
code is in FilterCompiler class, which clear purpose is
to compile raw filters into a form which can be persisted
and later fed to the snfe with no parsing overhead.
To compile raw static network filter, the new approach is:
snfe.createCompiler(parser);
Then for each single raw filter to compile:
compiler.compile(parser, writer);
The caller is responsible to keep a reference to the
compiler instance for as long as it is needed. This removes
the need for the clunky code used to keep an instance of
compiler alive in the snfe.
Additionally, snfe.tokenHistograms() has been moved to
benchmarks.js, as it has no dependency on the snfe, it's
just a utility function.
The code exported to nodejs package was revised to use modern
JavaScript syntax. A few issues were fixed at the same time.
The exported classes are:
- DynamicHostRuleFiltering
- DynamicURLRuleFiltering
- DynamicSwitchRuleFiltering
These related to the content the of "My rules" pane in the
uBlock Origin extension.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1664
This change allows to add the redirect engine into the
nodejs package. The purpose of the redirect engine is to
resolve a redirect token into a path to a local resource,
to be used by the caller as wished.
Invalid URLs like "http://" and "http://foo@" trigger TypeErrors
when they are passed to the URL constructor. These TypeErrors
caused the scriptlet to stop processing subsequent noscript nodes
due to uncaught exceptions.
These exceptions are now caught to allow all noscript nodes to
be processed.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1664
The various filtering engine benchmarking functions are best
isolated in their own file since they have specific
dependencies that should not be suffered by the filtering
engines.
Additionally, moved decomposeHostname() into uri-utils.js
as it's a hostname-related function required by many
filtering engine cores -- this allows to further reduce
or outright remove dependency on `µb`.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1664
The changes are enough to fulfill the related issue.
A new platform has been added in order to allow for building
a NodeJS package. From the root of the project:
./tools/make-nodejs
This will create new uBlock0.nodejs directory in the
./dist/build directory, which is a valid NodeJS package.
From the root of the package, you can try:
node test
This will instantiate a static network filtering engine,
populated by easylist and easyprivacy, which can be used
to match network requests by filling the appropriate
filtering context object.
The test.js file contains code which is typical example
of usage of the package.
Limitations: the NodeJS package can't execute the WASM
versions of the code since the WASM module requires the
use of fetch(), which is not available in NodeJS.
This is a first pass at modularizing the codebase, and
while at it a number of opportunistic small rewrites
have also been made.
This commit requires the minimum supported version for
Chromium and Firefox be raised to 61 and 60 respectively.
The title of tabs in uBO is solely to have a better
presentation in the logger -- no other purpose.
This commit simplify keeping track of the titles, from
an active approach by directly querying it from tabs
whenever a change occurs, to a passive approach by
storing it when the title string become available in
some tab event handlers.
Related discussion:
- https://www.reddit.com/r/uBlockOrigin/comments/oicch9/
The new replacement script contains the smallest API
possible to resolve the reported case.
Please report instances where it's not sufficient to
unbreak a site, in which case I will extend the neutered
API to address these cases on an on-demand basis.
Related issue:
- https://github.com/gorhill/uBlock/issues/3212
The element picker will now properly work on sites where
cosmetic filtering is disabled, but will not allow the
creation of cosmetic filters when specific cosmetic filters
are not meant to be enforced in the current page.
When specific cosmetic filters are not meant to be enforced,
the element picker will still allow the creation of network
filters, that is unless the current page is trusted, in which
case using the element picker is pointless.
Related issue:
- https://github.com/gorhill/uBlock/issues/3037
This takes care of the specific case reported. There are
other edge cases which are likely not addressed though, i.e.
those involving wildcards -- those should be rather rare and
at this point I rather leave them unaddressed to not
risk regressions (as they are less trivial to address).
Related feedback:
- https://ilakovac.com/teespring-ublock-issue/
The surrogate script googletagmanager_gtm.js was essentially a
subset of surrogate script google-analytics_analytics.js. This
commit makes it a plain alias so that the whole GA API -- often
expected by clients of GTM -- is properly stubbed.
Reported internally:
> STR --
>
> Import https://cdn.statically.io/gh/uBlockOrigin/uAssets/master/filters/filters.txt
> as a Custom filter list.
>
> Observe the filter count at 24K instead of true count
> being 29K.
>
> Force updating is sometimes compiling 24K filters and
> on subsequent updates 29K, so I narrowed it down to
> this host.
Findings:
One of the sublists was erroring when being fetched on
this particular CDN server (reason unknown).
uBO was not properly handling network errors when
fetching a sublist. This commit make it so that if
a sublist can't be fetched, then the error is propagated
as if it affected the whole list, in which case uBO will
use an alternative URL if any.
This advanced setting is not really needed, as the
same can be accomplished with a broad exception
filter such as `#@#+js()`.
Related feedback:
- f5b453fae3 (commitcomment-49499082)
Ever since the `redirect` code was refactored:
157cef6034
This advanced setting is no longer needed, as the same
can be accomplished with a plain network filter:
@@*$redirect-rule
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1553
This commit ensures FLoC is opt-in. The generic filter
`*##+js(no-floc)` in "uBlock filters -- Privacy" ensures
the feature is disabled when using default settings/lists.
Users can opt-in to FLoC by adding a generic exception
filter to their custom filters, `#@#+js(no-floc)`; or they
can opt-in only for a specific set of websites through a
more specific exception filter:
example.com,shopping.example#@#+js(no-floc)
The syntax to remove response header is a special case
of HTML filtering, whereas the response headers are
targeted, rather than the response body:
example.com##^responseheader(header-name)
Where `header-name` is the name of the header to
remove, and must always be lowercase.
The removal of response headers can only be applied to
document resources, i.e. main- or sub-frames.
Only a limited set of headers can be targeted for
removal:
location
refresh
report-to
set-cookie
This limitation is to ensure that uBO never lowers the
security profile of web pages, i.e. we wouldn't want to
remove `content-security-policy`.
Given that the header removal occurs at onHeaderReceived
time, this new ability works for all browsers.
The motivation for this new filtering ability is instance
of website using a `refresh` header to redirect a visitor
to an undesirable destination after a few seconds.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1513
Prior to this commit, the ability to enable/disable the
uncloaking of canonical names was only available to advanced
users. This commit make it so that the setting can be
toggled from the _Settings_ pane.
The setting is enabled by default. The documentation should
be clear that the setting should not be disabled unless it
actually solves serious network issues, for example:
https://bugzilla.mozilla.org/show_bug.cgi?id=1694404
Also, as a result, the advanced setting `cnameUncloak` is no
longer available from within the advanced settings editor.