The `urltransform` option allows to redirect a non-blocked network
request to another URL. There are restrictions on its usage:
- require a trusted source -- thus uBO-maintained lists or user
filters
- the `urltransform` value must start with a `/`
If at least one of these conditions is not fulfilled, the filter
will be invalid and rejected.
The requirement to start with `/` is to enforce that only the path
part of a URL can be modified, thus ensuring the network request
is redirected to the same scheme and authority (as defined at
https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Syntax).
Usage example (redirect requests for CSS resources to a non-existing
resource, for demonstration purpose):
||iana.org^$css,urltransform=/notfound.css
Name of this option is inspired from DNR API:
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/declarativeNetRequest/URLTransform
This commit required to bring the concept of "trusted source" to
the static network filtering engine.
Related issue:
- https://github.com/AdguardTeam/Scriptlets/issues/332
Additionally, uBO's own scriplet syntax now also accept quoting
the parameters with either `'` or `"`. This can be used to avoid
having to escape commas when they are present in a parameter.
This commit is a rewrite of the static filtering parser into a
tree-based data structure, for easier maintenance and better
abstraction of parsed filters.
This simplifies greatly syntax coloring of filters and also
simplify extending filter syntax.
The minimum version of Chromium-based browsers has been raised
to version 73 because of usage of String.matchAll().
The new parser no longer uses the browser DOM to validate
that a cosmetic filter is valid or not, this is now done
through a JS library, CSSTree.
This means filter list authors will have to be more careful
to ensure that a cosmetic filter is really valid, as there is
no more guarantee that a cosmetic filter which works for a
given browser/version will still work properly on another
browser, or different version of the same browser.
This change has become necessary because of many reasons,
one of them being the flakiness of the previous parser as
exposed by many issues lately:
- https://github.com/uBlockOrigin/uBlock-issues/issues/2262
- https://github.com/uBlockOrigin/uBlock-issues/issues/2228
The new parser introduces breaking changes, there was no way
to do otherwise. Some current procedural cosmetic filters will
be shown as invalid with this change. This occurs because the
CSSTree library gets confused with some syntax which was
previously allowed by the previous parser because it was more
permissive.
Mainly the issue is with the arguments passed to some procedural
cosmetic filters, and these issues can be solved as follow:
Use quotes around the argument. You can use either single or
double-quotes, whichever is most convenient. If your argument
contains a single quote, use double-quotes, and vice versa.
Additionally, try to escape a quote inside an argument using
backslash. THis may work, but if not, use quotes around the
argument.
When the parser encounter quotes around an argument, it will
discard them before trying to process the argument, same with
escaped quotes inside the argument. Examples:
Breakage:
...##^script:has-text(toscr')
Fix:
...##^script:has-text(toscr\')
Breakage:
...##:xpath(//*[contains(text(),"VPN")]):upward(2)
Fix:
...##:xpath('//*[contains(text(),"VPN")]'):upward(2)
There are not many filters which break in the default set of
filter lists, so this should be workable for default lists.
Unfortunately those fixes will break the filter for previous
versions of uBO since these to not deal with quoted argument.
In such case, it may be necessary to keep the previous filter,
which will be discarded as broken on newer version of uBO.
THis was a necessary change as the old parser was becoming
more and more flaky after being constantly patched for new
cases arising, The new parser should be far more robust and
stay robist through expanding procedural cosmetic filter
syntax.
Additionally, in the MV3 version, filters are pre-compiled
using a Nodejs script, i.e. outside the browser, so validating
cosmetic filters using a live DOM no longer made sense.
This new parser will have to be tested throughly before stable
release.
In the static network filtering engine (snfe), the
compiling-related code was spread across two classes.
This commit makes it so that all the compiling-related
code is in FilterCompiler class, which clear purpose is
to compile raw filters into a form which can be persisted
and later fed to the snfe with no parsing overhead.
To compile raw static network filter, the new approach is:
snfe.createCompiler(parser);
Then for each single raw filter to compile:
compiler.compile(parser, writer);
The caller is responsible to keep a reference to the
compiler instance for as long as it is needed. This removes
the need for the clunky code used to keep an instance of
compiler alive in the snfe.
Additionally, snfe.tokenHistograms() has been moved to
benchmarks.js, as it has no dependency on the snfe, it's
just a utility function.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1664
The changes are enough to fulfill the related issue.
A new platform has been added in order to allow for building
a NodeJS package. From the root of the project:
./tools/make-nodejs
This will create new uBlock0.nodejs directory in the
./dist/build directory, which is a valid NodeJS package.
From the root of the package, you can try:
node test
This will instantiate a static network filtering engine,
populated by easylist and easyprivacy, which can be used
to match network requests by filling the appropriate
filtering context object.
The test.js file contains code which is typical example
of usage of the package.
Limitations: the NodeJS package can't execute the WASM
versions of the code since the WASM module requires the
use of fetch(), which is not available in NodeJS.
This is a first pass at modularizing the codebase, and
while at it a number of opportunistic small rewrites
have also been made.
This commit requires the minimum supported version for
Chromium and Firefox be raised to 61 and 60 respectively.
The syntax to remove response header is a special case
of HTML filtering, whereas the response headers are
targeted, rather than the response body:
example.com##^responseheader(header-name)
Where `header-name` is the name of the header to
remove, and must always be lowercase.
The removal of response headers can only be applied to
document resources, i.e. main- or sub-frames.
Only a limited set of headers can be targeted for
removal:
location
refresh
report-to
set-cookie
This limitation is to ensure that uBO never lowers the
security profile of web pages, i.e. we wouldn't want to
remove `content-security-policy`.
Given that the header removal occurs at onHeaderReceived
time, this new ability works for all browsers.
The motivation for this new filtering ability is instance
of website using a `refresh` header to redirect a visitor
to an undesirable destination after a few seconds.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1365
This commit adds the compiled magic version number to the
compiled data itself, and consequently this allows uBO
to no longer require that any given compiled list with a
mismatched format to be detected and discarded at launch
time.
Given this change, uBO no longer needs to rely on the
deletion of cached data at launch time to ensure it
won't use no longer valid compiled lists.
filterUnits is now treated as a buffer which is
pre-allocated and which will grow in chunks so as
to minimize memory allocations. Entries are never
released, just null-ed.
Additionally, move urlTokenizer into the static
network filtering engine, since it's not used
anywhere else.
Before this commit, CodeMirror's add-on for search occurrences
was limited to find at most 1000 first occurrences, because of
performance considerations.
This commit removes this low limit by having the search
occurrences done in a dedicated worker. The limit is now
time-based, and highly unlikely to ever be hit under normal
condition.
With this change, all search occurrences are gathered,
and as a result:
- All occurrences are reported in the scrollbar instead of
just the 1,000 first
- The total count of all occurrences is now reported, instead
of capping at "1000+".
- The current occurrence rank at the cursor or selection
position is now reported -- this was not possible to report
this before.
The number of occurrences is line-based, it's not useful to
report finer-grained occurences in uBO.
Since it's possible to execute specific code paths according
to whether the context is that of a worker or not, it's possible
to keep the main/thread code in a single file. Keeping the
main/worker code paths into a single file is more convenient
for both code maintenance and code review.
Reported internally. The issue could cause the logger
to be unable to successfully reverse-lookup a filter
list for a filter which had tokens longer than 6
characters followed by wildcard.
Regression from:
- 01b1ed9a98
A new standalone static filtering parser is introduced,
vAPI.StaticFilteringParser. It's purpose is to parse
line of text into representation suitable for
compiling filters. It can additionally serves for
syntax highlighting purpose.
As a side effect, this solves:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1038
This is a first draft, there are more work left to do
to further perfect the implementation and extend its
capabilities, especially those useful to assist filter
authors.
For the time being, this commits break line-continuation
syntax highlighting -- which was already flaky prior to
this commit anyway.
This concerns the static network filtering engine.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/943
* * *
New static network filter type: `cname`
By default, network requests which are result of
resolving a canonical name are subject to filtering.
This filtering can be bypassed by creating exception
filters using the `cname` option. For example:
@@*$cname
The filter above tells the network filtering engine
to except network requests which fulfill all the
following conditions:
- network request is blocked
- network request is that of an unaliased hostname
Filter list authors are discouraged from using
exception filters of `cname` type, unless there no
other practical solution such that maintenance
burden become the greater issue. Of course, such
exception filters should be as narrow as possible,
i.e. apply to specific domain, etc.
* * *
New static network filter option: `denyallow`
The purpose of `denyallow` is bring
default-deny/allow-exceptionally ability into static
network filtering arsenal. Example of usage:
*$3p,script, \
denyallow=x.com|y.com \
domain=a.com|b.com
The above filter tells the network filtering engine that
when the context is `a.com` or `b.com`, block all
3rd-party scripts except those from `x.com` and `y.com`.
Essentially, the new `denyallow` option makes it easier
to implement default-deny/allow-exceptionally in static
filter lists, whereas before this had to be done with
unwieldy regular expressions[1], or through the mix of
broadly blocking filters along with exception filters[2].
[1] https://hg.adblockplus.org/ruadlist/rev/f362910bc9a0
[2] Typically filters which pattern are of the
form `|http*://`
Related issues:
- https://github.com/uBlockOrigin/uBlock-issues/issues/761
- https://github.com/uBlockOrigin/uBlock-issues/issues/528
The previous bidi-trie code could only hold filters which
are plain pattern, i.e. no wildcard characters, and which
had no origin option (`domain=`), right and/or left anchor,
and no `csp=` option.
Example of filters that could be moved into a bidi-trie
data structure:
&ad_box_
/w/d/capu.php?z=$script,third-party
||liveonlinetv247.com/images/muvixx-150x50-watch-now-in-hd-play-btn.gif
Examples of filters that could NOT be moved to a bidi-trie:
-adap.$domain=~l-adap.org
/tsc.php?*&ses=
||ibsrv.net/*forumsponsor$domain=[...]
@@||imgspice.com/jquery.cookie.js|$script
||view.atdmt.com^*/iview/$third-party
||postimg.cc/image/$csp=[...]
Ideally the filters above should be able to be moved to a
bidi-trie since they are basically plain patterns, or at
least partially moved to a bidi-trie when there is only a
single wildcard (i.e. made of two plain patterns).
Also, there were two distinct bidi-tries in which
plain-pattern filters can be moved to: one for patterns
without hostname anchoring and another one for patterns
with hostname-anchoring. This was required because the
hostname-anchored patterns have an extra condition which
is outside the bidi-trie knowledge.
This commit expands the number of filters which can be
stored in the bidi-trie, and also remove the need to
use two distinct bidi-tries.
- Added ability to associate a pattern with an integer
in the bidi-trie [1].
- The bidi-trie match code passes this externally
provided integer when calling an externally
provided method used for testing extra conditions
that may be present for a plain pattern found to
be matching in the bidi-trie.
- Decomposed existing filters into smaller logical units:
- FilterPlainLeftAnchored =>
FilterPatternPlain +
FilterAnchorLeft
- FilterPlainRightAnchored =>
FilterPatternPlain +
FilterAnchorRight
- FilterExactMatch =>
FilterPatternPlain +
FilterAnchorLeft +
FilterAnchorRight
- FilterPlainHnAnchored =>
FilterPatternPlain +
FilterAnchorHn
- FilterWildcard1 =>
FilterPatternPlain + [
FilterPatternLeft or
FilterPatternRight
]
- FilterWildcard1HnAnchored =>
FilterPatternPlain + [
FilterPatternLeft or
FilterPatternRight
] +
FilterAnchorHn
- FilterGenericHnAnchored =>
FilterPatternGeneric +
FilterAnchorHn
- FilterGenericHnAndRightAnchored =>
FilterPatternGeneric +
FilterAnchorRight +
FilterAnchorHn
- FilterOriginMixedSet =>
FilterOriginMissSet +
FilterOriginHitSet
- Instances of FilterOrigin[...], FilterDataHolder
can also be added to a composite filter to
represent `domain=` and `csp=` options.
- Added a new filter class, FilterComposite, for
filters which are a combination of two or more
logical units. A FilterComposite instance is a
match when *all* filters composing it are a
match.
Since filters are now encoded into combination of
smaller units, it becomes possible to extract the
FilterPatternPlain component and store it in the
bidi-trie, and use the integer as a handle for the
remaining extra conditions, if any.
Since a single pattern in the bidi-trie may be a
component for different filters, the associated
integer points to a sequence of extra conditions,
and a match occurs as soon as one of the extra
conditions (which may itself be a sequence of
conditions) is fulfilled.
Decomposing filters which are currently single
instance into sequences of smaller logical filters
means increasing the storage and CPU overhead when
evaluating such filters. The CPU overhead is
compensated by the fact that more filters can now
moved into the bidi-trie, where the first match is
efficiently evaluated. The extra conditions have to
be evaluated if and only if there is a match in the
bidi-trie.
The storage overhead is compensated by the
bidi-trie's intrinsic nature of merging similar
patterns.
Furthermore, the storage overhead is reduced by no
longer using JavaScript array to store collection
of filters (which is what FilterComposite is):
the same technique used in [2] is imported to store
sequences of filters.
A sequence of filters is a sequence of integer pairs
where the first integer is an index to an actual
filter instance stored in a global array of filters
(`filterUnits`), while the second integer is an index
to the next pair in the sequence -- which means all
sequences of filters are encoded in one single array
of integers (`filterSequences` => Uint32Array). As
a result, a sequence of filters can be represented by
one single integer -- an index to the first pair --
regardless of the number of filters in the sequence.
This representation is further leveraged to replace
the use of JavaScript array in FilterBucket [3],
which used a JavaScript array to store collection
of filters. Doing so means there is no more need for
FilterPair [4], which purpose was to be a lightweight
representation when there was only two filters in a
collection.
As a result of the above changes, the map of `token`
(integer) => filter instance (object) used to
associate tokens to filters or collections of filters
is replaced with a more efficient map of `token`
(integer) to filter unit index (integer) to lookup a
filter object from the global `filterUnits` array.
Another consequence of using one single global
array to store all filter instances means we can reuse
existing instances when a logical filter instance is
parameter-less, which is the case for FilterAnchorLeft,
FilterAnchorRight, FilterAnchorHn, the index to these
single instances is reused where needed.
`urlTokenizer` now stores the character codes of the
scanned URL into a bidi-trie buffer, for reuse when
string matching methods are called.
New method: `tokenHistogram()`, used to generate
histograms of occurrences of token extracted from URLs
in built-in benchmark. The top results of the "miss"
histogram are used as "bad tokens", i.e. tokens to
avoid if possible when compiling filter lists.
All plain pattern strings are now stored in the
bidi-trie memory buffer, regardless of whether they
will be used in the trie proper or not.
Three methods have been added to the bidi-trie to test
stored string against the URL which is also stored in
then bidi-trie.
FilterParser is now instanciated on demand and
released when no longer used.
***
[1] 135a45a878/src/js/strie.js (L120)
[2] e94024d350
[3] 135a45a878/src/js/static-net-filtering.js (L1630)
[4] 135a45a878/src/js/static-net-filtering.js (L1566)
Related documentation:
- https://help.eyeo.com/en/adblockplus/how-to-write-filters#element-hiding
Related feedback/discussion:
- https://www.reddit.com/r/uBlockOrigin/comments/d6vxzj/
The `elemhide` filter option as per ABP semantic is
now supported. Previously uBO would consider `elemhide`
to be an alias of `generichide`.
The support of `elemhide` is through the convenient
conversion of `elemhide` option into existing
`generichide` option and new `specifichide` option.
The purpose of the new `specifichide` filter option
is to disable all specific cosmetic filters, i.e.
those who target a specific site.
Additionally, for convenience purpose, the filter
options `generichide`, `specifichide` and `elemhide`
can be aliased using the shorter forms `ghide`,
`shide` and `ehide` respectively.
* refactoring assets management code
* finalizing refactoring of assets management
* various code review of new assets management code
* fix#2281
* fix#1961
* fix#1293
* fix#1275
* fix update scheduler timing logic
* forward compatibility (to be removed once 1.11+ is widespread)
* more codereview; give admins ability to specify own assets.json
* "assetKey" is more accurate than "path"
* fix group count update when building dom incrementally
* reorganize content (order, added URLs, etc.)
* ability to customize updater through advanced settings
* better spinner icon