Related discussions:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1356#issuecomment-732411286
- https://github.com/AdguardTeam/CoreLibs/issues/1384
Changes:
Negation character is `~` (instead of `!`).
Drop special anchor character `|` -- leading `|`
will be supported until no such filter is present
in uBO's own filter lists. For example, instance
of `queryprune=|ad` will have to be replaced with
`queryprune=/^ad/` (or `queryprune=ad` if the name
of the parameter to remove is exactly `ad`).
Align semantic with that of AdGuard's `removeparam=`,
except that specifying multiple `|`-separated names
is not supported.
`match-case`
------------
Related issue:
- https://github.com/uBlockOrigin/uAssets/issues/8280#issuecomment-735245452
The new filter option `match-case` can be used only for
regex-based filters. Using `match-case` with any other
sort of filters will cause uBO to discard the filter.
`redirect=`
-----------
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1366
`redirect=` filters with unresolvable resource token at
runtime will be discarded.
Additionally, the implicit priority is now set to 1
(was 0). The idea is to allow custom `redirect=` filters
to be used strictly as fallback `redirect=` filters in case
another `redirect=` filter is not picked up.
For example, one might create a `redirect=click2load.html:0`
filter, to be taken if and only if the blocked resource is
not already being redirected by another "official" filter
in one of the enabled filter lists.
Related issue:
- https://github.com/gorhill/uBlock/issues/3590
Since the `redirect=` option was refactored into a modifier
filter, presence of a type (`script`, `xhr`, etc.) is no
longer a requirement.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1356
Related commit:
- bde3164eb4
It is not possible to achieve perfect compatiblity at this
point, but reasonable compatibility should be achieved for
a majority of instances of `removeparam=`.
Notable differences:
--------------------
uBO always matches in a case insensitive manner, there is
no need to ask for case-insensitivity, and no need to use
uppercase characters in `queryprune=` values.
uBO does not escape special regex characters since the
`queryprune=` values are always assumed to be literal
regex expression (leaving out the documented special
characters). This means `removeparam=` with characters
which are special regex characters won't be properly
translated and are unlikely to work properly in uBO.
For example, the `queryprune` value of a filter such as
`$removeparam=__xts__[0]` internally become the literal
regex `/__xts__[0]/`, and consequently would not match
a query parameter such as `...?__xts__[0]=...`.
Notes:
------
Additionally, for performance reason, when uBO encounter
a pattern-less `queryprune=` (or `removeparam=`) filter,
it will try to extract a valid pattern from the
`queryprune=` value. For instance, the following filter:
$queryprune=utm_campaign
Will be translated internally into:
utm_campaign$queryprune=utm_campaign
The logger will reflect this internal translation.
New filter options
==================
Strict partyness: `1P`, `3P`
----------------------------
The current options 1p/3p are meant to "weakly" match partyness, i.e. a
network request is considered 1st-party to its context as long as both the
context and the request share the same base domain.
The new partyness options are meant to check for strict partyness, i.e. a
network request will be considered 1st-party if and only if both the context
and the request share the same hostname.
For examples:
- context: `www.example.org`
- request: `www.example.org`
- `1p`: yes, `1P`: yes
- `3p`: no, `3P`: no
- context: `www.example.org`
- request: `subdomain.example.org`
- `1p`: yes, `1P`: no
- `3p`: no, `3P`: yes
- context: `www.example.org`
- request: `www.example.com`
- `1p`: no, `1P`: no
- `3p`: yes, `3P`: yes
The strict partyness options will be visually emphasized in the editor so as
to prevent mistakenly using `1P` or `3P` where weak partyness is meant to be
used.
Filter on response headers: `header=`
-------------------------------------
Currently experimental and under evaluation. Disabled by default, enable by
toggling `filterOnHeaders` to `true` in advanced settings.
Ability to filter network requests according to whether a specific response
header is present and whether it matches or does not match a specific value.
For example:
*$1p,3P,script,header=via:1\.1\s+google
The above filter is meant to block network requests which fullfill all the
following conditions:
- is weakly 1st-party to the context
- is not strictly 1st-party to the context
- is of type `script`
- has a response HTTP header named `via`, which value matches the regular
expression `1\.1\s+google`.
The matches are always performed in a case-insensitive manner.
The header value is assumed to be a literal regular expression, except for
the following special characters:
- to anchor to start of string, use leading `|`, not `^`
- to anchor to end of string, use trailing `|`, not `$`
- to invert the test, use a leading `!`
To block a network request if it merely contains a specific HTTP header is
just a matter of specifying the header name without a header value:
*$1p,3P,script,header=via
Generic exception filters can be used to disable specific block `header=`
filters, i.e. `@@*$1p,3P,script,header` will override the block `header=`
filters given as example above.
Dynamic filtering's `allow` rules override block `headers=` filters.
Important: It is key that filter authors use as many narrowing filter options
as possible when using the `header=` option, and the `header=` option should
be used ONLY when other filter options are not sufficient.
More documentation justifying the purpose of `header=` option will be
provided eventually if ever it is decided to move it from experimental to
stable status.
To be decided: to restrict usage of this filter option to only uBO's own
filter lists or "My filters".
Changes
=======
Fine tuning `queryprune=`
-------------------------
The following changes have been implemented:
The special value `*` (i.e. `queryprune=*`) means "remove all query
parameters".
If the `queryprune=` value is made only of alphanumeric characters
(including `_`), the value will be internally converted to regex equivalent
`^value=`. This ensures a better future compatibility with AdGuard's
`removeparam=`.
If the `queryprune=` value starts with `!`, the test will be inverted. This
can be used to remove all query parameters EXCEPT those who match the
specified value.
Other
-----
The legacy code to test for spurious CSP reports has been removed. This
is no longer an issue ever since uBO redirects to local resources through
web accessible resources.
Notes
=====
The following new and recently added filter options are not compatible with
Chromium's manifest v3 changes:
- `queryprune=`
- `1P`
- `3P`
- `header=`
The auto-complete feature in the _"My filters"_ pane will
use hostname/domain from the set of opened tabs to assist
in entering values for `domain=` option. This also works
for the implict `domain=` option ṗrepending static extended
filters.
`about:srcdoc` frames are their own origin, trying to
use the origin of the parent context causes an
exception to be thrown when accessing location.href.
Notably, add clickable link to open the widget
in its own tab. Also, allows the URL to be text-
selected so that it becomes possible to use the
selection in a browser contextual menu's "Open
in a new tab" option.
Notably, make `queryprune` option available only
to filter list authors, until there are guards
against bad filters in some future and until the
option syntax and behavior is fully settled.
Instances of `queryprune` in filter lists will be
compiled, however instances of `queryprune` in
_"My filters"_ will be ignored unless users
indicated they are a filter list author.
The important bit is now considered an action bit
so that there is no more a need for the `important`
property in the parser. The modify bit is now
considered a realm bit.
When the modify bit is set, the action bits become
available to be used to further narrow the realm.
This could be useful in the future if we want to
spread the population of modifier filters across
different buckets.
Reusing the same iterator instance for all cases
of `domain=` option parsing should reduce memory
churning.
Additonally, fine tune regex used to extract
valid token from regex-based filters to increase
likelihood of being able to extract a valid
token.
Reported internally by @gwarser.
In rare occasion, a timing issue could cause uBO to redirect
to a web accessible resource meant to be used for another
network request. This is a regression introduced with the
following commit:
- 2e5d32e967
Additionally, I identified another issue which would cause
cached redirection to fail when a cache entry with redirection
to a web accessible resource was being reused, an issue which
could especially affect pages which are generated dynamically
(i.e. without full page reload).
filterUnits is now treated as a buffer which is
pre-allocated and which will grow in chunks so as
to minimize memory allocations. Entries are never
released, just null-ed.
Additionally, move urlTokenizer into the static
network filtering engine, since it's not used
anywhere else.
Notably, defer the post-load optimization operations
to a few seconds after the filters have been all
loaded in memory -- this is not a critical step for
the filtering engine to work properly, hence this
can be delayed in order to ensure readiness as soon
as possible.
Most notably, the `denyallow=` option now requires
the presence of a valid `domain=` option to not be
rejected.
Using `denyallow=` without narrowing down using the
`domain=` option leads to catastrophic blocking
behvior, hence the requirement for a valid `domain=`
option.
Related commit:
- b265f2644d
The optimization in the commit above was meant to
improve the performance of lookup operations of
modifier filters, but I forgot to enable the
optimisation for that class of filters.
This means this commit brings another significant
performance gain on top of the previous commit, as
shown by the built-in benchmark.
Additionally a few minor code rearrangements.
Performance-related work.
There is a fair number of filters which can't be tokenized
in uBO's own filter lists. Majority of those filters also
declare a `domain=` option, examples:
*$script,redirect-rule=noopjs,domain=...
*$script,3p,domain=...,denyallow=...
*$frame,3p,domain=...
Such filters can be found in uBO's asset viewer using the
following search expression:
/^\*?\$[^\n]*?domain=/
Some filter buckets will contain many of those filters, for
instance one of the bucket holding untokenizable `redirect=`
filters has over 170 entries, which must be all visited when
collating all matching `redirect=` filters.
When a bucket contains many such filters, I found that it's
worth to extract all the non-negated hostname values from
`domain=` options into a single hntrie and perform a pre-test
at match() time to find out whether the current origin of a
network request matches any one of the collected hostnames,
so as to avoid iterating through all the filters.
Since there is rarely a match() for vast majority of network
requests with `domain=` option, this pre-test saves a good
amount of work, and this is measurable with the built-in
benchmark.
This commit moves the parsing, compiling and enforcement
of the `redirect=` and `redirect-rule=` network filter
options into the static network filtering engine as
modifier options -- just like `csp=` and `queryprune=`.
This solves the two following issues:
- https://github.com/gorhill/uBlock/issues/3590
- https://github.com/uBlockOrigin/uBlock-issues/issues/1008#issuecomment-716164214
Additionally, `redirect=` option is not longer afflicted
by static network filtering syntax quirks, `redirect=`
filters can be used with any other static filtering
modifier options, can be excepted using `@@` and can be
badfilter-ed.
Since more than one `redirect=` directives could be found
to apply to a single network request, the concept of
redirect priority is introduced.
By default, `redirect=` directives have an implicit
priority of 0. Filter authors can declare an explicit
priority by appending `:[integer]` to the token of the
`redirect=` option, for example:
||example.com/*.js$1p,script,redirect=noopjs:100
The priority dictates which redirect token out of many
will be ultimately used. Cases of multiple `redirect=`
directives applying to a single blocked network request
are expected to be rather unlikely.
Explicit redirect priority should be used if and only if
there is a case of redirect ambiguity to solve.