Notably, defer the post-load optimization operations
to a few seconds after the filters have been all
loaded in memory -- this is not a critical step for
the filtering engine to work properly, hence this
can be delayed in order to ensure readiness as soon
as possible.
Most notably, the `denyallow=` option now requires
the presence of a valid `domain=` option to not be
rejected.
Using `denyallow=` without narrowing down using the
`domain=` option leads to catastrophic blocking
behvior, hence the requirement for a valid `domain=`
option.
Related commit:
- b265f2644d
The optimization in the commit above was meant to
improve the performance of lookup operations of
modifier filters, but I forgot to enable the
optimisation for that class of filters.
This means this commit brings another significant
performance gain on top of the previous commit, as
shown by the built-in benchmark.
Additionally a few minor code rearrangements.
Performance-related work.
There is a fair number of filters which can't be tokenized
in uBO's own filter lists. Majority of those filters also
declare a `domain=` option, examples:
*$script,redirect-rule=noopjs,domain=...
*$script,3p,domain=...,denyallow=...
*$frame,3p,domain=...
Such filters can be found in uBO's asset viewer using the
following search expression:
/^\*?\$[^\n]*?domain=/
Some filter buckets will contain many of those filters, for
instance one of the bucket holding untokenizable `redirect=`
filters has over 170 entries, which must be all visited when
collating all matching `redirect=` filters.
When a bucket contains many such filters, I found that it's
worth to extract all the non-negated hostname values from
`domain=` options into a single hntrie and perform a pre-test
at match() time to find out whether the current origin of a
network request matches any one of the collected hostnames,
so as to avoid iterating through all the filters.
Since there is rarely a match() for vast majority of network
requests with `domain=` option, this pre-test saves a good
amount of work, and this is measurable with the built-in
benchmark.
This commit moves the parsing, compiling and enforcement
of the `redirect=` and `redirect-rule=` network filter
options into the static network filtering engine as
modifier options -- just like `csp=` and `queryprune=`.
This solves the two following issues:
- https://github.com/gorhill/uBlock/issues/3590
- https://github.com/uBlockOrigin/uBlock-issues/issues/1008#issuecomment-716164214
Additionally, `redirect=` option is not longer afflicted
by static network filtering syntax quirks, `redirect=`
filters can be used with any other static filtering
modifier options, can be excepted using `@@` and can be
badfilter-ed.
Since more than one `redirect=` directives could be found
to apply to a single network request, the concept of
redirect priority is introduced.
By default, `redirect=` directives have an implicit
priority of 0. Filter authors can declare an explicit
priority by appending `:[integer]` to the token of the
`redirect=` option, for example:
||example.com/*.js$1p,script,redirect=noopjs:100
The priority dictates which redirect token out of many
will be ultimately used. Cases of multiple `redirect=`
directives applying to a single blocked network request
are expected to be rather unlikely.
Explicit redirect priority should be used if and only if
there is a case of redirect ambiguity to solve.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/760
The purpose of this new network filter option is to remove
query parameters form the URL of network requests.
The name `queryprune` has been picked over `querystrip`
since the purpose of the option is to remove some
parameters from the URL rather than all parameters.
`queryprune` is a modifier option (like `csp`) in that it
does not cause a network request to be blocked but rather
modified before being emitted.
`queryprune` must be assigned a value, which value will
determine which parameters from a query string will be
removed. The syntax for the value is that of regular
expression *except* for the following rules:
- do not wrap the regex directive between `/`
- do not use regex special values `^` and `$`
- do not use literal comma character in the value,
though you can use hex-encoded version, `\x2c`
- to match the start of a query parameter, prepend `|`
- to match the end of a query parameter, append `|`
`queryprune` regex-like values will be tested against each
key-value parameter pair as `[key]=[value]` string. This
way you can prune according to either the key, the value,
or both.
This commit introduces the concept of modifier filter
options, which as of now are:
- `csp=`
- `queryprune=`
They both work in similar way when used with `important`
option or when used in exception filters. Modifier
options can apply to any network requests, hence the
logger reports the type of the network requests, and no
longer use the modifier as the type, i.e. `csp` filters
are no longer reported as requests of type `csp`.
Though modifier options can apply to any network requests,
for the time being the `csp=` modifier option still apply
only to top or embedded (frame) documents, just as before.
In some future we may want to apply `csp=` directives to
network requests of type script, to control the behavior
of service workers for example.
A new built-in filter expression has been added to the
logger: "modified", which allow to see all the network
requests which were modified before being emitted. The
translation work for this new option will be available
in a future commit.
Additionally, add a button in the About pane
to launch benchmark sessions. The button will
be available only when advanced setting
`benchmarkDatasetURL` is set and pointing to
a valid dataset.