Notably, make `queryprune` option available only
to filter list authors, until there are guards
against bad filters in some future and until the
option syntax and behavior is fully settled.
Instances of `queryprune` in filter lists will be
compiled, however instances of `queryprune` in
_"My filters"_ will be ignored unless users
indicated they are a filter list author.
The important bit is now considered an action bit
so that there is no more a need for the `important`
property in the parser. The modify bit is now
considered a realm bit.
When the modify bit is set, the action bits become
available to be used to further narrow the realm.
This could be useful in the future if we want to
spread the population of modifier filters across
different buckets.
Reusing the same iterator instance for all cases
of `domain=` option parsing should reduce memory
churning.
Additonally, fine tune regex used to extract
valid token from regex-based filters to increase
likelihood of being able to extract a valid
token.
Reported internally by @gwarser.
In rare occasion, a timing issue could cause uBO to redirect
to a web accessible resource meant to be used for another
network request. This is a regression introduced with the
following commit:
- 2e5d32e967
Additionally, I identified another issue which would cause
cached redirection to fail when a cache entry with redirection
to a web accessible resource was being reused, an issue which
could especially affect pages which are generated dynamically
(i.e. without full page reload).
filterUnits is now treated as a buffer which is
pre-allocated and which will grow in chunks so as
to minimize memory allocations. Entries are never
released, just null-ed.
Additionally, move urlTokenizer into the static
network filtering engine, since it's not used
anywhere else.
Notably, defer the post-load optimization operations
to a few seconds after the filters have been all
loaded in memory -- this is not a critical step for
the filtering engine to work properly, hence this
can be delayed in order to ensure readiness as soon
as possible.
Most notably, the `denyallow=` option now requires
the presence of a valid `domain=` option to not be
rejected.
Using `denyallow=` without narrowing down using the
`domain=` option leads to catastrophic blocking
behvior, hence the requirement for a valid `domain=`
option.
Related commit:
- b265f2644d
The optimization in the commit above was meant to
improve the performance of lookup operations of
modifier filters, but I forgot to enable the
optimisation for that class of filters.
This means this commit brings another significant
performance gain on top of the previous commit, as
shown by the built-in benchmark.
Additionally a few minor code rearrangements.
Performance-related work.
There is a fair number of filters which can't be tokenized
in uBO's own filter lists. Majority of those filters also
declare a `domain=` option, examples:
*$script,redirect-rule=noopjs,domain=...
*$script,3p,domain=...,denyallow=...
*$frame,3p,domain=...
Such filters can be found in uBO's asset viewer using the
following search expression:
/^\*?\$[^\n]*?domain=/
Some filter buckets will contain many of those filters, for
instance one of the bucket holding untokenizable `redirect=`
filters has over 170 entries, which must be all visited when
collating all matching `redirect=` filters.
When a bucket contains many such filters, I found that it's
worth to extract all the non-negated hostname values from
`domain=` options into a single hntrie and perform a pre-test
at match() time to find out whether the current origin of a
network request matches any one of the collected hostnames,
so as to avoid iterating through all the filters.
Since there is rarely a match() for vast majority of network
requests with `domain=` option, this pre-test saves a good
amount of work, and this is measurable with the built-in
benchmark.