This commit is a rewrite of the static filtering parser into a
tree-based data structure, for easier maintenance and better
abstraction of parsed filters.
This simplifies greatly syntax coloring of filters and also
simplify extending filter syntax.
The minimum version of Chromium-based browsers has been raised
to version 73 because of usage of String.matchAll().
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1791
The following case of invalid syntax was not reported as
invalid by the syntax highlighter:
... example.com image ...
With dynamic filtering, there can't be a specific
hostname when a specific type is used, or a
specific type when a specific hostname is used, one
or the other must be `*`.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1690
New procedural operator: `:matches-path(...)`
Description: this is a all-or-nothing passthrough operator, which
on/off behavior is dictated by whether the argument match the
path of the current location. The argument can be either plain
text to be found at any position in the path, or a literal regex
against which the path is tested.
Whereas cosmetic filters can be made specific to whole domain,
the new `:matches-path()` operator allows to further narrow
the specificity according to the path of the current document
lcoation.
Typically this procedural operator is used as first operator in
a procedural cosmetic filter, so as to ensure that no further
matching work is performed should there be no match against the
current path of the current document location.
Example of usage:
example.com##:matches-path(/shop) p
Will hide all `p` elements when visiting `https://example.com/shop/stuff`,
but not when visiting `https://example.com/` or any other page
on `example.com` which has no instance of `/shop` in the path part
of the URL.
The code exported to nodejs package was revised to use modern
JavaScript syntax. A few issues were fixed at the same time.
The exported classes are:
- DynamicHostRuleFiltering
- DynamicURLRuleFiltering
- DynamicSwitchRuleFiltering
These related to the content the of "My rules" pane in the
uBlock Origin extension.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1664
The changes are enough to fulfill the related issue.
A new platform has been added in order to allow for building
a NodeJS package. From the root of the project:
./tools/make-nodejs
This will create new uBlock0.nodejs directory in the
./dist/build directory, which is a valid NodeJS package.
From the root of the package, you can try:
node test
This will instantiate a static network filtering engine,
populated by easylist and easyprivacy, which can be used
to match network requests by filling the appropriate
filtering context object.
The test.js file contains code which is typical example
of usage of the package.
Limitations: the NodeJS package can't execute the WASM
versions of the code since the WASM module requires the
use of fetch(), which is not available in NodeJS.
This is a first pass at modularizing the codebase, and
while at it a number of opportunistic small rewrites
have also been made.
This commit requires the minimum supported version for
Chromium and Firefox be raised to 61 and 60 respectively.
The syntax to remove response header is a special case
of HTML filtering, whereas the response headers are
targeted, rather than the response body:
example.com##^responseheader(header-name)
Where `header-name` is the name of the header to
remove, and must always be lowercase.
The removal of response headers can only be applied to
document resources, i.e. main- or sub-frames.
Only a limited set of headers can be targeted for
removal:
location
refresh
report-to
set-cookie
This limitation is to ensure that uBO never lowers the
security profile of web pages, i.e. we wouldn't want to
remove `content-security-policy`.
Given that the header removal occurs at onHeaderReceived
time, this new ability works for all browsers.
The motivation for this new filtering ability is instance
of website using a `refresh` header to redirect a visitor
to an undesirable destination after a few seconds.
This commit fixes mouse double-click-and-drag operations,
which was broken due to the implementation of a custom
word selection in the filter list editor/viewer.
Regex-based static network filters are those most likely to
cause performance degradation, and as such the best guard
against undue performance degradation caused by regex-based
filters is the ability to extract valid and good tokens
from regex patterns.
This commit introduces a complete regex parser so that the
static network filtering engine can now safely extract
tokens regardless of the complexity of the regex pattern.
The regex parser is a library imported from:
https://github.com/foo123/RegexAnalyzer
The syntax highlighter adds an underline to regex-based
filters as a visual aid to filter authors so as to avoid
mistakenly creating regex-based filters. This commit
further colors the underline as a warning when a regex-based
filter is found to be untokenizable.
Filter list authors are invited to spot these untokenizable
regex-based filters in their lists to verify that no
mistake were made for those filters, causing them to be
untokenizabke. For example, what appears to be a mistake:
/^https?:\/\/.*\/sw.js?.[a-zA-Z0-9%]{50,}/
Though the mistake is minor, the regex-based filter above
is untokenizable as a result, and become tokenizable when
the `.` is properly escaped:
/^https?:\/\/.*\/sw\.js?.[a-zA-Z0-9%]{50,}/
Filter list authors can use this search expression in the
asset viewer to find instances of regex-based filters:
/^(@@)?\/[^\n]+\/(\$|$)/
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/857
The recognized resources are:
- abp-resource:blank-mp3
- abp-resource:blank-js
ABP's tokens are excluded from auto-complete so as to not
get in the way of uBO's filter list maintainers.
All matching `redirect-rule` directives will now be reported
in the logger, instead of just the effective one.
The highest-ranked redirect directive will be the one
effectively used for redirection. This way filter list
authors can see whether a lower priority redirect is
being overriden by a higher priority one.
The default priority has been changed to 10, so as to allow
more leeway to create lower ranked redirect directives.
Additonally, rendering of redirect directives with explicit
priority has been fixed in the logger, they will no longer
be rendered as unknown redirect tokens.
New filter options
==================
Strict partyness: `1P`, `3P`
----------------------------
The current options 1p/3p are meant to "weakly" match partyness, i.e. a
network request is considered 1st-party to its context as long as both the
context and the request share the same base domain.
The new partyness options are meant to check for strict partyness, i.e. a
network request will be considered 1st-party if and only if both the context
and the request share the same hostname.
For examples:
- context: `www.example.org`
- request: `www.example.org`
- `1p`: yes, `1P`: yes
- `3p`: no, `3P`: no
- context: `www.example.org`
- request: `subdomain.example.org`
- `1p`: yes, `1P`: no
- `3p`: no, `3P`: yes
- context: `www.example.org`
- request: `www.example.com`
- `1p`: no, `1P`: no
- `3p`: yes, `3P`: yes
The strict partyness options will be visually emphasized in the editor so as
to prevent mistakenly using `1P` or `3P` where weak partyness is meant to be
used.
Filter on response headers: `header=`
-------------------------------------
Currently experimental and under evaluation. Disabled by default, enable by
toggling `filterOnHeaders` to `true` in advanced settings.
Ability to filter network requests according to whether a specific response
header is present and whether it matches or does not match a specific value.
For example:
*$1p,3P,script,header=via:1\.1\s+google
The above filter is meant to block network requests which fullfill all the
following conditions:
- is weakly 1st-party to the context
- is not strictly 1st-party to the context
- is of type `script`
- has a response HTTP header named `via`, which value matches the regular
expression `1\.1\s+google`.
The matches are always performed in a case-insensitive manner.
The header value is assumed to be a literal regular expression, except for
the following special characters:
- to anchor to start of string, use leading `|`, not `^`
- to anchor to end of string, use trailing `|`, not `$`
- to invert the test, use a leading `!`
To block a network request if it merely contains a specific HTTP header is
just a matter of specifying the header name without a header value:
*$1p,3P,script,header=via
Generic exception filters can be used to disable specific block `header=`
filters, i.e. `@@*$1p,3P,script,header` will override the block `header=`
filters given as example above.
Dynamic filtering's `allow` rules override block `headers=` filters.
Important: It is key that filter authors use as many narrowing filter options
as possible when using the `header=` option, and the `header=` option should
be used ONLY when other filter options are not sufficient.
More documentation justifying the purpose of `header=` option will be
provided eventually if ever it is decided to move it from experimental to
stable status.
To be decided: to restrict usage of this filter option to only uBO's own
filter lists or "My filters".
Changes
=======
Fine tuning `queryprune=`
-------------------------
The following changes have been implemented:
The special value `*` (i.e. `queryprune=*`) means "remove all query
parameters".
If the `queryprune=` value is made only of alphanumeric characters
(including `_`), the value will be internally converted to regex equivalent
`^value=`. This ensures a better future compatibility with AdGuard's
`removeparam=`.
If the `queryprune=` value starts with `!`, the test will be inverted. This
can be used to remove all query parameters EXCEPT those who match the
specified value.
Other
-----
The legacy code to test for spurious CSP reports has been removed. This
is no longer an issue ever since uBO redirects to local resources through
web accessible resources.
Notes
=====
The following new and recently added filter options are not compatible with
Chromium's manifest v3 changes:
- `queryprune=`
- `1P`
- `3P`
- `header=`
The auto-complete feature in the _"My filters"_ pane will
use hostname/domain from the set of opened tabs to assist
in entering values for `domain=` option. This also works
for the implict `domain=` option ṗrepending static extended
filters.
Notably, make `queryprune` option available only
to filter list authors, until there are guards
against bad filters in some future and until the
option syntax and behavior is fully settled.
Instances of `queryprune` in filter lists will be
compiled, however instances of `queryprune` in
_"My filters"_ will be ignored unless users
indicated they are a filter list author.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/760
The purpose of this new network filter option is to remove
query parameters form the URL of network requests.
The name `queryprune` has been picked over `querystrip`
since the purpose of the option is to remove some
parameters from the URL rather than all parameters.
`queryprune` is a modifier option (like `csp`) in that it
does not cause a network request to be blocked but rather
modified before being emitted.
`queryprune` must be assigned a value, which value will
determine which parameters from a query string will be
removed. The syntax for the value is that of regular
expression *except* for the following rules:
- do not wrap the regex directive between `/`
- do not use regex special values `^` and `$`
- do not use literal comma character in the value,
though you can use hex-encoded version, `\x2c`
- to match the start of a query parameter, prepend `|`
- to match the end of a query parameter, append `|`
`queryprune` regex-like values will be tested against each
key-value parameter pair as `[key]=[value]` string. This
way you can prune according to either the key, the value,
or both.
This commit introduces the concept of modifier filter
options, which as of now are:
- `csp=`
- `queryprune=`
They both work in similar way when used with `important`
option or when used in exception filters. Modifier
options can apply to any network requests, hence the
logger reports the type of the network requests, and no
longer use the modifier as the type, i.e. `csp` filters
are no longer reported as requests of type `csp`.
Though modifier options can apply to any network requests,
for the time being the `csp=` modifier option still apply
only to top or embedded (frame) documents, just as before.
In some future we may want to apply `csp=` directives to
network requests of type script, to control the behavior
of service workers for example.
A new built-in filter expression has been added to the
logger: "modified", which allow to see all the network
requests which were modified before being emitted. The
translation work for this new option will be available
in a future commit.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1134
Double-clicking on...
... a filter option will cause the option to be
wholly selected, including `=[value]` if present;
... a value assigned to a filter option will cause
the value to be wholly selected, except when the
value is a hostname/entity, in which case all the
labels from the cursor position to the right-most
label will be selected.
This allows to bring in all the benefits of
syntax highlighting and enhanced editing
features in the element picker, like auto-
completion, etc.
This is also a necessary step to possibly solve
the following issue:
- https://github.com/gorhill/uBlock/issues/2035
Additionally, incrementally improved the behavior
of uBO's custom CodeMirror static filtering syntax
mode when double-clicking somewhere in a static
extended filter:
- on a class/id string will cause the whole
class/id string to be selected, including the
prepending `.`/`#`.
- somewhere in a hostname/entity will cause all
the labels from the cursor position to the
right-most label to be selected (subject to
change/fine-tune as per feedback of filter
list maintainers).
Related feedback:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1134#issuecomment-679421316
Double-cliking on a URL will cause the whole URL to be
selected, thus making it easier to navigate to this
URL (through your browser "Open in new tab" entry in
contextual menu).
Unrecognized scriptlet names will be highlighted so as
to warn that the filter is not going to be effective.
Added a dotted box around found text occurrences,
as just pale yellow to highlight the text is not
enough to visually distinguish from surrounding text.
Iterating through found text occurrences will now
ensure they are vertically positioned in the middle
of the editor.
Code review following latest changes.
Also, move the input field to the left so that it
renders properly on smaller displays and does not
jump around when the result position/count numbers
change.
This also makes it easier to add more functionality
to the editor's toolbar in the future.
Related commit:
- 23332400f5
Since the search worker can go away after its time-to-live
elapsed, we may need to pass again the haystack on which
search operations are performed.
Before this commit, CodeMirror's add-on for search occurrences
was limited to find at most 1000 first occurrences, because of
performance considerations.
This commit removes this low limit by having the search
occurrences done in a dedicated worker. The limit is now
time-based, and highly unlikely to ever be hit under normal
condition.
With this change, all search occurrences are gathered,
and as a result:
- All occurrences are reported in the scrollbar instead of
just the 1,000 first
- The total count of all occurrences is now reported, instead
of capping at "1000+".
- The current occurrence rank at the cursor or selection
position is now reported -- this was not possible to report
this before.
The number of occurrences is line-based, it's not useful to
report finer-grained occurences in uBO.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1134
CodeMirror's code folding reference:
- https://codemirror.net/doc/manual.html#addon_foldcode
This commit adds support for code-folding to the filter
list editor/viewer.
The following blocks of code are foldable by clicking the
corresponding marker in the gutter:
- !#if/#endif blocks
- !#include blocks
Addtionally, the following changes:
- The `!#include` line is now preserved when importing a
sublist
- The `!#if` directives will be syntax-colored according
to whether they evaluate to true or false on the current
platform
- Double-clicking on a foldable line in the gutter will
select the content of the foldable block
- Minor visual improvement to matching brackets
Auto-completion will work only for uBO's own
tokens, compatibility-related tokens[1] will not be
taken into account for auto-completion.
The reason is to not have the compatibility-related
tokens get in the way of auto-completion in order
to not inconvenience uBO's filter list maintainers.
[1] `adguard_ext_chromium`, `adguard_ext_firefox`,
etc.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1134
Invalid values for `!#if ...` will be highlighted as errors.
Auto completion is now supported for both the directives
themselves and the valid values for `!#if ...`.
For examples, when pressing ctrl-space:
- `!#e` will auto-complete to `!#endif`
- `!#i` will offer to choose between `!#if ` or `!#include `
- `!#if fir` will auto-complete to `!#if env_firefox`
Additionally, support for some of AdGuard preparsing
directives, i.e. `!#if adguard` is now a valid and will be
honoured -- it always evaluate to `false` in uBO.