There is a lot asynchronicity in the auto-update code, and
the fix here is to detect then fix instances of out of sync
state between a cached filter list and its metadata stored
separately.
When done compiling, force CSSTree to parse an empty string, so
as to ensure it doesn't keep a reference to that string.
Typically, the string passed to CSSTree is a small slice of a
larger string which is a whole filter list. This means that
holding a reference to the sliced string causes the JS engine
to hold in memory to the whole filter list last parsed.
Reference documentation:
https://adguard.com/kb/general/ad-filtering/create-own-filters/#replace-modifier
This is a network filter option which can only be loaded from a
trusted source.
Since this filter is about modifying the response body, it currently
only works in Firefox.
As discussed with filter list maintainers.
Manual update of one or more lists will cause the most recent version
of these lists to be fetched from the "origin" server, and since the
lists from "origin" servers cannot be updated through differential
update, the lists will be subsequently updated according to their
`Expires` directive.
When the lists are auto-updated, the "CDN" servers will be used,
and as a result the lists will start to be updated trhough
differential updates every 6-hour (currently).
Thus it is recommended and optimal to let the lists auto-update,
since you will benefit from a much shorter delay to get up-to-date
lists (i.e. every 6-hour instead of every 6-day).
You can force the auto-updater to fetch all the lists by clicking
"Purge all caches", then restart uBO without clicking "Update".
This will cause uBO to perform an emergency auto-update at restart
time, after which you will have all the lists which are candidates
for differential update.
The "Update now" button in the "Support" pane will also cause lists
to be fetched from their "origin" server.
Related discussion:
https://github.com/ameshkov/diffupdates
The benefits of diff-patching filter lists is much shorter update
schedule and significantly less bandwidth consumed.
At the moment, only default filter lists are subject to be
diff-patched.
External filter lists can make their lists diff-patchable by
following the specification link above.
Only filter lists fetched by the auto-updater are candidate for
diff-patching.
Forcing a manual update of the filter lists will prevent the
diff-patcher from kicking in until one or more lists are
auto-updated.
Some back-of-the-envelop calculations regarding the load on free
CDN solutions used by uBO to distribute its own filter lists:
Currently, for each CDN (with lists updating after days):
~560 M req/month, ~78 TB/month
With diff-patching lists on a 6-hour schedule:
~390 M req/month, 1 TB/month
Those estimates were done according to statistics shown by
jsDelivr, which is one of 4 CDNs picked randomly when a list
updates:
https://www.jsdelivr.com/package/gh/uBlockOrigin/uAssetsCDN?tab=stats
I worked through some of the websites listed in the google-ima shim
script issue[1], to see what was going wrong. It turned out the
addEventListener method supports an optional context Object, which is
bound to the listener if provided. Some websites make use of that,
and then break when `this` is not bound correctly when events are
dispatched.
See also https://github.com/duckduckgo/tracker-surrogates/pull/24
1 - https://github.com/uBlockOrigin/uBlock-issues/issues/2265
Related discussion:
https://github.com/uBlockOrigin/uBlock-issues/discussions/2895
Changes:
The _content of the My filters_ pane is now considered untrusted by
default, and only uBO's own lists are now trusted by default.
It has been observed that too many people will readily copy-paste
filters from random sources. Copy-pasting filters which require trust
represents a security risk to users with no understanding of how the
filters work and their potential abuse.
Using a filter which requires trust in a filter list from an untrusted
source will cause the filter to be invalid, i.e. shown as an error.
A new advanced setting has been added to control which lists are
considered trustworthy: `trustedListPrefixes`, which is a space-
separated list of tokens. Examples of possible values:
- `ublock-`: trust only uBO lists, exclude everything else including
content of _My filters_ (default value)
- `ublock- user-`: trust uBO lists and content of _My filters_
- `-`: trust no list, essentially disabling all filters requiring
trust (admins or people who don't trust us may want to use this)
One can also decide to trust lists maintained elsewhere. For example,
for stock AdGuard lists add ` adguard-`. To trust stock EasyList lists,
add ` easylist-`.
To trust a specific regional stock list, look-up its token in
assets.json and add to `trustedListPrefixes`.
The matching is made with String.startsWith(), hence why `ublock-`
matches all uBO's own filter lists.
This also allows to trust imported lists, for example add
` https://filters.adtidy.org/extension/ublock/filters/` to trust all
non-stock AdGuard lists.
Add the complete URL of a given imported list to trust only that one
list.
URLs not starting with `https://` or `file:///` will be rejected,
i.e. `http://example.org` will be ignored.
Invalid URLs are rejected.
Related issue:
https://github.com/uBlockOrigin/uBlock-issues/issues/2896
TODO: Eventually, distinguish between filtering profile increasing
or decreasing so as to avoid flushing caches when increasing
filtering, which should not affect the scriptlets cache.
(In addition to in already supported single- and double-quote).
The parsing of (optionally) quoted arguments from an argument
list has been spinned off into a standalone helper in order to
be reused in other parts of the parser eventually.
The `urltransform` option allows to redirect a non-blocked network
request to another URL. There are restrictions on its usage:
- require a trusted source -- thus uBO-maintained lists or user
filters
- the `urltransform` value must start with a `/`
If at least one of these conditions is not fulfilled, the filter
will be invalid and rejected.
The requirement to start with `/` is to enforce that only the path
part of a URL can be modified, thus ensuring the network request
is redirected to the same scheme and authority (as defined at
https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Syntax).
Usage example (redirect requests for CSS resources to a non-existing
resource, for demonstration purpose):
||iana.org^$css,urltransform=/notfound.css
Name of this option is inspired from DNR API:
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/declarativeNetRequest/URLTransform
This commit required to bring the concept of "trusted source" to
the static network filtering engine.
As per discussion with uBO volunteers.
Volunteers offering support for uBO will be able to craft links with
specially formed URLs, which once clicked will cause uBO to automatically
force an update of specified filter lists.
The URL must be crafted as shown in the example below:
https://ublockorigin.github.io/uAssets/update-lists.html?listkeys=ublock-filters,easylist
Where the `listkeys` parameter is a comma-separated list of tokens
corresponding to filter lists. If a token does not match an enabled
filter list, it will be ignored.
The ability to update filter lists through a specially crafted link
is available only on uBO's own support sites:
- https://github.com/uBlockOrigin/
- https://reddit.com/r/uBlockOrigin/
- https://ublockorigin.github.io/
Additionally, a visual cue has been added in the "Filter lists" pane
to easily spot the filter lists which have been recently updated, where
"recently" is currently defined as less than an hour ago.
Additionally, finalize versioning scheme for uBOL. Since most updates
will be simply related to update rulesets, the version will from now
on reflects the date at which the extension package was created:
year.month.day.minutes
So for example:
2023.8.19.690
Related issue:
https://github.com/uBlockOrigin/uBlock-issues/issues/2778
Regression from:
bb41d9594f
The regression occurred because the modified code made the assumption
that a leading combinator would never be preceded by whitespace, while
the parser didn't prevent this.
The parser has been fixed to ensure there is never a leading
whitespace in a selector.
Related issue:
https://github.com/uBlockOrigin/uBlock-issues/issues/2773
The `randomize` paramater introduced in https://github.com/gorhill/uBlock/commit/418087de9c
is now named `directive`, and beside the `true` value which is meant
to respond with a random 10-character string, it can now take the
following value:
war:[web_accessible_resource name]
In order to mock the XHR response with a web accessible resource. For
example:
piquark6046.github.io##+js(no-xhr-if, adsbygoogle.js, war:googlesyndication_adsbygoogle.js)
Will cause the XHR performed by the webpage to resolve to the content
of `/web_accessible_resources/googlesyndication_adsbygoogle.js`.
Should the resource not exist, the empty string will be returned.
Additionally:
Use `export UBO_VERSION=local` at the console to build MV3 extension using
current version of uBO code base. By default, the version is taken from
`./platform/mv3/ubo-version' and usually set to last stable release.
Prepend pattern with `!` to test for unmatched patterns in
stack trace. This applies to sctiplet parameters which purpose
is to test against the stack, i.e. `aost` and `json-prune`.
Additionally, dropped support for JSON notation in favor of
optional variable arguments notation.
Related discussion:
- https://github.com/uBlockOrigin/uBlock-discussions/discussions/789#discussioncomment-6520330
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/2730
CSS selectors used in cosmetic filtering are normalized in order
to ignore non-functional differences. For instance:
example.org##body p
example.org#@#body p
The first cosmetic filter should be excepted by the second one,
but this was not the case because the fast path use to compile
common CSS selectors was not causing normalization to take
place.
The fix is to ensure that the fast path used to compile most
common CSS selectors is taken only when in presence of already
normalized CSS selectors.
The change allows to better parse AdGuard filters with `replace=`
option when the value to the `replace=` option contains dollar
sign character `$`. uBO will still reject these filters but will
better identify which dollar sign `$` is the real filter option
anchor.
Reference:
- https://adguard.com/kb/general/ad-filtering/create-own-filters/#noop-modifier
uBO already supported the noop filter option `_` to allow filter
authors to resolve possible ambiguities arising when crafting network
filters with many options.
AdGuard extended the semantic of the `_` option to also resolve
readability issues by supporting multiple instances of the `_` option
in a single filter, and also by supporting any number of consecutive
`_` in a single noop filter option.
Reference:
https://adguard.com/kb/general/ad-filtering/create-own-filters/#conditions-directive
This commit should make uBO fully compatible with the `!#if`
directives found throughout AdGuard's filter lists.
Additionally, added the new `!#else` directive for convenience
to filter list authors:
!#if cap_html_filtering
example.com##^script:has-text(fakeAd)
!#else
example.com##+js(rmnt, script, fakeAd)
!#endif
Related issue:
- https://github.com/AdguardTeam/Scriptlets/issues/332
Additionally, uBO's own scriplet syntax now also accept quoting
the parameters with either `'` or `"`. This can be used to avoid
having to escape commas when they are present in a parameter.
The syntax highlighter could throw with some invalid static
network filter patterns. This was caused by the syntax
highlighter still drilling down the pattern parts after
having told codemirror to style the whole pattern as an
error, thus causing the codemirror stream position to go
backward.
Since in uBOL filter lists from various sources are combined into
a single list, there must be a way to turn on/off trust level
inside the resulting combined filter list so as to be able to
validate the trust level of filters requiring trust.
This commit adds new parser directives understood only by MV3
compiler to turn on/off trust flag internally.
New official name: `no-window-open-if`.
The pattern will now be matched against all arguments passed
to `window.open()`: all the arguments are joined as a single
space-spearated string, and the result is used as the target
for matching the pattern.
To enable logging, used the extra parameters approach, i.e.
`log, 1`, which should come after the positional arguments
`pattern`, `delay`, and `decoy`.
Source code of scriplets is now fetched directly from uBO
project, so there is no longer the need to keep duplicate
versions of scriplet code.
All scriplet filters are now supported.
Some filters with entity-based domain option can be salvaged
when there are non-entity-based domain option, but since we are
throwing away the entity-based entries, we are only partially
converting to DNR. This commit will log a warning about this
in log.txt. Before this commit, only non-salvageable filters
were logged.
This reflects the _world_ of the MV3 scripting API:
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/scripting/ExecutionWorld
MAIN: page's world
ISOLATED: extension's content script world
Some scriptlets are best executed in either world, so this
commit allows to pick in which world a scriptlet should execute
(default to MAIN).
For instance, the new sed.js scriptlet will now execute in
the ISOLATED world.
At the moment, the only filter lists deemed from a "trusted source"
are uBO-specific filter lists (i.e. "uBlock filters -- ..."), and
the user's own filters from "My filters".
A new scriptlet which can only be used by filter lists from trusted
sources has been introduced: `sed.js`.
The new `sed.js` scriptlet provides the ability to perform
text-level substitutions. Usage:
example.org##+js(sed, nodeName, pattern, replacement, ...)
`nodeName`
The name of the node for which the text content must be substituted.
Valid node names can be found at:
https://developer.mozilla.org/en-US/docs/Web/API/Node/nodeName
`pattern`
A string or regex to find in the text content of the node as the target of
substitution.
`replacement`
The replacement text. Can be omitted if the goal is to delete the text which
matches the pattern. Cannot be omitted if extra pairs of parameters have to be
used (see below).
Optionally, extra pairs of parameters to modify the behavior of the scriptlet:
`condition, pattern`
A string or regex which must be found in the text content of the node
in order for the substitution to occur.
`sedCount, n`
This will cause the scriptlet to stop after n instances of substitution. Since
a mutation oberver is used by the scriptlet, it's advised to stop it whenever
it becomes pointless. Default to zero, which means the scriptlet never stops.
`tryCount, n`
This will cause the scriptlet to stop after n instances of mutation observer
run (regardless of whether a substitution occurred). Default to zero, which
means the scriptlet never stops.
`log, 1`
This will cause the scriptlet to output information at the console, useful as
a debugging tool for filter authors. The logging ability is supported only
in the dev build of uBO.
Examples of usage:
example.com##+js(sed, script, /devtoolsDetector\.launch\(\)\;/, , sedCount, 1)
example.com##+js(sed, #text, /^Advertisement$/)
Related feedback:
- https://www.reddit.com/r/uBlockOrigin/comments/13enzvv/
When assessing which default lists to disable/enable after
updating from 1.48.x to 1.49.x, uBO has to ignore imported
lists, as these do not have a `off` property -- the
non-existence of this property was used to determine whether
a list was default or not. There needs to be an extra test for
whether the list is imported or not.
As discussed internally with filter list maintainers.
Additionally, added a search field to filter out lists. This
is still a work in progress, no need to open issues about this,
I am aware of what is missing (i18n, more tags, etc.)
Related discussion:
- https://github.com/uBlockOrigin/uBlock-issues/discussions/2582
If there exist any built-in filter list which last update time
is older than 2 hours, the "Report a filter issue" page will ask
the user to update their filter lists then verify that the issue
still exists.
Once filter lists are updated, the troubleshooting information
will reflect the change in update time.