Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1008
This commit adds support entity-matching in the filter
option `domain=`. Example:
pattern$domain=google.*
The `*` above is meant to match any suffix from the Public
Suffix List. The semantic is exactly the same as the
already existing entity-matching support in static
extended filtering:
- https://github.com/gorhill/uBlock/wiki/Static-filter-syntax#entity
Additionally, in this commit:
Fix cases where "just-origin" filters of the form `|http*://`
were erroneously normalized to `|http://`. The proper
normalization of `|http*://` is `*`.
Add support to store hostname strings into the character
buffer of a hntrie container. As of commit time, there are
5,544 instances of FilterOriginHit, and 732 instances of
FilterOriginMiss, which filters require storing/matching a
single hostname string. Those strings are now stored in the
character buffer of the already existing origin-related
hntrie container. (The same approach is used for plain
patterns which are not part of a bidi-trie.)
There have been too many examples out there of users
opting-in to "I am an advanced user" and yet still misusing
dynamic filtering by creating _allow_ rules where _noop_
rules should be used.
Creating _allow_ rules has serious consequences as these
override blocking static filters and can potentially
disable other advanced filtering ability such as
HTML filtering and scriptlet injection -- often used
to deal with anti-blocker mechanisms.
The ability to point-and-click to create _allow_ rules
from the popup panel is no longer allowed by default.
An new advanced setting has been added to enable
the ability to create _allow_ rules from the popup
panel, `popupPanelGodMode`, which default to `false`.
Set to `true` to restore ability to set _allow_ rules
from popup panel.
Since the creation of _allow_ rules is especially useful
to filter list authors, to diagnose and narrow down site
breakage as a result of problematic blocking filter,
the creation of _allow_ rules will still be available
when the advanced setting `filterAuthorMode` is `true`.
This change is probably going to be problematic to all
those users who were misusing dynamic filtering by
creating _allow_ rules instead of _noop_ rules -- but
the breakage is going to bring their misusing to their
attention, a positive outcome.
Default to `unset`.
To allow users to bypass uBO's default CSS styles in
case they are causing issues to specific users. It is
the responsibility of the user to ensure the value of
`uiStyles` contains valid CSS property declarations.
uBO will assign the value to `document.body.style.cssText`.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1044
For example, in the case of the issue above, one could
set `uiStyles` to `font-family: sans-serif` to force uBO
to the system font for its user interface.
FilterPlainHostname, an atomic filter unit, has been
removed and is being replaced with a composite filter
made of a pattern filter and a filter which test
hostname boundaries.
Doing so enables filters formerly being represented
by FilterPlainHostname to be now represented as a
plain pattern, and thus to be potentially stored in
a bidi-trie.
Comparing the new filter histogram with the previous
one:
FilterPatternPlain 24612 26432 1820
FilterComposite 17656 17125 -531
FilterPlainTrie Content 12977 13519 542
FilterPlainHostname 2904 0 -2904
FilterBucket 2121 1961 -160
FilterPlainTrie 1418 1578 160
Which means:
- An extra 542 patterns could be stored in bidi-tries
- There are 531 less composite filters needed
- An extra 160 buckets could be aggregated into 160
bidi-trie
Memory-wise, it's a marginal gain (as per Chromium's
Javascript VM instance figure) -- i.e. not worth
talking about). CPU-wise, no measurable difference.
The benefit is that I consider this conceptually
simplifies slightly the static network filtering
code base.
The old "classic" popup panel will still be used
when at least one of the following is true:
- advanced setting `uiFlavor` is set to `classic`; or
- the browser is Chromium 65 or older; or
- the browser is Firefox 67 or older
The default configuration of the new popup panel
at installation time is to show the power button,
statistics and the basic tool icons, i.e. access
to dashboard, logger, pickers.
For existing installations, the new popup panel
will be configured by respecting the existing
configuration of the classic one.
The new popup panel is currently already in use
on Firefox for Android, and the visual redesign
was made according to suggestions and feedback
from <https://github.com/brampitoyo> to be
optimal for Firefox for Android.
The new popup panel will allow closing the following
pending issues:
- https://github.com/uBlockOrigin/uBlock-issues/issues/255
- https://github.com/uBlockOrigin/uBlock-issues/issues/178
This commit specifically address bringing the
desktop version of the new popup panel's look
and feel more inline with the classic one:
- Hide tool captions on desktop
- Bring back no-popups switch on desktop
- Bring back tooltips on desktop (though they
are now rendered natively by the browser)
- Use the Photon icons suggested by @brampitoyo
for the no-popups and no-remote-fonts
switches
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/974
Related feedback:
- https://www.reddit.com/r/uBlockOrigin/comments/fuscia/
The race condition was that a content script could
query the main process to retrieve cosmetic filters
while the cosmetic filters had not been yet fully
loaded into memory. The fix ensure that an already
injected content script will re-query once the
cosmetic filters are fully loaded in memory at
browser launch time.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/911
Since cname-uncloaking is available only on Firefox
at the moment, the fix is relevant only to Firefox.
By default uBO will no longer cname-uncloak when it
detects that network requests are being being proxied.
This default behavior can be overriden by setting the
new advanced setting `cnameUncloakProxied` to `true`.
The new setting default to `false`, i.e. cname-uncloaking
is disabled when uBO detects that a proxy is in use.
This new advanced setting may disappear once the
following Firefox issue is fixed:
- https://bugzilla.mozilla.org/show_bug.cgi?id=1618271
This concerns the static network filtering engine.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/943
* * *
New static network filter type: `cname`
By default, network requests which are result of
resolving a canonical name are subject to filtering.
This filtering can be bypassed by creating exception
filters using the `cname` option. For example:
@@*$cname
The filter above tells the network filtering engine
to except network requests which fulfill all the
following conditions:
- network request is blocked
- network request is that of an unaliased hostname
Filter list authors are discouraged from using
exception filters of `cname` type, unless there no
other practical solution such that maintenance
burden become the greater issue. Of course, such
exception filters should be as narrow as possible,
i.e. apply to specific domain, etc.
* * *
New static network filter option: `denyallow`
The purpose of `denyallow` is bring
default-deny/allow-exceptionally ability into static
network filtering arsenal. Example of usage:
*$3p,script, \
denyallow=x.com|y.com \
domain=a.com|b.com
The above filter tells the network filtering engine that
when the context is `a.com` or `b.com`, block all
3rd-party scripts except those from `x.com` and `y.com`.
Essentially, the new `denyallow` option makes it easier
to implement default-deny/allow-exceptionally in static
filter lists, whereas before this had to be done with
unwieldy regular expressions[1], or through the mix of
broadly blocking filters along with exception filters[2].
[1] https://hg.adblockplus.org/ruadlist/rev/f362910bc9a0
[2] Typically filters which pattern are of the
form `|http*://`
New advanced setting: `benchmarkDatasetURL`
Default value: `unset`
To specify a URL from where the benchmark dataset will be
fetched. This allows to launch benchmark operations from
within published versions of uBO, rather than from just
a locally built version.
First draft of changes as discussed with Firefox
Preview people.
In order to allow testing/evaluating these changes,
the new advanced setting `uiFlavor` has been added.
Default to `unset`; and can currently only be set
to `fenix`.
The new setting takes effect at launch only. This
new setting is not to be mentioned in official
documentation for now.
This is ongoing work, not open to external feedback.
Advanced setting `cnameAliasList` has been removed.
New advanced settings:
cnameUncloak:
Boolean
Default value:
true
Description:
Whether to CNAME-uncloak hostnames.
cnameIgnoreExceptions:
Boolean
Default value:
true
Description:
Whether to bypass the uncloaking of network requests
which were excepted by filters/rules. This is
necessary so as to avoid undue breakage by having
exception filters being rendered useless as a result
of CNAME-uncloaking.
For example, `google-analytics.com` uncloaks to
`www-google-analytics.l.google.com` and both hostnames
appear in Peter Lowe's list, which means exception
filters for `google-analytics.com` (to fix site
breakage) would be rendered useless as the uncloaking
would cause the network request to be ultimately
blocked.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/780
Related commit:
- https://github.com/gorhill/uBlock/commit/3a564c199260
This adds two new advanced settings:
- cnameIgnoreRootDocument
- Default to `true`
- Tells uBO to skip CNAME-lookup for root document.
- cnameReplayFullURL
- Default to `false`
- Tells uBO whether to replay the whole URL or just
the origin part of it.
Replaying only the origin part is meant to lower
undue breakage and improve performance by avoiding
repeating the pattern-matching of the whole URL --
which pattern-matching was most likely already
accomplished with the original request.
This commit is meant to explore enabling CNAME-lookup
by default for the next stable release while:
- Eliminating a development burden by removing the
need to create a new filtering syntax to deal with
undesirable CNAME-cloaked hostnames
- Eliminating a filter list maintainer burden by
removing the need to find/deal with all base
domains which engage in undesirable CNAME-cloaked
hostnames
The hope is that the approach implemented in this
commit should require at most a few unbreak rules
with no further need for special filtering syntax
or filter list maintance efforts.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/780
New webext permission added: `dns`, which purpose is
to allow an extension to fetch the DNS record of
specific hostnames, reference documentation:
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/dns
The webext API `dns` is available in Firefox 60+ only.
The new API will enable uBO to "uncloak" the actual
hostname used in network requests. The ability is
currently disabled by default for now -- this is only
a first commit related to the above issue to allow
advanced users to immediately use the new ability.
Four advanced settings have been created to control the
uncloaking of actual hostnames:
cnameAliasList: a space-separated list of hostnames.
Default value: unset => empty list.
Special value: * => all hostnames.
A space-separated list of hostnames => this tells uBO
to "uncloak" the hostnames in the list will.
cnameIgnoreList: a space-separated list of hostnames.
Default value: unset => empty list.
Special value: * => all hostnames.
A space-separated list of hostnames => this tells uBO
to NOT re-run the network request through uBO's
filtering engine with the CNAME hostname. This is
useful to exclude commonly used actual hostnames
from being re-run through uBO's filtering engine, so
as to avoid pointless overhead.
cnameIgnore1stParty: boolean.
Default value: true.
Whether uBO should ignore to re-run a network request
through the filtering engine when the CNAME hostname
is 1st-party to the alias hostname.
cnameMaxTTL: number of minutes.
Default value: 120.
This tells uBO to clear its CNAME cache after the
specified time. For efficiency purpose, uBO will
cache alias=>CNAME associations for reuse so as
to reduce calls to `browser.dns.resolve`. All the
associations will be cleared after the specified time
to ensure the map does not grow too large and too
ensure uBO uses up to date CNAME information.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/756
As per various feedbacks:
Added an advanced setting to keep the original behavior,
which can be potentially costly CPU-wise on some sites:
popupCosmeticFilterBadgeSlow
Default to `false`. Set to `true` to restore original
method of surveying the number of elements hidden as
a result of applying cosmetic filtering.
As suggested by <https://github.com/gwarser>, skip
descendant of nodes which have been found to be a
match in order to potentially increase the number
of nodes which can be surveyed in the alloted time.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/759
This commit adds code to rely less on the state of the
cache storage to decide whether filter lists should be
re-compiled or whether the selfie is currently valid
at launch time when a change in compiled/selfie format
is detected.
Related issues:
- https://github.com/uBlockOrigin/uBlock-issues/issues/761
- https://github.com/uBlockOrigin/uBlock-issues/issues/528
The previous bidi-trie code could only hold filters which
are plain pattern, i.e. no wildcard characters, and which
had no origin option (`domain=`), right and/or left anchor,
and no `csp=` option.
Example of filters that could be moved into a bidi-trie
data structure:
&ad_box_
/w/d/capu.php?z=$script,third-party
||liveonlinetv247.com/images/muvixx-150x50-watch-now-in-hd-play-btn.gif
Examples of filters that could NOT be moved to a bidi-trie:
-adap.$domain=~l-adap.org
/tsc.php?*&ses=
||ibsrv.net/*forumsponsor$domain=[...]
@@||imgspice.com/jquery.cookie.js|$script
||view.atdmt.com^*/iview/$third-party
||postimg.cc/image/$csp=[...]
Ideally the filters above should be able to be moved to a
bidi-trie since they are basically plain patterns, or at
least partially moved to a bidi-trie when there is only a
single wildcard (i.e. made of two plain patterns).
Also, there were two distinct bidi-tries in which
plain-pattern filters can be moved to: one for patterns
without hostname anchoring and another one for patterns
with hostname-anchoring. This was required because the
hostname-anchored patterns have an extra condition which
is outside the bidi-trie knowledge.
This commit expands the number of filters which can be
stored in the bidi-trie, and also remove the need to
use two distinct bidi-tries.
- Added ability to associate a pattern with an integer
in the bidi-trie [1].
- The bidi-trie match code passes this externally
provided integer when calling an externally
provided method used for testing extra conditions
that may be present for a plain pattern found to
be matching in the bidi-trie.
- Decomposed existing filters into smaller logical units:
- FilterPlainLeftAnchored =>
FilterPatternPlain +
FilterAnchorLeft
- FilterPlainRightAnchored =>
FilterPatternPlain +
FilterAnchorRight
- FilterExactMatch =>
FilterPatternPlain +
FilterAnchorLeft +
FilterAnchorRight
- FilterPlainHnAnchored =>
FilterPatternPlain +
FilterAnchorHn
- FilterWildcard1 =>
FilterPatternPlain + [
FilterPatternLeft or
FilterPatternRight
]
- FilterWildcard1HnAnchored =>
FilterPatternPlain + [
FilterPatternLeft or
FilterPatternRight
] +
FilterAnchorHn
- FilterGenericHnAnchored =>
FilterPatternGeneric +
FilterAnchorHn
- FilterGenericHnAndRightAnchored =>
FilterPatternGeneric +
FilterAnchorRight +
FilterAnchorHn
- FilterOriginMixedSet =>
FilterOriginMissSet +
FilterOriginHitSet
- Instances of FilterOrigin[...], FilterDataHolder
can also be added to a composite filter to
represent `domain=` and `csp=` options.
- Added a new filter class, FilterComposite, for
filters which are a combination of two or more
logical units. A FilterComposite instance is a
match when *all* filters composing it are a
match.
Since filters are now encoded into combination of
smaller units, it becomes possible to extract the
FilterPatternPlain component and store it in the
bidi-trie, and use the integer as a handle for the
remaining extra conditions, if any.
Since a single pattern in the bidi-trie may be a
component for different filters, the associated
integer points to a sequence of extra conditions,
and a match occurs as soon as one of the extra
conditions (which may itself be a sequence of
conditions) is fulfilled.
Decomposing filters which are currently single
instance into sequences of smaller logical filters
means increasing the storage and CPU overhead when
evaluating such filters. The CPU overhead is
compensated by the fact that more filters can now
moved into the bidi-trie, where the first match is
efficiently evaluated. The extra conditions have to
be evaluated if and only if there is a match in the
bidi-trie.
The storage overhead is compensated by the
bidi-trie's intrinsic nature of merging similar
patterns.
Furthermore, the storage overhead is reduced by no
longer using JavaScript array to store collection
of filters (which is what FilterComposite is):
the same technique used in [2] is imported to store
sequences of filters.
A sequence of filters is a sequence of integer pairs
where the first integer is an index to an actual
filter instance stored in a global array of filters
(`filterUnits`), while the second integer is an index
to the next pair in the sequence -- which means all
sequences of filters are encoded in one single array
of integers (`filterSequences` => Uint32Array). As
a result, a sequence of filters can be represented by
one single integer -- an index to the first pair --
regardless of the number of filters in the sequence.
This representation is further leveraged to replace
the use of JavaScript array in FilterBucket [3],
which used a JavaScript array to store collection
of filters. Doing so means there is no more need for
FilterPair [4], which purpose was to be a lightweight
representation when there was only two filters in a
collection.
As a result of the above changes, the map of `token`
(integer) => filter instance (object) used to
associate tokens to filters or collections of filters
is replaced with a more efficient map of `token`
(integer) to filter unit index (integer) to lookup a
filter object from the global `filterUnits` array.
Another consequence of using one single global
array to store all filter instances means we can reuse
existing instances when a logical filter instance is
parameter-less, which is the case for FilterAnchorLeft,
FilterAnchorRight, FilterAnchorHn, the index to these
single instances is reused where needed.
`urlTokenizer` now stores the character codes of the
scanned URL into a bidi-trie buffer, for reuse when
string matching methods are called.
New method: `tokenHistogram()`, used to generate
histograms of occurrences of token extracted from URLs
in built-in benchmark. The top results of the "miss"
histogram are used as "bad tokens", i.e. tokens to
avoid if possible when compiling filter lists.
All plain pattern strings are now stored in the
bidi-trie memory buffer, regardless of whether they
will be used in the trie proper or not.
Three methods have been added to the bidi-trie to test
stored string against the URL which is also stored in
then bidi-trie.
FilterParser is now instanciated on demand and
released when no longer used.
***
[1] 135a45a878/src/js/strie.js (L120)
[2] e94024d350
[3] 135a45a878/src/js/static-net-filtering.js (L1630)
[4] 135a45a878/src/js/static-net-filtering.js (L1566)
This commits make it so that `csp=` filters
are now stored in the same data structures as
all other static network filters rather than
being stored in a separate one.
This internal change is motivated by the wish
to bring session filters to the static network
filtering engine, as has already been done for
the static extended filtering engine in the
following commit:
59c9a34d34
This is a feature under development, hidden behind
a new advanced setting, `filterAuthorMode` which
default to `false`.
Ability to point-and-click to create temporary
exception filters for static extended filters (i.e.
cosmetic, scriptlet & html filters) from within
the summary pane in the logger. The button to
toggle on/off temporary exception filter is
labeled `#@#`.
The created exceptions are temporary and will be
lost when restarting uBO, or manually toggling off
the exception filters.
Creating temporary exception filters does not
cause the filter lists to reloaded, and thus there
is no overhead in creating/removing these temporary
exception filters.
Related documentation:
- https://help.eyeo.com/en/adblockplus/how-to-write-filters#element-hiding
Related feedback/discussion:
- https://www.reddit.com/r/uBlockOrigin/comments/d6vxzj/
The `elemhide` filter option as per ABP semantic is
now supported. Previously uBO would consider `elemhide`
to be an alias of `generichide`.
The support of `elemhide` is through the convenient
conversion of `elemhide` option into existing
`generichide` option and new `specifichide` option.
The purpose of the new `specifichide` filter option
is to disable all specific cosmetic filters, i.e.
those who target a specific site.
Additionally, for convenience purpose, the filter
options `generichide`, `specifichide` and `elemhide`
can be aliased using the shorter forms `ghide`,
`shide` and `ehide` respectively.
No need to store mouse coordinates in background
page, thus no need to post mouse coordinates
information for every click.
Rename/group element picker arguments and popup
arguments separately.
Related commit & feedback:
- 7ff750eaf6
The color value for the icon badge is now
"attached" to the blocking profile value.
Additionally, as per feedback, `3p` rules
will be relaxing before master JavaScript
switch rules.
Related feedback:
- https://www.reddit.com/r/uBlockOrigin/comments/cmh910/
Additionally, the `3p` rule has been made distinct from
`3p-script`/`3p-frame` for the purpose of
"Relax blocking mode" command.
The badge color will hint at the current blocking mode.
There are four colors for the four following blocking
modes:
- JavaScript wholly disabled
- All 3rd parties blocked
- 3rd-party scripts and frames blocked
- None of the above
The default badge color will be used when JavaScript is not
wholly disabled and when there are no rules for `3p`,
`3p-script` or `3p-frame`.
A new advanced setting has been added to let the user choose
the badge colors for the various blocking modes,
`blockingProfileColors`. The value *must* be a sequence of
4 valid CSS color values that match 6 hexadecimal digits
prefixed with`#` -- anything else will be ignored.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/682#issuecomment-515197130
The following advanced setting has been added:
updateAssetBypassBrowserCache
Default to `false`. If set to `true`, uBO will ensure the
browser cache is bypassed when fetching a remote resource.
This is for the convenience of filter list maintainers who
may want to test the latest version of their lists when
fetched from their remote location.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/371
By default, no specific keyboard shortcut is predefined,
this will have to be assigned by the user. The command
name in English is "Toggle blocking profile".
The default behavior is to toggle down according to one
of the following scenarios.
a) If script execution is disabled through the no-scripting
switch, the no-scripting switch will be locally toggled
so as to allow script execution. The page will be
automatically reloaded.
b) If script execution is not blocked but the 3rd-party
script and/or frame cells are blocked, local no-op rules
will be set so as to no longer block 3rd-party scripts
and/or frames. The page will be automatically reloaded.
Given this, it may take more than one toggle down command
to reach the lowest blocking profile, which is one where
JavaScript execution is not blocked and 3rd-party scripts
and frames resources block rules, if any, are bypassed
with local no-op rules.
TODO: At this point, I haven't yet decided whether
toggling from the lowest profile should restore the
original highest blocking profile.
The bidirectional trie allows storing the right
and left parts of a string into a trie given a
pivot position.
Releated issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528
Additionally, the mandatory token-at-index-0 rule
for FilterPlainHnAnchored has been lifted, thus
allowing the engine to pick a potentially better token
at any position in the filter string.
***
TODO: Eventually rename `strie.js` to `biditrie.js`.
TODO: Fix dump() method, it currently only show the
right-hand side of a filter string.
Related issue:
- https://github.com/gorhill/uBlock/issues/2394
Additionally, I added a new advanced setting to control
how long after launch an auto-update session should be
started -- value is in seconds:
autoUpdateDelayAfterLaunch 180
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/131
The new advanced setting and its default value is:
allowGenericProceduralFilters false
Whenever this setting is toggled, the user is responsible
of forcing a reload of all filter lists so as to allow uBO
to process differently any existing generic procedural
cosmetic filters.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/214
Built-in whitelist directives are now rendered differently
than user-defined whitelist directives. Also, removing a
built-in whitelist directive will only cause that directive
to be commented out, so that users do not have to remember
built-in directives should they want to bring them back.
Related issue:
https://github.com/uBlockOrigin/uBlock-issues/issues/494
The built-in per-site switch rule
`no-scripting: behind-the-scene false` has been removed,
it should not ever be needed since there will always be a
valid root context for main- and sub-frames.
This was a TODO item:
- 07cbae66a4/src/js/cosmetic-filtering.js (L375)
µBlock.staticExtFilteringEngine.HostnameBasedDB has been
re-factored to accomodate the storing of specific cosmetic
filters.
As a result of this refactoring:
- Memory usage has been further decreased
- Performance of selector retrieval marginally
improved
- New internal representation opens the door
to use a specialized version of HNTrie, which
should further improve performance/memory
usage
The `null` placeholder are not necessary, we can just use
default arguments instead, and add the HNTrieContainer
references if and only if they are instanciated.
Consider the two following filters:
example.com
www.example.com
This commit make it so that if the first filter is
already present in a given HNTrie, the second filter
will not be stored, since HNTrie will _always_
return the first filter as a match whenever the
hostname to match is example.com or any subdomain
of example.com.
The detection of such pointless filters is
virtually free when adding a hostname to an HNTrie
instance (given how data is stored in the trie), so
in practice no overhead is incurred to detect such
pointless filters.
The ability to ignore impossible to match filters
in HNTrie instances will _especially_ benefit those
using large hosts files.
Examples of how this helps using real configurations:
- Default lists:
444 filters out of 100,382 were ignored as a result
of this commit.
- Default lists + "Energized Ultimate Protection":
283,669 filters out of 903,235 were ignored as a
result of this commit.
Side note: There was no measurable difference between
the two configurations above in the performance of
the matching algorithm as reported by the built-in
benchmark tool.
The staticNetFilteringEngine uses token hashes to store/lookup
filters into Map objects.
Before this commit, the tokens were encoded into token hashes
as JS numbers (not exceeding MAX_SAFE_INTEGER) using at most
the 8 first characters of the token.
With this commit, token hashes are now restricted to fit
into 32-bit integers, and are derived from at most the 7 first
characters. This improves filter look-up performance as per
built-in benchmark().
Related commit:
- 69a43e07c4
Using 32 bits of token hash rather than just the 16 lower
bits does help discard more unknown tokens.
Using the default filter lists, the known-token lookup
table is populated by 12,276 entries, out of 65,536, thus
making the case that theoretically there is a lot of
possible tokens which can be discarded.
In practice, running the built-in
staticNetFilteringEngine.benchmark() with default filter
lists, I find that 1,518,929 tokens were skipped out of
4,441,891 extracted tokens, or 34%.
Given that all tokens extracted from one single URL are potentially
iterated multiple times in a single URL-matching cycle, it pays to
ignore extracted tokens which are known to not be used anywhere in
the static filtering engine.
The gain in processing a single network request in the static
filtering engine can become especially high when dealing with
long and random-looking URLs, which URLs have a high likelihood
of containing a majority of tokens which are known to not be in
use.
Related commit:
- 99390390fc
The token information available at compile time can be stored
in the filter to be used at match() time. This allows the use of
startsWith() rather than a more costly indexOf() call as a first
quick test to detect mismatches.
Performance- and memory-related work. Three more classes have
been created to avoid regex-based filters internally.
Purpose is to enforce filters which have only one single
wildcard in their pattern, a common occurrence. The filter
pattern is split in two literal string segments.
Similar as above, with the added condition that the filter is
hostname-anchored (`||`). The "Wildcard2" variant is a further
specialization to enforce filters where the only wildcard
is immediately preceded by the `^` special character, again
a very common occurrence.
Using two literal string segments in lieu of regexes allows to
quickly detect a mismatch by just testing the first segment.
Additionally, this reduces memory footprint as regexes are
much more expensive memory-wise than plain strings.
These three new filter classes allow to replace the use of
5276 regex-based filters internally with plain string-based
filters.
Often-called isHnAnchored() has been further fine-tuned to
avoid as much work as possible. I have also observed that
using an arrow function for closure-purpose helps measurably
performance, as per built-in benchmark.
The purpose of using a custom base128 encoder is to
convert array buffers into strings, to allow a direct
string-to-array buffer conversion at load time:
string => array buffer
Whereas a JSON array would require an extra step:
JSON array as string => JS array => array buffer
Turns out that the current use of a custom base128 encoding
results in a significantly larger selfie storage usage when
converting array buffers into strings.
Speculation: possibly the browser convert the strings to
save into JSON strings internally. Since the custom base128
encoder is likely to cause the resulting string to contain
a lot of unprintable ASCII characters, these will need to
be escaped when converted to JSON -- escaped characters
occupy more space than non-escaped ones.
Using a sequence of base 64 numbers means only printable
will be present in the output string, hence no escaping
necessary. I have observed significant reduction in
storage usage for selfie purpose.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
Implement a plain string trie container class: STrieContainer.
Make use of STrieContainer where beneficial
Some filter buckets can grow quite large, and in such case
coalescing "trieable" filter classes into a single trie reduces
lookup performance and memory usage.
For instance, at time of commit, the filter bucket for the
`ad` keyword contains 919 entries[1].
Coalescing trieable filters of the same class into a single plain
string trie reduced the size of the bucket into 50 entries + two
tries which are scanned only once each whenever the bucket is
visited.
[1] Enter the following code at uBO's dev console:
µBlock.staticNetFilteringEngine.categories.get(0).get(µBlock.urlTokenizer.tokenHashFromString('ad'))
Refactor static network filtering engine code to make use of
ES6's syntactic sugar `class`.
Change first auto-update run from 7 to 5 minutes.
The value of `suspendTabsUntilReady` was disregarded in Firefox and
uBO defaulted to always defer tab loading until it was ready.
This commit allows to disable the deferring of tab loading in
Firefox. The new valid values for `suspendTabsUntilReady` are:
- `unset`: leave it to the platform to pick the optimal
behavior (default)
- `no`: do no suspend tab loading at launch time
- `yes`: suspend tab loading at launch time
This removes the derivation of FilterOrigin flavors from
FilterOrigin itself and simplify code paths. FilterOrigin
flavors are small specialized classes, no need to
overcomplicate with derivation.
Specifically, this removes an indirect call to reach the
match() method.
The motivation is to address the higher peak memory usage at launch
time with 3rd-gen HNTrie when a selfie was present.
The selfie generation prior to this change was to collect all
filtering data into a single data structure, and then to serialize
that whole structure at once into storage (using JSON.stringify).
However, HNTrie serialization requires that a large UintArray32 be
converted into a plain JS array, which itslef would be indirectly
converted into a JSON string. This was the main reason why peak
memory usage would be higher at launch from selfie, since the JSON
string would need to be wholly unserialized into JS objects, which
themselves would need to be converted into more specialized data
structures (like that Uint32Array one).
The solution to lower peak memory usage at launch is to refactor
selfie generation to allow a more piecemeal approach: each filtering
component is given the ability to serialize itself rather than to be
forced to be embedded in the master selfie. With this approach, the
HNTrie buffer can now serialize to its own storage by converting the
buffer data directly into a string which can be directly sent to
storage. This avoiding expensive intermediate steps such as
converting into a JS array and then to a JSON string.
As part of the refactoring, there was also opportunistic code
upgrade to ES6 and Promise (eventually all of uBO's code will be
proper ES6).
Additionally, the polyfill to bring getBytesInUse() to Firefox has
been revisited to replace the rather expensive previous
implementation with an implementation with virtually no overhead.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/399
The advanced setting `cacheStorageAPI` has been added to allow
a user to force the use of IndexedDB as cache storage. Set to
`IndexedDB` to force use of IndexedDB. Default to `unset`.
Related issues:
- https://github.com/uBlockOrigin/uBlock-issues/issues/372
- https://github.com/gorhill/uBlock/issues/93
A new advanced settings has been added: `autoCommentFilterTemplate`.
Default value is `{{date}} {{origin}}`.
Placeholders are identified by `{{...}}`. There are currently
only three placeholders supported:
- `{{date}}`: will be replaced with current date
- `{{time}}`: will be replaced with current time
- `{{origin}}`: will be replaced with site information on which
the filter(s) was created
If no placeholder is found in `autoCommentFilterTemplate`, this
will disable auto-commenting. So one can use `-` to disable
auto-commenting.
Additionally, if auto-commenting is enabled, uBO will not emit a
comment if an emitted comment would be a duplicate of the last
one found in the user filter list.
commit 7c6cacc59b27660fabacb55d668ef099b222a9e6
Author: Raymond Hill <rhill@raymondhill.net>
Date: Sat Nov 3 08:52:51 2018 -0300
code review: finalize support for wasm-based hntrie
commit 8596ed80e3bdac2c36e3c860b51e7189f6bc8487
Merge: cbe1f2e 000eb82
Author: Raymond Hill <rhill@raymondhill.net>
Date: Sat Nov 3 08:41:40 2018 -0300
Merge branch 'master' of github.com:gorhill/uBlock into trie-wasm
commit cbe1f2e2f38484d42af3204ec7f1b5decd30f99e
Merge: 270fc7f dbb7e80
Author: Raymond Hill <rhill@raymondhill.net>
Date: Fri Nov 2 17:43:20 2018 -0300
Merge branch 'master' of github.com:gorhill/uBlock into trie-wasm
commit 270fc7f9b3b73d79e6355522c1a42ce782fe7e5c
Merge: d2a89cf d693d4f
Author: Raymond Hill <rhill@raymondhill.net>
Date: Fri Nov 2 16:21:08 2018 -0300
Merge branch 'master' of github.com:gorhill/uBlock into trie-wasm
commit d2a89cf28f0816ffd4617c2c7b4ccfcdcc30e1b4
Merge: d7afc78 649f82f
Author: Raymond Hill <rhill@raymondhill.net>
Date: Fri Nov 2 14:54:58 2018 -0300
Merge branch 'master' of github.com:gorhill/uBlock into trie-wasm
commit d7afc78b5f5675d7d34c5a1d0ec3099a77caef49
Author: Raymond Hill <rhill@raymondhill.net>
Date: Fri Nov 2 13:56:11 2018 -0300
finalize wasm-based hntrie implementation
commit e7b9e043cf36ad055791713e34eb0322dec84627
Author: Raymond Hill <rhill@raymondhill.net>
Date: Fri Nov 2 08:14:02 2018 -0300
add first-pass implementation of wasm version of hntrie
commit 1015cb34624f3ef73ace58b58fe4e03dfc59897f
Author: Raymond Hill <rhill@raymondhill.net>
Date: Wed Oct 31 17:16:47 2018 -0300
back up draft work toward experimenting with wasm hntries
<https://github.com/gorhill/uBlock/issues/3436>: a new per-site switch
has been added, no-scripting, which purpose is to wholly disable/enable
javascript for a given site. This new switch has precedence over all
other ways javascript can be disabled, including precedence over dynamic
filtering rules.
The popup panel will report the number of script resources which have
been seen by uBO for the current page. There is a minor inaccuracy to
be fixed regarding the count, and which fix requires to extend request
journaling.
<https://github.com/gorhill/uBlock/issues/308>: the `noscript` tags will
now be respected when the new no-scripting switch is in effect on a given
site.
A default setting has been added to the _Settings_ pane to
disable/enable globally the new no-script switch, such that one can
work in default-deny mode regarding javascript execution.
<https://github.com/uBlockOrigin/uBlock-issues/issues/155>: a new
hidden setting, `requestJournalProcessPeriod`, has been added to
allow controlling the delay before uBO internally process it's
network request journal queue. Default to 1000 (milliseconds).
Squashed commit of the following:
commit 6a8473822537636ac54d5dabdb14472114bb730b
Author: Raymond Hill <rhill@raymondhill.net>
Date: Mon Aug 6 10:56:44 2018 -0400
remove remnant of snappyjs and spurious instruction
commit 9a4b709bee97d3cc2235fab602359fa5953bdb46
Author: Raymond Hill <rhill@raymondhill.net>
Date: Mon Aug 6 09:48:58 2018 -0400
make cache storage compression optionally available on all platforms
New advanced setting: `cacheStorageCompression`. Default is `false`.
commit 22ee6547f2f7c9c5aefe25dea1262a1b31612155
Author: Raymond Hill <rhill@raymondhill.net>
Date: Sun Aug 5 19:16:26 2018 -0400
remove Chromium from lz4 experiment
commit ee3e201c45afe983508f70713a2d43af74737d8d
Author: Raymond Hill <rhill@raymondhill.net>
Date: Sun Aug 5 18:52:43 2018 -0400
import lz4-block-codec.wasm library
commit 883a3118efcfd749c82356fde7134754d6ae371d
Author: Raymond Hill <rhill@raymondhill.net>
Date: Sun Aug 5 18:50:46 2018 -0400
implement storage compression through lz4-wasm [draft]
commit 48d1ccaba407de447c2cd6747dc3a90839c260a7
Merge: 8ae77e6 b34c897
Author: Raymond Hill <rhill@raymondhill.net>
Date: Sat Aug 4 08:56:51 2018 -0400
Merge branch 'master' of github.com:gorhill/uBlock into lz4
commit 8ae77e6aeeaa85af335e664c2560d2afd37288c6
Author: Raymond Hill <rhill@raymondhill.net>
Date: Wed Jul 25 18:17:45 2018 -0400
experiment with compression
- collate together specific filters with same base domain
- replace string-based hash to integer-based hash
- revisit code to benefit from ES6-specific syntax
- Default to being detached
- Default to "Current tab"
- Append current tab title to "Current tab" entry
- Avoid iterating through all tabs when no change