1
0
mirror of https://github.com/gorhill/uBlock.git synced 2024-09-29 22:27:12 +02:00
Commit Graph

236 Commits

Author SHA1 Message Date
Raymond Hill
0e3071dd50
Add filterLists property to managed storage
The entry `toOverwrite.filterLists` is an array of
string, where each string is a token identifying a
stock filter list, or a URL for an external filter
list.

This new entry is to make it easier for an
administrator to centrally configure uBO with a
custom set of filter lists.
2021-01-08 09:18:26 -05:00
Raymond Hill
e4e7cbc78f
Use better identifying name for overview panel 2021-01-07 08:19:47 -05:00
Raymond Hill
cc9c45f1e4
Adding to and further reviewing admin-managed settings 2021-01-06 11:39:24 -05:00
Raymond Hill
c1130ec843
Add support for admin-managed hidden settings
Related discussion:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1437#issuecomment-754127066
2021-01-05 12:16:50 -05:00
Raymond Hill
b28acfccbc
Add "extraTrustedSiteDirectives" as new admin policy
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1433

The new "extraTrustedSiteDirectives" policy is an array
of strings, each of which is parsed as a trusted-site
directive to append to a user's own set of trusted-site
directives at launch time.

The added trusted-site directives will be considered as
part of the default set of directives by uBO.
2021-01-04 07:54:24 -05:00
Raymond Hill
5d7b2918ef
Harden processing of changes in compiled list format
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1365

This commit adds the compiled magic version number to the
compiled data itself, and consequently this allows uBO
to no longer require that any given compiled list with a
mismatched format to be detected and discarded at launch
time.

Given this change, uBO no longer needs to rely on the
deletion of cached data at launch time to ensure it
won't use no longer valid compiled lists.
2020-12-08 10:00:47 -05:00
Raymond Hill
e8e4a1ac74
Wait for removal of storage entries to be completed
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1365

When compiled data format changes, do not rely on order
of operations at launch to assume deletion of storage
occurs before attempts to access it. It's unclear this
commit will fix the reported issue, as I could not
reproduce it except when outright commenting out the code
to prevent the storage deletion from occurring.
2020-12-04 06:17:18 -05:00
Raymond Hill
ee2fd45f00
Ensure we do not extract truncated URL for Homepage directive
Related feedback:
- b12e0e05ea (commitcomment-44309540)
2020-11-18 12:14:23 -05:00
Raymond Hill
b12e0e05ea
Extract Homepage URL from a list when present
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1346

Additionally, fixed a case of filter list being compiled
twice at subscription time.
2020-11-18 10:02:22 -05:00
Raymond Hill
2cfeaddbed
Fine tune various static filtering code
Notably, make `queryprune` option available only
to filter list authors, until there are guards
against bad filters in some future and until the
option syntax and behavior is fully settled.

Instances of `queryprune` in filter lists will be
compiled, however instances of `queryprune` in
_"My filters"_ will be ignored unless users
indicated they are a filter list author.
2020-11-13 09:23:25 -05:00
Raymond Hill
0196993828
Use buffer-like approach for filterUnits array
filterUnits is now treated as a buffer which is
pre-allocated and which will grow in chunks so as
to minimize memory allocations. Entries are never
released, just null-ed.

Additionally, move urlTokenizer into the static
network filtering engine, since it's not used
anywhere else.
2020-11-09 06:54:51 -05:00
Raymond Hill
4f00c08f6b
Fix detection of already present comment
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1281

Related feedback:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1281#issuecomment-705081202
2020-10-07 14:23:57 -04:00
Raymond Hill
46ec969411
Add ability to use full URL in auto-generated comment
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1281

New supported placeholder: `{{url}}`, which will be
replaced by the full URL of the page for which a filter
is created.
2020-10-07 11:52:38 -04:00
Raymond Hill
f4aebc9390
Backup/restore only modified advanced settings
This reduces the size of the backup file and also
ensures that default values can be changed.
2020-10-03 12:34:21 -04:00
Raymond Hill
4c7635514a
Fine tuning changes to click-to-subscribe code
Related feedback:
- https://github.com/uBlockOrigin/uBlock-issues/issues/763#issuecomment-691682195

Additionally, enable an existing subscription when
subscribing again to it.
2020-09-13 11:44:42 -04:00
Raymond Hill
d4182add6e
Add ability to outright remove/ignore "really bad lists"
In addition to what is deemed really bad lists by consensus,
some lists will also be labelled "really bad list"
temporarily so as to force-remove them from the set of
filter lists.

This will be the case for filter lists which are not
necessarily "bad lists" but which were once part of
uBO's stock filter lists and have been removed since
then for various reasons.

This will ensure that the majority of users who do not
modifies uBO's default listset will still have a
configuration which matches the official default listset.
2020-09-09 09:57:29 -04:00
Raymond Hill
0e9d4714e9
Mibor: better variable name 2020-08-24 12:40:36 -04:00
Raymond Hill
4150c17f4a
Add concept of "really bad list" to badlists infrastructure
This commit adds concept of "really bad list" to the
badlists infrastructure. Really bad lists won't be
fetched from a remote server, while plain bad list
will be fetched but won't be compiled.

A really bad list is denoted by the `nofetch` token
following the URL.

Really bad lists can cause more serious issues such
as causing undue launch delays because the remote
server where a really bad list is hosted fails to
respond properly and times out.

Such an example of really bad list is hpHosts which
original server no longer exist.
2020-08-22 08:43:16 -04:00
Raymond Hill
23f08f0274
Add support for blocklist of filter lists
Many filter lists are known to cause serious filtering
issues in uBO and are not meant to be used in uBO.

Unfortunately, unwitting users keep importing these
filter lists and as a result this ends up causing
filtering issues for which the resolution is always
to remove the incompatible filter list.

Example of inconpatible filter lists:
- Reek's Anti-Adblock Killer
- AdBlock Warning Removal List
- ABP anti-circumvention filter list

uBO will use the following resource to know
which filter lists are incompatible:
- https://github.com/uBlockOrigin/uAssets/blob/master/filters/badlists.txt

Incompatible filter lists can still be imported into
uBO, useful for asset-viewing purpose, but their content
will be discarded at compile time.
2020-08-21 11:57:20 -04:00
Raymond Hill
24ef0cb753
Fix typo in comment 2020-08-13 09:40:43 -04:00
Raymond Hill
00b790ce72
Add support for more !#if pre-parser directive tokens
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1205
2020-08-13 09:32:34 -04:00
C0rn3j
3fed25a52d
Use ISO8061 dates in filter comments 2020-08-03 10:30:36 -04:00
Raymond Hill
e44a568278
Add CoreMirror's code-folding ability to list editor/viewer
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1134

CodeMirror's code folding reference:
- https://codemirror.net/doc/manual.html#addon_foldcode

This commit adds support for code-folding to the filter
list editor/viewer.

The following blocks of code are foldable by clicking the
corresponding marker in the gutter:

- !#if/#endif blocks
- !#include blocks

Addtionally, the following changes:

- The `!#include` line is now preserved when importing a
  sublist
- The `!#if` directives will be syntax-colored according
  to whether they evaluate to true or false on the current
  platform
- Double-clicking on a foldable line in the gutter will
  select the content of the foldable block
- Minor visual improvement to matching brackets
2020-07-10 08:01:39 -04:00
Raymond Hill
ebf7fb145e
Fine tune auto-completion for !#if directives
Auto-completion will work only for uBO's own
tokens, compatibility-related tokens[1] will not be
taken into account for auto-completion.

The reason is to not have the compatibility-related
tokens get in the way of auto-completion in order
to not inconvenience uBO's filter list maintainers.

[1] `adguard_ext_chromium`, `adguard_ext_firefox`,
    etc.
2020-07-09 08:09:51 -04:00
Raymond Hill
83c01fb352
Add syntax highlighting/auto-completion for preparsing directives
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1134

Invalid values for `!#if ...` will be highlighted as errors.

Auto completion is now supported for both the directives
themselves and the valid values for `!#if ...`.

For examples, when pressing ctrl-space:

- `!#e` will auto-complete to `!#endif`
- `!#i` will offer to choose between `!#if ` or `!#include `
- `!#if fir` will auto-complete to `!#if env_firefox`

Additionally, support for some of AdGuard preparsing
directives, i.e. `!#if adguard` is now a valid and will be
honoured -- it always evaluate to `false` in uBO.
2020-07-08 09:52:27 -04:00
Raymond Hill
18a5f41a04
Better processing of Expires directive in filter list
In case of invalid `Expires` value -- i.e. `NaN` -- do
not use `1` as default value, just let uBO pick the
value according to the global default (which is `5` as
of commit time).
2020-07-06 08:31:53 -04:00
Raymond Hill
0da34f7edf
Handle properly Unicode characters in static network filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/772

Unicode characters inside hostname part of a filter will
be converted to punycode.

Unicode characters anywhere else in the pattern will be
percent-encoded.

Unicode characters which cannot be encoded will cause a
filter to be invalid.
2020-07-04 14:47:33 -04:00
Raymond Hill
aab3812089
Ignore !#include directives within inactive !#if/!#endif blocks
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1113
2020-07-03 08:43:40 -04:00
Raymond Hill
bc7f149252
Minor code review of static parser code 2020-06-09 11:58:27 -04:00
Raymond Hill
01b1ed9a98
Add a new static filtering parser
A new standalone static filtering parser is introduced,
vAPI.StaticFilteringParser. It's purpose is to parse
line of text into representation suitable for
compiling filters. It can additionally serves for
syntax highlighting purpose.

As a side effect, this solves:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1038

This is a first draft, there are more work left to do
to further perfect the implementation and extend its
capabilities, especially those useful to assist filter
authors.

For the time being, this commits break line-continuation
syntax highlighting -- which was already flaky prior to
this commit anyway.
2020-06-04 07:18:54 -04:00
Raymond Hill
4687c60bf9
Support fetching assets from CDNs when auto-updating
This commit add the ability to fetch from CDN servers
when an asset is fetched as a result of auto-update.

If an asset has a `cdnURLs` entry in `assets.json`,
the asset will be auto-updated using one of those
CDN URLs. When many CDN URLs are specified, those
URLs will be shuffled in order to spread the bandwidth
across all specified CDN servers. If all specified CDN
servers fail to respond, uBO will fall back to usual
`contentURLs` entry.

The `cdnURLs` are used only when an asset is
auto-updated, this ensures a user will get the more
recent available version of an asset when manually
updating.

The motivation of this new feature is to relieve
GitHub from acting as a CDN (which it is not) for
uBO -- an increasing concern with the growing adoption
of uBO along with the growing size of key uBO assets.
2020-04-08 09:57:55 -04:00
Raymond Hill
11d24abea0
Move proxy-detection code to Firefox-specific code
Related commit:
- https://github.com/uBlockOrigin/uBlock-issues/issues/911

The motivation is to avoid executing code which is
unnecessary on platforms not supporting the browser.dns
API.
2020-03-23 13:31:43 -04:00
Raymond Hill
3f7ece9469
Do not cname-uncloak when a proxy is in use
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/911

Since cname-uncloaking is available only on Firefox
at the moment, the fix is relevant only to Firefox.

By default uBO will no longer cname-uncloak when it
detects that network requests are being being proxied.

This default behavior can be overriden by setting the
new advanced setting `cnameUncloakProxied` to `true`.
The new setting default to `false`, i.e. cname-uncloaking
is disabled when uBO detects that a proxy is in use.

This new advanced setting may disappear once the
following Firefox issue is fixed:
- https://bugzilla.mozilla.org/show_bug.cgi?id=1618271
2020-03-22 14:52:58 -04:00
Raymond Hill
ca80d2826b
Add indentation requirement for line continuation
Related commit:
- https://github.com/gorhill/uBlock/commit/703c525b01aa

This adds an indentation requirement for line
continuation to take place. The conditions are now
as follow:
- Current line ends with ` \`: ASCII space + backslash
- Next line starts with `    `: four ASCII spaces
2020-03-15 08:15:17 -04:00
Raymond Hill
703c525b01
Support line continuation in filter lists
If a line in a filter list ends with a space
(ASCII code 32) followed by a backslash
(ASCII code 92), those two characters will be
removed, the line will be trimmed and the next
line will be trimmed and concatenated to form
a new, longer line.

The purpose is to give filter list authors
a way to visually break apart unduly long
filters and thus make maintenance easier.

When line continuation is used, it is suggested
that the extra lines are prepended with four
space so as to make it more visually obvious that
the extra line(s) are the continuation of a
previous line.

Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/943

The filter referenced in the above issue was the
motivation to implement this feature:
- https://hg.adblockplus.org/ruadlist/rev/f362910bc9a0

I verified and could not find any instance in major
filter lists of lines ending with ` \`, thus the
change should be safe.
2020-03-14 13:34:13 -04:00
Raymond Hill
3621792f16
Rework/remove remnant of code dependent on localStorage
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/899
2020-02-23 12:18:45 -05:00
Raymond Hill
15470bcbdc
Ensure disableWebAssembly setting is loaded before use
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/899

WASM modules are now loaded on demand rather than at
script evaluation time.
2020-02-22 13:36:22 -05:00
JustOff
a806dd4bd2
Add env_legacy to the pre-processor supported tokens (#3768)
This will allow specifically target uBlock Origin for Firefox legacy-based browsers in shared filter lists,
see https://github.com/gorhill/uBlock-for-firefox-legacy/pull/1.
2020-02-09 08:23:00 -05:00
Raymond Hill
651955b97c
Throw if mismatched size when unserializing an array buffer
An exception will be thrown if the length of an unserialized
array buffer does not match exactly the original size at
serialization time.
2020-02-04 09:55:02 -05:00
Raymond Hill
f8ec54c635
Fix compatibility issue with hosts files
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/847

IP address `0` is a valid synonym of `0.0.0.0`.
2020-01-10 11:07:31 -05:00
Raymond Hill
91e702cebb
Enable CNAME uncloaking by default
Advanced setting `cnameAliasList` has been removed.

New advanced settings:

cnameUncloak:
  Boolean
Default value:
  true
Description:
  Whether to CNAME-uncloak hostnames.

cnameIgnoreExceptions:
  Boolean
Default value:
  true
Description:
  Whether to bypass the uncloaking of network requests
  which were excepted by filters/rules. This is
  necessary so as to avoid undue breakage by having
  exception filters being rendered useless as a result
  of CNAME-uncloaking.
  For example, `google-analytics.com` uncloaks to
  `www-google-analytics.l.google.com` and both hostnames
  appear in Peter Lowe's list, which means exception
  filters for `google-analytics.com` (to fix site
  breakage) would be rendered useless as the uncloaking
  would cause the network request to be ultimately
  blocked.
2019-12-01 12:05:49 -05:00
Raymond Hill
e98a4b1ace
Discard :: from parsed hosts files
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/800
2019-12-01 09:15:25 -05:00
Raymond Hill
a16e4161de
Fine tune hostname uncloaking through CNAME-lookup
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/780

Related commit:
- https://github.com/gorhill/uBlock/commit/3a564c199260

This adds two new advanced settings:

- cnameIgnoreRootDocument
  - Default to `true`
  - Tells uBO to skip CNAME-lookup for root document.

- cnameReplayFullURL
  - Default to `false`
  - Tells uBO whether to replay the whole URL or just
    the origin part of it.
    Replaying only the origin part is meant to lower
    undue breakage and improve performance by avoiding
    repeating the pattern-matching of the whole URL --
    which pattern-matching was most likely already
    accomplished with the original request.

This commit is meant to explore enabling CNAME-lookup
by default for the next stable release while:

- Eliminating a development burden by removing the
  need to create a new filtering syntax to deal with
  undesirable CNAME-cloaked hostnames

- Eliminating a filter list maintainer burden by
  removing the need to find/deal with all base
  domains which engage in undesirable CNAME-cloaked
  hostnames

The hope is that the approach implemented in this
commit should require at most a few unbreak rules
with no further need for special filtering syntax
or filter list maintance efforts.
2019-11-23 13:07:23 -05:00
Raymond Hill
68e1b58bb6
Trim trailing spaces from string values in advanced settings 2019-11-20 11:45:10 -05:00
Raymond Hill
3a564c1992
Add ability to uncloak CNAME records
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/780

New webext permission added: `dns`, which purpose is
to allow an extension to fetch the DNS record of
specific hostnames, reference documentation:

https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/dns

The webext API `dns` is available in Firefox 60+ only.

The new API will enable uBO to "uncloak" the actual
hostname used in network requests. The ability is
currently disabled by default for now -- this is only
a first commit related to the above issue to allow
advanced users to immediately use the new ability.

Four advanced settings have been created to control the
uncloaking of actual hostnames:

cnameAliasList: a space-separated list of hostnames.
Default value: unset => empty list.
Special value: * => all hostnames.
A space-separated list of hostnames => this tells uBO
to "uncloak" the  hostnames in the list will.

cnameIgnoreList: a space-separated list of hostnames.
Default value: unset => empty list.
Special value: * => all hostnames.
A space-separated list of hostnames => this tells uBO
to NOT re-run the network request through uBO's
filtering engine with the CNAME hostname. This is
useful to exclude commonly used actual hostnames
from being re-run through uBO's filtering engine, so
as to avoid pointless overhead.

cnameIgnore1stParty: boolean.
Default value: true.
Whether uBO should ignore to re-run a network request
through the filtering engine when the CNAME hostname
is 1st-party to the alias hostname.

cnameMaxTTL: number of minutes.
Default value: 120.
This tells uBO to clear its CNAME cache after the
specified time. For efficiency purpose, uBO will
cache alias=>CNAME associations for reuse so as
to reduce calls to `browser.dns.resolve`. All the
associations will be cleared after the specified time
to ensure the map does not grow too large and too
ensure uBO uses up to date CNAME information.
2019-11-19 12:05:33 -05:00
Raymond Hill
a69b301d81
Fine-tune new bidi-trie code
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/761
2019-10-29 10:26:34 -04:00
Raymond Hill
d7b2d31180
Harden compiled/selfie format change detection at launch
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/759

This commit adds code to rely less on the state of the
cache storage to decide whether filter lists should be
re-compiled or whether the selfie is currently valid
at launch time when a change in compiled/selfie format
is detected.
2019-10-27 11:49:05 -04:00
Raymond Hill
7971b22385
Expand bidi-trie usage in static network filtering engine
Related issues:
- https://github.com/uBlockOrigin/uBlock-issues/issues/761
- https://github.com/uBlockOrigin/uBlock-issues/issues/528

The previous bidi-trie code could only hold filters which
are plain pattern, i.e. no wildcard characters, and which
had no origin option (`domain=`), right and/or left anchor,
and no `csp=` option.

Example of filters that could be moved into a bidi-trie
data structure:

    &ad_box_
    /w/d/capu.php?z=$script,third-party
    ||liveonlinetv247.com/images/muvixx-150x50-watch-now-in-hd-play-btn.gif

Examples of filters that could NOT be moved to a bidi-trie:

    -adap.$domain=~l-adap.org
    /tsc.php?*&ses=
    ||ibsrv.net/*forumsponsor$domain=[...]
    @@||imgspice.com/jquery.cookie.js|$script
    ||view.atdmt.com^*/iview/$third-party
    ||postimg.cc/image/$csp=[...]

Ideally the filters above should be able to be moved to a
bidi-trie since they are basically plain patterns, or at
least partially moved to a bidi-trie when there is only a
single wildcard (i.e. made of two plain patterns).

Also, there were two distinct bidi-tries in which
plain-pattern filters can be moved to: one for patterns
without hostname anchoring and another one for patterns
with hostname-anchoring. This was required because the
hostname-anchored patterns have an extra condition which
is outside the bidi-trie knowledge.

This commit expands the number of filters which can be
stored in the bidi-trie, and also remove the need to
use two distinct bidi-tries.

- Added ability to associate a pattern with an integer
  in the bidi-trie [1].
    - The bidi-trie match code passes this externally
      provided integer when calling an externally
      provided method used for testing extra conditions
      that may be present for a plain pattern found to
      be matching in the bidi-trie.

- Decomposed existing filters into smaller logical units:
    - FilterPlainLeftAnchored =>
        FilterPatternPlain +
        FilterAnchorLeft
    - FilterPlainRightAnchored =>
        FilterPatternPlain +
        FilterAnchorRight
    - FilterExactMatch =>
        FilterPatternPlain +
        FilterAnchorLeft +
        FilterAnchorRight
    - FilterPlainHnAnchored =>
        FilterPatternPlain +
        FilterAnchorHn
    - FilterWildcard1 =>
        FilterPatternPlain + [
          FilterPatternLeft or
          FilterPatternRight
        ]
    - FilterWildcard1HnAnchored =>
        FilterPatternPlain + [
          FilterPatternLeft or
          FilterPatternRight
        ] +
        FilterAnchorHn
    - FilterGenericHnAnchored =>
        FilterPatternGeneric +
        FilterAnchorHn
    - FilterGenericHnAndRightAnchored =>
        FilterPatternGeneric +
        FilterAnchorRight +
        FilterAnchorHn
    - FilterOriginMixedSet =>
        FilterOriginMissSet +
        FilterOriginHitSet
    - Instances of FilterOrigin[...], FilterDataHolder
      can also be added to a composite filter to
      represent `domain=` and `csp=` options.

- Added a new filter class, FilterComposite, for
  filters which are a combination of two or more
  logical units. A FilterComposite instance is a
  match when *all* filters composing it are a
  match.

Since filters are now encoded into combination of
smaller units, it becomes possible to extract the
FilterPatternPlain component and store it in the
bidi-trie, and use the integer as a handle for the
remaining extra conditions, if any.

Since a single pattern in the bidi-trie may be a
component for different filters, the associated
integer points to a sequence of extra conditions,
and a match occurs as soon as one of the extra
conditions (which may itself be a sequence of
conditions) is fulfilled.

Decomposing filters which are currently single
instance into sequences of smaller logical filters
means increasing the storage and CPU overhead when
evaluating such filters. The CPU overhead is
compensated by the fact that more filters can now
moved into the bidi-trie, where the first match is
efficiently evaluated. The extra conditions have to
be evaluated if and only if there is a match in the
bidi-trie.

The storage overhead is compensated by the
bidi-trie's intrinsic nature of merging similar
patterns.

Furthermore, the storage overhead is reduced by no
longer using JavaScript array to store collection
of filters (which is what FilterComposite is):
the same technique used in [2] is imported to store
sequences of filters.

A sequence of filters is a sequence of integer pairs
where the first integer is an index to an actual
filter instance stored in a global array of filters
(`filterUnits`), while the second integer is an index
to the next pair in the sequence -- which means all
sequences of filters are encoded in one single array
of integers (`filterSequences` => Uint32Array). As
a result, a sequence of filters can be represented by
one single integer -- an index to the first pair --
regardless of the number of filters in the sequence.

This representation is further leveraged to replace
the use of JavaScript array in FilterBucket [3],
which used a JavaScript array to store collection
of filters. Doing so means there is no more need for
FilterPair [4], which purpose was to be a lightweight
representation when there was only two filters in a
collection.

As a result of the above changes, the map of `token`
(integer)  => filter instance (object) used to
associate tokens to filters or collections of filters
is replaced with a more efficient map of `token`
(integer) to filter unit index (integer) to lookup a
filter object from the global `filterUnits` array.

Another consequence of using one single global
array to store all filter instances means we can reuse
existing instances when a logical filter instance is
parameter-less, which is the case for FilterAnchorLeft,
FilterAnchorRight, FilterAnchorHn, the index to these
single instances is reused where needed.

`urlTokenizer` now stores the character codes of the
scanned URL into a bidi-trie buffer, for reuse when
string matching methods are called.

New method: `tokenHistogram()`, used to generate
histograms of occurrences of token extracted from URLs
in built-in benchmark. The top results of the "miss"
histogram are used as "bad tokens", i.e. tokens to
avoid if possible when compiling filter lists.

All plain pattern strings are now stored in the
bidi-trie memory buffer, regardless of whether they
will be used in the trie proper or not.

Three methods have been added to the bidi-trie to test
stored string against the URL which is also stored in
then bidi-trie.

FilterParser is now instanciated on demand and
released when no longer used.

***

[1] 135a45a878/src/js/strie.js (L120)
[2] e94024d350
[3] 135a45a878/src/js/static-net-filtering.js (L1630)
[4] 135a45a878/src/js/static-net-filtering.js (L1566)
2019-10-21 08:15:58 -04:00
Raymond Hill
8c47fa1a3e
Use async/await instead of chained thens 2019-09-21 19:48:02 -04:00
Raymond Hill
eb871ae558
Fix regression in selfie destruction code
Related commit:
- 915687fddb (diff-73ef8c4664f2ec8c02320d50b2908efdR1100-R1113)

Since selfie destruction is now deferred so as to
coallesce burst of call to destroy(), the selfie
load code must mind whether there is a pending
destruction in order to decide whether the
selfie can be safely loaded.

Related feedback:
- 23c4c80136 (commitcomment-35179834)
2019-09-21 19:24:47 -04:00