1
0
mirror of https://github.com/gorhill/uBlock.git synced 2024-11-18 00:13:30 +01:00
Commit Graph

197 Commits

Author SHA1 Message Date
Raymond Hill
a69b301d81
Fine-tune new bidi-trie code
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/761
2019-10-29 10:26:34 -04:00
Raymond Hill
5cc797fb47
Add WASM implementation for BidiTrieContainer.matches()
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/761
2019-10-28 13:57:35 -04:00
Raymond Hill
d7b2d31180
Harden compiled/selfie format change detection at launch
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/759

This commit adds code to rely less on the state of the
cache storage to decide whether filter lists should be
re-compiled or whether the selfie is currently valid
at launch time when a change in compiled/selfie format
is detected.
2019-10-27 11:49:05 -04:00
Raymond Hill
7971b22385
Expand bidi-trie usage in static network filtering engine
Related issues:
- https://github.com/uBlockOrigin/uBlock-issues/issues/761
- https://github.com/uBlockOrigin/uBlock-issues/issues/528

The previous bidi-trie code could only hold filters which
are plain pattern, i.e. no wildcard characters, and which
had no origin option (`domain=`), right and/or left anchor,
and no `csp=` option.

Example of filters that could be moved into a bidi-trie
data structure:

    &ad_box_
    /w/d/capu.php?z=$script,third-party
    ||liveonlinetv247.com/images/muvixx-150x50-watch-now-in-hd-play-btn.gif

Examples of filters that could NOT be moved to a bidi-trie:

    -adap.$domain=~l-adap.org
    /tsc.php?*&ses=
    ||ibsrv.net/*forumsponsor$domain=[...]
    @@||imgspice.com/jquery.cookie.js|$script
    ||view.atdmt.com^*/iview/$third-party
    ||postimg.cc/image/$csp=[...]

Ideally the filters above should be able to be moved to a
bidi-trie since they are basically plain patterns, or at
least partially moved to a bidi-trie when there is only a
single wildcard (i.e. made of two plain patterns).

Also, there were two distinct bidi-tries in which
plain-pattern filters can be moved to: one for patterns
without hostname anchoring and another one for patterns
with hostname-anchoring. This was required because the
hostname-anchored patterns have an extra condition which
is outside the bidi-trie knowledge.

This commit expands the number of filters which can be
stored in the bidi-trie, and also remove the need to
use two distinct bidi-tries.

- Added ability to associate a pattern with an integer
  in the bidi-trie [1].
    - The bidi-trie match code passes this externally
      provided integer when calling an externally
      provided method used for testing extra conditions
      that may be present for a plain pattern found to
      be matching in the bidi-trie.

- Decomposed existing filters into smaller logical units:
    - FilterPlainLeftAnchored =>
        FilterPatternPlain +
        FilterAnchorLeft
    - FilterPlainRightAnchored =>
        FilterPatternPlain +
        FilterAnchorRight
    - FilterExactMatch =>
        FilterPatternPlain +
        FilterAnchorLeft +
        FilterAnchorRight
    - FilterPlainHnAnchored =>
        FilterPatternPlain +
        FilterAnchorHn
    - FilterWildcard1 =>
        FilterPatternPlain + [
          FilterPatternLeft or
          FilterPatternRight
        ]
    - FilterWildcard1HnAnchored =>
        FilterPatternPlain + [
          FilterPatternLeft or
          FilterPatternRight
        ] +
        FilterAnchorHn
    - FilterGenericHnAnchored =>
        FilterPatternGeneric +
        FilterAnchorHn
    - FilterGenericHnAndRightAnchored =>
        FilterPatternGeneric +
        FilterAnchorRight +
        FilterAnchorHn
    - FilterOriginMixedSet =>
        FilterOriginMissSet +
        FilterOriginHitSet
    - Instances of FilterOrigin[...], FilterDataHolder
      can also be added to a composite filter to
      represent `domain=` and `csp=` options.

- Added a new filter class, FilterComposite, for
  filters which are a combination of two or more
  logical units. A FilterComposite instance is a
  match when *all* filters composing it are a
  match.

Since filters are now encoded into combination of
smaller units, it becomes possible to extract the
FilterPatternPlain component and store it in the
bidi-trie, and use the integer as a handle for the
remaining extra conditions, if any.

Since a single pattern in the bidi-trie may be a
component for different filters, the associated
integer points to a sequence of extra conditions,
and a match occurs as soon as one of the extra
conditions (which may itself be a sequence of
conditions) is fulfilled.

Decomposing filters which are currently single
instance into sequences of smaller logical filters
means increasing the storage and CPU overhead when
evaluating such filters. The CPU overhead is
compensated by the fact that more filters can now
moved into the bidi-trie, where the first match is
efficiently evaluated. The extra conditions have to
be evaluated if and only if there is a match in the
bidi-trie.

The storage overhead is compensated by the
bidi-trie's intrinsic nature of merging similar
patterns.

Furthermore, the storage overhead is reduced by no
longer using JavaScript array to store collection
of filters (which is what FilterComposite is):
the same technique used in [2] is imported to store
sequences of filters.

A sequence of filters is a sequence of integer pairs
where the first integer is an index to an actual
filter instance stored in a global array of filters
(`filterUnits`), while the second integer is an index
to the next pair in the sequence -- which means all
sequences of filters are encoded in one single array
of integers (`filterSequences` => Uint32Array). As
a result, a sequence of filters can be represented by
one single integer -- an index to the first pair --
regardless of the number of filters in the sequence.

This representation is further leveraged to replace
the use of JavaScript array in FilterBucket [3],
which used a JavaScript array to store collection
of filters. Doing so means there is no more need for
FilterPair [4], which purpose was to be a lightweight
representation when there was only two filters in a
collection.

As a result of the above changes, the map of `token`
(integer)  => filter instance (object) used to
associate tokens to filters or collections of filters
is replaced with a more efficient map of `token`
(integer) to filter unit index (integer) to lookup a
filter object from the global `filterUnits` array.

Another consequence of using one single global
array to store all filter instances means we can reuse
existing instances when a logical filter instance is
parameter-less, which is the case for FilterAnchorLeft,
FilterAnchorRight, FilterAnchorHn, the index to these
single instances is reused where needed.

`urlTokenizer` now stores the character codes of the
scanned URL into a bidi-trie buffer, for reuse when
string matching methods are called.

New method: `tokenHistogram()`, used to generate
histograms of occurrences of token extracted from URLs
in built-in benchmark. The top results of the "miss"
histogram are used as "bad tokens", i.e. tokens to
avoid if possible when compiling filter lists.

All plain pattern strings are now stored in the
bidi-trie memory buffer, regardless of whether they
will be used in the trie proper or not.

Three methods have been added to the bidi-trie to test
stored string against the URL which is also stored in
then bidi-trie.

FilterParser is now instanciated on demand and
released when no longer used.

***

[1] 135a45a878/src/js/strie.js (L120)
[2] e94024d350
[3] 135a45a878/src/js/static-net-filtering.js (L1630)
[4] 135a45a878/src/js/static-net-filtering.js (L1566)
2019-10-21 08:15:58 -04:00
Raymond Hill
4bf6503f0a
Store csp= filters into main data structure
This commits make it so that `csp=` filters
are now stored in the same data structures as
all other static network filters rather than
being stored in a separate one.

This internal change is motivated by the wish
to bring session filters to the static network
filtering engine, as has already been done for
the static extended filtering engine in the
following commit:

59c9a34d34
2019-09-28 11:30:26 -04:00
Raymond Hill
59c9a34d34
Add ability to quickly create exceptions in logger
This is a feature under development, hidden behind
a new advanced setting, `filterAuthorMode` which
default to `false`.

Ability to point-and-click to create temporary
exception filters for static extended filters (i.e.
cosmetic, scriptlet & html filters) from within
the summary pane in the logger. The button to
toggle on/off temporary exception filter is
labeled `#@#`.

The created exceptions are temporary and will be
lost when restarting uBO, or manually toggling off
the exception filters.

Creating temporary exception filters does not
cause the filter lists to reloaded, and thus there
is no overhead in creating/removing these temporary
exception filters.
2019-09-24 17:05:03 -04:00
Raymond Hill
010635acd6
Add support for ping static filter option
Related issue:
- https://github.com/gorhill/uBlock/issues/1493

Documentation:
- https://help.eyeo.com/adblockplus/how-to-write-filters#type-options

Test page:
- https://testpages.adblockplus.org/en/filters/ping

Additionally, network requests of type `beacon` will
be mapped to `ping` by the static filtering engine.
2019-09-22 09:11:55 -04:00
Raymond Hill
eb871ae558
Fix regression in selfie destruction code
Related commit:
- 915687fddb (diff-73ef8c4664f2ec8c02320d50b2908efdR1100-R1113)

Since selfie destruction is now deferred so as to
coallesce burst of call to destroy(), the selfie
load code must mind whether there is a pending
destruction in order to decide whether the
selfie can be safely loaded.

Related feedback:
- 23c4c80136 (commitcomment-35179834)
2019-09-21 19:24:47 -04:00
Raymond Hill
23c4c80136
Add support for elemhide (through specifichide)
Related documentation:
- https://help.eyeo.com/en/adblockplus/how-to-write-filters#element-hiding

Related feedback/discussion:
- https://www.reddit.com/r/uBlockOrigin/comments/d6vxzj/

The `elemhide` filter option as per ABP semantic is
now supported. Previously uBO would consider `elemhide`
to be an alias of `generichide`.

The support of `elemhide` is through the convenient
conversion of `elemhide` option into existing
`generichide` option and new `specifichide` option.

The purpose of the new `specifichide` filter option
is to disable all specific cosmetic filters, i.e.
those who target a specific site.

Additionally, for convenience purpose, the filter
options `generichide`, `specifichide` and `elemhide`
can be aliased using the shorter forms `ghide`,
`shide` and `ehide` respectively.
2019-09-21 11:30:38 -04:00
Raymond Hill
917f3620e0
Revisit element picker arguments code
No need to store mouse coordinates in background
page, thus no need to post mouse coordinates
information for every click.

Rename/group element picker arguments and popup
arguments separately.
2019-09-18 12:17:45 -04:00
Raymond Hill
0051f3b5c7
Work toward modernizing code base: promisification
Swathes of code have been converted to use
Promises/async/await. More left to do.

Related commits:
- eec53c0154
- 915687fddb
- 55cc0c6997
- e27328f931
2019-09-16 16:17:48 -04:00
Raymond Hill
93f438f55e
Add advanced setting for extension reload on update
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/717

Related feedback:
- https://github.com/uBlockOrigin/uBlock-issues/issues/717#issuecomment-530275655

New advanced setting: `extensionUpdateForceReload`

Default value: `false`

If set to `true`, the extension will unconditionally reload
when an update is available; otherwise the extension will
reload only when being explicitly disabled then enabled, or
when the browser is restarted.
2019-09-11 08:00:55 -04:00
Raymond Hill
bcf5ac1fee
Add advanced setting to control logger popup type
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/663

The advanced setting `loggerPopupType` has been added, to
control the type of window to be used when the logger is
launched as a separate window.

The default value is `popup`, it can be changed to any of
the values documented at:

https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/windows/CreateType
2019-09-06 11:41:07 -04:00
Raymond Hill
5ad809c07d
Code review color badge code
Related commit:
- 07c950f1e5

Cache [blocking mode, color] pair for fast
lookup in subsequent calls.
2019-08-19 09:00:53 -04:00
Raymond Hill
07c950f1e5
Review icon badge color management
Related commit & feedback:
- 7ff750eaf6

The color value for the icon badge is now
"attached" to the blocking profile value.
Additionally, as per feedback, `3p` rules
will be relaxing before master JavaScript
switch rules.
2019-08-11 13:55:39 -04:00
Raymond Hill
7ff750eaf6
Reflect blocking mode in badge color of toolbar icon
Related feedback:
- https://www.reddit.com/r/uBlockOrigin/comments/cmh910/

Additionally, the `3p` rule has been made distinct from
`3p-script`/`3p-frame` for the purpose of
"Relax blocking mode" command.

The badge color will hint at the current blocking mode.
There are four colors for the four following blocking
modes:
- JavaScript wholly disabled
- All 3rd parties blocked
- 3rd-party scripts and frames blocked
- None of the above

The default badge color will be used when JavaScript is not
wholly disabled and when there are no rules for `3p`,
`3p-script` or `3p-frame`.

A new advanced setting has been added to let the user choose
the badge colors for the various blocking modes,
`blockingProfileColors`. The value *must* be a sequence of
4 valid CSS color values that match 6 hexadecimal digits
prefixed with`#` -- anything else will be ignored.
2019-08-10 10:57:24 -04:00
Raymond Hill
048bfd251c
Add ability to bypass browser cache when fetching a resource
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/682#issuecomment-515197130

The following advanced setting has been added:

    updateAssetBypassBrowserCache

Default to `false`. If set to `true`, uBO will ensure the
browser cache is bypassed when fetching a remote resource.

This is for the convenience of filter list maintainers who
may want to test the latest version of their lists when
fetched from their remote location.
2019-07-26 09:52:11 -04:00
Raymond Hill
10fe9fe656
Allow setting assetsBootstrapLocation from admin settings
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/666
2019-07-18 10:53:08 -04:00
Raymond Hill
f930da7ad6
Fix regression of reverse-lookup of scriptlet filters in logger
Related commit:
- 5552d6717d
2019-07-05 11:44:40 -04:00
Raymond Hill
1fb9845c35
Remove useless code 2019-07-04 14:10:23 -04:00
Raymond Hill
0ba9a35818
Convert more resources as immutable
Related commit:
- 152cea2dfe
2019-07-03 14:33:06 -04:00
Raymond Hill
6c34b3c3c9
Use "relax" instead of "toggle"
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/371
2019-06-27 08:16:18 -04:00
Raymond Hill
693687fd74
Add keyboard support for toggling down blocking profile
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/371

By default, no specific keyboard shortcut is predefined,
this will have to be assigned by the user. The command
name in English is "Toggle blocking profile".

The default behavior is to toggle down according to one
of the following scenarios.

a) If script execution is disabled through the no-scripting
switch, the no-scripting switch will be locally toggled
so as to allow script execution. The page will be
automatically reloaded.

b) If script execution is not blocked but the 3rd-party
script and/or frame cells are blocked, local no-op rules
will be set so as to no longer block 3rd-party scripts
and/or frames. The page will be automatically reloaded.

Given this, it may take more than one toggle down command
to reach the lowest blocking profile, which is one where
JavaScript execution is not blocked and 3rd-party scripts
and frames resources block rules, if any, are bypassed
with local no-op rules.

TODO: At this point, I haven't yet decided whether
toggling from the lowest profile should restore the
original highest blocking profile.
2019-06-26 07:47:14 -04:00
Raymond Hill
9065bbdd48
Code review of whitelisting-related code
- Use `Map()` instead of `{}` for internal data
  structure
- Export as array of directives instead of as
  a string
2019-06-25 11:57:14 -04:00
Raymond Hill
cfc2ce333d
Implement bidirectional plain-string trie
The bidirectional trie allows storing the right
and left parts of a string into a trie given a
pivot position.

Releated issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528

Additionally, the mandatory token-at-index-0 rule
for FilterPlainHnAnchored has been lifted, thus
allowing the engine to pick a potentially better token
at any position in the filter string.

***

TODO: Eventually rename `strie.js` to `biditrie.js`.

TODO: Fix dump() method, it currently only show the
      right-hand side of a filter string.
2019-06-18 19:16:39 -04:00
Raymond Hill
72d9758faa
Ensure the "Filter lists" pane is in sync with update status
Related issue:
- https://github.com/gorhill/uBlock/issues/2394

Additionally, I added a new advanced setting to control
how long after launch an auto-update session should be
started -- value is in seconds:

    autoUpdateDelayAfterLaunch 180
2019-05-19 18:31:12 -04:00
Raymond Hill
1caff7429e
Add optional support for generic procedural cosmetic filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/131

The new advanced setting and its default value is:

    allowGenericProceduralFilters false

Whenever this setting is toggled, the user is responsible
of forcing a reload of all filter lists so as to allow uBO
to process differently any existing generic procedural
cosmetic filters.
2019-05-18 18:57:32 -04:00
Raymond Hill
3cf71835c4
Set default delay for creating selfie to 3 minutes
Related discussion:
- https://www.reddit.com/r/uBlockOrigin/comments/bq49zi/
2019-05-18 14:43:44 -04:00
Raymond Hill
f7bbc80717
Improve "Whitelist pane"; remove now useless built-in switch rule
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/214

Built-in whitelist directives are now rendered differently
than user-defined whitelist directives. Also, removing a
built-in whitelist directive will only cause that directive
to be commented out, so that users do not have to remember
built-in directives should they want to bring them back.

Related issue:
 https://github.com/uBlockOrigin/uBlock-issues/issues/494

The built-in per-site switch rule
`no-scripting: behind-the-scene false` has been removed,
it should not ever be needed since there will always be a
valid root context for main- and sub-frames.
2019-05-18 14:20:05 -04:00
Raymond Hill
0ca44b847c
Avoid duplicated strings in filterOrigin w/ new approach
The new approach is simpler and should benefit selfie
serialization/unserialization.

This renders stringDeduplicater obsolete -- it has been
removed.
2019-05-17 10:13:58 -04:00
Raymond Hill
93f80eedfa
Refactor runtime storage of specific cosmetic filters
This was a TODO item:
- 07cbae66a4/src/js/cosmetic-filtering.js (L375)

µBlock.staticExtFilteringEngine.HostnameBasedDB has been
re-factored to accomodate the storing of specific cosmetic
filters.

As a result of this refactoring:

- Memory usage has been further decreased
- Performance of selector retrieval marginally
  improved
- New internal representation opens the door
  to use a specialized version of HNTrie, which
  should further improve performance/memory
  usage
2019-05-14 08:52:34 -04:00
Raymond Hill
915c1f1f3c
Report resources blocked by csp= option in logger
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/552
2019-05-11 10:40:34 -04:00
Raymond Hill
12bdd01595
Ensure "Ignore generic cosmetic filters" sticks on Fennec
Related issue:
- https://www.reddit.com/r/uBlockOrigin/comments/blkudl/

The setting was not sticking at first-install time.
2019-05-11 09:04:13 -04:00
Raymond Hill
0e4fbefd07
Remove unecessary null placeholders FilterOriginHitSet et al.
The `null` placeholder are not necessary, we can just use
default arguments instead, and add the HNTrieContainer
references if and only if they are instanciated.
2019-05-01 18:54:11 -04:00
Raymond Hill
adabb56dc9
Do not store impossible to match filters in HNTrie
Consider the two following filters:

    example.com
    www.example.com

This commit make it so that if the first filter is
already present in a given HNTrie, the second filter
will not be stored, since HNTrie will _always_
return the first filter as a match whenever the
hostname to match is example.com or any subdomain
of example.com.

The detection of such pointless filters is
virtually free when adding a hostname to an HNTrie
instance (given how data is stored in the trie), so
in practice no overhead is incurred to detect such
pointless filters.

The ability to ignore impossible to match filters
in HNTrie instances will _especially_ benefit those
using large hosts files.

Examples of how this helps using real configurations:

- Default lists:
  444 filters out of 100,382 were ignored as a result
  of this commit.

- Default lists + "Energized Ultimate Protection":
  283,669 filters out of 903,235 were ignored as a
  result of this commit.

Side note: There was no measurable difference between
the two configurations above in the performance of
the matching algorithm as reported by the built-in
benchmark tool.
2019-04-29 13:15:16 -04:00
Raymond Hill
ac58b8e688
Make token hashes fit within a 32-bit integer
The staticNetFilteringEngine uses token hashes to store/lookup
filters into Map objects.

Before this commit, the tokens were encoded into token hashes
as JS numbers (not exceeding MAX_SAFE_INTEGER) using at most
the 8 first characters of the token.

With this commit, token hashes are now restricted to fit
into 32-bit integers, and are derived from at most the 7 first
characters. This improves filter look-up performance as per
built-in benchmark().
2019-04-28 10:15:15 -04:00
Raymond Hill
96dce22218
Increase resolution of known-token lookup table
Related commit:
- 69a43e07c4

Using 32 bits of token hash rather than just the 16 lower
bits does help discard more unknown tokens.

Using the default filter lists, the known-token lookup
table is populated by 12,276 entries, out of 65,536, thus
making the case that theoretically there is a lot of
possible tokens which can be discarded.

In practice, running the built-in
staticNetFilteringEngine.benchmark() with default filter
lists, I find that 1,518,929 tokens were skipped out of
4,441,891 extracted tokens, or 34%.
2019-04-27 08:18:01 -04:00
Raymond Hill
69a43e07c4
Ignore unknown tokens in urlTokenizer.getTokens()
Given that all tokens extracted from one single URL are potentially
iterated multiple times in a single URL-matching cycle, it pays to
ignore extracted tokens which are known to not be used anywhere in
the static filtering engine.

The gain in processing a single network request in the static
filtering engine can become especially high when dealing with
long and random-looking URLs, which URLs have a high likelihood
of containing a majority of tokens which are known to not be in
use.
2019-04-26 17:14:00 -04:00
Raymond Hill
19ece97b0c
Leverage compile-time token information in new fitler classes
Related commit:
- 99390390fc

The token information available at compile time can be stored
in the filter to be used at match() time. This allows the use of
startsWith() rather than a more costly indexOf() call as a first
quick test to detect mismatches.
2019-04-26 11:16:47 -04:00
Raymond Hill
99390390fc
Introduce three more specialized filter classes to avoid regexes
Performance- and memory-related work. Three more classes have
been created to avoid regex-based filters internally.

Purpose is to enforce filters which have only one single
wildcard in their pattern, a common occurrence. The filter
pattern is split in two literal string segments.

Similar as above, with the added condition that the filter is
hostname-anchored (`||`). The "Wildcard2" variant is a further
specialization to enforce filters where the only wildcard
is immediately preceded by the `^` special character, again
a very common occurrence.

Using two literal string segments in lieu of regexes allows to
quickly detect a mismatch by just testing the first segment.
Additionally, this reduces memory footprint as regexes are
much more expensive memory-wise than plain strings.

These three new filter classes allow to replace the use of
5276 regex-based filters internally with plain string-based
filters.

Often-called isHnAnchored() has been further fine-tuned to
avoid as much work as possible. I have also observed that
using an arrow function for closure-purpose helps measurably
performance, as per built-in benchmark.
2019-04-25 17:48:08 -04:00
Raymond Hill
fa83744b58
Use a sequence of base 64 numbers to encode array buffers
The purpose of using a custom base128 encoder is to
convert array buffers into strings, to allow a direct
string-to-array buffer conversion at load time:

  string => array buffer

Whereas a JSON array would require an extra step:

  JSON array as string => JS array => array buffer

Turns out that the current use of a custom base128 encoding
results in a significantly larger selfie storage usage when
converting array buffers into strings.

Speculation: possibly the browser convert the strings to
save into JSON strings internally. Since the custom base128
encoder is likely to cause the resulting string to contain
a lot of unprintable ASCII characters, these will need to
be escaped when converted to JSON -- escaped characters
occupy more space than non-escaped ones.

Using a sequence of base 64 numbers means only printable
will be present in the output string, hence no escaping
necessary. I have observed significant reduction in
storage usage for selfie purpose.
2019-04-20 09:06:54 -04:00
Raymond Hill
3f3a1543ea
Add HNTrie-based filter classes to store origin-only filters
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622

Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:

- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries

These filters in these buckets have to be matched against all
the network requests.

In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.

Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:

- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option

If a filter does not fulfill ALL the conditions above, no change
in behavior.

A filter which matches ALL of the above will be processed in a special
manner:

- The `domain=` option will be decomposed so as to create as many
  distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
  means it now become possible to `badfilter` only one of the
  distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
  single hostname in the `domain=` option.

***

[1] HNTrie is currently WASM-ed on Firefox.
2019-04-19 16:33:46 -04:00
Raymond Hill
c229003d31
Performance + code maintenance work on static network filtering engine
Implement a plain string trie container class: STrieContainer.

Make use of STrieContainer where beneficial

  Some filter buckets can grow quite large, and in such case
  coalescing "trieable" filter classes into a single trie reduces
  lookup performance and memory usage.

  For instance, at time of commit, the filter bucket for the
  `ad` keyword contains 919 entries[1].

  Coalescing trieable filters of the same class into a single plain
  string trie reduced the size of the bucket into 50 entries + two
  tries which are scanned only once each whenever the bucket is
  visited.

  [1] Enter the following code at uBO's dev console:
      µBlock.staticNetFilteringEngine.categories.get(0).get(µBlock.urlTokenizer.tokenHashFromString('ad'))

Refactor static network filtering engine code to make use of
ES6's syntactic sugar `class`.

Change first auto-update run from 7 to 5 minutes.
2019-04-14 16:45:20 -04:00
Raymond Hill
87feb47b51
Support disabling suspendTabsUntilReady in Firefox
The value of `suspendTabsUntilReady` was disregarded in Firefox and
uBO defaulted to always defer tab loading until it was ready.

This commit allows to disable the deferring of tab loading in
Firefox. The new valid values for `suspendTabsUntilReady` are:
- `unset`: leave it to the platform to pick the optimal
  behavior (default)
- `no`: do no suspend tab loading at launch time
- `yes`: suspend tab loading at launch time
2019-02-19 12:30:37 -05:00
Raymond Hill
426a6ea9a7
Fix spurious output at uBO's dev console
Regression from https://github.com/gorhill/uBlock/commit/0d369cda21bb
2019-02-18 14:41:04 -05:00
Raymond Hill
e93062bcdf
Spin-off FilterOrigin flavors into standalone classes
This removes the derivation of FilterOrigin flavors from
FilterOrigin itself and simplify code paths. FilterOrigin
flavors are small specialized classes, no need to
overcomplicate with derivation.

Specifically, this removes an indirect call to reach the
match() method.
2019-02-16 12:16:30 -05:00
Raymond Hill
ed7e34fb07
Refactor selfie generation into a more flexible persistence mechanism
The motivation is to address the higher peak memory usage at launch
time with 3rd-gen HNTrie when a selfie was present.

The selfie generation prior to this change was to collect all
filtering data into a single data structure, and then to serialize
that whole structure at once into storage (using JSON.stringify).

However, HNTrie serialization requires that a large UintArray32 be
converted into a plain JS array, which itslef would be indirectly
converted into a JSON string. This was the main reason why peak
memory usage would be higher at launch from selfie, since the JSON
string would need to be wholly unserialized into JS objects, which
themselves would need to be converted into more specialized data
structures (like that Uint32Array one).

The solution to lower peak memory usage at launch is to refactor
selfie generation to allow a more piecemeal approach: each filtering
component is given the ability to serialize itself rather than to be
forced to be embedded in the master selfie. With this approach, the
HNTrie buffer can now serialize to its own storage by converting the
buffer data directly into a string which can be directly sent to
storage. This avoiding expensive intermediate steps such as
converting into a JS array and then to a JSON string.

As part of the refactoring, there was also opportunistic code
upgrade to ES6 and Promise (eventually all of uBO's code will be
proper ES6).

Additionally, the polyfill to bring getBytesInUse() to Firefox has
been revisited to replace the rather expensive previous
implementation with an implementation with virtually no overhead.
2019-02-14 13:33:55 -05:00
Raymond Hill
a026e9ae54
Fix reverting use of IndexedDB as default cache storage on Chromium
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/399

The advanced setting `cacheStorageAPI` has been added to allow
a user to force the use of IndexedDB as cache storage. Set to
`IndexedDB` to force use of IndexedDB. Default to `unset`.
2019-01-25 18:49:30 -05:00
Raymond Hill
64bea27881
Add ability to control auto-commenting at filter creation time
Related issues:
- https://github.com/uBlockOrigin/uBlock-issues/issues/372
- https://github.com/gorhill/uBlock/issues/93

A new advanced settings has been added: `autoCommentFilterTemplate`.

Default value is `{{date}} {{origin}}`.

Placeholders are identified by `{{...}}`. There are currently
only three placeholders supported:

- `{{date}}`: will be replaced with current date
- `{{time}}`: will be replaced with current time
- `{{origin}}`: will be replaced with site information on which
  the filter(s) was created

If no placeholder is found in `autoCommentFilterTemplate`, this
will disable auto-commenting. So one can use `-` to disable
auto-commenting.

Additionally, if auto-commenting is enabled, uBO will not emit a
comment if an emitted comment would be a duplicate of the last
one found in the user filter list.
2019-01-08 07:37:50 -05:00
Raymond Hill
610ca2684b
Remove (broken) benchmark pane 2018-12-21 12:01:24 -05:00