Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622
Following STrie-related work in above issue, I noticed that a large
number of filters in EasyList were filters which only had to match
against the document origin. For instance, among just the top 10
most populous buckets, there were four such buckets with over
hundreds of entries each:
- bits: 72, token: "http", 146 entries
- bits: 72, token: "https", 139 entries
- bits: 88, token: "http", 122 entries
- bits: 88, token: "https", 118 entries
These filters in these buckets have to be matched against all
the network requests.
In order to leverage HNTrie for these filters[1], they are now handled
in a special way so as to ensure they all end up in a single HNTrie
(per bucket), which means that instead of scanning hundreds of entries
per URL, there is now a single scan per bucket per URL for these
apply-everywhere filters.
Now, any filter which fulfill ALL the following condition will be
processed in a special manner internally:
- Is of the form `|https://` or `|http://` or `*`; and
- Does have a `domain=` option; and
- Does not have a negated domain in its `domain=` option; and
- Does not have `csp=` option; and
- Does not have a `redirect=` option
If a filter does not fulfill ALL the conditions above, no change
in behavior.
A filter which matches ALL of the above will be processed in a special
manner:
- The `domain=` option will be decomposed so as to create as many
distinct filter as there is distinct value in the `domain=` option
- This also apply to the `badfilter` version of the filter, which
means it now become possible to `badfilter` only one of the
distinct filter without having to `badfilter` all of them.
- The logger will always report these special filters with only a
single hostname in the `domain=` option.
***
[1] HNTrie is currently WASM-ed on Firefox.
The motivation is to address the higher peak memory usage at launch
time with 3rd-gen HNTrie when a selfie was present.
The selfie generation prior to this change was to collect all
filtering data into a single data structure, and then to serialize
that whole structure at once into storage (using JSON.stringify).
However, HNTrie serialization requires that a large UintArray32 be
converted into a plain JS array, which itslef would be indirectly
converted into a JSON string. This was the main reason why peak
memory usage would be higher at launch from selfie, since the JSON
string would need to be wholly unserialized into JS objects, which
themselves would need to be converted into more specialized data
structures (like that Uint32Array one).
The solution to lower peak memory usage at launch is to refactor
selfie generation to allow a more piecemeal approach: each filtering
component is given the ability to serialize itself rather than to be
forced to be embedded in the master selfie. With this approach, the
HNTrie buffer can now serialize to its own storage by converting the
buffer data directly into a string which can be directly sent to
storage. This avoiding expensive intermediate steps such as
converting into a JS array and then to a JSON string.
As part of the refactoring, there was also opportunistic code
upgrade to ES6 and Promise (eventually all of uBO's code will be
proper ES6).
Additionally, the polyfill to bring getBytesInUse() to Firefox has
been revisited to replace the rather expensive previous
implementation with an implementation with virtually no overhead.
A new filtering class has been created: "static extended filtering".
This new class is an umbrella class for more specialized filtering
engines:
- Cosmetic filtering
- Scriptlet filtering
- HTML filtering
HTML filtering is available only on platforms which support modifying
the response body on the fly, so only Firefox 57+ at the moment.
With the ability to modify the response body, HTML filtering has
been introduced: removing elements from the DOM before the source
data has been parsed by the browser.
A consequence of HTML filtering ability is to bring back script tag
filtering feature.
- badfilter option was no longer working following last refactoring
changes.
- performance work:
- reduce duplication of large strings.
- new lighter FilterBucket to use when only 2 filters: FilterPair.
.jshintrc's otion-set is a personal choice, merely a suggestion.
Beside that, it includes some common globals for specific browsers, so
there's no need to set the globals in every .js file.
In order to force strict coding, "use strict" directive was added into
every .js file.