1
0
mirror of https://github.com/gorhill/uBlock.git synced 2024-09-29 14:17:11 +02:00
Commit Graph

27 Commits

Author SHA1 Message Date
Raymond Hill
a71b71e4c8
New cosmetic filter parser using CSSTree library
The new parser no longer uses the browser DOM to validate
that a cosmetic filter is valid or not, this is now done
through a JS library, CSSTree.

This means filter list authors will have to be more careful
to ensure that a cosmetic filter is really valid, as there is
no more guarantee that a cosmetic filter which works for a
given browser/version will still work properly on another
browser, or different version of the same browser.

This change has become necessary because of many reasons,
one of them being the flakiness of the previous parser as
exposed by many issues lately:

- https://github.com/uBlockOrigin/uBlock-issues/issues/2262
- https://github.com/uBlockOrigin/uBlock-issues/issues/2228

The new parser introduces breaking changes, there was no way
to do otherwise. Some current procedural cosmetic filters will
be shown as invalid with this change. This occurs because the
CSSTree library gets confused with some syntax which was
previously allowed by the previous parser because it was more
permissive.

Mainly the issue is with the arguments passed to some procedural
cosmetic filters, and these issues can be solved as follow:

Use quotes around the argument. You can use either single or
double-quotes, whichever is most convenient. If your argument
contains a single quote, use double-quotes, and vice versa.

Additionally, try to escape a quote inside an argument using
backslash. THis may work, but if not, use quotes around the
argument.

When the parser encounter quotes around an argument, it will
discard them before trying to process the argument, same with
escaped quotes inside the argument. Examples:

Breakage:

    ...##^script:has-text(toscr')

Fix:

    ...##^script:has-text(toscr\')

Breakage:

    ...##:xpath(//*[contains(text(),"VPN")]):upward(2)

Fix:

    ...##:xpath('//*[contains(text(),"VPN")]'):upward(2)

There are not many filters which break in the default set of
filter lists, so this should be workable for default lists.

Unfortunately those fixes will break the filter for previous
versions of uBO since these to not deal with quoted argument.
In such case, it may be necessary to keep the previous filter,
which will be discarded as broken on newer version of uBO.

THis was a necessary change as the old parser was becoming
more and more flaky after being constantly patched for new
cases arising, The new parser should be far more robust and
stay robist through expanding procedural cosmetic filter
syntax.

Additionally, in the MV3 version, filters are pre-compiled
using a Nodejs script, i.e. outside the browser, so validating
cosmetic filters using a live DOM no longer made sense.

This new parser will have to be tested throughly before stable
release.
2022-09-23 16:03:13 -04:00
Raymond Hill
e31637af78
[mv3] Add ability to enable/disable filter lists 2022-09-13 17:44:24 -04:00
Raymond Hill
83d028ac7d
Report specific filter before generic one
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/2092

Regression from:
- 72bb89495b
2022-04-25 09:49:31 -04:00
Raymond Hill
4818405cf6
Remove need to pass parser at every compile() call
The compiler instance is already initialized with a
reference to the parser, no need to keep passing the
reference at each call to compile().
2021-08-05 13:30:20 -04:00
Raymond Hill
85c68116bd
Group all compiling-related code into FilterCompiler() class
In the static network filtering engine (snfe), the
compiling-related code was spread across two classes.
This commit makes it so that all the compiling-related
code is in FilterCompiler class, which clear purpose is
to compile raw filters into a form which can be persisted
and later fed to the snfe with no parsing overhead.

To compile raw static network filter, the new approach is:

    snfe.createCompiler(parser);

Then for each single raw filter to compile:

    compiler.compile(parser, writer);

The caller is responsible to keep a reference to the
compiler instance for as long as it is needed. This removes
the need for the clunky code used to keep an instance of
compiler alive in the snfe.

Additionally, snfe.tokenHistograms() has been moved to
benchmarks.js, as it has no dependency on the snfe, it's
just a utility function.
2021-08-04 15:14:48 -04:00
Raymond Hill
62b6826dd5
Further modularize uBO's codebase
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1664

Modularization is a necessary step toward possibly publishing
a more complete nodejs package to allow using uBO's filtering
capabilities outside of the uBO extension.

Additionally, as per feedback, remove undue usage of console
output as per feedback:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1664#issuecomment-888451032
2021-07-28 19:48:38 -04:00
Raymond Hill
22022f636f
Modularize codebase with export/import
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1664

The changes are enough to fulfill the related issue.

A new platform has been added in order to allow for building
a NodeJS package. From the root of the project:

    ./tools/make-nodejs

This will create new uBlock0.nodejs directory in the
./dist/build directory, which is a valid NodeJS package.

From the root of the package, you can try:

    node test

This will instantiate a static network filtering engine,
populated by easylist and easyprivacy, which can be used
to match network requests by filling the appropriate
filtering context object.

The test.js file contains code which is typical example
of usage of the package.

Limitations: the NodeJS package can't execute the WASM
versions of the code since the WASM module requires the
use of fetch(), which is not available in NodeJS.

This is a first pass at modularizing the codebase, and
while at it a number of opportunistic small rewrites
have also been made.

This commit requires the minimum supported version for
Chromium and Firefox be raised to 61 and 60 respectively.
2021-07-27 17:26:04 -04:00
Raymond Hill
f876b68171
Add support for removal of response headers
The syntax to remove response header is a special case
of HTML filtering, whereas the response headers are
targeted, rather than the response body:

  example.com##^responseheader(header-name)

Where `header-name` is the name of the header to
remove, and must always be lowercase.

The removal of response headers can only be applied to
document resources, i.e. main- or sub-frames.

Only a limited set of headers can be targeted for
removal:

  location
  refresh
  report-to
  set-cookie

This limitation is to ensure that uBO never lowers the
security profile of web pages, i.e. we wouldn't want to
remove `content-security-policy`.

Given that the header removal occurs at onHeaderReceived
time, this new ability works for all browsers.

The motivation for this new filtering ability is instance
of website using a `refresh` header to redirect a visitor
to an undesirable destination after a few seconds.
2021-03-13 08:53:34 -05:00
Raymond Hill
5d7b2918ef
Harden processing of changes in compiled list format
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1365

This commit adds the compiled magic version number to the
compiled data itself, and consequently this allows uBO
to no longer require that any given compiled list with a
mismatched format to be detected and discarded at launch
time.

Given this change, uBO no longer needs to rely on the
deletion of cached data at launch time to ensure it
won't use no longer valid compiled lists.
2020-12-08 10:00:47 -05:00
Raymond Hill
0196993828
Use buffer-like approach for filterUnits array
filterUnits is now treated as a buffer which is
pre-allocated and which will grow in chunks so as
to minimize memory allocations. Entries are never
released, just null-ed.

Additionally, move urlTokenizer into the static
network filtering engine, since it's not used
anywhere else.
2020-11-09 06:54:51 -05:00
Raymond Hill
23332400f5
Improve annotations for search operations in CodeMirror editor
Before this commit, CodeMirror's add-on for search occurrences
was limited to find at most 1000 first occurrences, because of
performance considerations.

This commit removes this low limit by having the search
occurrences done in a dedicated worker. The limit is now
time-based, and highly unlikely to ever be hit under normal
condition.

With this change, all search occurrences are gathered,
and as a result:

- All occurrences are reported in the scrollbar instead of
just the 1,000 first

- The total count of all occurrences is now reported, instead
of capping at "1000+".

- The current occurrence rank at the cursor or selection
position is now reported -- this was not possible to report
this before.

The number of occurrences is line-based, it's not useful to
report finer-grained occurences in uBO.
2020-08-02 12:18:01 -04:00
Raymond Hill
90c7e79f4f
Consolidate filter list reverse lookup code into a single file
Since it's possible to execute specific code paths according
to whether the context is that of a worker or not, it's possible
to keep the main/thread code in a single file. Keeping the
main/worker code paths into a single file is more convenient
for both code maintenance and code review.
2020-08-01 10:32:40 -04:00
Raymond Hill
7dc962281f
Set max token length on parser for consistent compilation
Reported internally. The issue could cause the logger
to be unable to successfully reverse-lookup a filter
list for a filter which had tokens longer than 6
characters followed by wildcard.

Regression from:
- 01b1ed9a98
2020-06-07 08:50:20 -04:00
Raymond Hill
01b1ed9a98
Add a new static filtering parser
A new standalone static filtering parser is introduced,
vAPI.StaticFilteringParser. It's purpose is to parse
line of text into representation suitable for
compiling filters. It can additionally serves for
syntax highlighting purpose.

As a side effect, this solves:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1038

This is a first draft, there are more work left to do
to further perfect the implementation and extend its
capabilities, especially those useful to assist filter
authors.

For the time being, this commits break line-continuation
syntax highlighting -- which was already flaky prior to
this commit anyway.
2020-06-04 07:18:54 -04:00
Raymond Hill
c3bc2c741d
Add support for cname type and denyallow option
This concerns the static network filtering engine.

Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/943

* * *

New static network filter type: `cname`

By default, network requests which are result of
resolving a canonical name are subject to filtering.
This filtering can be bypassed by creating exception
filters using the `cname` option. For example:

    @@*$cname

The filter above tells the network filtering engine
to except network requests which fulfill all the
following conditions:

- network request is blocked
- network request is that of an unaliased hostname

Filter list authors are discouraged from using
exception filters of `cname` type, unless there no
other practical solution such that maintenance
burden become the greater issue. Of course, such
exception filters should be as narrow as possible,
i.e. apply to specific domain, etc.

* * *

New static network filter option: `denyallow`

The purpose of `denyallow` is bring
default-deny/allow-exceptionally ability into static
network filtering arsenal. Example of usage:

    *$3p,script, \
        denyallow=x.com|y.com \
        domain=a.com|b.com

The above filter tells the network filtering engine that
when the context is `a.com` or `b.com`, block all
3rd-party scripts except those from `x.com` and `y.com`.

Essentially, the new `denyallow` option makes it easier
to implement default-deny/allow-exceptionally in static
filter lists, whereas before this had to be done with
unwieldy regular expressions[1], or through the mix of
broadly blocking filters along with exception filters[2].

[1] https://hg.adblockplus.org/ruadlist/rev/f362910bc9a0

[2] Typically filters which pattern are of the
    form `|http*://`
2020-03-15 12:23:25 -04:00
Raymond Hill
7971b22385
Expand bidi-trie usage in static network filtering engine
Related issues:
- https://github.com/uBlockOrigin/uBlock-issues/issues/761
- https://github.com/uBlockOrigin/uBlock-issues/issues/528

The previous bidi-trie code could only hold filters which
are plain pattern, i.e. no wildcard characters, and which
had no origin option (`domain=`), right and/or left anchor,
and no `csp=` option.

Example of filters that could be moved into a bidi-trie
data structure:

    &ad_box_
    /w/d/capu.php?z=$script,third-party
    ||liveonlinetv247.com/images/muvixx-150x50-watch-now-in-hd-play-btn.gif

Examples of filters that could NOT be moved to a bidi-trie:

    -adap.$domain=~l-adap.org
    /tsc.php?*&ses=
    ||ibsrv.net/*forumsponsor$domain=[...]
    @@||imgspice.com/jquery.cookie.js|$script
    ||view.atdmt.com^*/iview/$third-party
    ||postimg.cc/image/$csp=[...]

Ideally the filters above should be able to be moved to a
bidi-trie since they are basically plain patterns, or at
least partially moved to a bidi-trie when there is only a
single wildcard (i.e. made of two plain patterns).

Also, there were two distinct bidi-tries in which
plain-pattern filters can be moved to: one for patterns
without hostname anchoring and another one for patterns
with hostname-anchoring. This was required because the
hostname-anchored patterns have an extra condition which
is outside the bidi-trie knowledge.

This commit expands the number of filters which can be
stored in the bidi-trie, and also remove the need to
use two distinct bidi-tries.

- Added ability to associate a pattern with an integer
  in the bidi-trie [1].
    - The bidi-trie match code passes this externally
      provided integer when calling an externally
      provided method used for testing extra conditions
      that may be present for a plain pattern found to
      be matching in the bidi-trie.

- Decomposed existing filters into smaller logical units:
    - FilterPlainLeftAnchored =>
        FilterPatternPlain +
        FilterAnchorLeft
    - FilterPlainRightAnchored =>
        FilterPatternPlain +
        FilterAnchorRight
    - FilterExactMatch =>
        FilterPatternPlain +
        FilterAnchorLeft +
        FilterAnchorRight
    - FilterPlainHnAnchored =>
        FilterPatternPlain +
        FilterAnchorHn
    - FilterWildcard1 =>
        FilterPatternPlain + [
          FilterPatternLeft or
          FilterPatternRight
        ]
    - FilterWildcard1HnAnchored =>
        FilterPatternPlain + [
          FilterPatternLeft or
          FilterPatternRight
        ] +
        FilterAnchorHn
    - FilterGenericHnAnchored =>
        FilterPatternGeneric +
        FilterAnchorHn
    - FilterGenericHnAndRightAnchored =>
        FilterPatternGeneric +
        FilterAnchorRight +
        FilterAnchorHn
    - FilterOriginMixedSet =>
        FilterOriginMissSet +
        FilterOriginHitSet
    - Instances of FilterOrigin[...], FilterDataHolder
      can also be added to a composite filter to
      represent `domain=` and `csp=` options.

- Added a new filter class, FilterComposite, for
  filters which are a combination of two or more
  logical units. A FilterComposite instance is a
  match when *all* filters composing it are a
  match.

Since filters are now encoded into combination of
smaller units, it becomes possible to extract the
FilterPatternPlain component and store it in the
bidi-trie, and use the integer as a handle for the
remaining extra conditions, if any.

Since a single pattern in the bidi-trie may be a
component for different filters, the associated
integer points to a sequence of extra conditions,
and a match occurs as soon as one of the extra
conditions (which may itself be a sequence of
conditions) is fulfilled.

Decomposing filters which are currently single
instance into sequences of smaller logical filters
means increasing the storage and CPU overhead when
evaluating such filters. The CPU overhead is
compensated by the fact that more filters can now
moved into the bidi-trie, where the first match is
efficiently evaluated. The extra conditions have to
be evaluated if and only if there is a match in the
bidi-trie.

The storage overhead is compensated by the
bidi-trie's intrinsic nature of merging similar
patterns.

Furthermore, the storage overhead is reduced by no
longer using JavaScript array to store collection
of filters (which is what FilterComposite is):
the same technique used in [2] is imported to store
sequences of filters.

A sequence of filters is a sequence of integer pairs
where the first integer is an index to an actual
filter instance stored in a global array of filters
(`filterUnits`), while the second integer is an index
to the next pair in the sequence -- which means all
sequences of filters are encoded in one single array
of integers (`filterSequences` => Uint32Array). As
a result, a sequence of filters can be represented by
one single integer -- an index to the first pair --
regardless of the number of filters in the sequence.

This representation is further leveraged to replace
the use of JavaScript array in FilterBucket [3],
which used a JavaScript array to store collection
of filters. Doing so means there is no more need for
FilterPair [4], which purpose was to be a lightweight
representation when there was only two filters in a
collection.

As a result of the above changes, the map of `token`
(integer)  => filter instance (object) used to
associate tokens to filters or collections of filters
is replaced with a more efficient map of `token`
(integer) to filter unit index (integer) to lookup a
filter object from the global `filterUnits` array.

Another consequence of using one single global
array to store all filter instances means we can reuse
existing instances when a logical filter instance is
parameter-less, which is the case for FilterAnchorLeft,
FilterAnchorRight, FilterAnchorHn, the index to these
single instances is reused where needed.

`urlTokenizer` now stores the character codes of the
scanned URL into a bidi-trie buffer, for reuse when
string matching methods are called.

New method: `tokenHistogram()`, used to generate
histograms of occurrences of token extracted from URLs
in built-in benchmark. The top results of the "miss"
histogram are used as "bad tokens", i.e. tokens to
avoid if possible when compiling filter lists.

All plain pattern strings are now stored in the
bidi-trie memory buffer, regardless of whether they
will be used in the trie proper or not.

Three methods have been added to the bidi-trie to test
stored string against the URL which is also stored in
then bidi-trie.

FilterParser is now instanciated on demand and
released when no longer used.

***

[1] 135a45a878/src/js/strie.js (L120)
[2] e94024d350
[3] 135a45a878/src/js/static-net-filtering.js (L1630)
[4] 135a45a878/src/js/static-net-filtering.js (L1566)
2019-10-21 08:15:58 -04:00
Raymond Hill
23c4c80136
Add support for elemhide (through specifichide)
Related documentation:
- https://help.eyeo.com/en/adblockplus/how-to-write-filters#element-hiding

Related feedback/discussion:
- https://www.reddit.com/r/uBlockOrigin/comments/d6vxzj/

The `elemhide` filter option as per ABP semantic is
now supported. Previously uBO would consider `elemhide`
to be an alias of `generichide`.

The support of `elemhide` is through the convenient
conversion of `elemhide` option into existing
`generichide` option and new `specifichide` option.

The purpose of the new `specifichide` filter option
is to disable all specific cosmetic filters, i.e.
those who target a specific site.

Additionally, for convenience purpose, the filter
options `generichide`, `specifichide` and `elemhide`
can be aliased using the shorter forms `ghide`,
`shide` and `ehide` respectively.
2019-09-21 11:30:38 -04:00
Raymond Hill
e27328f931
Work toward modernizing code base: promisification
Swathes of code have been converted to use
Promises/async/await. More left to do.

In the process, a regression affecting the fix to
<https://github.com/uBlockOrigin/uBlock-issues/issues/682>
has been fixed.
2019-09-15 07:58:28 -04:00
Raymond Hill
51a4e9ccf4
fix #2763 2018-07-22 10:47:02 -04:00
Raymond Hill
8e9fe020b5
allow to view list content from blocked-document page 2018-07-21 12:22:53 -04:00
gorhill
f3e6057e07
fix #2598: refactor to address the cause rather than the symptoms 2017-05-25 17:46:59 -04:00
Raymond Hill
3b9fd49c50 Assets management refactored (#2314)
* refactoring assets management code

* finalizing refactoring of assets management

* various code review of new assets management code

* fix #2281

* fix #1961

* fix #1293

* fix #1275

* fix update scheduler timing logic

* forward compatibility (to be removed once 1.11+ is widespread)

* more codereview; give admins ability to specify own assets.json

* "assetKey" is more accurate than "path"

* fix group count update when building dom incrementally

* reorganize content (order, added URLs, etc.)

* ability to customize updater through advanced settings

* better spinner icon
2017-01-18 13:17:47 -05:00
gorhill
b190f0b183 this fixes #536 2015-07-27 17:55:25 -04:00
gorhill
9a5404ef07 this fixes the other half of #58: from which list(s) a cosmetic filter originates 2015-06-13 11:21:55 -04:00
gorhill
e22fdaa9f2 code review 2015-06-11 15:35:35 -04:00
gorhill
5c39998de4 List name for user filters come from i18n 2015-06-11 15:12:14 -04:00
gorhill
060a43fe81 this addresses half of #58: find list(s) from which a static network filter originates 2015-06-11 12:12:23 -04:00