Mirrors/uBlock - uBlock - Git.je (Gitea)

Mirrors/uBlock

mirror of https://github.com/gorhill/uBlock.git synced 2024-09-29 14:17:11 +02:00

Author	SHA1	Message	Date
Raymond Hill	a71b71e4c8	New cosmetic filter parser using CSSTree library The new parser no longer uses the browser DOM to validate that a cosmetic filter is valid or not, this is now done through a JS library, CSSTree. This means filter list authors will have to be more careful to ensure that a cosmetic filter is really valid, as there is no more guarantee that a cosmetic filter which works for a given browser/version will still work properly on another browser, or different version of the same browser. This change has become necessary because of many reasons, one of them being the flakiness of the previous parser as exposed by many issues lately: - https://github.com/uBlockOrigin/uBlock-issues/issues/2262 - https://github.com/uBlockOrigin/uBlock-issues/issues/2228 The new parser introduces breaking changes, there was no way to do otherwise. Some current procedural cosmetic filters will be shown as invalid with this change. This occurs because the CSSTree library gets confused with some syntax which was previously allowed by the previous parser because it was more permissive. Mainly the issue is with the arguments passed to some procedural cosmetic filters, and these issues can be solved as follow: Use quotes around the argument. You can use either single or double-quotes, whichever is most convenient. If your argument contains a single quote, use double-quotes, and vice versa. Additionally, try to escape a quote inside an argument using backslash. THis may work, but if not, use quotes around the argument. When the parser encounter quotes around an argument, it will discard them before trying to process the argument, same with escaped quotes inside the argument. Examples: Breakage: ...##^script:has-text(toscr') Fix: ...##^script:has-text(toscr\') Breakage: ...##:xpath(//[contains(text(),"VPN")]):upward(2) Fix: ...##:xpath('//[contains(text(),"VPN")]'):upward(2) There are not many filters which break in the default set of filter lists, so this should be workable for default lists. Unfortunately those fixes will break the filter for previous versions of uBO since these to not deal with quoted argument. In such case, it may be necessary to keep the previous filter, which will be discarded as broken on newer version of uBO. THis was a necessary change as the old parser was becoming more and more flaky after being constantly patched for new cases arising, The new parser should be far more robust and stay robist through expanding procedural cosmetic filter syntax. Additionally, in the MV3 version, filters are pre-compiled using a Nodejs script, i.e. outside the browser, so validating cosmetic filters using a live DOM no longer made sense. This new parser will have to be tested throughly before stable release.	2022-09-23 16:03:13 -04:00
Raymond Hill	e31637af78	[mv3] Add ability to enable/disable filter lists	2022-09-13 17:44:24 -04:00
Raymond Hill	83d028ac7d	Report specific filter before generic one Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/2092 Regression from: - `72bb89495b`	2022-04-25 09:49:31 -04:00
Raymond Hill	4818405cf6	Remove need to pass parser at every compile() call The compiler instance is already initialized with a reference to the parser, no need to keep passing the reference at each call to compile().	2021-08-05 13:30:20 -04:00
Raymond Hill	85c68116bd	Group all compiling-related code into FilterCompiler() class In the static network filtering engine (snfe), the compiling-related code was spread across two classes. This commit makes it so that all the compiling-related code is in FilterCompiler class, which clear purpose is to compile raw filters into a form which can be persisted and later fed to the snfe with no parsing overhead. To compile raw static network filter, the new approach is: snfe.createCompiler(parser); Then for each single raw filter to compile: compiler.compile(parser, writer); The caller is responsible to keep a reference to the compiler instance for as long as it is needed. This removes the need for the clunky code used to keep an instance of compiler alive in the snfe. Additionally, snfe.tokenHistograms() has been moved to benchmarks.js, as it has no dependency on the snfe, it's just a utility function.	2021-08-04 15:14:48 -04:00
Raymond Hill	62b6826dd5	Further modularize uBO's codebase Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/1664 Modularization is a necessary step toward possibly publishing a more complete nodejs package to allow using uBO's filtering capabilities outside of the uBO extension. Additionally, as per feedback, remove undue usage of console output as per feedback: - https://github.com/uBlockOrigin/uBlock-issues/issues/1664#issuecomment-888451032	2021-07-28 19:48:38 -04:00
Raymond Hill	22022f636f	Modularize codebase with export/import Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/1664 The changes are enough to fulfill the related issue. A new platform has been added in order to allow for building a NodeJS package. From the root of the project: ./tools/make-nodejs This will create new uBlock0.nodejs directory in the ./dist/build directory, which is a valid NodeJS package. From the root of the package, you can try: node test This will instantiate a static network filtering engine, populated by easylist and easyprivacy, which can be used to match network requests by filling the appropriate filtering context object. The test.js file contains code which is typical example of usage of the package. Limitations: the NodeJS package can't execute the WASM versions of the code since the WASM module requires the use of fetch(), which is not available in NodeJS. This is a first pass at modularizing the codebase, and while at it a number of opportunistic small rewrites have also been made. This commit requires the minimum supported version for Chromium and Firefox be raised to 61 and 60 respectively.	2021-07-27 17:26:04 -04:00
Raymond Hill	f876b68171	Add support for removal of response headers The syntax to remove response header is a special case of HTML filtering, whereas the response headers are targeted, rather than the response body: example.com##^responseheader(header-name) Where `header-name` is the name of the header to remove, and must always be lowercase. The removal of response headers can only be applied to document resources, i.e. main- or sub-frames. Only a limited set of headers can be targeted for removal: location refresh report-to set-cookie This limitation is to ensure that uBO never lowers the security profile of web pages, i.e. we wouldn't want to remove `content-security-policy`. Given that the header removal occurs at onHeaderReceived time, this new ability works for all browsers. The motivation for this new filtering ability is instance of website using a `refresh` header to redirect a visitor to an undesirable destination after a few seconds.	2021-03-13 08:53:34 -05:00
Raymond Hill	5d7b2918ef	Harden processing of changes in compiled list format Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/1365 This commit adds the compiled magic version number to the compiled data itself, and consequently this allows uBO to no longer require that any given compiled list with a mismatched format to be detected and discarded at launch time. Given this change, uBO no longer needs to rely on the deletion of cached data at launch time to ensure it won't use no longer valid compiled lists.	2020-12-08 10:00:47 -05:00
Raymond Hill	0196993828	Use buffer-like approach for filterUnits array filterUnits is now treated as a buffer which is pre-allocated and which will grow in chunks so as to minimize memory allocations. Entries are never released, just null-ed. Additionally, move urlTokenizer into the static network filtering engine, since it's not used anywhere else.	2020-11-09 06:54:51 -05:00
Raymond Hill	23332400f5	Improve annotations for search operations in CodeMirror editor Before this commit, CodeMirror's add-on for search occurrences was limited to find at most 1000 first occurrences, because of performance considerations. This commit removes this low limit by having the search occurrences done in a dedicated worker. The limit is now time-based, and highly unlikely to ever be hit under normal condition. With this change, all search occurrences are gathered, and as a result: - All occurrences are reported in the scrollbar instead of just the 1,000 first - The total count of all occurrences is now reported, instead of capping at "1000+". - The current occurrence rank at the cursor or selection position is now reported -- this was not possible to report this before. The number of occurrences is line-based, it's not useful to report finer-grained occurences in uBO.	2020-08-02 12:18:01 -04:00
Raymond Hill	90c7e79f4f	Consolidate filter list reverse lookup code into a single file Since it's possible to execute specific code paths according to whether the context is that of a worker or not, it's possible to keep the main/thread code in a single file. Keeping the main/worker code paths into a single file is more convenient for both code maintenance and code review.	2020-08-01 10:32:40 -04:00
Raymond Hill	7dc962281f	Set max token length on parser for consistent compilation Reported internally. The issue could cause the logger to be unable to successfully reverse-lookup a filter list for a filter which had tokens longer than 6 characters followed by wildcard. Regression from: - `01b1ed9a98`	2020-06-07 08:50:20 -04:00
Raymond Hill	01b1ed9a98	Add a new static filtering parser A new standalone static filtering parser is introduced, vAPI.StaticFilteringParser. It's purpose is to parse line of text into representation suitable for compiling filters. It can additionally serves for syntax highlighting purpose. As a side effect, this solves: - https://github.com/uBlockOrigin/uBlock-issues/issues/1038 This is a first draft, there are more work left to do to further perfect the implementation and extend its capabilities, especially those useful to assist filter authors. For the time being, this commits break line-continuation syntax highlighting -- which was already flaky prior to this commit anyway.	2020-06-04 07:18:54 -04:00
Raymond Hill	c3bc2c741d	Add support for `cname` type and `denyallow` option This concerns the static network filtering engine. Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/943 * * * New static network filter type: `cname` By default, network requests which are result of resolving a canonical name are subject to filtering. This filtering can be bypassed by creating exception filters using the `cname` option. For example: @@$cname The filter above tells the network filtering engine to except network requests which fulfill all the following conditions: - network request is blocked - network request is that of an unaliased hostname Filter list authors are discouraged from using exception filters of `cname` type, unless there no other practical solution such that maintenance burden become the greater issue. Of course, such exception filters should be as narrow as possible, i.e. apply to specific domain, etc. * * New static network filter option: `denyallow` The purpose of `denyallow` is bring default-deny/allow-exceptionally ability into static network filtering arsenal. Example of usage: $3p,script, \ denyallow=x.com\|y.com \ domain=a.com\|b.com The above filter tells the network filtering engine that when the context is `a.com` or `b.com`, block all 3rd-party scripts except those from `x.com` and `y.com`. Essentially, the new `denyallow` option makes it easier to implement default-deny/allow-exceptionally in static filter lists, whereas before this had to be done with unwieldy regular expressions[1], or through the mix of broadly blocking filters along with exception filters[2]. [1] https://hg.adblockplus.org/ruadlist/rev/f362910bc9a0 [2] Typically filters which pattern are of the form `\|http://`	2020-03-15 12:23:25 -04:00
Raymond Hill	7971b22385	Expand bidi-trie usage in static network filtering engine Related issues: - https://github.com/uBlockOrigin/uBlock-issues/issues/761 - https://github.com/uBlockOrigin/uBlock-issues/issues/528 The previous bidi-trie code could only hold filters which are plain pattern, i.e. no wildcard characters, and which had no origin option (`domain=`), right and/or left anchor, and no `csp=` option. Example of filters that could be moved into a bidi-trie data structure: &ad_box_ /w/d/capu.php?z=$script,third-party \|\|liveonlinetv247.com/images/muvixx-150x50-watch-now-in-hd-play-btn.gif Examples of filters that could NOT be moved to a bidi-trie: -adap.$domain=~l-adap.org /tsc.php?&ses= \|\|ibsrv.net/forumsponsor$domain=[...] @@\|\|imgspice.com/jquery.cookie.js\|$script \|\|view.atdmt.com^/iview/$third-party \|\|postimg.cc/image/$csp=[...] Ideally the filters above should be able to be moved to a bidi-trie since they are basically plain patterns, or at least partially moved to a bidi-trie when there is only a single wildcard (i.e. made of two plain patterns). Also, there were two distinct bidi-tries in which plain-pattern filters can be moved to: one for patterns without hostname anchoring and another one for patterns with hostname-anchoring. This was required because the hostname-anchored patterns have an extra condition which is outside the bidi-trie knowledge. This commit expands the number of filters which can be stored in the bidi-trie, and also remove the need to use two distinct bidi-tries. - Added ability to associate a pattern with an integer in the bidi-trie [1]. - The bidi-trie match code passes this externally provided integer when calling an externally provided method used for testing extra conditions that may be present for a plain pattern found to be matching in the bidi-trie. - Decomposed existing filters into smaller logical units: - FilterPlainLeftAnchored => FilterPatternPlain + FilterAnchorLeft - FilterPlainRightAnchored => FilterPatternPlain + FilterAnchorRight - FilterExactMatch => FilterPatternPlain + FilterAnchorLeft + FilterAnchorRight - FilterPlainHnAnchored => FilterPatternPlain + FilterAnchorHn - FilterWildcard1 => FilterPatternPlain + [ FilterPatternLeft or FilterPatternRight ] - FilterWildcard1HnAnchored => FilterPatternPlain + [ FilterPatternLeft or FilterPatternRight ] + FilterAnchorHn - FilterGenericHnAnchored => FilterPatternGeneric + FilterAnchorHn - FilterGenericHnAndRightAnchored => FilterPatternGeneric + FilterAnchorRight + FilterAnchorHn - FilterOriginMixedSet => FilterOriginMissSet + FilterOriginHitSet - Instances of FilterOrigin[...], FilterDataHolder can also be added to a composite filter to represent `domain=` and `csp=` options. - Added a new filter class, FilterComposite, for filters which are a combination of two or more logical units. A FilterComposite instance is a match when all* filters composing it are a match. Since filters are now encoded into combination of smaller units, it becomes possible to extract the FilterPatternPlain component and store it in the bidi-trie, and use the integer as a handle for the remaining extra conditions, if any. Since a single pattern in the bidi-trie may be a component for different filters, the associated integer points to a sequence of extra conditions, and a match occurs as soon as one of the extra conditions (which may itself be a sequence of conditions) is fulfilled. Decomposing filters which are currently single instance into sequences of smaller logical filters means increasing the storage and CPU overhead when evaluating such filters. The CPU overhead is compensated by the fact that more filters can now moved into the bidi-trie, where the first match is efficiently evaluated. The extra conditions have to be evaluated if and only if there is a match in the bidi-trie. The storage overhead is compensated by the bidi-trie's intrinsic nature of merging similar patterns. Furthermore, the storage overhead is reduced by no longer using JavaScript array to store collection of filters (which is what FilterComposite is): the same technique used in [2] is imported to store sequences of filters. A sequence of filters is a sequence of integer pairs where the first integer is an index to an actual filter instance stored in a global array of filters (`filterUnits`), while the second integer is an index to the next pair in the sequence -- which means all sequences of filters are encoded in one single array of integers (`filterSequences` => Uint32Array). As a result, a sequence of filters can be represented by one single integer -- an index to the first pair -- regardless of the number of filters in the sequence. This representation is further leveraged to replace the use of JavaScript array in FilterBucket [3], which used a JavaScript array to store collection of filters. Doing so means there is no more need for FilterPair [4], which purpose was to be a lightweight representation when there was only two filters in a collection. As a result of the above changes, the map of `token` (integer) => filter instance (object) used to associate tokens to filters or collections of filters is replaced with a more efficient map of `token` (integer) to filter unit index (integer) to lookup a filter object from the global `filterUnits` array. Another consequence of using one single global array to store all filter instances means we can reuse existing instances when a logical filter instance is parameter-less, which is the case for FilterAnchorLeft, FilterAnchorRight, FilterAnchorHn, the index to these single instances is reused where needed. `urlTokenizer` now stores the character codes of the scanned URL into a bidi-trie buffer, for reuse when string matching methods are called. New method: `tokenHistogram()`, used to generate histograms of occurrences of token extracted from URLs in built-in benchmark. The top results of the "miss" histogram are used as "bad tokens", i.e. tokens to avoid if possible when compiling filter lists. All plain pattern strings are now stored in the bidi-trie memory buffer, regardless of whether they will be used in the trie proper or not. Three methods have been added to the bidi-trie to test stored string against the URL which is also stored in then bidi-trie. FilterParser is now instanciated on demand and released when no longer used. *** [1] `135a45a878/src/js/strie.js (L120)` [2] `e94024d350` [3] `135a45a878/src/js/static-net-filtering.js (L1630)` [4] `135a45a878/src/js/static-net-filtering.js (L1566)`	2019-10-21 08:15:58 -04:00
Raymond Hill	23c4c80136	Add support for `elemhide` (through `specifichide`) Related documentation: - https://help.eyeo.com/en/adblockplus/how-to-write-filters#element-hiding Related feedback/discussion: - https://www.reddit.com/r/uBlockOrigin/comments/d6vxzj/ The `elemhide` filter option as per ABP semantic is now supported. Previously uBO would consider `elemhide` to be an alias of `generichide`. The support of `elemhide` is through the convenient conversion of `elemhide` option into existing `generichide` option and new `specifichide` option. The purpose of the new `specifichide` filter option is to disable all specific cosmetic filters, i.e. those who target a specific site. Additionally, for convenience purpose, the filter options `generichide`, `specifichide` and `elemhide` can be aliased using the shorter forms `ghide`, `shide` and `ehide` respectively.	2019-09-21 11:30:38 -04:00
Raymond Hill	e27328f931	Work toward modernizing code base: promisification Swathes of code have been converted to use Promises/async/await. More left to do. In the process, a regression affecting the fix to <https://github.com/uBlockOrigin/uBlock-issues/issues/682> has been fixed.	2019-09-15 07:58:28 -04:00
Raymond Hill	51a4e9ccf4	fix #2763	2018-07-22 10:47:02 -04:00
Raymond Hill	8e9fe020b5	allow to view list content from blocked-document page	2018-07-21 12:22:53 -04:00
gorhill	f3e6057e07	fix #2598 : refactor to address the cause rather than the symptoms	2017-05-25 17:46:59 -04:00
Raymond Hill	3b9fd49c50	Assets management refactored (#2314 ) * refactoring assets management code * finalizing refactoring of assets management * various code review of new assets management code * fix #2281 * fix #1961 * fix #1293 * fix #1275 * fix update scheduler timing logic * forward compatibility (to be removed once 1.11+ is widespread) * more codereview; give admins ability to specify own assets.json * "assetKey" is more accurate than "path" * fix group count update when building dom incrementally * reorganize content (order, added URLs, etc.) * ability to customize updater through advanced settings * better spinner icon	2017-01-18 13:17:47 -05:00
gorhill	b190f0b183	this fixes #536	2015-07-27 17:55:25 -04:00
gorhill	9a5404ef07	this fixes the other half of #58 : from which list(s) a cosmetic filter originates	2015-06-13 11:21:55 -04:00
gorhill	e22fdaa9f2	code review	2015-06-11 15:35:35 -04:00
gorhill	5c39998de4	List name for user filters come from i18n	2015-06-11 15:12:14 -04:00
gorhill	060a43fe81	this addresses half of #58 : find list(s) from which a static network filter originates	2015-06-11 12:12:23 -04:00