Mirrors/uBlock - uBlock - Git.je (Gitea)

Mirrors/uBlock

mirror of https://github.com/gorhill/uBlock.git synced 2024-11-17 16:02:33 +01:00

Author	SHA1	Message	Date
Raymond Hill	1b068c15fb	Fix token array being too small for very long URL Related feedback: - https://www.reddit.com/r/uBlockOrigin/comments/dzw57l/ Each token requires two slots in the token indices array. This commit fixes uBO breaking when dealing with very long URLs with lot of distinct tokens in them.	2019-11-22 23:51:39 -05:00
Raymond Hill	11c56ab540	Minor fine-tuning of URL tokenizer	2019-10-31 11:15:00 -04:00
Raymond Hill	a69b301d81	Fine-tune new bidi-trie code Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/761	2019-10-29 10:26:34 -04:00
Raymond Hill	b0cbc47d9a	Add WASM versions for some bidi-trie methods Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/761 Changes related to above issue made it possible to create WASM versions of methods used in the bidi-trie. In this commit, WASM versions for startsWith(), indexOf() and lastIndexOf() have been implemented.	2019-10-26 13:13:53 -04:00
Raymond Hill	7971b22385	Expand bidi-trie usage in static network filtering engine Related issues: - https://github.com/uBlockOrigin/uBlock-issues/issues/761 - https://github.com/uBlockOrigin/uBlock-issues/issues/528 The previous bidi-trie code could only hold filters which are plain pattern, i.e. no wildcard characters, and which had no origin option (`domain=`), right and/or left anchor, and no `csp=` option. Example of filters that could be moved into a bidi-trie data structure: &ad_box_ /w/d/capu.php?z=$script,third-party \|\|liveonlinetv247.com/images/muvixx-150x50-watch-now-in-hd-play-btn.gif Examples of filters that could NOT be moved to a bidi-trie: -adap.$domain=~l-adap.org /tsc.php?&ses= \|\|ibsrv.net/forumsponsor$domain=[...] @@\|\|imgspice.com/jquery.cookie.js\|$script \|\|view.atdmt.com^/iview/$third-party \|\|postimg.cc/image/$csp=[...] Ideally the filters above should be able to be moved to a bidi-trie since they are basically plain patterns, or at least partially moved to a bidi-trie when there is only a single wildcard (i.e. made of two plain patterns). Also, there were two distinct bidi-tries in which plain-pattern filters can be moved to: one for patterns without hostname anchoring and another one for patterns with hostname-anchoring. This was required because the hostname-anchored patterns have an extra condition which is outside the bidi-trie knowledge. This commit expands the number of filters which can be stored in the bidi-trie, and also remove the need to use two distinct bidi-tries. - Added ability to associate a pattern with an integer in the bidi-trie [1]. - The bidi-trie match code passes this externally provided integer when calling an externally provided method used for testing extra conditions that may be present for a plain pattern found to be matching in the bidi-trie. - Decomposed existing filters into smaller logical units: - FilterPlainLeftAnchored => FilterPatternPlain + FilterAnchorLeft - FilterPlainRightAnchored => FilterPatternPlain + FilterAnchorRight - FilterExactMatch => FilterPatternPlain + FilterAnchorLeft + FilterAnchorRight - FilterPlainHnAnchored => FilterPatternPlain + FilterAnchorHn - FilterWildcard1 => FilterPatternPlain + [ FilterPatternLeft or FilterPatternRight ] - FilterWildcard1HnAnchored => FilterPatternPlain + [ FilterPatternLeft or FilterPatternRight ] + FilterAnchorHn - FilterGenericHnAnchored => FilterPatternGeneric + FilterAnchorHn - FilterGenericHnAndRightAnchored => FilterPatternGeneric + FilterAnchorRight + FilterAnchorHn - FilterOriginMixedSet => FilterOriginMissSet + FilterOriginHitSet - Instances of FilterOrigin[...], FilterDataHolder can also be added to a composite filter to represent `domain=` and `csp=` options. - Added a new filter class, FilterComposite, for filters which are a combination of two or more logical units. A FilterComposite instance is a match when all* filters composing it are a match. Since filters are now encoded into combination of smaller units, it becomes possible to extract the FilterPatternPlain component and store it in the bidi-trie, and use the integer as a handle for the remaining extra conditions, if any. Since a single pattern in the bidi-trie may be a component for different filters, the associated integer points to a sequence of extra conditions, and a match occurs as soon as one of the extra conditions (which may itself be a sequence of conditions) is fulfilled. Decomposing filters which are currently single instance into sequences of smaller logical filters means increasing the storage and CPU overhead when evaluating such filters. The CPU overhead is compensated by the fact that more filters can now moved into the bidi-trie, where the first match is efficiently evaluated. The extra conditions have to be evaluated if and only if there is a match in the bidi-trie. The storage overhead is compensated by the bidi-trie's intrinsic nature of merging similar patterns. Furthermore, the storage overhead is reduced by no longer using JavaScript array to store collection of filters (which is what FilterComposite is): the same technique used in [2] is imported to store sequences of filters. A sequence of filters is a sequence of integer pairs where the first integer is an index to an actual filter instance stored in a global array of filters (`filterUnits`), while the second integer is an index to the next pair in the sequence -- which means all sequences of filters are encoded in one single array of integers (`filterSequences` => Uint32Array). As a result, a sequence of filters can be represented by one single integer -- an index to the first pair -- regardless of the number of filters in the sequence. This representation is further leveraged to replace the use of JavaScript array in FilterBucket [3], which used a JavaScript array to store collection of filters. Doing so means there is no more need for FilterPair [4], which purpose was to be a lightweight representation when there was only two filters in a collection. As a result of the above changes, the map of `token` (integer) => filter instance (object) used to associate tokens to filters or collections of filters is replaced with a more efficient map of `token` (integer) to filter unit index (integer) to lookup a filter object from the global `filterUnits` array. Another consequence of using one single global array to store all filter instances means we can reuse existing instances when a logical filter instance is parameter-less, which is the case for FilterAnchorLeft, FilterAnchorRight, FilterAnchorHn, the index to these single instances is reused where needed. `urlTokenizer` now stores the character codes of the scanned URL into a bidi-trie buffer, for reuse when string matching methods are called. New method: `tokenHistogram()`, used to generate histograms of occurrences of token extracted from URLs in built-in benchmark. The top results of the "miss" histogram are used as "bad tokens", i.e. tokens to avoid if possible when compiling filter lists. All plain pattern strings are now stored in the bidi-trie memory buffer, regardless of whether they will be used in the trie proper or not. Three methods have been added to the bidi-trie to test stored string against the URL which is also stored in then bidi-trie. FilterParser is now instanciated on demand and released when no longer used. *** [1] `135a45a878/src/js/strie.js (L120)` [2] `e94024d350` [3] `135a45a878/src/js/static-net-filtering.js (L1630)` [4] `135a45a878/src/js/static-net-filtering.js (L1566)`	2019-10-21 08:15:58 -04:00
Raymond Hill	f2340bef3c	Fix bad returned value in case of empty URL Though I do no expect the empty URL case to ever occur, having the tokenizer return the wrong value if it ever occur could cause uBO to malfunction.	2019-10-17 17:23:05 -04:00
Raymond Hill	f117c280d0	Fix minor bugs spotted during code review	2019-10-14 09:03:51 -04:00
Raymond Hill	23c4c80136	Add support for `elemhide` (through `specifichide`) Related documentation: - https://help.eyeo.com/en/adblockplus/how-to-write-filters#element-hiding Related feedback/discussion: - https://www.reddit.com/r/uBlockOrigin/comments/d6vxzj/ The `elemhide` filter option as per ABP semantic is now supported. Previously uBO would consider `elemhide` to be an alias of `generichide`. The support of `elemhide` is through the convenient conversion of `elemhide` option into existing `generichide` option and new `specifichide` option. The purpose of the new `specifichide` filter option is to disable all specific cosmetic filters, i.e. those who target a specific site. Additionally, for convenience purpose, the filter options `generichide`, `specifichide` and `elemhide` can be aliased using the shorter forms `ghide`, `shide` and `ehide` respectively.	2019-09-21 11:30:38 -04:00
Raymond Hill	b5c1efc7f5	Informal code review toward ES6	2019-09-11 08:08:30 -04:00
Raymond Hill	bcf5ac1fee	Add advanced setting to control logger popup type Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/663 The advanced setting `loggerPopupType` has been added, to control the type of window to be used when the logger is launched as a separate window. The default value is `popup`, it can be changed to any of the values documented at: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/windows/CreateType	2019-09-06 11:41:07 -04:00
Raymond Hill	708e5004e8	Fix badly computed output size in µBlock.base64.encode() This bug could cause losing 1 to 3 bytes of information dropped from various internal buffers at encoding time. Possibly related feedback: - https://www.reddit.com/r/uBlockOrigin/comments/cs1y26/	2019-08-22 09:17:19 -04:00
Raymond Hill	7ff750eaf6	Reflect blocking mode in badge color of toolbar icon Related feedback: - https://www.reddit.com/r/uBlockOrigin/comments/cmh910/ Additionally, the `3p` rule has been made distinct from `3p-script`/`3p-frame` for the purpose of "Relax blocking mode" command. The badge color will hint at the current blocking mode. There are four colors for the four following blocking modes: - JavaScript wholly disabled - All 3rd parties blocked - 3rd-party scripts and frames blocked - None of the above The default badge color will be used when JavaScript is not wholly disabled and when there are no rules for `3p`, `3p-script` or `3p-frame`. A new advanced setting has been added to let the user choose the badge colors for the various blocking modes, `blockingProfileColors`. The value must be a sequence of 4 valid CSS color values that match 6 hexadecimal digits prefixed with`#` -- anything else will be ignored.	2019-08-10 10:57:24 -04:00
Raymond Hill	0ca44b847c	Avoid duplicated strings in filterOrigin w/ new approach The new approach is simpler and should benefit selfie serialization/unserialization. This renders stringDeduplicater obsolete -- it has been removed.	2019-05-17 10:13:58 -04:00
Raymond Hill	c4f9ae706a	Fix alternate code path introduced in `295f08da97` (oops)	2019-04-28 14:18:09 -04:00
Raymond Hill	295f08da97	Implement code path for when TextDecoder() is not available The primary purpose is to unbreak https://github.com/cliqz-oss/adblocker/tree/master/bench/comparison	2019-04-28 14:07:21 -04:00
Raymond Hill	ac58b8e688	Make token hashes fit within a 32-bit integer The staticNetFilteringEngine uses token hashes to store/lookup filters into Map objects. Before this commit, the tokens were encoded into token hashes as JS numbers (not exceeding MAX_SAFE_INTEGER) using at most the 8 first characters of the token. With this commit, token hashes are now restricted to fit into 32-bit integers, and are derived from at most the 7 first characters. This improves filter look-up performance as per built-in benchmark().	2019-04-28 10:15:15 -04:00
Raymond Hill	96dce22218	Increase resolution of known-token lookup table Related commit: - `69a43e07c4` Using 32 bits of token hash rather than just the 16 lower bits does help discard more unknown tokens. Using the default filter lists, the known-token lookup table is populated by 12,276 entries, out of 65,536, thus making the case that theoretically there is a lot of possible tokens which can be discarded. In practice, running the built-in staticNetFilteringEngine.benchmark() with default filter lists, I find that 1,518,929 tokens were skipped out of 4,441,891 extracted tokens, or 34%.	2019-04-27 08:18:01 -04:00
Raymond Hill	69a43e07c4	Ignore unknown tokens in urlTokenizer.getTokens() Given that all tokens extracted from one single URL are potentially iterated multiple times in a single URL-matching cycle, it pays to ignore extracted tokens which are known to not be used anywhere in the static filtering engine. The gain in processing a single network request in the static filtering engine can become especially high when dealing with long and random-looking URLs, which URLs have a high likelihood of containing a majority of tokens which are known to not be in use.	2019-04-26 17:14:00 -04:00
Raymond Hill	a52b07ff6e	Make `userResourcesLocation` able to support multiple URLs The URLs must be space-separated. Reminders: - The additional resources will be updated at the same time the built-in resource file is updated - Purging the cache of 'uBlock filters' will also purge the cache of the built-in resource file -- and hence force a reload of the user's custom resources if any Related issues: - https://github.com/gorhill/uBlock/issues/3307 - https://github.com/uBlockOrigin/uAssets/issues/5184#issuecomment-475875189 Addtionally: - Opportunitically promisified assets.fetchText() - Fixed https://github.com/gorhill/uBlock/issues/3586	2019-04-20 17:16:49 -04:00
Raymond Hill	fa83744b58	Use a sequence of base 64 numbers to encode array buffers The purpose of using a custom base128 encoder is to convert array buffers into strings, to allow a direct string-to-array buffer conversion at load time: string => array buffer Whereas a JSON array would require an extra step: JSON array as string => JS array => array buffer Turns out that the current use of a custom base128 encoding results in a significantly larger selfie storage usage when converting array buffers into strings. Speculation: possibly the browser convert the strings to save into JSON strings internally. Since the custom base128 encoder is likely to cause the resulting string to contain a lot of unprintable ASCII characters, these will need to be escaped when converted to JSON -- escaped characters occupy more space than non-escaped ones. Using a sequence of base 64 numbers means only printable will be present in the output string, hence no escaping necessary. I have observed significant reduction in storage usage for selfie purpose.	2019-04-20 09:06:54 -04:00
Raymond Hill	3f3a1543ea	Add HNTrie-based filter classes to store origin-only filters Related issue: - https://github.com/uBlockOrigin/uBlock-issues/issues/528#issuecomment-484408622 Following STrie-related work in above issue, I noticed that a large number of filters in EasyList were filters which only had to match against the document origin. For instance, among just the top 10 most populous buckets, there were four such buckets with over hundreds of entries each: - bits: 72, token: "http", 146 entries - bits: 72, token: "https", 139 entries - bits: 88, token: "http", 122 entries - bits: 88, token: "https", 118 entries These filters in these buckets have to be matched against all the network requests. In order to leverage HNTrie for these filters[1], they are now handled in a special way so as to ensure they all end up in a single HNTrie (per bucket), which means that instead of scanning hundreds of entries per URL, there is now a single scan per bucket per URL for these apply-everywhere filters. Now, any filter which fulfill ALL the following condition will be processed in a special manner internally: - Is of the form `\|https://` or `\|http://` or ``; and - Does have a `domain=` option; and - Does not have a negated domain in its `domain=` option; and - Does not have `csp=` option; and - Does not have a `redirect=` option If a filter does not fulfill ALL the conditions above, no change in behavior. A filter which matches ALL of the above will be processed in a special manner: - The `domain=` option will be decomposed so as to create as many distinct filter as there is distinct value in the `domain=` option - This also apply to the `badfilter` version of the filter, which means it now become possible to `badfilter` only one of the distinct filter without having to `badfilter` all of them. - The logger will always report these special filters with only a single hostname in the `domain=` option. ** [1] HNTrie is currently WASM-ed on Firefox.	2019-04-19 16:33:46 -04:00
Raymond Hill	a594b3f3d1	Add µBlock.staticNetFilteringEngine.bucketHistogram() as investigative dev tool Additionally, lower the treshold of trieability to 4 for FilterPlainPrefix1.	2019-04-15 11:45:33 -04:00
Raymond Hill	008370e4b9	Fix https://github.com/uBlockOrigin/uBlock-issues/issues/461 uBO will fallback using a JSON string when trying to encode an array buffer in Chromium version 59 and earlier.	2019-03-16 09:00:31 -04:00
Raymond Hill	928ab91ab8	Add support to benchmark the dynamic filtering pane From uBO's dev console, type: - `µBlock.sessionFirewall.benchmark();` Keep in mind that it's the temporary ruleset being benchmarked.	2019-02-19 10:46:33 -05:00
Raymond Hill	ed7e34fb07	Refactor selfie generation into a more flexible persistence mechanism The motivation is to address the higher peak memory usage at launch time with 3rd-gen HNTrie when a selfie was present. The selfie generation prior to this change was to collect all filtering data into a single data structure, and then to serialize that whole structure at once into storage (using JSON.stringify). However, HNTrie serialization requires that a large UintArray32 be converted into a plain JS array, which itslef would be indirectly converted into a JSON string. This was the main reason why peak memory usage would be higher at launch from selfie, since the JSON string would need to be wholly unserialized into JS objects, which themselves would need to be converted into more specialized data structures (like that Uint32Array one). The solution to lower peak memory usage at launch is to refactor selfie generation to allow a more piecemeal approach: each filtering component is given the ability to serialize itself rather than to be forced to be embedded in the master selfie. With this approach, the HNTrie buffer can now serialize to its own storage by converting the buffer data directly into a string which can be directly sent to storage. This avoiding expensive intermediate steps such as converting into a JS array and then to a JSON string. As part of the refactoring, there was also opportunistic code upgrade to ES6 and Promise (eventually all of uBO's code will be proper ES6). Additionally, the polyfill to bring getBytesInUse() to Firefox has been revisited to replace the rather expensive previous implementation with an implementation with virtually no overhead.	2019-02-14 13:33:55 -05:00
Raymond Hill	261ef8c510	Add support for procedural :not to HTML filtering Related issue: <https://github.com/gorhill/uBlock/issues/3683> Additionally, improve compile-time error reporting in the logger	2018-12-15 10:46:17 -05:00
Raymond Hill	5b7a3c9983	fix https://github.com/uBlockOrigin/uBlock-issues/issues/256 ; add regex support in logger filter field	2018-12-14 11:01:21 -05:00
Raymond Hill	cabb0d36b6	fix https://github.com/gorhill/uBlock/issues/3371	2018-10-23 14:01:08 -03:00
Raymond Hill	777144b036	fix https://github.com/uBlockOrigin/uBlock-issues/issues/200	2018-09-03 16:15:51 -04:00
Raymond Hill	8f1b4b52fd	fix #3606	2018-08-09 11:31:25 -04:00
Raymond Hill	7766786b2c	code review: reuse last decomposed hostname (hit rate = 75%)	2018-06-03 13:27:42 -04:00
Raymond Hill	2c843f6e69	code review: chromium 45 supports arrow functions = start using them	2018-06-01 11:49:48 -04:00
Raymond Hill	798f8dab9d	reduce baseline memory at selfie-load time	2018-06-01 07:54:31 -04:00
Raymond Hill	a9f68fe02f	Fix #3069 , and consequently #3374 , #3378 . A new filtering class has been created: "static extended filtering". This new class is an umbrella class for more specialized filtering engines: - Cosmetic filtering - Scriptlet filtering - HTML filtering HTML filtering is available only on platforms which support modifying the response body on the fly, so only Firefox 57+ at the moment. With the ability to modify the response body, HTML filtering has been introduced: removing elements from the DOM before the source data has been parsed by the browser. A consequence of HTML filtering ability is to bring back script tag filtering feature.	2017-12-28 13:49:02 -05:00
Raymond Hill	4ab63e70fe	code review: avoid Array.splice/unshift The array size stays the same, items are just moved around.	2017-12-22 09:37:26 -05:00
Raymond Hill	607968de7f	code review: cache most-recently-used pre-filled scriptlets	2017-12-21 17:05:25 -05:00
gorhill	386e8bee9c	fix #3210	2017-11-09 12:53:05 -05:00
gorhill	6112a68faf	fix #2984	2017-10-21 13:43:46 -04:00
gorhill	9a4681d4e1	fix #2656	2017-05-27 14:31:46 -04:00
gorhill	aae97b8535	fix badfilter option; performance work - badfilter option was no longer working following last refactoring changes. - performance work: - reduce duplication of large strings. - new lighter FilterBucket to use when only 2 filters: FilterPair.	2017-05-26 20:00:21 -04:00
gorhill	8d2319e011	fix "purge all" button not disabled when there is nothing left to purge	2017-05-26 08:31:19 -04:00
gorhill	f3e6057e07	fix #2598 : refactor to address the cause rather than the symptoms	2017-05-25 17:46:59 -04:00
gorhill	fd03683045	minor code review: it makes no difference, I just prefer no indent there	2017-05-20 16:32:42 -04:00
gorhill	acf7562b0f	minor code review	2017-05-19 20:22:26 -04:00
gorhill	fcf43d972e	tentatively fix issue reported in #2612 re. FFox 24.8.1	2017-05-19 10:12:55 -04:00
gorhill	a222e23e49	fix #2630	2017-05-19 08:45:19 -04:00
gorhill	0232382695	refactor static network filtering, add support for csp injection	2017-05-12 10:35:11 -04:00
gorhill	a4e20ae3ad	new filter option: "badfilter" (see https://github.com/uBlockOrigin/uAssets/issues/192 )	2017-03-11 13:55:47 -05:00
gorhill	0b4f31bd8a	fix #2344	2017-01-27 13:44:52 -05:00
gorhill	da163bbe4b	fix #1641	2016-10-13 13:25:57 -04:00

1 2