Add removeparam modifier tips

2024-07-03 02:37:21 +02:00 · 2023-03-02 18:22:48 +03:30 · 2023-03-02 18:22:48 +03:30 · 1775146070
commit 1775146070
parent a2ca25a2b1
1 changed files with 57 additions and 8 deletions
--- a/Filter-performance.md
+++ b/Filter-performance.md
@ -1,23 +1,27 @@
 See also: [Syntax quirks](./Syntax-quirks)

-1- Using `:upward()` instead of `:has()` can be more performant. `:upward()` is fast, it just lookup ancestors -- there is only one parent per element, `:has()` has to lookup descendants, there can be many children per elements.
+- [`:upward()` vs `:has()`](#upward-vs-has)
+- [Narrowing options for network filters](#narrowing-options-for-network-filters)
+- [`removeparam` modifier](#removeparam-modifier)
+
+## `:upward()` vs `:has()`
+Using `:upward()` instead of `:has()` can be more performant. `:upward()` is fast, it just lookup ancestors -- there is only one parent per element, `:has()` has to lookup descendants, there can be many children per elements.

 ```html
-<div id="text-19" class="widget">
-  <div class="section-heading">
-    <span class="h-text">Promoted Links</span>
-  </div>
+<div class="widget">
+  <span class="h-text">Promoted Links</span>
  <div class="textwidget">
    <ul>
      <li>
        <a href="http://example.com" target="_blank" rel="nofollow noopener">Ad Link</a>
      </li>
    </ul>
-  </div>
+</div>
 ```
 In the case above, it's better to use `##.widget .h-text:has-text(Promoted Links):upward(.widget)` instead of `##.widget:has(.h-text:has-text(Promoted Links))`.

-2- For all static network filters, ensure that they have as many primary narrowing options as possible, i.e. that they are well targeted. Pure hostname-based filters (such as `||example.com^`) are most optimized memory- and cpu-wise. For non-pure hostname, i.e. pattern-based filters (such as `||example.com/path/to/file`) , ensure they are tokenizable, narrow their target with filter options and/or `domain=` if they are meant to apply only on specific site(s).
+## Narrowing options for network filters
+For all static network filters, ensure that they have as many primary narrowing options as possible, i.e. that they are well targeted. Pure hostname-based filters (such as `||example.com^`) are most optimized memory- and cpu-wise. For non-pure hostname, i.e. pattern-based filters (such as `||example.com/path/to/file`) , ensure they are tokenizable, narrow their target with filter options and/or `domain=` if they are meant to apply only on specific site(s).

 Regarding static network filters, pure hostname ones are most optimized, as they are stored in a compressed [trie](https://en.wikipedia.org/wiki/Trie), and WebAssembly code is used to look them up (in Firefox). With the trie, the number of comparisons to perform to find a hit will always be more proportional to the number of characters in the hostname to match, less to the number of filters in the list, so you could have tens of thousands of such filters with no measurable performance hit.

@ -69,4 +73,49 @@ Filter above will be treated as a regex-based filter since it starts and ends wi

 The trailing wildcard will be dropped, but the filter is now the pattern `/path/to/`.

-Source: https://reddit.com/r/uBlockOrigin/comments/mdh9jz/order_of_complexity_for_network_filters/gsfn0xl/
+Source: https://reddit.com/r/uBlockOrigin/comments/mdh9jz/order_of_complexity_for_network_filters/gsfn0xl/
+
+## `removeparam` modifier
+Avoiding `removeparam` from being visited at all is best (by using [narrowing options and being tokenizable](#narrowing-options)), I do hope filter authors will be as carefully as possible when crafting `removeparam` filters as I am careful at minimizing all overhead in the code -- otherwise all the coding efforts are going to waste. 
+
+When no pattern is present, uBO will try to derive a pattern from the `removeparam=` value. 
+For example, `*$doc,xhr,removeparam=utm_` will be translated to `utm_$doc,xhr,removeparam=utm_` internally, and consequently uBO will be able to use the `utm` token for that filter. The logger will always output the internally translated version.
+
+It's best to split filters with multiple `removeparam` queries in distinct filters to ensure tokenization can take place:
+
+_Bad_
+```adb
+*clid=$removeparam=/gclid|yclid|fbclid/=
+```
+
+_Good_
+```adb
+gclid=$removeparam=gclid=
+yclid=$removeparam=yclid=
+fbclid=$removeparam=fbclid=
+```
+
+Typically the query parameter of interest will be part of the filter pattern:
+
+_Bad_
+```adb
+||reddit.com^$removeparam=utm_
+
+||youtube.com^$removeparam=/fbclid|gclid/
+```
+
+_Good_
+```adb
+||reddit.com^*utm_$removeparam=utm_
+
+||youtube.com^*fbclid$removeparam=fbclid
+||youtube.com^*gclid$removeparam=gclid
+```
+
+This way uBO will scan the query parameters only when the URL is found to match the targeted query parameters. Mind performance when crafting filters. Your proposed filters forces uBO to scan every URL matching `reddit.com` and `youtube.com`.
+
+
+Sources:
+- https://github.com/uBlockOrigin/uBlock-issues/issues/760#issuecomment-719952467
+- https://github.com/uBlockOrigin/uBlock-issues/issues/760#issuecomment-726824559
+- https://github.com/uBlockOrigin/uBlock-issues/issues/760#issuecomment-734440655