Regex-based static network filters are those most likely to
cause performance degradation, and as such the best guard
against undue performance degradation caused by regex-based
filters is the ability to extract valid and good tokens
from regex patterns.
This commit introduces a complete regex parser so that the
static network filtering engine can now safely extract
tokens regardless of the complexity of the regex pattern.
The regex parser is a library imported from:
https://github.com/foo123/RegexAnalyzer
The syntax highlighter adds an underline to regex-based
filters as a visual aid to filter authors so as to avoid
mistakenly creating regex-based filters. This commit
further colors the underline as a warning when a regex-based
filter is found to be untokenizable.
Filter list authors are invited to spot these untokenizable
regex-based filters in their lists to verify that no
mistake were made for those filters, causing them to be
untokenizabke. For example, what appears to be a mistake:
/^https?:\/\/.*\/sw.js?.[a-zA-Z0-9%]{50,}/
Though the mistake is minor, the regex-based filter above
is untokenizable as a result, and become tokenizable when
the `.` is properly escaped:
/^https?:\/\/.*\/sw\.js?.[a-zA-Z0-9%]{50,}/
Filter list authors can use this search expression in the
asset viewer to find instances of regex-based filters:
/^(@@)?\/[^\n]+\/(\$|$)/
This should improve usability of uBO's hard-mode
and "relax blocking mode" operations. This is the
new default behavior.
The previous behavior of forcing a reload of the
page can be re-enabled by simply setting the `3p`
bit of the advanced setting `blockingProfiles`
to 1.
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1204
Not much can be done beside reporting to tabless network
requests to all tabs for which the context is a match.
A short term local cache is used to avoid having to iterate
through all existing tabs for each tabless network request
just to find and report to the matching ones -- users
reporting having a lot of opened tabs at once is not so
uncommon.