Though Firefox shares a lot of WebExtensions code with Chromium,
these platforms have their own specific code paths, for various
reasons.
The reorganization here makes it clear that Chromium platform is
just one flavor of WebExtensions, and as such all Chromium-specific
code paths should no longer be automatically pulled by other
platforms where these code paths are not needed.
Given that the filepath of many files changed, here is the
parent commit to quickly browse back to the previous directory
layout:
ec7db30b2f
The syntax to remove response header is a special case
of HTML filtering, whereas the response headers are
targeted, rather than the response body:
example.com##^responseheader(header-name)
Where `header-name` is the name of the header to
remove, and must always be lowercase.
The removal of response headers can only be applied to
document resources, i.e. main- or sub-frames.
Only a limited set of headers can be targeted for
removal:
location
refresh
report-to
set-cookie
This limitation is to ensure that uBO never lowers the
security profile of web pages, i.e. we wouldn't want to
remove `content-security-policy`.
Given that the header removal occurs at onHeaderReceived
time, this new ability works for all browsers.
The motivation for this new filtering ability is instance
of website using a `refresh` header to redirect a visitor
to an undesirable destination after a few seconds.
Regex-based static network filters are those most likely to
cause performance degradation, and as such the best guard
against undue performance degradation caused by regex-based
filters is the ability to extract valid and good tokens
from regex patterns.
This commit introduces a complete regex parser so that the
static network filtering engine can now safely extract
tokens regardless of the complexity of the regex pattern.
The regex parser is a library imported from:
https://github.com/foo123/RegexAnalyzer
The syntax highlighter adds an underline to regex-based
filters as a visual aid to filter authors so as to avoid
mistakenly creating regex-based filters. This commit
further colors the underline as a warning when a regex-based
filter is found to be untokenizable.
Filter list authors are invited to spot these untokenizable
regex-based filters in their lists to verify that no
mistake were made for those filters, causing them to be
untokenizabke. For example, what appears to be a mistake:
/^https?:\/\/.*\/sw.js?.[a-zA-Z0-9%]{50,}/
Though the mistake is minor, the regex-based filter above
is untokenizable as a result, and become tokenizable when
the `.` is properly escaped:
/^https?:\/\/.*\/sw\.js?.[a-zA-Z0-9%]{50,}/
Filter list authors can use this search expression in the
asset viewer to find instances of regex-based filters:
/^(@@)?\/[^\n]+\/(\$|$)/
A new standalone static filtering parser is introduced,
vAPI.StaticFilteringParser. It's purpose is to parse
line of text into representation suitable for
compiling filters. It can additionally serves for
syntax highlighting purpose.
As a side effect, this solves:
- https://github.com/uBlockOrigin/uBlock-issues/issues/1038
This is a first draft, there are more work left to do
to further perfect the implementation and extend its
capabilities, especially those useful to assist filter
authors.
For the time being, this commits break line-continuation
syntax highlighting -- which was already flaky prior to
this commit anyway.
Implement a plain string trie container class: STrieContainer.
Make use of STrieContainer where beneficial
Some filter buckets can grow quite large, and in such case
coalescing "trieable" filter classes into a single trie reduces
lookup performance and memory usage.
For instance, at time of commit, the filter bucket for the
`ad` keyword contains 919 entries[1].
Coalescing trieable filters of the same class into a single plain
string trie reduced the size of the bucket into 50 entries + two
tries which are scanned only once each whenever the bucket is
visited.
[1] Enter the following code at uBO's dev console:
µBlock.staticNetFilteringEngine.categories.get(0).get(µBlock.urlTokenizer.tokenHashFromString('ad'))
Refactor static network filtering engine code to make use of
ES6's syntactic sugar `class`.
Change first auto-update run from 7 to 5 minutes.
The motivation is to address the higher peak memory usage at launch
time with 3rd-gen HNTrie when a selfie was present.
The selfie generation prior to this change was to collect all
filtering data into a single data structure, and then to serialize
that whole structure at once into storage (using JSON.stringify).
However, HNTrie serialization requires that a large UintArray32 be
converted into a plain JS array, which itslef would be indirectly
converted into a JSON string. This was the main reason why peak
memory usage would be higher at launch from selfie, since the JSON
string would need to be wholly unserialized into JS objects, which
themselves would need to be converted into more specialized data
structures (like that Uint32Array one).
The solution to lower peak memory usage at launch is to refactor
selfie generation to allow a more piecemeal approach: each filtering
component is given the ability to serialize itself rather than to be
forced to be embedded in the master selfie. With this approach, the
HNTrie buffer can now serialize to its own storage by converting the
buffer data directly into a string which can be directly sent to
storage. This avoiding expensive intermediate steps such as
converting into a JS array and then to a JSON string.
As part of the refactoring, there was also opportunistic code
upgrade to ES6 and Promise (eventually all of uBO's code will be
proper ES6).
Additionally, the polyfill to bring getBytesInUse() to Firefox has
been revisited to replace the rather expensive previous
implementation with an implementation with virtually no overhead.
A new filtering class has been created: "static extended filtering".
This new class is an umbrella class for more specialized filtering
engines:
- Cosmetic filtering
- Scriptlet filtering
- HTML filtering
HTML filtering is available only on platforms which support modifying
the response body on the fly, so only Firefox 57+ at the moment.
With the ability to modify the response body, HTML filtering has
been introduced: removing elements from the DOM before the source
data has been parsed by the browser.
A consequence of HTML filtering ability is to bring back script tag
filtering feature.
* refactoring assets management code
* finalizing refactoring of assets management
* various code review of new assets management code
* fix#2281
* fix#1961
* fix#1293
* fix#1275
* fix update scheduler timing logic
* forward compatibility (to be removed once 1.11+ is widespread)
* more codereview; give admins ability to specify own assets.json
* "assetKey" is more accurate than "path"
* fix group count update when building dom incrementally
* reorganize content (order, added URLs, etc.)
* ability to customize updater through advanced settings
* better spinner icon
... for the sake of portability.
When including vapi-common.js in an HTML file, then the body element there
will have a "dir" attribute filled with the current locale's direction
(ltr or rtl).
The following languages are considered right-to-left: ar, he, fa, ps, ur.
Everything else is left-to-right.
After the "dir" attribute is set, we can decide in CSS which elements
should have different styling for rtl languages (e.g., body[dir=rtl] #id).
Chrome has getManifest(), Safari doesn't have anything, Firefox has an
asynchronous API...
So, instead of using extension APIs, store the common informations
(extension name, version, homepage url) in a file (vapi-appinfo.js), which
can be included when it's needed (its data will be available at vAPI.app.____).
The file's content is updated each time the extension is being built, so
it shouldn't be modified manually.