mirror of
https://github.com/gorhill/uBlock.git
synced 2024-10-06 09:37:12 +02:00
c6fb70b1f0
Whereas before the string segment was encoded as: LL OOOOOOOOOOOO where L are the upper 8 bits and used to encode the length of the segment, and O are the lower 24 bits and used to encode the offset of the string data in the character buffer, the new code encode as follow: OOOOOOOOOOOO LL And furthermore the most significant bit of the length LL is now used to mark whether the current string segment is a label boundary. This means a cell can't reference a segment longer then 127 characters. To work around this limitation for when a segment is longer than 127 characters (a rare occurrence), the algorithm will simply split the segment into multiple adjacent cells. As a result, there is no longer a need to encode "boundariness" into special cells, which simplifies both the storing and matching algorithms. Additionally, added minimal documentation for the NPM package on how to import and use HNTrieContainer as a standalone API.
159 lines
4.9 KiB
Markdown
159 lines
4.9 KiB
Markdown
# uBlock Origin Core
|
|
|
|
The core filtering engines used in the uBlock Origin ("uBO") extension, and has
|
|
no external dependencies.
|
|
|
|
## Installation
|
|
|
|
Install: `npm install @gorhill/ubo-core`
|
|
|
|
This is a very early version and the API is subject to change at any time.
|
|
|
|
This package uses [native JavaScript modules](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules).
|
|
|
|
|
|
## Description
|
|
|
|
The package contains uBO's static network filtering engine ("SNFE"), which
|
|
purpose is to parse and enforce filter lists. The matching algorithm is highly
|
|
efficient, and _especially_ optimized to match against large sets of pure
|
|
hostnames.
|
|
|
|
The SNFE can be fed filter lists from a variety of sources, such as [EasyList/EasyPrivacy](https://easylist.to/),
|
|
[uBlock filters](https://github.com/uBlockOrigin/uAssets/tree/master/filters),
|
|
and also lists of domain names or hosts file format (i.e. block lists from [The Block List Project](https://github.com/blocklistproject/Lists#the-block-list-project),
|
|
[Steven Black's HOSTS](https://github.com/StevenBlack/hosts#readme), etc).
|
|
|
|
|
|
## Usage
|
|
|
|
At the moment, there can be only one instance of the static network filtering
|
|
engine ("SNFE"), which proxy API must be imported as follow:
|
|
|
|
```js
|
|
import { StaticNetFilteringEngine } from '@gorhill/ubo-core';
|
|
```
|
|
|
|
If you must import as a NodeJS module:
|
|
|
|
```js
|
|
const { StaticNetFilteringEngine } await import from '@gorhill/ubo-core';
|
|
```
|
|
|
|
|
|
Create an instance of SNFE:
|
|
|
|
```js
|
|
const snfe = StaticNetFilteringEngine.create();
|
|
```
|
|
|
|
Feed the SNFE with filter lists -- `useLists()` accepts an array of
|
|
objects (or promises to object) which expose the raw text of a list
|
|
through the `raw` property, and optionally the name of the list through the
|
|
`name` property (how you fetch the lists is up to you):
|
|
|
|
```js
|
|
await snfe.useLists([
|
|
fetch('easylist').then(raw => ({ name: 'easylist', raw })),
|
|
fetch('easyprivacy').then(raw => ({ name: 'easyprivacy', raw })),
|
|
]);
|
|
```
|
|
|
|
Now we are ready to match network requests:
|
|
|
|
```js
|
|
// Not blocked
|
|
if ( snfe.matchRequest({
|
|
originURL: 'https://www.bloomberg.com/',
|
|
url: 'https://www.bloomberg.com/tophat/assets/v2.6.1/that.css',
|
|
type: 'stylesheet'
|
|
}) !== 0 ) {
|
|
console.log(snfe.toLogData());
|
|
}
|
|
|
|
// Blocked
|
|
if ( snfe.matchRequest({
|
|
originURL: 'https://www.bloomberg.com/',
|
|
url: 'https://securepubads.g.doubleclick.net/tag/js/gpt.js',
|
|
type: 'script'
|
|
}) !== 0 ) {
|
|
console.log(snfe.toLogData());
|
|
}
|
|
|
|
// Unblocked
|
|
if ( snfe.matchRequest({
|
|
originURL: 'https://www.bloomberg.com/',
|
|
url: 'https://sourcepointcmp.bloomberg.com/ccpa.js',
|
|
type: 'script'
|
|
}) !== 0 ) {
|
|
console.log(snfe.toLogData());
|
|
}
|
|
```
|
|
|
|
It is possible to pre-parse filter lists and save the intermediate results for
|
|
later use -- useful to speed up the loading of filter lists. This will be
|
|
documented eventually, but if you feel adventurous, you can look at the code
|
|
and use this capability now if you figure out the details.
|
|
|
|
---
|
|
|
|
## Extras
|
|
|
|
You can directly use specific APIs exposed by this package, here are some of
|
|
them, which are used internally by uBO's SNFE.
|
|
|
|
### `HNTrieContainer`
|
|
|
|
A well optimised [compressed trie](https://en.wikipedia.org/wiki/Trie#Compressing_tries)
|
|
container specialized to specifically store and lookup hostnames.
|
|
|
|
The matching algorithm is designed for hostnames, i.e. the hostname labels
|
|
making up a hostname are matched from right to left, such that `www.example.org`
|
|
with be a match if `example.org` is stored into the trie, while
|
|
`anotherexample.org` won't be a match.
|
|
|
|
`HNTrieContainer` is designed to store a large number of hostnames with CPU and
|
|
memory efficiency as a main concern -- and is a key component of uBO.
|
|
|
|
To create and use a standalone `HNTrieContainer` object:
|
|
|
|
```js
|
|
import HNTrieContainer from '@gorhill/ubo-core/js/hntrie.js';
|
|
|
|
const trieContainer = new HNTrieContainer();
|
|
|
|
const aTrie = trieContainer.createOne();
|
|
aTrie.add('example.org');
|
|
aTrie.add('example.com');
|
|
|
|
const anotherTrie = trieContainer.createOne();
|
|
anotherTrie.add('foo.invalid');
|
|
anotherTrie.add('bar.invalid');
|
|
|
|
// matches() return the position at which the match starts, or -1 when
|
|
// there is no match.
|
|
|
|
// Matches: return 4
|
|
console.log("aTrie.matches('www.example.org')", aTrie.matches('www.example.org'));
|
|
|
|
// Does not match: return -1
|
|
console.log("aTrie.matches('www.foo.invalid')", aTrie.matches('www.foo.invalid'));
|
|
|
|
// Does not match: return -1
|
|
console.log("anotherTrie.matches('www.example.org')", anotherTrie.matches('www.example.org'));
|
|
|
|
// Matches: return 0
|
|
console.log("anotherTrie.matches('foo.invalid')", anotherTrie.matches('foo.invalid'));
|
|
```
|
|
|
|
The `reset()` method must be used to remove all the tries from a trie container,
|
|
you can't remove a single trie from the container.
|
|
|
|
```js
|
|
hntrieContainer.reset();
|
|
```
|
|
|
|
When you reset a trie container, you can't use the reference to prior instances
|
|
of trie, i.e. `aTrie` and `anotherTrie` are no longer valid and shouldn't be
|
|
used following a reset.
|