2021-08-05 20:28:17 +02:00
# uBlock Origin Core
2021-08-06 14:25:36 +02:00
The core filtering engines used in the uBlock Origin ("uBO") extension, and has
no external dependencies.
2021-08-05 20:28:17 +02:00
2021-08-06 14:25:36 +02:00
## Installation
2021-08-05 20:28:17 +02:00
2021-08-07 17:38:22 +02:00
Install: `npm install @gorhill/ubo-core`
2021-08-06 14:25:36 +02:00
This is a very early version and the API is subject to change at any time.
This package uses [native JavaScript modules ](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules ).
## Description
The package contains uBO's static network filtering engine ("SNFE"), which
purpose is to parse and enforce filter lists. The matching algorithm is highly
2021-08-06 14:29:13 +02:00
efficient, and _especially_ optimized to match against large sets of pure
hostnames.
2021-08-06 14:25:36 +02:00
The SNFE can be fed filter lists from a variety of sources, such as [EasyList/EasyPrivacy ](https://easylist.to/ ),
[uBlock filters ](https://github.com/uBlockOrigin/uAssets/tree/master/filters ),
and also lists of domain names or hosts file format (i.e. block lists from [The Block List Project ](https://github.com/blocklistproject/Lists#the-block-list-project ),
[Steven Black's HOSTS ](https://github.com/StevenBlack/hosts#readme ), etc).
## Usage
2024-10-07 14:56:34 +02:00
See `./demo.js` in package for instructions to quickly get started.
2021-08-06 14:25:36 +02:00
At the moment, there can be only one instance of the static network filtering
2021-08-08 15:17:14 +02:00
engine ("SNFE"), which proxy API must be imported as follow:
2021-08-06 14:25:36 +02:00
```js
2021-08-08 15:17:14 +02:00
import { StaticNetFilteringEngine } from '@gorhill/ubo-core';
2021-08-06 14:25:36 +02:00
```
If you must import as a NodeJS module:
```js
2021-09-25 13:47:40 +02:00
const { StaticNetFilteringEngine } = await import('@gorhill/ubo-core');
2021-08-06 14:25:36 +02:00
```
2021-08-08 15:17:14 +02:00
Create an instance of SNFE:
2021-08-06 14:25:36 +02:00
2021-08-06 14:29:13 +02:00
```js
2024-10-06 22:29:42 +02:00
const snfe = await StaticNetFilteringEngine.create();
2021-08-06 14:29:13 +02:00
```
2021-08-08 15:17:14 +02:00
Feed the SNFE with filter lists -- `useLists()` accepts an array of
2021-08-06 14:29:13 +02:00
objects (or promises to object) which expose the raw text of a list
through the `raw` property, and optionally the name of the list through the
`name` property (how you fetch the lists is up to you):
```js
2021-08-08 15:17:14 +02:00
await snfe.useLists([
2024-10-06 22:29:42 +02:00
fetch('easylist').then(r => r.text()).then(raw => ({ name: 'easylist', raw })),
fetch('easyprivacy').then(r => r.text()).then(raw => ({ name: 'easyprivacy', raw })),
2021-08-06 14:29:13 +02:00
]);
```
Now we are ready to match network requests:
```js
// Not blocked
2021-08-08 15:17:14 +02:00
if ( snfe.matchRequest({
originURL: 'https://www.bloomberg.com/',
url: 'https://www.bloomberg.com/tophat/assets/v2.6.1/that.css',
type: 'stylesheet'
}) !== 0 ) {
2021-08-06 14:29:13 +02:00
console.log(snfe.toLogData());
}
// Blocked
2021-08-08 15:17:14 +02:00
if ( snfe.matchRequest({
originURL: 'https://www.bloomberg.com/',
url: 'https://securepubads.g.doubleclick.net/tag/js/gpt.js',
type: 'script'
}) !== 0 ) {
2021-08-06 14:29:13 +02:00
console.log(snfe.toLogData());
}
// Unblocked
2021-08-08 15:17:14 +02:00
if ( snfe.matchRequest({
originURL: 'https://www.bloomberg.com/',
url: 'https://sourcepointcmp.bloomberg.com/ccpa.js',
type: 'script'
}) !== 0 ) {
2021-08-06 14:29:13 +02:00
console.log(snfe.toLogData());
2021-08-06 14:25:36 +02:00
}
```
2021-08-08 15:17:14 +02:00
2024-10-06 23:03:18 +02:00
Once all the filter lists are loaded into the static network filtering engine,
you can serialize the content of the engine into a JS string:
```js
const serializedData = await snfe.serialize();
```
You can save and later use that JS string to fast-load the content of the
static network filtering engine without having to parse and compile the lists:
```js
const snfe = await StaticNetFilteringEngine.create();
await snfe.deserialize(serializedData);
```
2021-08-10 15:27:59 +02:00
---
## Extras
You can directly use specific APIs exposed by this package, here are some of
them, which are used internally by uBO's SNFE.
2021-08-15 16:43:36 +02:00
### HNTrieContainer
2021-08-10 15:27:59 +02:00
A well optimised [compressed trie ](https://en.wikipedia.org/wiki/Trie#Compressing_tries )
container specialized to specifically store and lookup hostnames.
The matching algorithm is designed for hostnames, i.e. the hostname labels
making up a hostname are matched from right to left, such that `www.example.org`
with be a match if `example.org` is stored into the trie, while
`anotherexample.org` won't be a match.
`HNTrieContainer` is designed to store a large number of hostnames with CPU and
memory efficiency as a main concern -- and is a key component of uBO.
To create and use a standalone `HNTrieContainer` object:
```js
import HNTrieContainer from '@gorhill/ubo-core/js/hntrie.js';
const trieContainer = new HNTrieContainer();
const aTrie = trieContainer.createOne();
2021-12-04 18:40:43 +01:00
trieContainer.add(aTrie, 'example.org');
trieContainer.add(aTrie, 'example.com');
2021-08-10 15:27:59 +02:00
const anotherTrie = trieContainer.createOne();
2021-12-04 18:40:43 +01:00
trieContainer.add(anotherTrie, 'foo.invalid');
trieContainer.add(anotherTrie, 'bar.invalid');
2021-08-10 15:27:59 +02:00
// matches() return the position at which the match starts, or -1 when
// there is no match.
// Matches: return 4
2021-12-04 18:40:43 +01:00
console.log("trieContainer.matches(aTrie, 'www.example.org')", trieContainer.matches(aTrie, 'www.example.org'));
2021-08-10 15:27:59 +02:00
// Does not match: return -1
2021-12-04 18:40:43 +01:00
console.log("trieContainer.matches(aTrie, 'www.foo.invalid')", trieContainer.matches(aTrie, 'www.foo.invalid'));
2021-08-10 15:27:59 +02:00
// Does not match: return -1
2021-12-04 18:40:43 +01:00
console.log("trieContainer.matches(anotherTrie, 'www.example.org')", trieContainer.matches(anotherTrie, 'www.example.org'));
2021-08-10 15:27:59 +02:00
// Matches: return 0
2021-12-04 18:40:43 +01:00
console.log("trieContainer.matches(anotherTrie, 'foo.invalid')", trieContainer.matches(anotherTrie, 'foo.invalid'));
2021-08-10 15:27:59 +02:00
```
The `reset()` method must be used to remove all the tries from a trie container,
you can't remove a single trie from the container.
```js
2021-12-04 18:40:43 +01:00
trieContainer.reset();
2021-08-10 15:27:59 +02:00
```
When you reset a trie container, you can't use the reference to prior instances
of trie, i.e. `aTrie` and `anotherTrie` are no longer valid and shouldn't be
used following a reset.