1
0
mirror of https://github.com/gorhill/uBlock.git synced 2024-11-06 02:42:33 +01:00
Commit Graph

14 Commits

Author SHA1 Message Date
Raymond Hill
c6fb70b1f0
Refactor hntrie to avoid the need for boundary cells
Whereas before the string segment was encoded as:

LL OOOOOOOOOOOO

where L are the upper 8 bits and used to encode the length
of the segment, and O are the lower 24 bits and used to
encode the offset of the string data in the character
buffer, the new code encode as follow:

OOOOOOOOOOOO LL

And furthermore the most significant bit of the length
LL is now used to mark whether the current string segment
is a label boundary.

This means a cell can't reference a segment longer then
127 characters. To work around this limitation for when a
segment is longer than 127 characters (a rare occurrence),
the algorithm will simply split the segment into multiple
adjacent cells.

As a result, there is no longer a need to encode
"boundariness" into special cells, which simplifies
both the storing and matching algorithms.

Additionally, added minimal documentation for the NPM
package on how to import and use HNTrieContainer as a
standalone API.
2021-08-10 09:27:59 -04:00
Raymond Hill
b54bf554a8
Fix bad test in WASM version of HNTrieContainer
The erroneous test does not seem to interfere
with the proper functioning of the trie, due
to the fact that nodes are never split without
a OR node or boundary node being present.

The issue was found when undertaking a rewrite
of the algorithm to avoid having to create
boundary nodes.
2021-08-09 07:02:00 -04:00
Raymond Hill
a08cdd721a
Fix edge case involving filter with a single wildcard
This fix the case of the following filter:

    trk*.vidible.tv

Not matching:

    https://trk.vidible.tv/trk/.vidible.tv

The wildcard is supposed to match any number of
characters, including zero characters. The issue
is that the code was not matching zero characters.

This is due to an incorrect comparison in
BidiTrieContainer.indexOf(), causing the code to
bail out before testing for the zero character
condition.
2020-06-27 07:58:46 -04:00
Raymond Hill
a36566b348
Allow empty needle in BidiTrieContainer.lastIndexOf()
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/882

Related commit:
- https://github.com/gorhill/uBlock/commit/7c0294bd5f54

The changes in the commit above have been reverted, and
the new fix is to add the ability to handle an empty
needle in BidiTrieContainer.lastIndexOf() -- in which
case the method will return the end of the currently
matched pattern.
2020-03-19 13:16:41 -04:00
Raymond Hill
609e9a6428
Remove elision of leading wildcard in some filter patterns
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/882

Related commits:
- https://github.com/gorhill/uBlock/commit/a95ef16e064a
- https://github.com/gorhill/uBlock/commit/7971b223855d

Leading wildcards before valid token characters need to
be kept in order to respect the semantic of the filter.
A leading wildcard in such case changes the semantic of
a filter, i.e. two following filters are semantically
different:

    example/abc
    *example/abc

As a result, µBlock.BidiTrieContainer.indexOf() is now
able to deal with a needle of length zero -- which is
what happens in FilterPatternLeft(Ex) with filter
patterns starting with `*` (or `^*`) and followed by
valid token characters (0-9, a-z and %).
2020-02-03 14:09:37 -05:00
Raymond Hill
a69b301d81
Fine-tune new bidi-trie code
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/761
2019-10-29 10:26:34 -04:00
Raymond Hill
5cc797fb47
Add WASM implementation for BidiTrieContainer.matches()
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/761
2019-10-28 13:57:35 -04:00
Raymond Hill
0373410635
Fix comments in WASM code 2019-10-26 15:34:40 -04:00
Raymond Hill
6c3296958c
Fix last commit due to bad last second change
Related feedback:
- b0cbc47d9a (commitcomment-35677572)

It seems I completely forgot to test the last
"trivial" change to the WASM code.
2019-10-26 15:25:47 -04:00
Raymond Hill
b0cbc47d9a
Add WASM versions for some bidi-trie methods
Related issue:
- https://github.com/uBlockOrigin/uBlock-issues/issues/761

Changes related to above issue made it possible to
create WASM versions of methods used in the bidi-trie.
In this commit, WASM versions for startsWith(), indexOf()
and lastIndexOf() have been implemented.
2019-10-26 13:13:53 -04:00
Raymond Hill
adabb56dc9
Do not store impossible to match filters in HNTrie
Consider the two following filters:

    example.com
    www.example.com

This commit make it so that if the first filter is
already present in a given HNTrie, the second filter
will not be stored, since HNTrie will _always_
return the first filter as a match whenever the
hostname to match is example.com or any subdomain
of example.com.

The detection of such pointless filters is
virtually free when adding a hostname to an HNTrie
instance (given how data is stored in the trie), so
in practice no overhead is incurred to detect such
pointless filters.

The ability to ignore impossible to match filters
in HNTrie instances will _especially_ benefit those
using large hosts files.

Examples of how this helps using real configurations:

- Default lists:
  444 filters out of 100,382 were ignored as a result
  of this commit.

- Default lists + "Energized Ultimate Protection":
  283,669 filters out of 903,235 were ignored as a
  result of this commit.

Side note: There was no measurable difference between
the two configurations above in the performance of
the matching algorithm as reported by the built-in
benchmark tool.
2019-04-29 13:15:16 -04:00
Raymond Hill
1b6fea16da
3rd-gen hntrie, suitable for large set of hostnames 2018-12-04 13:02:09 -05:00
Raymond Hill
bf266eb757
recompile wat file using latest https://github.com/WebAssembly/wabt/releases 2018-11-25 12:12:07 -05:00
Raymond Hill
d7d544cda0
Squashed commit of the following:
commit 7c6cacc59b27660fabacb55d668ef099b222a9e6
Author: Raymond Hill <rhill@raymondhill.net>
Date:   Sat Nov 3 08:52:51 2018 -0300

    code review: finalize support for wasm-based hntrie

commit 8596ed80e3bdac2c36e3c860b51e7189f6bc8487
Merge: cbe1f2e 000eb82
Author: Raymond Hill <rhill@raymondhill.net>
Date:   Sat Nov 3 08:41:40 2018 -0300

    Merge branch 'master' of github.com:gorhill/uBlock into trie-wasm

commit cbe1f2e2f38484d42af3204ec7f1b5decd30f99e
Merge: 270fc7f dbb7e80
Author: Raymond Hill <rhill@raymondhill.net>
Date:   Fri Nov 2 17:43:20 2018 -0300

    Merge branch 'master' of github.com:gorhill/uBlock into trie-wasm

commit 270fc7f9b3b73d79e6355522c1a42ce782fe7e5c
Merge: d2a89cf d693d4f
Author: Raymond Hill <rhill@raymondhill.net>
Date:   Fri Nov 2 16:21:08 2018 -0300

    Merge branch 'master' of github.com:gorhill/uBlock into trie-wasm

commit d2a89cf28f0816ffd4617c2c7b4ccfcdcc30e1b4
Merge: d7afc78 649f82f
Author: Raymond Hill <rhill@raymondhill.net>
Date:   Fri Nov 2 14:54:58 2018 -0300

    Merge branch 'master' of github.com:gorhill/uBlock into trie-wasm

commit d7afc78b5f5675d7d34c5a1d0ec3099a77caef49
Author: Raymond Hill <rhill@raymondhill.net>
Date:   Fri Nov 2 13:56:11 2018 -0300

    finalize wasm-based hntrie implementation

commit e7b9e043cf36ad055791713e34eb0322dec84627
Author: Raymond Hill <rhill@raymondhill.net>
Date:   Fri Nov 2 08:14:02 2018 -0300

    add first-pass implementation of wasm version of hntrie

commit 1015cb34624f3ef73ace58b58fe4e03dfc59897f
Author: Raymond Hill <rhill@raymondhill.net>
Date:   Wed Oct 31 17:16:47 2018 -0300

    back up draft work toward experimenting with wasm hntries
2018-11-03 08:58:46 -03:00