Mike Fährmann
56039d2456
add 'hash_md5' and 'hash_sha1' functions ( #3679 )
...
... to global eval namespace
2023-02-22 10:58:44 +01:00
Mike Fährmann
e1df7f73b1
[deviantart] add 'search' extractor
...
(#538 , #1264 , #2954 , #2970 , #3577 )
Requires login to fetch any results, since the API endpoint raises an
error for not logged in requests.
TODO: parse HTML search results
2023-02-20 20:54:46 +01:00
Gray Manley
38a6389e2c
Fix lint.
2023-02-20 00:33:30 -06:00
Gray Manley
56cbae92ec
Use more pythony naming.
2023-02-19 06:14:34 -06:00
Gray Manley
8e2ba4f32e
Add test.
2023-02-19 06:13:21 -06:00
Mike Fährmann
dd884b02ee
replace json.loads with direct calls to JSONDecoder.decode
2023-02-09 15:22:00 +01:00
Mike Fährmann
b7337d810e
[postprocessor:metadata] add 'sort' and 'separators' options
2023-02-07 18:28:14 +01:00
Mike Fährmann
3436c6b117
[postprocessor:metadata] speed up JSON encoding
2023-02-06 12:35:28 +01:00
Mike Fährmann
925b467496
split e621 from danbooru module ( #3425 )
2023-02-03 19:24:31 +01:00
Mike Fährmann
c2bc70593e
implement ability to load external extractor classes
...
- -X/--extractors
- extractor.module-sources
2023-01-30 23:10:10 +01:00
ClosedPort22
b6706b373a
[downloader:http] add signature checks for some formats
...
also add the MIME type for .obj files
2023-01-15 23:40:55 +08:00
Mike Fährmann
71d3143c35
fix bug in test_extractors.py
...
pattern matching tests would succeed
if there is exactly one match
but for the wrong extractor
2023-01-08 15:35:05 +01:00
Mike Fährmann
fa144f38ed
[ytdl} fix dfe4f00c
for legacy yt-dlp
2023-01-04 21:42:22 +01:00
Mike Fährmann
dfe4f00ca2
[ytdl] update for yt-dlp changes
2023-01-04 13:12:24 +01:00
Mike Fährmann
d651d45239
implement specifying ranges in slice notation ( #918 , #2865 )
...
e.g.
- '1:101' or ':101' or ':101:' for files 1 to 100
- '1::2' or '::2' for every second file
- '1:101:5' or ':101:5' for files 1, 6, 11, ..., 91, 96
(the second argument specifies the first index NOT included)
2022-12-27 18:21:12 +01:00
Mike Fährmann
3616adfc75
implement '--range' with Python ranges
2022-12-26 18:32:34 +01:00
Mike Fährmann
1800bd7d14
allow '*-filter' options to be a list of expressions
2022-12-23 22:20:21 +01:00
Mike Fährmann
43c211f1a7
extend and rename util.CustomNone
2022-12-06 22:08:51 +01:00
Mike Fährmann
42481aed59
[formatter] implement 'S' format specifier ( #3266 )
...
to Sort lists
2022-11-21 21:44:42 +01:00
Mike Fährmann
6e08ad26f7
update downloader tests
2022-11-16 22:59:18 +01:00
Mike Fährmann
05255f5be0
add 'default' argument to 'text.extr()'
2022-11-09 11:00:32 +01:00
Mike Fährmann
8124c16a50
split 'build_path' from 'set_filename' and 'set_extension'
...
Do not automatically build a new path
when setting file metadata or updating its extension.
2022-11-08 17:03:24 +01:00
Mike Fährmann
eb33e6cf2d
add 'text.extr()'
...
a stripped-down version of text.extract() that
- always returns a string (like 'extract_from')
- only returns a string
- does not deal with 'pos' arguments
- is ~20% faster
2022-11-04 21:37:36 +01:00
Mike Fährmann
460095adca
update downloader tests
2022-11-01 18:48:35 +01:00
Mike Fährmann
f037429fa4
attempt to improve '-K' output for lists
...
- use [N] instead if [] to indicate a Number needs to be placed there
- enumerate list items
2022-10-28 12:04:58 +02:00
thatfuckingbird
062ef238a6
add support for aibooru (using danbooru extractor) ( #3075 )
2022-10-19 11:53:59 +02:00
Mike Fährmann
b57015cf0a
[postprocessor:metadata] assume 'mode: custom' when format is set
...
{"name": "metadata", "format": "foobar"}
will now implicitly use mode:custom and no longer mode:json like before
2022-10-04 22:35:26 +02:00
enduser420
f7ba19a1c0
[nana] add 'nana' extractors ( #2967 )
2022-10-04 09:23:24 +02:00
Mike Fährmann
b36125333f
[postprocessor:zip] implement 'files' option ( #2872 )
2022-09-09 11:41:27 +02:00
Mike Fährmann
67bad04dda
[formatter] add 'g' conversion to sluGify a string ( #2410 )
2022-08-26 17:57:17 +02:00
Mike Fährmann
6990ad0ba8
[formatter] do NOT apply :J to strings ( #2833 )
2022-08-16 16:41:19 +02:00
Mike Fährmann
c0051d7d4c
fix test
2022-08-01 21:40:35 +02:00
Mike Fährmann
dd3a6a9fd1
make 'enumerate_reversed()' work with generators ( #2795 )
2022-08-01 14:08:44 +02:00
Mike Fährmann
0c73914848
[postprocessor:metadata] implement 'mode: modify' ( #2640 )
2022-07-19 12:24:26 +02:00
Mike Fährmann
f3de6b7a87
[postprocessor:metadata] implement 'mode: delete' ( #2640 )
2022-07-19 00:57:29 +02:00
Mike Fährmann
9704c04172
[postprocessor:zip] ensure target directory exists ( #2758 )
2022-07-14 11:55:39 +02:00
Mike Fährmann
74865adae5
implement 'format-separator' option ( #2737 )
...
a global option, that servers as a workaround for shortcomings due to
lack of a proper format string parser
2022-07-10 13:31:43 +02:00
bradenhilton
117eeefda0
[postprocessor:mtime] add 'value' option ( #2739 )
2022-07-08 20:56:01 +02:00
Mike Fährmann
90ae48c40c
[formatter] implement 'O' format specifier ( #2736 )
...
to apply a UTC offset to 'date' values and other datetime objects
2022-07-08 12:51:03 +02:00
Mike Fährmann
04bed1eba3
[formatter] allow for custom "format" functions ( #2721 )
2022-07-05 12:22:01 +02:00
Mike Fährmann
54525d2e21
[formatter] implement slice operator as format specifier
...
this allows using a slice operator alongside other (special) format
specifiers like J, to first join list elements to a string and then
trimming that with a slice.
{tags:J, /[:50]}
2022-06-25 16:52:58 +02:00
Mike Fährmann
241e82e18d
[horne] add support for horne.red ( #2700 )
2022-06-25 16:52:16 +02:00
Mike Fährmann
42525cfe8d
fix '{…!j}' for otherwise non-serializable types (##2624)
...
like 'datetime'
2022-06-07 17:47:07 +02:00
Mike Fährmann
5b43faffed
[postprocessor:metadata] write to stdout by setting filename to "-"
...
(#2624 )
2022-05-30 21:17:31 +02:00
Mike Fährmann
6ad39f2b68
add ytdl tests
...
they only run when youtube-dl or yt-dlp are installed,
i.e. if __import__("<ytdl-package>") succeeds
2022-05-23 18:30:26 +02:00
Mike Fährmann
688d6553b4
replace calls to print() with stdout_write() ( #2529 )
2022-05-19 17:09:24 +02:00
Mike Fährmann
f3408a9d92
implement string literals in replacement fields
...
- either {_lit[foo]} or {'foo'}
- useful as alternative for empty metadata fields: {title|'no title'}
- due to using '_string.formatter_field_name_split()' to parse format
strings, using certain characters will result in an error: [].:!
2022-05-09 23:49:33 +02:00
Mike Fährmann
c4b9f7bab8
update functions working with cookies.txt files
...
- rename
- load_cookiestxt -> cookiestxt_load
- save_cookiestxt -< cookiestxt_store
- in cookiestxt_load, add cookies directly to a cookie jar
instead of storing them in a list first
- other unnoticeable performance increases
2022-05-06 13:21:29 +02:00
Mike Fährmann
ca3a364db7
fix build_duration_func() ( #2533 )
...
for extractors with request_interval_min > 0
2022-04-27 20:28:14 +02:00
Mike Fährmann
7fe54bab2a
attempt to fix some issues with 'contains()' ( #2446 )
...
add a third argument that gets used
when the values o search are given as a string
2022-04-08 14:40:26 +02:00
Mike Fährmann
d78a2c7163
re.escape() arguments for 'contains()' ( #2446 )
2022-04-07 15:35:54 +02:00
Mike Fährmann
413b77757b
implement 'contains()' ( #2446 )
...
and add it to globals() in compiled expressions for --filter etc
2022-03-30 16:18:33 +02:00
Mike Fährmann
e7b30866d0
[postprocessor:mtime] fix timestamps from datetime objects ( #2307 )
...
'datetime.timestamp()', which got used to convert datetime objects to
POSIX timestamps, assumes naive datetimes represent LOCAL time, while
datetimes in 'date' metadata fields represent UTC time.
Ref: https://docs.python.org/3/library/datetime.html#datetime.datetime.timestamp
> Naive datetime instances are assumed to represent local time
> you can obtain the POSIX timestamp by … calculating the timestamp directly
2022-03-23 23:05:14 +01:00
Mike Fährmann
29db716a63
implement 'datetime_to_timestamp()'
...
and rename 'to_timestamp()'
to the more descriptive 'datetime_to_timestamp_string()'
2022-03-23 22:36:01 +01:00
Mike Fährmann
8295bc6d97
fix loading/storing cookies without domain
2022-03-19 15:14:55 +01:00
Mike Fährmann
500a479026
fix a third(!) bug in _check_cookies() ( #2372 )
...
turns out tests are worthless if you get em wrong ...
2022-03-18 19:52:37 +01:00
Mike Fährmann
cf44aba333
[formatter] allow evaluating f-string literals
...
by starting a format string with '\fF'.
This was technically already possible with '\fE',
but this makes it a bit more convenient.
2022-03-18 13:31:01 +01:00
Mike Fährmann
94452761ed
fix cookies tests
2022-03-11 18:16:00 +01:00
Mike Fährmann
bddcec49f1
implement 'text.root_from_url()'
...
use domain from input URL for kemono
2022-03-01 03:09:57 +01:00
Mike Fährmann
f5b2b9333f
fix another bug in _check:cookies ( #2160 )
...
regression introduced in ed317bfc
Added a couple of tests to hopefully catch such bugs
before they land in a release.
2022-02-16 22:58:57 +01:00
Mike Fährmann
563bd0ecf4
[danbooru] inherit from BaseExtractor
...
- merge danbooru and e621 code
- support booru.allthefallen.moe (closes #2283 )
- remove support for old e621 tag search URLs
2022-02-11 21:01:51 +01:00
Mike Fährmann
b5b4f5a168
use 'build_extractor_filter' in test_results.py
2021-12-28 17:25:07 +01:00
Mike Fährmann
64cf26eaf4
allow specifying sleep-* options as string
...
either as single value or as range: "3.5", "2.1 - 5.0"
2021-12-18 23:28:56 +01:00
Mike Fährmann
010d65dcec
extend blacklist/whitelist syntax ( #2025 )
...
Each entry in such a list can now also include a subcategory
'<category>:<subcategory>'
and it is possible to use '*' or an empty string as placeholder
'*:<subcategory>', ':<subcategory>', '<category>:*'
For example
"blacklist": "imgur,*:tag,gfycat:user" or
"blacklist": ["imgur", "*:tag", "gfycat:user"]
will filter all 'imgur' extractors, all extractors with a 'tag'
subcategory (e.g. https://danbooru.donmai.us/posts?tags=bonocho ),
and all 'gfycat' user extractors.
2021-11-23 20:31:43 +01:00
Mike Fährmann
af6424f398
allow testing metadata in list elements
2021-11-21 22:46:34 +01:00
Mike Fährmann
3842cdcd8f
[formatter] implement 'D' format specifier
...
To be able to parse any string into a 'datetime' object
and format it as necessary.
Example:
{created_at:D%Y-%m-%dT%H:%M:%S%z}
->
"2010-01-01 00:00:00"
{created_at:D%Y-%m-%dT%H:%M:%S%z/%b %d %Y %I:%M %p}
->
"Jan 01 2010 12:00 AM"
with 'created_at' == "2010-01-01T01:00:00+0100"
2021-11-20 23:04:34 +01:00
Mike Fährmann
2ab190ce08
add tests for special format strings
2021-11-01 23:26:18 +01:00
Mike Fährmann
46e17c5e61
support accessing the current local datetime in format strings
...
{_now}, {_now:%Y-%m-%d}, etc
(#1968 )
2021-10-30 21:41:09 +02:00
Mike Fährmann
38193dba46
support accessing environment variables in format strings ( #1968 )
...
{_env[HOME]} to get the value of $HOME
every other format string feature is supported as well
2021-10-28 19:18:55 +02:00
Mike Fährmann
f2d6b3e6b4
run tests without using 'nose'
...
run_tests.sh -> run_tests.py
2021-10-13 04:07:41 +02:00
Mike Fährmann
12fc646c53
fix filename formatting tests
2021-09-29 23:39:02 +02:00
Mike Fährmann
e0bdacd932
[fappic] add 'image' extractor ( closes #1898 )
2021-09-28 23:35:29 +02:00
Mike Fährmann
c22ff97743
remove 'unit' argument from 'util.format_value()'
2021-09-28 23:07:55 +02:00
Mike Fährmann
cad85640de
move 'util.PathFormat' into its own 'path' module
...
to prevent circular imports between 'formatter' and 'util'
2021-09-27 21:29:37 +02:00
Mike Fährmann
74145467dd
move 'util.Formatter' into its own 'formatter' module
2021-09-27 02:37:04 +02:00
Mike Fährmann
9377543162
[mastodon] add 'following' extractor ( #1891 )
2021-09-26 00:12:34 +02:00
Mike Fährmann
bd845303ad
implement a way to shorten filenames with east-asian characters
...
(#1377 )
Setting 'output.shorten' to "eaw" (East-Asian Width) uses a slower
algorithm that also considers characters with a width > 1.
2021-09-13 21:38:33 +02:00
Mike Fährmann
292fffc83c
add 'j' format string conversion
...
to convert to a JSON formatted string
2021-08-28 01:19:36 +02:00
Mike Fährmann
bb6a130942
automatically set required DDoS-GUARD cookies ( #1779 )
...
for kemono.party and seiso.party
2021-08-16 17:40:29 +02:00
Mike Fährmann
2792ed6e4b
implement 'util.format_value()'
2021-07-26 02:11:22 +02:00
Mike Fährmann
9e42cd58ea
replace ChainPredicate class with 'functools.partial'
2021-07-20 20:21:32 +02:00
Mike Fährmann
36ac2197db
[ytdl] add extractor for sites supported by youtube-dl
...
(#1680 , #878 )
Can be used by prefixing any URL with 'ytdl:',
or by setting 'extractor,ytdl.enabled' to 'true'.
2021-07-10 20:55:47 +02:00
Mike Fährmann
64240c8d42
[imagevenue] fix extraction
...
(closes #1677 )
2021-07-09 20:13:18 +02:00
Mike Fährmann
0179581340
add 'T' format string conversion ( #1646 )
...
to convert 'date'/datetime to timestamp
2021-06-25 22:35:45 +02:00
Mike Fährmann
f74cf52e2b
[seisoparty] add 'user' and 'post' extractors ( #1635 )
2021-06-25 18:40:11 +02:00
Mike Fährmann
759735fb02
[kemonoparty] fix 'username' extraction ( fixes #1652 )
...
The site's <title> content changed from
<title>NAME | Kemono</title>
to
<title>
NAME | Kemono
</title>
2021-06-25 15:35:20 +02:00
Mike Fährmann
07c8adbd8b
[mangadex] implement login with username & password ( #1535 )
2021-06-08 02:12:57 +02:00
Mike Fährmann
4a747a31a3
[postprocessor:metadata] handle dicts in mode;tags ( fixes #1598 )
2021-06-04 22:37:43 +02:00
Mike Fährmann
3cbbefd4ed
support 'filter' option for post processors ( #1460 )
2021-06-04 18:23:32 +02:00
Mike Fährmann
0abad8bc12
implement 'compile_expression()'
2021-06-03 22:34:58 +02:00
Mike Fährmann
da6806a161
fix job tests for Python 3.4 and 3.5
...
assert_called() and assert_not_called() got added in Python 3.6
2021-05-22 21:40:52 +02:00
Mike Fährmann
8fd8126117
fix ISO 639-1 code for Japanese
...
"jp" -> "ja"
2021-05-22 16:07:04 +02:00
Mike Fährmann
af9dba4684
add DataJob tests
2021-05-21 02:59:54 +02:00
Mike Fährmann
adf4d661b3
use '_extractor' info in UrlJobs
2021-05-19 15:52:30 +02:00
Mike Fährmann
1eabfa5c7a
[pillowfort] implement login with username & password ( #846 )
2021-05-19 02:59:16 +02:00
Mike Fährmann
559462789d
add some tests for job.py
2021-05-14 19:44:16 +02:00
Mike Fährmann
c5ca7905ce
add 'noop()' and 'identity()' functions
2021-05-04 19:27:17 +02:00
Mike Fährmann
bc868e7bb8
consider apparently long extensions as part of the filename
...
(#1516 )
2021-05-02 21:15:50 +02:00
Mike Fährmann
bdfcc9c4b1
update extractor test results
2021-04-18 20:28:15 +02:00
Mike Fährmann
387fe415d5
unescape items in text.split_html()
2021-03-29 02:12:29 +02:00