to remove HTML tags and unescape HTML entities
11 KiB
String Formatting
Format strings in gallery-dl follow the general rules of str.format()
(PEP 3101) plus several extras.
The syntax for replacement fields is {<field-name>!<conversion>:<format-specifiers>}
, where !<conversion>
and :<format-specifiers>
are both optional and can be used to specify how the value selected by <field-name>
should be transformed.
Field Names
Field names select the metadata value to use in a replacement field.
While simple names are usually enough, more complex forms like accessing values by attribute, element index, or slicing are also supported.
Example | Result | |
---|---|---|
Name | {title} |
Hello World |
Element Index | {title[6]} |
W |
Slicing | {title[3:8]} |
lo Wo |
Slicing (Bytes) | {title_ja[b3:18]} |
ロー・ワー |
Alternatives | {empty|title} |
Hello World |
Attribute Access | {extractor.url} |
https://example.org/ |
Element Access | {user[name]} |
John Doe |
{user['name']} |
John Doe |
All of these methods can be combined as needed.
For example {title[24]|empty|extractor.url[15:-1]}
would result in .org
.
Conversions
Conversion specifiers allow to convert the value to a different form or type. Such a specifier must only consist of 1 character. gallery-dl supports the default three (s
, r
, a
) as well as several others:
Conversion | Description | Example | Result |
---|---|---|---|
l |
Convert a string to lowercase | {foo!l} |
foo bar |
u |
Convert a string to uppercase | {foo!u} |
FOO BAR |
c |
Capitalize a string, i.e. convert the first character to uppercase and all others to lowercase | {foo!c} |
Foo bar |
C |
Capitalize each word in a string | {foo!C} |
Foo Bar |
g |
Slugify a value | {foo!g} |
foo-bar |
j |
Serialize value to a JSON formatted string | {tags!j} |
["sun", "tree", "water"] |
t |
Trim a string, i.e. remove leading and trailing whitespace characters | {bar!t} |
FooBar |
T |
Convert a datetime object to a unix timestamp |
{date!T} |
1262304000 |
d |
Convert a unix timestamp to a datetime object |
{created!d} |
2010-01-01 00:00:00 |
U |
Convert HTML entities | {html!U} |
<p>foo & bar</p> |
H |
Convert HTML entities & remove HTML tags | {html!H} |
foo & bar |
s |
Convert value to str |
{tags!s} |
['sun', 'tree', 'water'] |
S |
Convert value to str while providing a human-readable representation for lists |
{tags!S} |
sun, tree, water |
r |
Convert value to str using repr() |
||
a |
Convert value to str using ascii() |
Format Specifiers
Format specifiers can be used for advanced formatting by using the options provided by Python (see Format Specification Mini-Language) like zero-filling a number ({num:>03}
) or formatting a datetime
object ({date:%Y%m%d}
), or with gallery-dl's extra formatting specifiers:
Format Specifier | Description | Example | Result |
---|---|---|---|
?<start>/<end>/ |
Adds <start> and <end> to the actual value if it evaluates to True . Otherwise the whole replacement field becomes an empty string. |
{foo:?[/]/} |
[Foo Bar] |
{empty:?[/]/} |
|
||
[<start>:<stop>] |
Applies a Slicing operation to the current value, similar to Field Names | {foo:[1:-1]} |
oo Ba |
[b<start>:<stop>] |
Same as above, but applies to the bytes() representation of a string in filesystem encoding |
{foo_ja:[b3:-1]} |
ー・バ |
L<maxlen>/<repl>/ |
Replaces the entire output with <repl> if its length exceeds <maxlen> |
{foo:L15/long/} |
Foo Bar |
{foo:L3/long/} |
long |
||
J<separator>/ |
Concatenates elements of a list with <separator> using str.join() |
{tags:J - /} |
sun - tree - water |
R<old>/<new>/ |
Replaces all occurrences of <old> with <new> using str.replace() |
{foo:Ro/()/} |
F()() Bar |
S<order>/ |
Sort a list. <order> can be either ascending or descending/reverse. (default: a) |
{tags:Sd} |
['water', 'tree', 'sun'] |
D<format>/ |
Parse a string value to a datetime object according to <format> |
{updated:D%b %d %Y %I:%M %p/} |
2010-01-01 00:00:00 |
O<offset>/ |
Apply <offset> to a datetime object, either as ±HH:MM or local for local UTC offset |
{date:O-06:30/} |
2009-12-31 17:30:00 |
All special format specifiers (?
, L
, J
, R
, D
, O
, etc)
can be chained and combined with one another,
but must always appear before any standard format specifiers:
For example {foo:?//RF/B/Ro/e/> 10}
-> Bee Bar
?//
- Tests iffoo
has a valueRF/B/
- ReplacesF
withB
Ro/e/
- Replaceso
withe
> 10
- Left-fills the string with spaces until it is 10 characters long
Global Replacement Fields
Replacement field names that are available in all format strings.
Field Name | Description | Example | Result |
---|---|---|---|
_env |
Environment variables | {_env[HOME]} |
/home/john |
_now |
Current local date and time | {_now:%Y-%m} |
2022-08 |
_lit |
String literals | {_lit[foo]} |
foo |
{'bar'} |
bar |
Special Type Format Strings
Starting a format string with \f<Type>
allows to set a different format string type than the default. Available ones are:
Type | Description | Usage |
---|---|---|
F |
An f-string literal | \fF '{title.strip()}' by {artist.capitalize()} |
E |
An arbitrary Python expression | \fE title.upper().replace(' ', '-') |
T |
Path to a template file containing a regular format string | \fT ~/.templates/booru.txt |
TF |
Path to a template file containing an f-string literal | \fTF ~/.templates/fstr.txt |
M |
Path or name of a Python module followed by the name of one of its functions. This function gets called with the current metadata dict as argument and should return a string. | \fM my_module:generate_text |