Regular Expressions in Import Rules

Regular expressions used in import rules support the following standard syntax:

.

any non-newline character

(a|z)

a or z

^

start of line

[az]

a or z

$

end of line

[^az]

not a or z

\b

word boundary

[a-z]

a through z

\B

non-word boundary

(foo)

capture foo

\A

start of subject (usually the same as ^)

a?

0 or 1 as

\z

end of subject (usually the same as $)

a*

0 or more as

\d

decimal digit

a+

1 or more as

\D

non-decimal digit

a{3}

exactly 3 as

\s

whitespace

a{3,}

3 or more as

\S

non-whitespace

a{3,5}

between 3 and 5 as (inclusive)

\w

word character

\W

non-word character

All regular expressions are case-sensitive and unicode-aware, e.g. \s will match unicode whitespace characters as well as ASCII ones.

Limitations

Certain features of regular expressions aren't supported when they're used in Import Rules. These are, specifically:

  • Lookarounds (i.e. lookahead and lookbehind), both negative and positive.

    • Positive lookaround can usually be matched directly instead. E.g. foo(?=bar) could just be matched as foobar.

    • Negative lookaround can usually be matched as a normal regex, but it can be tricky.

      • E.g. pre_(?!no)/ can be matched as pre_([^/]?|[^n/][^/]|[^/][^o/]|[^/]{3,})/.

        • Because complex regexes like this are hard to maintain, we recommend just positive-matching the specific known items instead, e.g. pre_(yes|yeah|sure).

  • Backreferences.

    • Due to the nature of backreferences (i.e. that they are non-regular), it isn't generally possible to replicate the same match without them.

      • When possible, we recommend just enumerating all the items in this case instead, e.g. instead of trying to match all foldersfoob(a+)rb\0z/, you can just enumerate the folders you know exist, like foo(barbaz|baarbaaz)/.

These features are unsupported due to allowing for construction of extremely slow (exponential-time) regexes that are hard for Scanner to detect.

Last updated