I’ve been using procmail for a while, filtering mailing list messages into per-list folders with one rule per list. I recently decided to try automatically sorting lists by List-Id headers. When I searched for pre-built rules to do this, I found some that came close, but none that did what I wanted. So I wrote my own compound rule and posted it here. The below code sorts a List-Id like foo.example.com or foo.lists.example.com into the Maildir folder .Lists.example-com.foo.
:0
* ^List-Id: .*\/<[a-zA-Z0-9_-]+(\.lists)?\.[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+>$
{
LISTID="$MATCH"
:0
* LISTID ?? ^<\/[a-zA-Z0-9_-]+
{
LISTID_PART1="$MATCH"
:0
* LISTID ?? ^<[a-zA-Z0-9_-]+(\.lists)?\.\/[a-zA-Z0-9_-]+
{
LISTID_PART2="$MATCH"
:0
* LISTID ?? ^<[a-zA-Z0-9_-]+(\.lists)?\.[a-zA-Z0-9_-]+\.\/[a-zA-Z0-9_-]+
.Lists.${LISTID_PART2}-${MATCH}.${LISTID_PART1}/
}
}
}
This has a few advantages over other rules I found online:
- The entire List-Id is used, not just the part before the first dot.
- The resulting folders are hierarchical.
- Only a limited character set is allowed in folder names. This prevents something like List-Id: <foo/../../../etc/passwd> from causing trouble.
It also has a few disadvantages:
- It only works as intended for List-Ids of the form list-name.domain.tld or list-name.lists.domain.tld. Other List-Ids are either incorrectly assumed to be in one of those forms, or ignored.
- It only supports a limited character set. This has security advantages, but usability disadvantages.
- It does not place a limit on the length of the directory name. Unfortunately, procmail makes that difficult to do.
- It converts dots in the domain into hyphens, but hyphens are allowed in domain names. This could cause both foo.bar-baz.quux and foo.bar.baz-quux to be sorted into the same folder, .Lists.bar-baz-quux.foo.
- It’s a bit hard to read.