Context
E-mail address parsing is a requirement in many web applications. It's not uncommon for web applications or services to need to store e-mail addresses and to send e-mail messages for one reason or another.
The Problem
A new security vulnerability surfaced in the sydent identity server of Matrix.org. It was identified by Elliot Alderson and has been patched as of 2019-04-18.
The vulnerability stems from sydent's reliance on Python's email.utils.parseaddr() function to parse e-mail addresses before sending validation e-mail messages.
It turns out that email.utils.parseaddr() isn't very reliable. In particular, it takes certain malformed e-mail addresses, treats them as valid, and returns parsed values that may be valid, but no longer equivalent to the input. We know the e-mail addresses are malformed because there's an RFC describing the format of valid e-mail addresses. The email.utils.parseaddr function's shortcomings had been reported at least as early as 2018-07-19. I also happen to have provided an example in the bug ticket thread that is exactly the behaviour being exploited.
But if we cannot trust email.utils.parseaddr(), what can we do?
The Solution
Enter email.headerregistry.Address. Note that it comes from the email.headerregistry module, which deals with, unsurprisingly, the e-mail message header. You might not think about it very often, but when you compose an e-mail message in a mail client and you write John Doe <john.doe@example.com> as the recipient, that's actually conforming to the syntax for the To: header field described in RFC 5322.
The sydent server wants to validate e-mail addresses in the form of john.doe@example.com, without the display name component. In this case, the email.headerregistry.Address class is perfect for the job. It's newer, (more) correct, and therefore safer. In contrast, email.utils.parseaddr() is quite a legacy function and quite hacky; its parsing logic is quite naïve and doesn't completely conform to RFC 5322. Note that email.headerregistry.Address cannot parse, only contruct, addresses in the form of John Doe <john.doe@example.com> - i.e. a name-addr including a display-name component. Fortunately, for sydent's use case, it doesn't matter.
So how do you use it? Here is a simple example:
from email.headerregistry import Address
try:
address = Address(addr_spec="john.doe@example.com")
except ValueError:
print('Invalid e-mail address')
Here is the ValueError you would get if you tried to pass it an invalid addr-spec:
>>> from email.headerregistry import Address
>>> Address(addr_spec='foo@bar@com')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/email/headerregistry.py", line 46, in __init__
a_s, addr_spec))
ValueError: Invalid addr_spec; only 'foo@bar' could be parsed from 'foo@bar@com'
Incidentally, Address can be used to contruct address strings that are valid according to RFC 5322, complete with validation and any necessary escape sequences - e.g. if the display name contains "exotic" characters - which is very useful for sending e-mail from Python:
addr_string_1 = str(Address(display_name='John Doe', username='john.doe', domain='example.com'))
addr_string_2 = str(Address(display_name='John Doe', addr_spec='john.doe@example.com'))
expected = 'John Doe <john.doe@example.com>'
assert addr_string_1 == expected
assert addr_string_2 == expected
Spērō ūsuī sit.
Comments
comments powered by Disqus