85
370
Chapter 16
table 16-3 (continued)
Search key
meaning
'FROM string',
'TO string',
'CC string',
'BCC string'
Returns all messages where string is found in the “from”
emailaddress, “to” addresses, “cc” (carbon copy) addresses,
or “bcc” (blind carbon copy) addresses, respectively If there
are multiple email addresses in string, then separate them
with spaces and enclose them all with double quotes:
'CC "firstcc@example.com secondcc@example.com"'
'SEEN',
'UNSEEN'
Returns all messages with and without the \Seen flag, respec-
tively An email obtains the \Seen flag if it has been accessed
with a fetch() method call (described later) or if it is clicked
when you’re checking your email in an email program or web
browser It’s more common to say the email has been “read”
rather than “seen,” but they mean the same thing
'ANSWERED',
'UNANSWERED'
Returns all messages with and without the \Answered flag,
respectively A message obtains the \Answered flag when it
is replied to
'DELETED',
'UNDELETED'
Returns all messages with and without the \Deleted flag, respec-
tively Email messages deleted with the delete_messages()
method are given the \Deleted flag but are not permanently
deleted until the expunge() method is called (see “Deleting
Emails” on page 375) Note that some email providers, such
as Gmail, automatically expunge emails
'DRAFT',
'UNDRAFT'
Returns all messages with and without the \Draft flag, respec-
tively Draft messages are usually kept in a separate Drafts
folder rather than in the INBOX folder
'FLAGGED',
'UNFLAGGED'
Returns all messages with and without the \Flagged flag,
respectively This flag is usually used to mark email mes-
sages as “Important” or “Urgent”
'LARGER N',
'SMALLER N'
Returns all messages larger or smaller than N bytes,
respectively
'NOT search-key'
Returns the messages that search-key would not have returned
'OR search-key1
search-key2'
Returns the messages that match either the first or second
search-key
Note that some IMAP servers may have slightly different implementa-
tions for how they handle their flags and search keys. It may require some
experimentation in the interactive shell to see exactly how they behave.
You can pass multiple IMAP search key strings in the list argument to
the
search()
method. The messages returned are the ones that match all the
search keys. If you want to match any of the search keys, use the
OR
search
key. For the
NOT
and
OR
search keys, one and two complete search keys follow
the
NOT
and
OR
, respectively.
Here are some example
search()
method calls along with their meanings:
imapObj.search(['ALL'])
Returns every message in the currently
selected folder.
imapObj.search(['ON 05-Jul-2015'])
Returns every message sent on
July 5, 2015.
55
Sending Email and Text Messages
371
imapObj.search(['SINCE 01-Jan-2015', 'BEFORE 01-Feb-2015', 'UNSEEN'])
Returns every message sent in January 2015 that is unread. (Note that
this means on and after January 1 and up to but not including February 1.)
imapObj.search(['SINCE 01-Jan-2015', 'FROM alice@example.com'])
Returns
every message from alice@example.com sent since the start of 2015.
imapObj.search(['SINCE 01-Jan-2015', 'NOT FROM alice@example.com'])
Returns every message sent from everyone except alice@example.com
since the start of 2015.
imapObj.search(['OR FROM alice@example.com FROM bob@example.com'])
Returns
every message ever sent from alice@example.com or bob@example.com.
imapObj.search(['FROM alice@example.com', 'FROM bob@example.com'])
Trick
example! This search will never return any messages, because messages
must match all search keywords. Since there can be only one “from”
address, it is impossible for a message to be from both alice@example.com
and bob@example.com.
The
search()
method doesn’t return the emails themselves but rather
unique IDs (UIDs) for the emails, as integer values. You can then pass these
UIDs to the
fetch()
method to obtain the email content.
Continue the interactive shell example by entering the following:
>>> UIDs = imapObj.search(['SINCE 05-Jul-2015'])
>>> UIDs
[40032, 40033, 40034, 40035, 40036, 40037, 40038, 40039, 40040, 40041]
Here, the list of message IDs (for messages received July 5 onward)
returned by
search()
is stored in
UIDs
. The list of UIDs returned on your com-
puter will be different from the ones shown here; they are unique to a par-
ticular email account. When you later pass UIDs to other function calls, use
the UID values you received, not the ones printed in this book’s examples.
Size Limits
If your search matches a large number of email messages, Python might
raise an exception that says
imaplib.error: got more than 10000 bytes
. When
this happens, you will have to disconnect and reconnect to the IMAP server
and try again.
This limit is in place to prevent your Python programs from eating up
too much memory. Unfortunately, the default size limit is often too small.
You can change this limit from 10,000 bytes to 10,000,000 bytes by running
this code:
>>> import imaplib
>>> imaplib._MAXLINE = 10000000
This should prevent this error message from coming up again. You may
want to make these two lines part of every IMAP program you write.
40
372
Chapter 16
Fetching an Email and Marking It As Read
Once you have a list of UIDs, you can call the
IMAPClient
object’s
fetch()
method to get the actual email content.
The list of UIDs will be
fetch()
’s first argument. The second argument
should be the list
['BODY[]']
, which tells
fetch()
to download all the body
content for the emails specified in your UID list.
uSing imAPClient’S gmAil _Se ArCh() methoD
If you are logging in to the imap.gmail.com server to access a Gmail account,
the
IMAPClient
object provides an extra search function that mimics the search
bar at the top of the Gmail web page, as highlighted in Figure 16-1
Figure 16-1: The search bar at the top of the Gmail web page
Instead of searching with IMAP search keys, you can use Gmail’s more
sophisticated search engine Gmail does a good job of matching closely
related words (for example, a search for driving will also match drive and
drove) and sorting the search results by most significant matches You can also
use Gmail’s advanced search operators (see http://nostarch.com/automatestuff/
for more information) If you are logging in to a Gmail account, pass the search
terms to the
gmail_search()
method instead of the
search()
method, like in the
following interactive shell example:
>>> UIDs = imapObj.gmail_search('meaning of life')
>>> UIDs
[42]
Ah, yes—there’s that email with the meaning of life! I was looking for that
94
Sending Email and Text Messages
373
Let’s continue our interactive shell example.
>>> rawMessages = imapObj.fetch(UIDs, ['BODY[]'])
>>> import pprint
>>> pprint.pprint(rawMessages)
{40040: {'BODY[]': 'Delivered-To: my_email_address@gmail.com\r\n'
'Received: by 10.76.71.167 with SMTP id '
--snip--
'\r\n'
'------=_Part_6000970_707736290.1404819487066--\r\n',
'SEQ': 5430}}
Import
pprint
and pass the return value from
fetch()
, stored in the vari-
able
rawMessages
, to
pprint.pprint()
to “pretty print” it, and you’ll see that
this return value is a nested dictionary of messages with UIDs as the keys.
Each message is stored as a dictionary with two keys:
'BODY[]'
and
'SEQ'
.
The
'BODY[]'
key maps to the actual body of the email. The
'SEQ'
key is for a
sequence number, which has a similar role to the UID. You can safely ignore it.
As you can see, the message content in the
'BODY[]'
key is pretty unintel-
ligible. It’s in a format called RFC 822, which is designed for IMAP servers
to read. But you don’t need to understand the RFC 822 format; later in this
chapter, the
pyzmail
module will make sense of it for you.
When you selected a folder to search through, you called
select_folder()
with the
readonly=True
keyword argument. Doing this will prevent you from
accidentally deleting an email—but it also means that emails will not get
marked as read if you fetch them with the
fetch()
method. If you do want
emails to be marked as read when you fetch them, you will need to pass
readonly=False
to
select_folder()
. If the selected folder is already in read-
only mode, you can reselect the current folder with another call to
select_
folder()
, this time with the
readonly=False
keyword argument:
>>> imapObj.select_folder('INBOX', readonly=False)
Getting Email Addresses from a Raw Message
The raw messages returned from the
fetch()
method still aren’t very use-
ful to people who just want to read their email. The
pyzmail
module parses
these raw messages and returns them as
PyzMessage
objects, which make the
subject, body, “To” field, “From” field, and other sections of the email easily
accessible to your Python code.
Continue the interactive shell example with the following (using UIDs
from your own email account, not the ones shown here):
>>> import pyzmail
>>> message = pyzmail.PyzMessage.factory(rawMessages[40041]['BODY[]'])
First, import
pyzmail
. Then, to create a
PyzMessage
object of an email,
call the
pyzmail.PyzMessage.factory()
function and pass it the
'BODY[]'
sec-
tion of the raw message. Store the result in
message
. Now
message
contains
92
374
Chapter 16
a
PyzMessage
object, which has several methods that make it easy to get
the email’s subject line, as well as all sender and recipient addresses. The
get_subject()
method returns the subject as a simple string value. The
get_
addresses()
method returns a list of addresses for the field you pass it. For
example, the method calls might look like this:
>>> message.get_subject()
'Hello!'
>>> message.get_addresses('from')
[('Edward Snowden', 'esnowden@nsa.gov')]
>>> message.get_addresses('to')
[(Jane Doe', 'my_email_address@gmail.com')]
>>> message.get_addresses('cc')
[]
>>> message.get_addresses('bcc')
[]
Notice that the argument for
get_addresses()
is
'from'
,
'to'
,
'cc'
, or
'bcc'
. The return value of
get_addresses()
is a list of tuples. Each tuple con-
tains two strings: The first is the name associated with the email address,
and the second is the email address itself. If there are no addresses in the
requested field,
get_addresses()
returns a blank list. Here, the
'cc'
carbon
copy and
'bcc'
blind carbon copy fields both contained no addresses and
so returned empty lists.
Getting the Body from a Raw Message
Emails can be sent as plaintext, HTML, or both. Plaintext emails contain
only text, while HTML emails can have colors, fonts, images, and other fea-
tures that make the email message look like a small web page. If an email
is only plaintext, its
PyzMessage
object will have its
html_part
attributes set to
None
. Likewise, if an email is only HTML, its
PyzMessage
object will have its
text_part
attribute set to
None
.
Otherwise, the
text_part
or
html_part
value will have a
get_payload()
method that returns the email’s body as a value of the bytes data type. (The
bytes data type is beyond the scope of this book.) But this still isn’t a string
value that we can use. Ugh! The last step is to call the
decode()
method on
the bytes value returned by
get_payload()
. The
decode()
method takes one
argument: the message’s character encoding, stored in the
text_part.charset
or
html_part.charset
attribute. This, finally, will return the string of the
email’s body.
Continue the interactive shell example by entering the following:
u >>> message.text_part != None
True
>>> message.text_part.get_payload().decode(message.text_part.charset)
v 'So long, and thanks for all the fish!\r\n\r\n-Al\r\n'
w >>> message.html_part != None
True
85
Sending Email and Text Messages
375
x >>> message.html_part.get_payload().decode(message.html_part.charset)
'<div dir="ltr"><div>So long, and thanks for all the fish!<br><br></div>-Al
<br></div>\r\n'
The email we’re working with has both plaintext and HTML content, so
the
PyzMessage
object stored in
message
has
text_part
and
html_part
attributes
not equal to
None
u w. Calling
get_payload()
on the message’s
text_part
and
then calling
decode()
on the bytes value returns a string of the text version
of the email v. Using
get_payload()
and
decode()
with the message’s
html_part
returns a string of the HTML version of the email x.
Deleting Emails
To delete emails, pass a list of message UIDs to the
IMAPClient
object’s
delete_messages()
method. This marks the emails with the \Deleted flag.
Calling the
expunge()
method will permanently delete all emails with the
\Deleted flag in the currently selected folder. Consider the following inter-
active shell example:
u >>> imapObj.select_folder('INBOX', readonly=False)
v >>> UIDs = imapObj.search(['ON 09-Jul-2015'])
>>> UIDs
[40066]
>>> imapObj.delete_messages(UIDs)
w {40066: ('\\Seen', '\\Deleted')}
>>> imapObj.expunge()
('Success', [(5452, 'EXISTS')])
Here we select the inbox by calling
select_folder()
on the
IMAPClient
object and passing
'INBOX'
as the first argument; we also pass the keyword
argument
readonly=False
so that we can delete emails u. We search the inbox
for messages received on a specific date and store the returned message IDs
in
UIDs
v. Calling
delete_message()
and passing it
UIDs
returns a dictionary;
each key-value pair is a message ID and a tuple of the message’s flags, which
should now include \Deleted w. Calling
expunge()
then permanently deletes
messages with the \Deleted flag and returns a success message if there were
no problems expunging the emails. Note that some email providers, such as
Gmail, automatically expunge emails deleted with
delete_messages()
instead
of waiting for an expunge command from the IMAP client.
Disconnecting from the IMAP Server
When your program has finished retrieving or deleting emails, simply call
the IMAPClient’s
logout()
method to disconnect from the IMAP server.
>>> imapObj.logout()
55
376
Chapter 16
If your program runs for several minutes or more, the IMAP server
may time out, or automatically disconnect. In this case, the next method call
your program makes on the
IMAPClient
object will raise an exception like the
following:
imaplib.abort: socket error: [WinError 10054] An existing connection was
forcibly closed by the remote host
In this event, your program will have to call
imapclient.IMAPClient()
to
connect again.
Whew! That’s it. There were a lot of hoops to jump through, but you
now have a way to get your Python programs to log in to an email account
and fetch emails. You can always consult the overview in “Retrieving and
Deleting Emails with IMAP” on page 366 whenever you need to remember
all of the steps.
Project: Sending member dues reminder emails
Say you have been “volunteered” to track member dues for the Mandatory
Volunteerism Club. This is a truly boring job, involving maintaining a
spreadsheet of everyone who has paid each month and emailing reminders
to those who haven’t. Instead of going through the spreadsheet yourself and
copying and pasting the same email to everyone who is behind on dues,
let’s—you guessed it—write a script that does this for you.
At a high level, here’s what your program will do:
• Read data from an Excel spreadsheet.
• Find all members who have not paid dues for the latest month.
• Find their email addresses and send them personalized reminders.
This means your code will need to do the following:
• Open and read the cells of an Excel document with the
openpyxl
mod-
ule. (See Chapter 12 for working with Excel files.)
• Create a dictionary of members who are behind on their dues.
• Log in to an SMTP server by calling
smtplib.SMTP()
,
ehlo()
,
starttls()
,
and
login()
.
• For all members behind on their dues, send a personalized reminder
email by calling the
sendmail()
method.
Open a new file editor window and save it as sendDuesReminders.py.
Step 1: Open the Excel File
Let’s say the Excel spreadsheet you use to track membership dues payments
looks like Figure 16-2 and is in a file named duesRecords.xlsx. You can down-
load this file from http://nostarch.com/automatestuff/.
Documents you may be interested
Documents you may be interested