Вы находитесь на странице: 1из 7

5/18/2020 Python : How to parse the Body from a raw email , given that raw email does not

mail does not have a "Body" tag or anything - Stack Overflow

Python : How to parse the Body from a raw email , given that raw email
does not have a “Body” tag or anything
Asked 6 years, 9 months ago Active 7 days ago Viewed 117k times

It seems easy to get the

73 From
To
Subject

etc via
28
import email
b = email.message_from_string(a)
bbb = b['from']
ccc = b['to']

assuming that "a" is the raw-email string which looks something like this.

a = """From root@a1.local.tld Thu Jul 25 19:28:59 2013


Received: from a1.local.tld (localhost [127.0.0.1])
by a1.local.tld (8.14.4/8.14.4) with ESMTP id r6Q2SxeQ003866
for <ooo@a1.local.tld>; Thu, 25 Jul 2013 19:28:59 -0700
Received: (from root@localhost)
by a1.local.tld (8.14.4/8.14.4/Submit) id r6Q2Sxbh003865;
Thu, 25 Jul 2013 19:28:59 -0700
From: root@a1.local.tld
Subject: oooooooooooooooo
To: ooo@a1.local.tld
Cc:
X-Originating-IP: 192.168.15.127
X-Mailer: Webmin 1.420
Message-Id: <1374805739.3861@a1>
Date: Thu, 25 Jul 2013 19:28:59 -0700 (PDT)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="bound1374805739"

This is a multi-part message in MIME format.

--bound1374805739
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

ooooooooooooooooooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooooooooooooooooooo
By using our site, you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and
--bound1374805739--"""
our Terms of Service.

https://stackoverflow.com/questions/17874360/python-how-to-parse-the-body-from-a-raw-email-given-that-raw-email-does-not 1/7
5/18/2020 Python : How to parse the Body from a raw email , given that raw email does not have a "Body" tag or anything - Stack Overflow

THE QUESTION

how do you get the Body of this email via python ?

So far this is the only code i am aware of but i have yet to test it.

if email.is_multipart():
for part in email.get_payload():
print part.get_payload()
else:
print email.get_payload()

is this the correct way ?

or maybe there is something simpler such as...

import email
b = email.message_from_string(a)
bbb = b['body']

python email python-2.7 mod-wsgi wsgi

edited Aug 18 '14 at 18:19 asked Jul 26 '13 at 6:25


codegeek user2621078
23.4k 9 51 60

7 Answers Active Oldest Votes

Use Message.get_payload

83 b = email.message_from_string(a)
if b.is_multipart():
for payload in b.get_payload():
# if payload.is_multipart(): ...
print payload.get_payload()
else:
print b.get_payload()

edited Mar 4 '14 at 13:34 answered Jul 26 '13 at 6:30


Gagandeep Singh falsetru
4,745 3 36 58 284k 39 530 502

3 Nevermind! I realized that I could just use base64 library and do a base64.b64decode() – user4822346 Jul
23 '15 at 15:23

been searching for an hour trying to figure this out! thank you! – user2709115 Mar 26 at 17:50

By using our site, you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and
our Terms of Service.

https://stackoverflow.com/questions/17874360/python-how-to-parse-the-body-from-a-raw-email-given-that-raw-email-does-not 2/7
5/18/2020 Python : How to parse the Body from a raw email , given that raw email does not have a "Body" tag or anything - Stack Overflow

To be highly positive you work with the actual email body (yet, still with the possibility you're not
parsing the right part), you have to skip attachments, and focus on the plain or html part
102 (depending on your needs) for further processing.

As the before-mentioned attachments can and very often are of text/plain or text/html part, this
non-bullet-proof sample skips those by checking the content-disposition header:

b = email.message_from_string(a)
body = ""

if b.is_multipart():
for part in b.walk():
ctype = part.get_content_type()
cdispo = str(part.get('Content-Disposition'))

# skip any text/plain (txt) attachments


if ctype == 'text/plain' and 'attachment' not in cdispo:
body = part.get_payload(decode=True) # decode
break
# not multipart - i.e. plain text, no attachments, keeping fingers crossed
else:
body = b.get_payload(decode=True)

BTW, walk() iterates marvelously on mime parts, and get_payload(decode=True) does the dirty
work on decoding base64 etc. for you.

Some background - as I implied, the wonderful world of MIME emails presents a lot of pitfalls of
"wrongly" finding the message body. In the simplest case it's in the sole "text/plain" part and
get_payload() is very tempting, but we don't live in a simple world - it's often surrounded in
multipart/alternative, related, mixed etc. content. Wikipedia describes it tightly - MIME, but
considering all these cases below are valid - and common - one has to consider safety nets all
around:

Very common - pretty much what you get in normal editor (Gmail,Outlook) sending formatted text
with an attachment:

multipart/mixed
|
+- multipart/related
| |
| +- multipart/alternative
| | |
| | +- text/plain
| | +- text/html
| |
| +- image/png
|
+-- application/msexcel

By usingRelatively simple
our site, you - just alternative
acknowledge representation:
that you have read and understand our Cookie Policy, Privacy Policy, and
our Terms of Service.

https://stackoverflow.com/questions/17874360/python-how-to-parse-the-body-from-a-raw-email-given-that-raw-email-does-not 3/7
5/18/2020 Python : How to parse the Body from a raw email , given that raw email does not have a "Body" tag or anything - Stack Overflow

multipart/alternative
|
+- text/plain
+- text/html

For good or bad, this structure is also valid:

multipart/alternative
|
+- text/plain
+- multipart/related
|
+- text/html
+- image/jpeg

Hope this helps a bit.

P.S. My point is don't approach email lightly - it bites when you least expect it :)

answered Sep 29 '15 at 9:30


Todor Minakov
11.8k 2 37 44

5 Thanks for this thorough example and for spelling out a warning - in contrary to the accepted answer. I think
this is a far better/safer approach. – Simon Steinberger Jun 23 '17 at 15:04

Ah, very good! .get_payload(decode=True) instead of just .get_payload() has made life much easier,
thanks! – Mark Jul 30 '19 at 3:55

There is very good package available to parse the email contents with proper documentation.

7 import mailparser

mail = mailparser.parse_from_file(f)
mail = mailparser.parse_from_file_obj(fp)
mail = mailparser.parse_from_string(raw_mail)
mail = mailparser.parse_from_bytes(byte_mail)

How to Use:

mail.attachments: list of all attachments


mail.body
mail.to

answered Mar 15 '18 at 9:05


Amit Sharma
1,201 12 18

1 Library is great, but I had to make my own class that inherits from MailParser and override body method,
By using ourbecause
site, youitacknowledge
joins the partsthat
of email's body
you have withand
read "\n--- mail_boundary
understand ---\n"Policy
our Cookie which was notPolicy
, Privacy ideal ,for
andme. –
avram Sep
our Terms of Service. 21 '18 at 12:30

https://stackoverflow.com/questions/17874360/python-how-to-parse-the-body-from-a-raw-email-given-that-raw-email-does-not 4/7
5/18/2020 Python : How to parse the Body from a raw email , given that raw email does not have a "Body" tag or anything - Stack Overflow
hi @avram, could you please share the class that you have written ? – Amey P Naik May 13 '19 at 12:53

I managed to split the result on "\n--- mail_boundary ---\n". – Amey P Naik May 14 '19 at 7:16

1 @AmeyPNaik Here I made a quick github gist:


gist.github.com/aleksaa01/ccd371869f3a3c7b3e47822d5d78ccdf – avram May 14 '19 at 20:11

1 @AmeyPNaik in their documentation, it says: mail-parser can parse Outlook email format (.msg). To use
this feature, you need to install libemail-outlook-message-perl package – Ciprian Tomoiagă Dec 3 '19 at
10:57

There is no b['body'] in python. You have to use get_payload.

4 if isinstance(mailEntity.get_payload(), list):
for eachPayload in mailEntity.get_payload():
...do things you want...
...real mail body is in eachPayload.get_payload()...
else:
...means there is only text/plain part....
...use mailEntity.get_payload() to get the body...

Good Luck.

answered Jul 26 '13 at 6:36


Jimmy Lin
1,285 3 21 38

If emails is the pandas dataframe and emails.message the column for email text

0 ## Helper functions
def get_text_from_email(msg):
'''To get the content from email objects'''
parts = []
for part in msg.walk():
if part.get_content_type() == 'text/plain':
parts.append( part.get_payload() )
return ''.join(parts)

def split_email_addresses(line):
'''To separate multiple email addresses'''
if line:
addrs = line.split(',')
addrs = frozenset(map(lambda x: x.strip(), addrs))
else:
addrs = None
return addrs

import email
# Parse the emails into a list email objects
messages = list(map(email.message_from_string, emails['message']))
emails.drop('message', axis=1, inplace=True)
# Get fields from parsed email objects
keys = messages[0].keys()
for key in keys:
emails[key]
By using our site, = [doc[key]
you acknowledge for have
that you doc in messages]
read and understand our Cookie Policy, Privacy Policy, and
# Parse content from emails
our Termsemails['content']
of Service. = list(map(get_text_from_email, messages))

https://stackoverflow.com/questions/17874360/python-how-to-parse-the-body-from-a-raw-email-given-that-raw-email-does-not 5/7
5/18/2020 Python : How to parse the Body from a raw email , given that raw email does not have a "Body" tag or anything - Stack Overflow
# Split multiple email addresses
emails['From'] = emails['From'].map(split_email_addresses)
emails['To'] = emails['To'].map(split_email_addresses)

# Extract the root of 'file' as 'user'


emails['user'] = emails['file'].map(lambda x:x.split('/')[0])
del messages

emails.head()

edited Dec 4 '18 at 15:18 answered Aug 28 '18 at 6:10


Wayne Werner Ajay Ohri
37.1k 20 147 231 2,671 2 24 58

Python 3.6+ provides built-in convenience methods to find and decode the plain text body as in
@Todor Minakov 's answer. You can use the EMailMessage.get_body() and get_content() methods:

0
msg = email.message_from_string(s, policy=email.policy.default)
body = msg.get_body(('plain',))
if body:
body = body.get_content()
print(body)

Note this will give None if there is no (obvious) plain text body part.

If you are reading from e.g. an mbox file, you can give the mailbox constructor an EmailMessage
factory:

mbox = mailbox.mbox(mboxfile, factory=lambda f: email.message_from_binary_file(f,


policy=email.policy.default), create=False)
for msg in mbox:
...

Note you must pass email.policy.default as the policy, since it's not the default...

answered May 10 at 6:53


Doctor J
4,250 1 33 32

Here's the code that works for me everytime (for Outlook emails):

-3 #to read Subjects and Body of email in a folder (or subfolder)

import win32com.client
#import package

outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
#create object

#get to the desired folder (MyEmail@xyz.com is my root folder)

By using our site, you acknowledge


root_folder = that you have read and understand our Cookie Policy, Privacy Policy, and
our Termsoutlook.Folders['MyEmail@xyz.com'].Folders['Inbox'].Folders['SubFolderName']
of Service.

https://stackoverflow.com/questions/17874360/python-how-to-parse-the-body-from-a-raw-email-given-that-raw-email-does-not 6/7
5/18/2020 Python : How to parse the Body from a raw email , given that raw email does not have a "Body" tag or anything - Stack Overflow

#('Inbox' and 'SubFolderName' are the subfolders)

messages = root_folder.Items

for message in messages:


if message.Unread == True: # gets only 'Unread' emails
subject_content = message.subject
# to store subject lines of mails

body_content = message.body
# to store Body of mails

print(subject_content)
print(body_content)

message.Unread = True # mark the mail as 'Read'


message = messages.GetNext() #iterate over mails

edited Jan 30 '19 at 8:57 answered Jan 30 '19 at 8:16


Deepesh Verma
1 1

4 Perhaps spell out that this is for Outlook on Windows, not for real email. – tripleee Jan 30 '19 at 8:20

By using our site, you acknowledge that you have read and understand our Cookie Policy, Privacy Policy, and
our Terms of Service.

https://stackoverflow.com/questions/17874360/python-how-to-parse-the-body-from-a-raw-email-given-that-raw-email-does-not 7/7

Вам также может понравиться