how mime works
rfc 822 describes a generic message format, and specifies some standard header information. it says almost nothing about the body of the messages, except that it is a chunk of ascii text.
mime takes advantage of that by superimposing a format for the body text, and it also defines some header lines of its own. the neat thing is that since the body is still just a chunk of text, it doesn't break any rules established by rfc 822. this means that any rfc 822-compliant transport system can handle mime messages without modification. the transport system expects a message to have a chunk of 7-bit text, and mime messages happily oblige them. the transport system neither knows, nor has reason to care, that lines 50 through 700 of the text chunk actually represent an executable file. for that matter, it doesn't care that the chunk of text in a non-mime message represents the english sentence "attack at dawn". as far as a mail server transporting the message is concerned, it's all just a bunch of 7-bit characters to send up the chain.
of course, the receiving clients of the message do care because that chunk of text has to be converted back into a binary file. they can do this only if they know about the format of the message, i.e., only if they are mime-compliant mail readers.
mime headers
mime defines five additional header lines that inform the receiving client about how the body should be interpreted.
header meaning
content-type
specifies the type of data contained in the message
content-transfer-encoding
specifies how the data is converted into 7-bit text
mime-version
indicates the mime compliance level to which the message is encoded.
content-id
uniquely identifies the body. this is used for splitting the contents of large messages into smaller messages.
content-description
ignored by mime applications. gives a human reader an indication of the content.
the content-type header tells decoders what kind of data is contained in the body, giving both a broad data type (such as "image"), and a specific type (such as "gif") separated by a forward slash ("/"). following the type information, the content-type header may include additional parameters in name=value pairs, each separated by a semi-colon (";"). what these additional parameters are depends on the type of data.
everyone, and their mothers, and their mothers' dogs, are proposing new content types (along with their parameters) all the time, so you'll have to go swimming in an ocean of standards documents to discover them all.
example:
content-type: text/plain; charset="iso-8859-1"
the more content types your mail reader can handle, the more capable it is. outlook supports text/html, so it can display html messages to the user directly. it supports image/jpeg, so it can show you pictures right there in the message.
any content type that the mail reader can't support itself should be saved to disk so that the user can open it with something that knows how to handle it.
in other words, there's no such thing as a "file attachment"; there are only content types that are encoded and decoded. what you do after decoding is up to you. do you support application/zip? then unzip the binary block. if not, save it to disk and let someone think of it as a "file attachment".
what most people think of as a file attachment is usually a body of type application/octet-stream. that signifies a chunk of 8-bit bytes... but that could be anything! without more concrete information about the data, there's really nothing the mail reader can do except save it to disk.
the content-transfer-encoding header identifies the mechanism a decoder should use in order to convert the message body back into its original form.
the biggies are 7bit, 8bit, binary, quoted-printable, and base64. regardless of what content types your mail reader supports, you need to be able to decode the message in the first place. you don't have to support image/gif, but you need to be able to turn the body into a gif file that you can save to disk. your program can't do this unless it handles all encoding mechanisms (you can't control how someone else's program encoded it, so you need to handle them all). you can get away with supporting only text/plain because you can just dump other types of data to disk, but you can't do anything if you can't read the data to begin with.
luckily, only base64 encoding could be considered even slightly difficult to implement.
multi-part messages
mime completes the illusion of file attachments by allowing the message body to be divided into distinct parts, each with their own headers. the content type multipart/mixed means that the content of the body is divided into blocks separated by "--" + a unique string guaranteed to not be found anywhere else in the message. if you say that your boundary string is "myboundarystring", then all occurrences of that string will be treated as a boundary. so it better not be in the message the user typed or it won't be decoded correctly.
the boundary string is specified as a parameter to the content-type header:
content-type: multipart/mixed; boundary="myboundarystring"
do not include the preceding "--" in the value. the parts should then be separated with this:
--myboundarystring
and the end of the entire message will be indicated with a trailing "--" as well:
--myboundarystring--
text before the first boundary and after the end-of-message boundary is ignored by decoders, but since a non-mime reader will simply display the whole thing as text, these areas can be used to tell the user to get a better mail reader.
from: me@here.com
to: you@there.com
subject: in one ear and out your mother
mime-version: 1.0
content-type: multipart/mixed; boundary="myboundarystring"
this is a mime message. if the next few lines look like gibberish,
then your mail reader sucks.
if you are using a mime reader, then you aren't even seeing this.
--myboundarystring
content-type: text/plain
content-transfer-encoding: 7bit
the charset= parameter is omitted and will default to us-ascii.
in fact, i could have had a blank line as my header and it would have
defaulted to exactly what i specified on those header lines.
--myboundarystring
content-type: application/octet-stream; file="attachment.exe"
content-transfer-encoding: base64
axfhfujropadladnggnfjgwsaiubvnmkadiuhterqhjsffuajkfhrqpeorlakfn
jnfhgt7fjd9dfkliodf==
--myboundarystring--
this text is ignored.
note that each part of the message conforms to rfc 822, including the "sub" headers.
MAIL 初始化邮件传输
mail from:
RCPT 标识单个的邮件接收人;常在MAIL命令后面
可有多个rcpt to:
DATA 在单个或多个RCPT命令后,表示所有的邮件接收人已标识,并初始化
数据传输,以.结束。
VRFY 用于验证指定的用户/邮箱是否存在;由于安全方面的原因,服务器常
禁止此命令
EXPN 验证给定的邮箱列表是否存在,扩充邮箱列表,也常被禁用
HELP 查询服务器支持什么命令
NOOP 无操作,服务器应响应OK
QUIT 结束会话
RSET 重置会话,当前传输被取消
--------------------------------
8. MAIL FROM命令中指定的地址是称作 envelope from地址,不需要和发送者自
己的地址是一致的。
RCPT TO 与之等同,指明的接收者地址称为envelope to地址,而与实际的to
:行是什么无关。
9.为什么没有RCPT CC和RCPT BCC:?
所有的接收者协商都通过RCPT TO命令来实现,如果是BCC,则协商发送后在对
方接收时被删掉信封接收者
10.邮件被分为信封部分,信头部分和信体部分
envelope from, envelope to 与message from:, message to:完全不相干。
11. 怎样由信封部分检查是否一封信是否是伪造的?
a. received行的关联性。
现在的SMTP邮件传输系统,在信封部分除了两端的内部主机处理的之个,考虑
两个公司防火墙之间的部分,若两台防火墙机器分别为A和B,但接收者检查信
封received:行时发现经过了C.则是伪造的。
b. received:行中的主机和IP地址对是否对应如:
Receibed: from galangal.org (turmeric.com [104.128.23.115] by mail
.bieberdorf.edu....
c. 被人手动添加在最后面的received行:
Received: from galangal.org ([104.128.23.115]) by mail .bieberdorf
.edu (8.8.5)
Received: from lemongrass.org by galangal.org (8.7.3)
Received: from graprao.com by lemongrass.org (8.6.4
MAIL 初始化邮件传输
mail from:
RCPT 标识单个的邮件接收人;常在MAIL命令后面
可有多个rcpt to:
DATA 在单个或多个RCPT命令后,表示所有的邮件接收人已标识,并初始化
数据传输,以.结束。
VRFY 用于验证指定的用户/邮箱是否存在;由于安全方面的原因,服务器常
禁止此命令
EXPN 验证给定的邮箱列表是否存在,扩充邮箱列表,也常被禁用
HELP 查询服务器支持什么命令
NOOP 无操作,服务器应响应OK
QUIT 结束会话
RSET 重置会话,当前传输被取消
--------------------------------
8. MAIL FROM命令中指定的地址是称作 envelope from地址,不需要和发送者自
己的地址是一致的。
RCPT TO 与之等同,指明的接收者地址称为envelope to地址,而与实际的to
:行是什么无关。
9.为什么没有RCPT CC和RCPT BCC:?
所有的接收者协商都通过RCPT TO命令来实现,如果是BCC,则协商发送后在对
方接收时被删掉信封接收者
10.邮件被分为信封部分,信头部分和信体部分
envelope from, envelope to 与message from:, message to:完全不相干。
11. 怎样由信封部分检查是否一封信是否是伪造的?
a. received行的关联性。
现在的SMTP邮件传输系统,在信封部分除了两端的内部主机处理的之个,考虑
两个公司防火墙之间的部分,若两台防火墙机器分别为A和B,但接收者检查信
封received:行时发现经过了C.则是伪造的。
b. received:行中的主机和IP地址对是否对应如:
Receibed: from galangal.org (turmeric.com [104.128.23.115] by mail
.bieberdorf.edu....
c. 被人手动添加在最后面的received行:
Received: from galangal.org ([104.128.23.115]) by mail .bieberdorf
.edu (8.8.5)
Received: from lemongrass.org by galangal.org (8.7.3)
Received: from graprao.com by lemongrass.org (8.6.4)