<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">

<?rfc compact='yes'?>
<?rfc toc='yes'?>


<!-- RFC references as names, not numbers -->
<?rfc symrefs="yes" ?>

<rfc ipr="full3978" docName="draft-hurtta-eai-messagestore-00"
     category="info">

   <front>
     <title abbrev="EAI Message Store">
Message Store requirements for Internationalized Email
     </title>
     <author initials="K.E.H." surname="Hurtta" fullname="Kari Hurtta">
     <organization/>

     <address>
       <postal>
          <street>Kala-Matti 4 B 24</street>
          <city>02230 Espoo</city>
          <country>FI</country>
       </postal>

       <email>hurtta-ietf@elmme-mailer.org
       </email>
       <uri>http://iki.fi/keh/</uri>
     </address>

     </author>

     <date month="March" year="2007" />
     <workgroup>Email Address Internationalization (EAI)</workgroup>


     <abstract>
     <t>The Email Address Internationalization (EAI) is implemented
        by allowing UTF-8 characters in SMTP envelope and mail
        headers. UTF8SMTP extension of ESMTP takes care that mails
        with UTF-8 characters in SMTP envelope and mail
        headers are not delivered to  EAI non-compliant SMTP servers.
        This document describes mechanism how to keep messages
        with UTF-8 characters on mail headers separated from
        EAI non-compliant Mail User Agents. This document also
        describes general requirements for UTF8SMTP Message Store.
      </t>
     </abstract>


  </front>

   <middle>

       <section title="Introduction">

       <t>
       Internationalized email <xref target="ietf-eai-framework" />
       includes UTF-8 characters 
       <xref target="RFC3629" /> on email headers
       <xref target="ietf-eai-utf8headers" />. Internationalized 
       email is incompatible with EAI unaware Mail Transport and User
       Agents. Therefore it is required that Internationalized email
       are separated from EAI unaware Mail Agents.
       </t>

       <t>
        UTF8SMTP extension <xref target="ietf-eai-smtpext" />
        of ESMTP takes care that mails
        with UTF-8 characters in SMTP envelope and mail
        headers are not delivered to  EAI non-compliant SMTP servers.
        EAI compliant Message Store 
        need take care that mails
        with UTF-8 characters in mail
        headers are not seen by EAI non-compliant Mail User Agents.
       </t>
       <t>
        IMAP protocol extension <xref target="ietf-eai-imap-utf8" />
        and POP protocol extension <xref target="ietf-eai-pop" />
        takes care that mails
        with UTF-8 characters in mail
        headers are not seen by EAI non-compliant Mail User Agents
	when served via IMAP and POP protocol. Therefore this
        this document focus more traditional Unix mailboxes,
        where Mail User Agents access mail via file system access.
       </t>
       <t>Focus of this document is on requirements of Message Stores
          and not actual implementation choice although some suggestions
          are given. IMAP and POP protocol assumes existence of
          Message Store. Requirements given on this document apply also
          Message Stores used by POP and IMAP servers.
       </t>
       <t>Mail submission is not discussed on this document.
       </t>

       </section>

       <section title="Model">

       <t>
       Terms "Mail Transfer Agents" (MTA), "Mail User Agent" (MUA),
       "Message Delivery Agent" (MDA) and "Message Store" (MS) are
       used according of <xref target="crocker-email-arch" />.
       </t>
       <t>
       Term "final delivery MTA" is used according of 
       <xref target="ietf-eai-framework" />.
       </t>
       <t>Term "UTF8SMTP message" are used for messages which
          follow <xref target="ietf-eai-utf8headers" />. </t>
       <t>Term "ASCII header message" are used for messages which
          follow <xref target="RFC2822" />. </t>

       <figure>
         <preamble>
          This document assumes following model for mail architecture.
         </preamble>

<artwork>

                +----------+         +---------+
  -- SMTP --->  |  final   |  -----> |         |
     (a)        | delivery |  (b)    |  MDA    |
                |   MTA    |         |         |
                +----------+         +---------+
                                         | |    
                                write    \ /
                                (c)       |     
                                     (---------)
                                     (         )
                                     (    MS   )
                                     (         )
                                     (---------)
                                          |
                                (d)      / \
                                access   | |
                                         | |
                                     +---------+
                                     |         |
                                     |   MUA   |
                                     |         |
                                     +---------+
</artwork>
       </figure>
       <t>
       <list style='hanging'>
         <t hangText="(a)">Mail arrives via SMTP <xref target="RFC2821" />.
         </t>
         <t hangText="(b)">Final MTA passes mail to MDA with proprietary 
            method or via LMTP <xref target="RFC2033" />.
         </t>
          <t hangText="(c)">MDA writes message to MS.
          </t>
          <t hangText="(d)">MUA access messages on MS with some means.</t>
       </list>
       </t>
       <t>This model is divided to two special model:
        <list style='numbers'>
          <t>Case where MUA access MS with POP and IMAP protocol, and </t>
          <t>Case where MUA access MS via file system access. </t>
        </list>
        </t>

        <section title="Message Store with IMAP and POP access">

         <figure>
         <preamble>
          This model assumes that MUAs do not access MS via file 
          system access.
         </preamble>
<artwork>

                +----------+  
  -- SMTP --->  |  final   |  
     (a)        | delivery |  
                |   MTA    |  
                +----------+  
                    |          
                    | (b)          
                    |          
                    v                     (----)
          +-----------------------+  (c)  (    )
          |         MDA           | ----- (    )
          |.......................|       (    )
          |                       |       ( MS )
          |    mail server        |  (d)  (    )
          |                       | ----- (    )
          |                       |       (    )
          +-----------------------+       (----)
            |         |         |
      (e) IMAP    (f)POP    (g)HTTP
            |         |         |
            |         v         |
     +---------+   +-------+  +--------+
     |         |   |       |  | WWW    |
     |   MUA   |   |  MUA  |  | browser|
     |         |   |       |  +--------+
     +---------+   +-------+
                      |
                    (-------)
                    ( MUA's )
                    (folders)
                    (-------)

</artwork>
       </figure>
       <t>
       <list style='hanging'>
         <t hangText="(a)-(c)">As on general model. </t>
         <t hangText="(d)">Mail server access messages on MS with 
                 some means.</t>
         <t hangText="(e)">MUA access mail server (IMAP server)
            with IMAP <xref target="RFC3501" />.  </t>
         <t hangText="(f)">MUA access mail server (POP server)
            with POP <xref target="RFC1939" />.   </t>
         <t hangText="(g)">WWW browser access mail server
            with HTTP <xref target="RFC2616" />.   </t>
        </list>
        </t>

        </section>

        <section title="Message Store with direct file system access">

         <figure>
         <preamble>
           This model assumes that MUA access Message Store with
file system access. 
         </preamble>
<artwork>

                +----------+         +---------+
  -- SMTP --->  |  final   |  -----> |         |
     (a)        | delivery |  (b)    |  MDA    |
                |   MTA    |         |         |
                +----------+         +---------+
                                         ||
                               (c) write ||
                                         vv
                   (----------------------------)
                   (   MS [incoming mailboxes]  )
                   (----------------------------)
                          | |
                          | | (d) read,write
                          | |
                       +-------+
                       |       |
                       |  MUA  | 
                       |       |
                       +-------+
                          |
                       (-------)
                       ( MUA's )
                       (folders)
                       (-------)

</artwork>
       </figure>
       <t>
       <list style='hanging'>
         <t hangText="(a)-(c)">As on general model. </t>
         <t hangText="(d)">MUA access MS via file system access.</t>
       </list>
       </t>

       <t>Traditionally Unix incoming mailboxes are files
          /var/mail/ or /var/spool/mail/ directory or
          similar places. User's incoming mail is located
          on file which name correspond to username of user.  
          These files are on "mbox" format 
          <xref target="RFC4155" />.
          <list style="hanging">
            <t hangText="NOTE:">Also other arrangements and formats
                exists.</t>
          </list>
        </t>
        </section>


       </section>

       <section title="Message Store requirements"
           anchor="ms-requirements">

       <t>
       Requirements for UTF8SMTP aware Message Store are following:
       <list  style="symbols">
       <t>UTF8SMTP messages must be accessible to UTF8SMTP aware MUAs
          on UTF8SMTP form without information lost.
       </t>
       <t>UTF8SMTP ignorant MUAs must not see UTF8SMTP messages 
          (messages can be hidden or downgraded 
           <xref target="ietf-eai-downgrade" />
          for UTF8SMTP ignorant MUAs).
       </t>
       <t>All messages should be accessible to UTF8SMTP aware MUAs
          on that form which  MS received them. This
          requirements is needed that possible signatures and
          hashes calculated for message can be verified.
       </t>
       </list>
       </t>

       <t>There is two mail reason why UTF8SMTP ignorant MUAs 
          must not see UTF8SMTP messages:
          <list  style="symbols">
             <t>Earlier standards allows only ASCII header fields. 
                Therefore UTF-8 may cause malfunction for MUA. </t>
             <t>Address syntax on  UTF8SMTP header fields includes 
                fallback address. Therefore UTF8SMTP ignorant MUAs
		are not able to parse these header fields. </t>
          </list>
       </t>


       <t>Notes:
         <list  style="symbols">
         <t>It is not required UTF8SMTP ignorant MUA can see UTF8SMTP 
            message on downgraded form. When MUAs access MS via file system
            this is seen to difficult (or wasteful) requirement. </t>
         <t>Although this document suggest that messages with ASCII 
            <xref target="ASCII"/>
            header fields and UTF8SMTP messages are handled separately it 
            is allowed that UTF8SMTP aware MS handles all messages as 
            UTF8SMTP messages.
            ASCII header message can be treated as subset of UTF8SMTP
            messages. This implies that UTF8SMTP ignorant MUA
            does not see any message (if downgrading is not provided).</t>
         <t>On some environment it is common that Subject: and some other
            header fields are not encoded sometimes, but use some random
            8-bit  
            (presumably system local) character set. If used character set 
            is not UTF-8, these are not UTF8SMTP messages. Therefore
            these messages can not be treated as UTF8SMTP messages as  
            ASCII header messages can be treated.
         </t>  
         <t>First requirement (messages accessible on UTF8SMTP form)
            practically forbids storing messages only on downgraded 
            form.</t>
         </list>
       </t>


       </section>

       <section title="Mail Delivery Agent requirements">

       <t>
       Requirements for UTF8SMTP aware Mail Delivery Agent are following:
         <list  style="symbols">
         <t>MDA must not deliver UTF8SMTP messages to UTF8SMTP 
            unaware Message Store. 
         </t>
         </list>
       </t>
       
       <t>Notes:
         <list  style="symbols"> 
         <t>
           In practice this means that if  Mail Delivery Agent
           is UTF8SMTP, MS must be constructed that way that
           requirements <xref target="ms-requirements">previous
           section</xref> are fulfilled.
         </t>
         <t>UTF8SMTP messages does not include label, which which tells
            which messages are UTF8SMTP messages and which messages are
            ASCII header messages.
            To discover message is UTF8SMTP message may require 
            that all message header fields (including header fields
            from MIME body parts) are parsed. Because
            MDAs are not expected to parse 
            messages, it is suggested that final delivery MTA
            pass that information to MDA. </t>
         <t>On some legacy environments it is common that Subject: and
            some other header fields are not encoded. Therefore presence
            of 8-bit bytes itself does not indicate that message
            is UTF8SMTP message. Test is also needed that sequence of
            8-bit bytes forms UTF-8 characters. </t> 
         </list>
       </t>

       </section>

       <section title="Final delivery MTA requirements">

       <t>
         UTF8SMTP aware MTAs must fill requirements of 
         <xref target="ietf-eai-smtpext" />. For UTF8SMTP aware 
         final delivery MTA there is following additional requirements:
         <list  style="symbols">
         <t>Final delivery MTA must not deliver UTF8SMTP messages 
            to UTF8SMTP unaware mail delivery agent.          
         </t>
         </list>
       </t>

         <t>Notes:
         <list  style="symbols"> 
           <t>
           If final delivery MTA is UTF8SMTP aware, it is recommended
           that MDA is arranged that way that it is
           UTF8SMTP aware.
           </t>
           <t>
           If communication between final delivery MTA and
           MDA use LMTP, UTF8SMTP response to LHLO command
           tells that MDA is UTF8SMTP aware. 
           </t>
           <t>
           In general final delivery MTA can be expected to parse
           messages, so it knows when them are UTF8SMTP messages.
           Passing that information to MDA may require that 
           LMTP is extended.
           </t>
         </list>
         </t>

       </section>

       <section title="Traditional Unix mailboxes">

       <t>On this section suggestion to how handle traditional
          Unix mailboxes is given. 
       </t>
       <t>Let's assume that incoming mail for user is stored to
          /var/mail/{username} file on UTF8SMTP unaware MS by using 
          "mbox" format <xref target="RFC4155" />, 
          where {username} is user's account name.
          To make MS UTF8SMTP aware, MS is modified following
          way:
         <list  style="symbols"> 
           <t>UTF8SMTP messages are stored to /var/mail/{username}:UTF8
              file by MDA (for UTF8SMTP messages ':UTF8' is 
              appended to name of file.)
              If environment is fully UTF8SMTP aware, all messages
              are stored to /var/mail/{username}:UTF8 file
              by MDA.
           </t>
           <t>ASCII header messages can be still stored to 
              /var/mail/{username} file.
           </t>
           <t>MDA is allowed (but not required) to store copy of  
             UTF8SMTP messages on downgraded form to 
             /var/mail/{username} file. This means that if
             downgrading is provided, storage space requirements
             are doubled. If downgrading fails, MDA must not
             reject message, but instead just store original
             UTF8SMTP message to /var/mail/{username}:UTF8 file. 
           </t>
           <t>On some cases it is common that Subject: and
            some other header fields are not encoded and use some 
            random 8-bit character set (presumably local character
            set of system). If UTF-8 is not used on header
            fields, these messages are not UTF8SMTP messages. 
            Therefore they are not stored to /var/mail/{username}:UTF8
            file. Handling of these messages is not changed.
           </t>

         </list>
       </t>
       <t>It is very unlikely that Unix account names includes ':'
          characters, therefore it is expected that ':UTF8'
          suffix does not conflict with user's account names.
       </t>        

        <t>UTF8SMTP aware MUA needs to do following:
           <list  style="symbols"> 
             <t>When incoming mailbox is opened, both files
                /var/mail/{username} and /var/mail/{username}:UTF8
                are parsed.
             </t>
             <t>If message with same Message-ID header field is
                presented both on /var/mail/{username} and
                /var/mail/{username}:UTF8 file, only message
                from /var/mail/{username}:UTF8 file is presented.
             </t>
             <t>UTF8SMTP aware MUA must not write  UTF8SMTP messages
                to /var/mail/{username} file.
             </t>
           </list>  
        </t>

        <t>It is recommended that configuration for MDA is provided,
           which allows specify 
           <list  style="symbols"> 
             <t>which users  accepts only non-UTF8SMTP messages, and </t>
             <t>which users accepts also UTF8SMTP messages.</t>
           </list>
              In lack of configuration, following heuristic
              is suggested:
              <list  style="symbols"> 
              <t>Presence of /var/mail/{username}:UTF8 file
	         indicates that user accepts UTF8SMTP messages.</t>
              <t>If file /var/mail/{username} exists and
                 /var/mail/{username}:UTF8 file do not exists,
                 that can be treated that user accepts
                 only non-UTF8SMTP messages. </t>
              <t>Presence of /var/mail/{username}:UTF8 file without
                 /var/mail/{username} file can be treated that
                 all messages should be treated as UTF8SMTP. </t>
              <t>If either /var/mail/{username} or
                 /var/mail/{username}:UTF8 file exists, that can be 
                 treated as temporary error condition. </t>
              </list>
        </t>
         
       </section>


       <section title="IANA Considerations">

       <t>There is no IANA considerations in this document.</t>

       </section>

       <section title="Security Considerations">

       <t>If user uses UTF8SMTP unaware MUA, UTF8SMTP
          messages may look for his/her that they go to bit bucket 
          although they appear to be delivered as far sender is 
          considered.</t> 

       <t>See "Security considerations" section in 
          <xref target="ietf-eai-framework" /> for more discussion. 
       </t>

       </section>

       <section title="Acknowledgements">

       <t>Various requirements and ideas are suggested 
          on IMA mailing list discussions.
       </t>

       </section>

   </middle>


   <back>


<!--                                   -->

<references title="Normative References">

 <reference anchor="RFC2822">
     <front>
       <title>
          Internet Message Format
       </title>
       <author role="editor" initials="P.W.R." surname="Resnick"
              fullname="Peter W. Resnick">
          <organization>QUALCOMM Incorporated</organization>
       </author>
       <date month="April" year="2001" />
     </front>
     <seriesInfo name='RFC' value='2822' />
 </reference>

 <reference anchor="RFC3629">
    <front>
      <title>
        UTF-8, a transformation format of ISO 10646
      </title>
      <author initials="F.Y." surname="Yergeau" fullname="Francois Yergeau">
            <organization /></author>
      <date month="November" year="2003" />
     </front>
    <seriesInfo name='RFC' value='3629' />
    <seriesInfo name='STD' value='63' />
 </reference>

 <reference anchor="ietf-eai-framework">
    <front>
        <title>
        Overview and Framework for Internationalized Email
        </title>
        <author initials="J.C.K." surname="Klensin" fullname="John C Klensin">
            <organization /></author>
        <author initials="Y.W.K." surname="Ko" fullname="YangWoo Ko">
            <organization>ICU</organization>
        </author>
        <date month="February" year="2007" />
    </front>

    <seriesInfo name="Internet-Draft" value="draft-ietf-eai-framework-05" />
</reference>

<reference anchor="ietf-eai-utf8headers">
    <front>
    <title>
    Internationalized Email Headers
    </title>
    <author initials="J.Y." surname="Yeh" fullname="Jeff YEH">
        <organization>TWNIC</organization>
    </author>
    <author surname="Abel" fullname="Abel Yang">
        <organization>TWNIC</organization>
    </author>
    <date month="March" day="4" year="2007" />
    </front>
    <seriesInfo name="Internet-Draft" value="draft-ietf-eai-utf8headers-04"/>
</reference>

<reference anchor="ietf-eai-downgrade">
     <front>
     <title>
     Downgrading mechanism for Email Address Internationalization
     </title>
     <author initials="Y.Y." surname="YONEYA" fullname="Yoshiro YONEYA"
       role="editor">
        <organization> JPRS </organization> 
     </author>
     <author initials="K.F." surname="Fujiwara" fullname="Kazunori Fujiwara"
       role="editor">
        <organization> JPRS </organization> 
     </author>
     <date month="March" day="5" year="2007" />
     </front>
   <seriesInfo name="Internet-Draft" value="draft-ietf-eai-downgrade-03"/>

</reference>
 
<reference anchor="ietf-eai-smtpext">
    <front>
    <title>
     SMTP extension for internationalized email address
    </title>
    <author role="editor" initials="J.K.Y." surname="Yao" fullname="Jiankang YAO">
    <organization> CNNIC</organization>
    </author>
    <author role="editor" initials="W.M." surname="Mao" fullname="MAO Wei">
    <organization> CNNIC</organization>
    </author>
    <date month="March"  year="2007" />
    </front>
    <seriesInfo name="Internet-Draft" value="draft-ietf-eai-smtpext-04"/>
</reference>

<reference anchor="crocker-email-arch">
    <front>
    <title>Internet Mail Architecture</title>
    <author initials="D.C." surname="Crocker" fullname="Dave Crocker">
      <organization>Brandenburg InternetWorking</organization>
    </author>

    <date month="March" day="4" year="2007" />
    </front>
    <seriesInfo name="Internet-Draft" 
       value="draft-crocker-email-arch-06" />
</reference>

</references>

<!--                                   -->

<references title="Informative References">

<reference anchor='ASCII'>
        <front>
          <title>USA Code for Information Interchange</title>
          <author>
            <organization abbrev="ANSI">
              American National Standards Institute
                (formerly United States of America Standards Institute)
            </organization>
          </author>
          <date year="1968"/>
        </front>
      <seriesInfo name="ANSI" value="X3.4-1968" />
      <annotation>ANSI X3.4-1968 has been replaced by newer
          versions with slight modifications, but the 1968 version
          remains definitive for the Internet. </annotation>
 </reference>

<reference anchor='RFC1939'>
   <front>
      <title>Post Office Protocol - Version 3</title>
      <author initials="J.G.M." surname="Myers" fullname="John G. Myers">
        <organization>Carnegie-Mellon University</organization>
      </author>
      <author initials="M.T.R." surname="Rose" fullname="Marshall T. Rose">
         <organization>Dover Beach Consulting, Inc.</organization>
       </author>
      <date month="May" year="1996" />
   </front>
   <seriesInfo name='RFC' value='1939' />
   <seriesInfo name='STD' value='53' />
</reference>

<reference anchor='RFC2033'>
    <front>
      <title>Local Mail Transfer Protocol</title>
      <author initials="C.G.M." surname="Myers" 
         fullname="John G. Myers">
         <organization>Carnegie-Mellon University</organization>
      </author>
      <date month="October"  year="1996" />
    </front>
    <seriesInfo name='RFC' value='2033' />
</reference>

<reference anchor='RFC2616'>
    <front>
       <title>Hypertext Transfer Protocol -- HTTP/1.1</title>
       <author initials="R.T.F." surname="Fielding" fullname="Roy T. Fielding">
             <organization abbrev="UC Irvine">
               Department of Information and Computer Science
             </organization>
       </author>
       <author initials="J.G." surname="Gettys" fullname="James Gettys">
             <organization abbrev="Compaq/W3C">
               World Wide Web Consortium
             </organization>
       </author>
       <author initials="J." surname="Mogul" fullname="Jeffrey C. Mogul">
              <organization abbrev="Compaq">
                Compaq Computer Corporation
              </organization>
       </author>
       <author initials="H." surname="Frystyk" 
              fullname="Henrik Frystyk Nielsen">
              <organization abbrev="W3C/MIT">
                 World Wide Web Consortium
              </organization>
       </author>
       <author initials="L." surname="Masinter" fullname="Larry Masinter">
              <organization abbrev="Xerox">
                 Xerox Corporation
              </organization>
       </author>
       <author initials="T." surname="Berners-Lee" fullname="Tim Berners-Lee">
               <organization abbrev="W3C/MIT">
                  World Wide Web Consortium
               </organization>
       </author>
       <date month="June"  year="1999" />
    </front>
    <seriesInfo name='RFC' value='2616' /> 
</reference>

<reference anchor='RFC2821'>
    <front>
    <title>Simple Mail Transfer Protocol</title>
        <author role="editor" initials="J.C.K." surname="Klensin" 
             fullname="John C Klensin">
            <organization>AT&amp;T Laboratories</organization>
         </author>
    
       <date month="April"  year="2001" />
    </front>
    <seriesInfo name='RFC' value='2821' />
</reference>

<reference anchor='RFC3501'>
    <front>
       <title>INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1</title>
       <author initials="M.R.C." surname="Crispin" fullname="Mark R. Crispin">
          <organization>Networks and Distributed Computing</organization>
        </author>
       <date month="March"  year="2003" />
    </front>
    <seriesInfo name='RFC' value='3501' />
</reference>

<reference anchor='RFC4155'>
    <front>
       <title>The application/mbox Media Type</title>
       <author initials="E.A.H." surname="Hall" fullname="Eric A. Hall">
          <organization />        
       </author>
       <date month="September"  year="2005" /> 
    </front>
    <seriesInfo name='RFC' value='4155' />
</reference>

<reference anchor="ietf-eai-imap-utf8">
    <front>
      <title>IMAP Support for UTF-8</title>
      <author initials="P.R." surname="Resnick" fullname="Pete Resnick">
        <organization>QUALCOMM Incorporated </organization>
      </author>
      <author initials="C.N." surname="Newman" fullname="Chris Newman">
        <organization>Sun Microsystems</organization>
      </author>
      <date month="March"  year="2007" />
    </front>
    <seriesInfo name="Internet-Draft" value="draft-ietf-eai-imap-utf8-01" />
</reference>

<reference anchor="ietf-eai-pop">
    <front>
      <title>POP3 Support for UTF-8</title>
      <author initials="C.N." surname="Newman" fullname="Chris Newman">
        <organization>Sun Microsystems</organization>
      </author>
      <date month="January" day="25" year="2007" />
    </front>
    <seriesInfo name="Internet-Draft" value="draft-ietf-eai-pop-01" />
</reference>

   </references>

   </back>


</rfc>
