mbox  –  packer plugin for Total Commander

version 1.10                Freeware                 © Jürgen Lüthje 2003-2005


Contents

What it is
Install
Usage
Names of extracted mail files
Note concerning long file names
Mail / news programs and mbox files
Background information about the mbox format
For programmers
References
Credits
License
Contact


What it is

mbox.wcx is a 32-bit Windows DLL for reading and writing Unix mailbox files (mbox format). This packer plugin basically makes it possible for Total Commander to treat mbox files like "directories" of mail files.

Install

This packer plugin works with Total Commander (TC) version 5.50 and later (some functions are not supported by older TC versions). I generally recommend to use the most recent version of TC.

Semi-automatic installation (TC 6.50 or later required)
Start TC and open the directory that contains the file mbox110.zip. After double-clicking at the file or selecting it and pressing  Enter , TC leads you through the installation process. The file extensions mbx and mbs are proposed for association with the plugin.

Manual installation
1. Unzip the file mbox.wcx to a directory, e.g. C:\Totalcmd\Plugins.
2. In Total Commander, choose "Configuration" > "Options" > "Packer".
3. Click the [Configure packer extension DLLs] button.
4. Type the file extension, that you want to associate with mbox.wcx (normally "mbx").
4a. If there is an older version of mbox.wcx installed on your PC, uninstall it by choosing "Associate with:" [none].
      Then click [OK], and continue with point 3.
5. Click [New type], and select the file mbox.wcx.
6. Click [OK].

Carry out the association process for each additional file extension that you want to associate with the plugin.

Languages
The built-in language is English. The plugin can read language files that are in the same directory as the file mbox.wcx. It comes with language files for Dutch, German and Polish, which will be used automatically if appropriate.
To get a file in your own language, translate one of the existing language files. %s is used as placeholder for file names, %d and %02d as placeholder for numbers. Take care not to add or remove placeholders and line breaks (\n), or to change their position relative to each other. If you name your file "default.lng" and put it in the directory where the plugin is, mbox uses it as language file (it might be necessary to restart Total Commander).
When you send me the file, I will offer it for download on my website, and include it in the next release of mbox, so that your language then will be supported automatically.

Usage

Total Commander can handle several types of archives, this plugin adds support for the mbox file type. If you don't know what archives are and how to work with them in Total Commander, please read Total Commander help, Chapter 3.e. "Archive handling (ZIP etc)" first.

All functions that Total Commander provides for working with archives are supported by the plugin. Here is an overview:
* list the mails inside mbox files, and sort them by subject, size, or date  – 
   for consistent sorting by subject, in 'wincmd.ini' SortUpper must be 0 (= Standard) or 1
* compare the contents of two open mbox files, and synchronize them
* compare two mails in the same or in different mailboxes
* view or edit selected mails
* extract messages to mail files (including all attachments)
* search for mails inside mbox files (using TC's powerful searching capabilities)
* delete selected mails from an mbox file
* create new mbox files
* add mails (including all attachments) to an open mbox file
* test mbox files whether they are intact, and whether they are in a form that probably can
   be read by most e-mail programs  (You'll get no message if everything is OK.)

Besides other things, with all these functions Total Commander in conjunction with this plugin is a powerful tool e.g. for finding and deleting duplicate mails, and for synchronizing mails in different mbox files.

Total Commander generally can handle an archive file more powerful, when its extension is associated with the concerning packer plugin. The following table shows some differences that I'm aware of (tested with TC 6.53):

  associated not associated
Open  Ctrl + PageDown  or
 Enter  or
double-click
 Ctrl + PageDown 
In rare cases this does not work because another packer plugin catches the keypress. Then open the file 'wincmd.ini', and move all lines that contain mbox.wcx to the beginning of the [PackerPlugins] section.
Commands > Search...
[v] Search archives
automatically supported The file 'wincmd.ini' must be changed:
Say your archives have the extension "box". Then in the [Configuration] section you have to add the line
    SearchInFiles = *.box
Copy files from one open archive to another supported not supported if the target archive is not associated
Synchronize dirs compare and delete supported,
copy not supported
not supported if at least one of the archives is not associated

The plugin will only add files to an mbox which it can identify as e-mail, because only in this case it will later be able to find the files inside the mbox again.

Each extracted mail file gets a time stamp according to its Date header field. Since this time stamp is expressed in Universal Time Coordinated (UTC) for all mails, messages from all parts of the world (e.g. on a mailing list) can be sorted by time in a consistent way. If the Date header field of a mail contains invalid data, then "01.01.1980 00:00:00" is used as time stamp.

The plugin locks the mbox files while it reads from or writes to them, so that they can't be altered by other programs at the same time. The program can read even corrupted files that contain binary data. When extracting messages, every ASCII character 26 ("End of File" marker) is replaced with the string "<EOF>". So even corrupted messages can be opened with any text editor after extraction. If the last line of a message only contains a dot, then that line will not be copied to the mail file during extraction. This is because there are programs, which cannot handle mail files containing such a line correctly.

This software has been used to extract messages from mailboxes with a size up to about 190 MB, containing more than 38,000 messages. mbox files up to 2 GB shouldn't cause problems, as long as there is sufficient (virtual) memory available on the machine. It will take some time to open a huge file, though. This is because an mbox file doesn't contain any header data, and therefore the whole file must be parsed on opening.

Names of extracted mail files

An mbox does not contain file names, so the plugin must create them itself. Each mail file is named after its subject. Thereby special characters, that are not allowed in FAT32 and NTFS file names under Windows [1], are replaced:
    " is replaced with '
    : is replaced with .
    /\?*<>| are each replaced with a blank


"Re:", "Re[2]:", "Re[3]:", "Re[4]:", "Aw:", and "Fw:" at the beginning of a name, and superfluous blanks and tabs are removed. Long file names are truncated, so that they don't exceed a maximum of 60 characters (including the extension ".eml"). A truncated name is denoted by "...". If a mail doesn't have a Subject header field, or the field body is empty, "[no subject]" is used as file name.
In order to get a unique name, '_' and a hexadecimal number with 9 digits  –  representing date, time and time zone of the mail  –  is appended to the file name. This way, we'll almost always get a file name that only depends on characteristics of the message itself.

When there are duplicate names, serial numbers in square brackets will be added to all names except the first one, e.g.
    important_message_2F5B5B73A    eml
    important_message_2F5B5B73A[2] eml
    important_message_2F5B5B73A[3] eml

Note concerning long file names

Long File Names (LFN) can cause unexpected problems on FAT file systems  –  not on NTFS file systems ("My Computer" > Right click at concerning drive > "Properties").

LFN are stored using a series of linked directory entries. A LFN will use one directory entry for its short 8.3 name, and a hidden secondary directory entry for every 13 characters in its long name (including dot and extension). So if you had a 120 character long file name, this would use 11 entries!
This can cause problems on FAT file systems, because on those the number of entries in a directory is limited! That means if you write too many files with too long names into one directory, this directory sometime will be "full", even if there is enough free space on the disk!
E.g. on FAT32 under Windows 98 there seems to be a maximum of 65,535 entries per directory. Say we have an mbox file that contains about 20,000 mails, and for the sake of a simple calculation let's assume that the names of all these mails have the same length. When the whole mbox file should be unpacked to one directory, the names of the mails must not be longer than 26 characters.

So when your disk has a FAT file system, and Total Commander shows the message
    "Error writing <filename>.eml"
although it already has successfully written many files to the concerning directory, and although there is enough free space on the disk, then this LFN problem might be the cause. In this case unpack the messages in the concerning mbox file to several separate directories, or unpack them to a different harddisk which has the NTFS file system.

Mail / news programs and mbox files

Because there are programs that can't read mbox files generated by certain other programs, and because some programs only can write/export mbox files while others only can read mail files (according to RFC 2822 [5]), and vice versa, this plugin can be very useful for migrating from one program to another (see also Examples of e-mail conversion).

Many mail clients and news readers such as Mozilla Mail, Thunderbird, Opera Mail, Pegasus Mail, Becky, The Bat, PocoMail, Forte Agent, 40tude Dialog can read from or write to mbox files. Several programs use mbx as extension for mbox files, Opera Mail uses mbs. Mozilla Mail and Thunderbird are somewhat special, their mbox files normally don't have any extension at all.

When you are going to change mbox files that are used by your mail or news client, be careful, and keep a backup of the original files. The general procedure is as follows:
Firstly "compact" the concerning folder(s) in your client, so that messages that previously have been "deleted" will actually be removed from the regarding mbox file. Then exit the program, and treat the mbox file(s) with Total Commander. Most mail or news clients use for each mbox file a corresponding index file, that contains internal information. E.g. in Thunderbird the file "Inbox" is an mbox file that contains the messages, and the file "Inbox.msf" is the index file. When you have changed the mbox file, the information in its index file is not valid anymore. So delete the index file, your mail client will create a new one when it starts the next time.

Background information about the mbox format [2,3]

This is a common format for storage of mail messages. There is no true specification of it, though. An mbox is a text file containing an arbitrary number of e-mail messages. Each message consists of a postmark, followed by an e-mail message, formatted according to RFC 2822 [5]. The file format is line-oriented. The postmark is a line that begins with the string "From " (note the space!), not followed by a colon. Because of the wide-range of variations in practice, nothing else on the "From " line should be considered.

However, this software does not regard every such "From " line as the beginning of a new message, because sometimes it is a normal line in the text body of the mail (e.g. "From now on ..."). Only if the lines immediately following the "From " line look like an e-mail header, then this "From " line is regarded as delimiter between two messages. Thereby the program is robust and recognizes even syntactical incorrect headers (see RFC 2822 [5]), if they are not too seriously damaged.

Eudora [4]
The Eudora mailbox format is nearly Unix mbox format, but contrary to popular belief it is not identical to it. Unfortunately, Eudora uses nevertheless the file extension 'mbx', too.
The Date header field is often left off of Eudora messages, presumably because it is contained in the initial "From " line. This does not correspond to RFC 2822 [5]. Also in contrast to the mbox format, Eudora extracts all attachments, and saves them as separate files.

For programmers

Programmers can use this 32-bit Windows DLL with their own programs. The DLL exports the routines
    OpenArchive()
    ReadHeader()
    ProcessFile()
    CloseArchive()
    SetProcessDataProc()
    SetChangeVolProc()   [not supported]
    GetPackerCaps()
    CanYouHandleThisFile()
    DeleteFiles()
    PackFiles()
    ConfigurePacker()
    PackSetDefaultParams()
mboxdemo.exw is a sample Euphoria program that uses the DLL. For additional information see "WCX Writer's Reference" [6].

The source code of this DLL is available as separate package mbox_src110.zip.

References

File systems
[1] http://en.wikipedia.org/wiki/Comparison_of_file_systems

Mbox format
[2] http://www.qmail.org/man/man5/mbox.html
[3] http://www.python.org/doc/current/lib/module-mailbox.html

Eudora mailbox format
[4] http://eudora2unix.sourceforge.net/details.html

Internet Message Format (RFC 2822)
[5] http://www.faqs.org/rfcs/rfc2822.html

WCX Writer's Reference
[6] http://ghisler.fileburst.com/plugins/wcx_ref2.1.zip

Credits

This software was written in Euphoria, and translated using the Euphoria To C Translator 2.5. Thanks to RDS for one of the best general purpose programming languages, and for outstanding support.

Thanks to Elliott Sales de Andrade, Bob Elia and Matt Lewis from the "Euphoria Community" for their valuable help, to Jiri Babor and George Papadopoulos for some good code, and to Matt Lewis also for his indispensable tool 'make_atom'.
Thanks to Wilhelm M. from the "Total Commander Community" for helpful hints.

The C code produced by the translator was compiled with the Borland C++ 5.5.1 Command-line Compiler. Thanks to Borland Software Corporation for providing this powerful tool free of charge.

Thanks to Christian Ghisler for Total Commander, the "Swiss army knife" for the Windows PC. Total Commander is an essential tool that greatly enhances my productivity.

Thanks to the translators (see respective language files for the names) and to everyone who sent me feedback to the program.

License

If you do not accept the following license, then you are not allowed to use or distribute this software.

1. Copyright
mbox is copyright 2003-2005 by the author Jürgen Lüthje, all rights are reserved.

2. Right to use
mbox is Freeware. You may use the program free of charge and unlimited in time.

3. Copying
You may copy and distribute the software and its documentation, as long as the file mbox110.zip is not modified. This means, among other things, that you are not allowed to rename the file, or split it into pieces.
Without express written permission from the author, you are not allowed to distribute the program included in another archive or file.
You are not allowed to sell the program, or to enclose it with a commercial program or a commercial collection of programs. The program may be distributed in the same ways that are allowed for Total Commander, though.
When you want to distribute the program on a data medium such as CD or DVD, be sure that you have got the most recent version from my website (see below). But do not distribute beta versions.

4. Support
You are not entitled to support by the author. However, the author tries to answer inquiries by e-mail.

5. Disclaimer
This software is distributed WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. I do not accept responsibility for any effects, adverse or otherwise, that this code may have on you or your computer. Use it at your own risk.

Contact

This software can be downloaded from http://home.arcor.de/luethje/prog/. If you are in doubt whether you have got an original, unaltered copy of the file mbox110.zip, calculate its MD5 checksum. (This can be done with Total Commander via "Files" > "Create CRC Checksums"). The MD5 checksum must be equal to the one published on my website.
Please send questions, suggestions, comments, and bug reports to <prog.lue AT arcor.de>.

Security note:
I use this mail address only for receiving mails, never for sending. If you get a mail that seems to come from this address, it is a fake probably caused by a virus or something similar, which does not come from me.


J. Lüthje, 29. October 2005