Last Updated Mon Jun 7 16:11:39 2010

trmail update 2006

Current Status 3 October 2006

Over the next couple of weeks, we intend to upgrade the TRIUMF email service. Planned improvements include: We also intend to improve performance by splitting off the email store (IMAP) from the email transfer agent(s) and filtering. For now, existing configurations will continue to work. However, please note the following:

Migration Strategy

The intention is to migrate user accounts with a minimum of disruption, without losing any mail.

Phase 0 (October 27th)

The IMAP server was upgraded to 2006a to allow testing of the new file format on a test account. New Internet standards for IMAP (support for UID List) required that some old mail messages with invalid UIDs be renumbered. Users using the POP protocol, particularly those who had not deleted messages on the server, saw duplicated emails and old messages re-appearing. (The mail program keeps a big list of messages UIDs it has already seen). Phase 1 (the night of October 3rd/4th)

"mail.triumf.ca" alias created as an outgoing server, points to "trmail"

For each user:

Rename trmail to trmail-old and rename trmail-new to trmail. The new machine (now called trmail) accepts all incoming mail, and users can read mail.

Hopefully, most users will be able to access their mail on "trmail" in the evening, and access it on "trmail" the next morning, with no disruption. (some hope!)

Phase 2 (again, probably at night)

For each user:

Phase 3

Phase 4

Phase 5

Hiccups

Since the IMAP mailserver was introduced at TRIUMF, Internet standards have changed.
Messages stored on the server now need a monotonic message ID number; renumbering of old messages caused a mismatch with status stored in the client (Thunderbird) and spurious new messages to be reported.
Unencrypted login is now deprecated. A software upgrade a few days ago caused users with plaintext logins to be locked out temporarily. Although we continue to support the old behaviour, users are recommended to switch to secure connection (TLS), particularly when using wireless. Typically, this setting is under "server properties" for each account on the client.

Undoubtedly there will be some more, though we will endeavour to provide (nearly) uninterrupted service to all users.


Current status:

Wednesday May 26 2010
Around 17:00 on Tuesday May 25 trmail logged errors on the sunstore connection, and the device went offline shortly afterwards. This affected some 70 users. In order to reset the connection, trmail was rebooted and subsequently had a boot failure where the system volume became unreadable. An attempt to recover this volume failed and the system was re-installed from distribution and backup media.

Around 08:00 on Wednesday the system appeared to be functioning normally, and deferred mail from other sites was being delivered to users' inboxes. Shortly thereafter some mail folders on the sunstore appeared to be corrupted - were unreadable, and the directory listing was similar to that seen on a failed disk drive. On the assumption that all data on the sunstore was suspect, new mail inboxes were created for affected users on another storage device - the Coraid. This device already contained a copy of user data prior to May 19th. At 12:00 on Wednesday, each affected user had 3 inboxes - a new one on the Coraid collecting incoming mail since 9am, an old one (INBOX-prior) on the Coraid with mail from before May 19, and a suspect one on the sunstore containing mail up till 9am. At this time, it appeared to users that most of their mail had disappeared, but they could send and receive new mail normally.

Once all live email was removed from the sunstore, the connection was reset and it was apparent that the files were not in fact corrupted on disk - the file corruption was an artefact of the network connection. Accordingly a procedure was implemented to merge the 3 inboxes into a single one on the Coraid. At the completion of this, users would have seen a complete inbox, but other folders would have reverted to a state from 10 days previous. An error in setting file permissions also caused folders to be unavailable for a while on Wednesday afternoon, and some incoming mail was delivered to a mail spool file.

After the merging of inboxes, other folders from the sunstore were checked for integrity and, if they more recent than those already on the Coraid, restored there. The permission problem was fixed and mail from the spool file merged with the normal inboxes. This was completed about 5pm on Wednesday.

Mon May 31 2010 The above procedure left a number of mailboxes, principally variations of "Sent" and "Drafts", which had been in use on the Coraid since May 26, along with a mailbox on the Sunstore containing messages saved between May 19-25. It was not possible to simply discard one copy, since messages would have been lost, and it was not practical to simply append one mailbox to the other since that would have left thousands of duplicate messages. Mailboxes for some users who had urgent needs were restored manually on Thu-Fri 27-28, while the majority awaited the development of an automated solution, since there were a total of some 500 affected mailboxes - too many for manual restoration.

Wed, Thu Jun 2-3 2010 A script was run to merge "Drafts" and "postponed-messages" folders. Messages with a "unique ID" seen on the Sunstore but not on the Coraid were copied across to the active folder on the Coraid, preserving their date. Depending on the mail client settings, they might appear in date order, or appended at the end of the folder.

Fri Jun 4 2010 The script was extended to all other folders, other than Inbox and Drafts which had already been processed. Affected folders were backed up to a folder suffixed ".bk4jun", and merged as described. A problem handling folders with spaces in the folder name was quickly found and fixed, and the process run for all affected users.

Mon Jun 7 2010 A few folders that had failed the script on Friday, and some that could not be merged because they did not exist on the Coraid, were copied.
At this point, all mailboxes should have been restored.

Sun Aug 16 10:50 PDT 2009
The Sun storage device (sunstore), which was being used to hold mail for high-volume users, failed. Email has been restored, and mail folders recovered from backup. Some mail will have been delayed. Mail received between the time of last backup and time of failure will be unavailable until sunstore is repaired.


[Background][Upgrade Strategy] [Report Problems]

21 May 2007 - remaining MBX and Unix mailboxes converted to MIX format. All mailfolders may now contain subfolders, not just MIX ones.

4 May 2007 - imap updated to version imap-h production release

27 March 2007 - - imap updated to version imap-2006g snapshot to address some issues with Outlook clients

27 Feb 2007 - imap updated to version imap-2006e
Nov 7 - squirrelmail personal addressbooks imported/merged from old server. Some addresses recovered from IMP addressbooks, marked "from IMP"
Backup files (.mbx) removed for most users.

Oct 29 00:00 - impad upgraded to version 2006c1
More mailboxes converted; most large boxes. Some .mbx backups kept for now; will be deleted.

Phase 1 completed October 4th; some problems...

ServicePlaintextTLSSSL
Webserver80: OK(443: OK)443: OK
SMTP25: OK25: OK465: OK
SMTP587: OK587: OK
IMAP143: OK143: OK993: OK
POP3110: OK110: OK995: OK
POP2109: disabledN/AN/A
LDAP389: OK(636: OK)636: OK

ServiceStatus
SpamAssassinRunning
AVP antivirusRunning
WhitelistOK
BlacklistRunning
MailmanRunning

Recommended Client Settings:

Users should uncheck "use name and password" in Thunderbird outgoing server settings - see this screenshot. On Windows, this is under Tools, Account Settings, Outgoing Server (SMTP). On Linux, it is under Edit, Account Settings, Outgoing Server (SMTP).
Windows clients using Symantec antivirus should select port 587 for secure connection (recommended for laptops); Symantec will not permit a TLS connection on port 25 and will generate a popup about "encrypted email detected".
(The old system did not support authentication, so ignored logins sent by many clients. Login is not necessary to send mail from inside TRIUMF.)

Outgoing mail: See Setting an outgoing SMTP server
Reading mail: See Setting an IMAP server

The following addressbook (LDAP) settings may be used:

TLS may not work with either of these.


POP3 is now enabled.
The new recommended outgoing SMTP server "mail.triumf.ca" generates a certificate warning on some clients if TLS or SSL is used. This can be ignored. In the future, this may be a separate machine with its own certificate and the problem will go away.

Upgrade History

trmail was moved to new hardware and operating system early Oct 4th.

Some users reported problems due to Web cache - their Web browser was showing the old page. Solution: reload/refresh page.
Some users reported problems due to ARP cache - their computer was still looking at the old hardware based on its Ethernet address. Solution: flush ARP cache, or reboot computer.

SpamAssassin was not enabled on the morning of Oct 4th. Users receivedsome untagged spam in their inbox instead of tagged or in "junk".
Some users were running a spam filter but had no "junk" folder and the mailer was reporting "unsafe" delivery; this folder has been recreated.

Many users were found to be requesting authentication "if possible" when sending mail. As the old mailserver did not support authentication this worked. Since authentication is enabled on the new server, users saw a password challenge and in some cases a Symantec antivirus alert. Initially the authentication mechanism did not work properly. Solution: turn off authentication for outgoing mail from within TRIUMF.

Initially POP3 service was configured to require encryption - the default on new servers for security reasons. Only TLS was working, which is not supported in older clients such as Netscape Communicator. Later, POP3 with SSL was enabled, and on Oct 14th, POP3 with plain login was enabled. Note: Encryption is still recommended, especially on laptops using wireless networking. POP3 passwords are trivially visible on wireless.

Due to a change in Internet standards, all email messages stored on the server now need a unique ID number; this forced renumbering of messages in some mailboxes. For clients using POP3 to read mail, this caused a mismatch with status stored in the client program (e.g. Thunderbird, Outlook). The client then downloaded "new" messages which the user thought they had deleted or already read.
This may happen again as we migrate to the new mailbox format.

LDAP (address lookup and website authentication) was not working. Temporary solution: use the LDAP service on the old machine (trmail-old). Note: A new CNAME "ldap.triumf.ca" has been created; please use that instead of "trmail" in addressbook and authentication configuration.

Mailman (mailing lists archives) were not working. Temporary solution: use mailing lists on the old server. The mailing list addresses (listxxx@trmail) are redirected and should work.

Some people's incoming mail was going in a mailbox called "broken" due to a missing XXX filter. "broken" mail was merged into normal inboxes. Some users may have seen old lost mail and spam reappear. Some mail may have been duplicated.

Between October 4th - 11th, the spam blacklists were not rejecting mail - the whitelist was misconfigured and was accepting all mail. Users may have seen more spam than usual, though most would have been tagged or filtered by SpamAssassin.

Between October 4th - 10th the virtual host Web pages - ca.triumf.ca, cbl.triumf.ca, rbl2.triumf.ca - were not working.

On October 4th, the Secure Web (https/SSL) was not working, blocking access to Webmail (Squirrelmail). As a temporary workaround users were directed to use the unencrypted service.

Between October 4th - 6th: the SpamAssassin central whitelists (CERN, JLAB etc.) were not configured - some "spammy" looking email from "trusted" institutions may have been flagged as spam, though this is fairly unlikely.

On Oct 16th sending mail with authentication (such as from laptops) was broken for a while following an LDAP rebuild.

Oct 19th: LDAP directories enabled for Squirrelmail
Oct 19th AM: trshare was rebooted when it became unresponsive; users could not sent mail through smtp.triumf.ca. Unrelated to the email upgrade. Oct 19th: Domain now set correctly for Squirrelmail outgoing mail
Oct 20-22: Some mailboxes (junk, Trash...) converted to Mix format, starting with larger files where most benefit will be seen.

Oct 25: Most inboxes converted to Mix format. Old format preserved temporarily as INBOX.mbx - you must subscribe to this if you want to check. This will be deleted later.

Oct 24: Mailman switched to new server, now called lists.triumf.ca

Nov 6 - Attn. Pine users - see this page for notes on Mix issues

Nov 2 15:20 - LDAP server "ldap.triumf.ca" moved to point to trmail (new). All authentication and addressbooks should use this CNAME (ldap.triumf.ca) as it may move again. Stop using trmail-old.triumf.ca

Services Supported:

LDAP

LDAP (Lightweight Directory Access Protocol) is a service used to perform addressbook lookups for TRIUMF. There are several servers which may be used, e.g. The LDAP service is also used to authenticate users for certain websites for access from offsite (e.g. www.triumf.ca/internal). It is also used to authenticate offsite users to use the SMTP service smtp.triumf.ca running on trshare.
(Note: with Korganizer one might have to check "Sub-tree query")

HTTP (webserver)

The trmail webserver supports Squirrelmail webmail, plus a user configuration tool. Secure HTTP (SSL) is enabled to safeguard passwords sent over wireless. Spam whitelist management also uses this service.

SMTP

SMTP (Simple Mail Transfer Protocol) is the service used to transport mail on the Internet. For a user, it is used for outgoing mail. For more detail, see Setting an outgoing SMTP server
The requirement to authenticate on port 587 (MSA) when using TLS from inside TRIUMF was removed Oct 18th. Windows users with Symantec Antivirus may use this; recommended for laptops (though a password is still required from offsite).

POP3

POP3 (PostOffice Protocol) is a protocol used to retrieve mail from a mailserver. It is traditionally used by ISPs to support customers at a single fixed location. TRIUMF users who wish to use POP should make sure they delete their mail from the server. Thunderbird offers an option to delete mail after a given period of time, as well as the traditional "immediately" and "never".

POP2

An earlier version of POP. This is unsupported at TRIUMF.

IMAP

IMAP (Internet Message Access Protocol) is a protocol used to manage mailboxes on a mailserver from one or more remote clients. For more detail, see Setting an IMAP server

SpamAssassin

The open-source SpamAssassin mail filtering system is running. User configuration is available under "User Config" on trmail

AVP antivirus

Kaspersky AVP antivirus is running on the new system

Whitelist

The Blacklist The RBL2 active blacklist is running.

Mailman

The Mailman mailing lists on trmail are currently not working. Lists are functioning on trmail-old but the standard addresses (listname@trmail) are working.

TLS

TLS (Transport Layer Security) is a standards-based encryption protocol similar to SSL which is built into most modern clients and services. It is typically offered on the normal (unencrypted) port as a negotiated service; the server lists it as a capability and the client, if it is supported, elects to use it. TLS is not available in older clients.

Either TLS or SSL should be used on wireless devices (e.g. laptops) whenever a password is sent over the network. This includes reading email with IMAP or POP, using the Squirrelmail Webmail client, logging on to TRIUMF internal pages from offsite, and sending mail via smtp.triumf.ca from offsite. In some cases (e.g. smtp.triumf.ca) this is enforced - the system will not accept a password unless encryption is used.

SSL

SSL (Secure Sockets Layer) is the original "Secure Web" protocol developed by Netscape to encrypt network traffic. It is typically offered on a separate port as a wrapper around a standard (unencrypted) protocol.

Problems?

Please report problems not listed here to Andrew Daviel, local 7376, or to: (this address does not rely on trmail; you can mail from a home ISP or via smtp.triumf.ca or another system)
A.Daviel