The Complexity of Time Data Programming
Anybody writing software that has to work in more than one geographic area must at some point think about how to handle time zones. Many developers have an incomplete picture of how time zones work, and this post is written in an attempt to describe this convoluted area.
What is a time zone?
Because the earth is spinning around itself at the rate of one revolution per 24 hours, we have noon at a different time each day based on the longitude of our location. If we wanted to have noon at 12:00 (a.k.a 12 pm) everywhere, we would have to have our clocks set slightly differently based on the location. If our requirement was that noon would be at 12:00 within one minute, we would have 1440 different clock settings (zones) around the planet.
This would not be a problem if we did not travel or communicate over longer distances. It actually was what was done in the 19th century when each town and city would have their own time based on its longitude. But then we got fast long distance travel with the railroads and instantaneous communications with the telegraph and later the radio.
This is when time zones got created. Instead of granularity of one minute, it was decided to use one hour instead as a trade-off between noon time and a number of zones. The primary time zone, called Greenwich Mean Time (GMT), spans from 7.5 degrees east longitude to 7.5 degrees west. Then we would have only 23 other time zones with each one spanning 15 degrees longitude - right? Not quite so. Countries and territories that cross these artificial boundaries have instead decided to draw these lines differently as shown on the map below.
Even if that was it, handling time zones would not be that complicated. We would always be able to calculate the local time in any place only based on its location. But there is more to it.
UTC versus GMT
GMT is a time zone around the Prime Meridian that runs through Greenwich in England. GMT is the time zone in the UK in the winter time, but they now use British Summer Time (BST) during Daylight Saving Time. UTC, on the other hand, is not a time zone but an international time standard reference. Clocks on GMT follow UTC, and other time zones are usually defined by their offset from GMT.
Daylight saving time (DST)
The idea of shifting the clocks by one hour during the summer time was first implemented for a whole country in Germany during the first world war in order to save energy. By shifting the clocks, evenings would have sunlight longer than otherwise and this would save energy for lights. The practice is still used in North America and Western Europe, even though research has shown doubtful energy savings as only a small part of the energy is now used for lights compared to manufacturing, transportation, heating and cooling. Russia and most of Asia and Africa do not use DST.
Daylight saving time means that we cannot determine local time based only on location. In addition, we have to know if daylight saving time was in effect. There is a particularly complex situation that occurs twice a year in time zones that use daylight saving time. In the spring when DST takes effect, there is an hour in local time that does not exist and a day that is only 23 hours long! In the fall the opposite happens – a whole hour occurs twice in the same day and that day is 25 hours long!
Here is what happens to the local clock around 09:00 UTC during the “Spring forward” event in the “America/Denver” time zone (also called Mountain Time). Just before 9:00 UTC the local time is about to be 2am, but then it jumps to 3am. Note this scenario happens in the US and Canada one hour at a time from east to west every spring.
The sequence of events in the fall is even worse. Just before 8:00 UTC the Denver clock is approaching 2am. Then “Falls back” to 1am causing the hour between 1am and 2am to happen twice!
This whole scenario would be easier to deal with if it was known with certainty when these changes are going to happen, but that is not even the case. Up until 2007, these changes would happen during the same Sunday night in Western Europe and the United States and Canada. In 2007, the US Congress extended the DST period, among other things to allow children in Wyoming to go trick-and-treating in more daylight. Since then there are three to four weeks a year where the time difference between Western Europe and North America is not the same, causing interesting challenges for everyone having to communicate in real time across the Atlantic. Not to mention that the European Union countries “Fall back” at 1am UTC, not one time zone at the time.
All these complications might sound bad enough to deal with, but the reality is even worse. There are States and Provinces that do not use Daylight Saving Time. Arizona and Saskatchewan are examples. Native American Reservations inside Arizona do use DST though. Most States have the same time zone and daylight saving time throughout the State. This is not the case in Indiana. In Indiana 12 of the 92 counties use Central Time while the rest are on Eastern Time. Now all counties observe daylight saving time, but that has not always been the case. Imagine a single state with four different local time standards.
Not all time zones are whole hours away from UTC. Newfoundland and Labrador in Canada are three and a half hour behind UTC and two and half hour during DST. There are several time zones with a 45 minute offset from UTC, including Nepal Standard Time which is 5:45 ahead of UTC.
Names and references to time zones
The earliest Unix systems could handle a single pair of time zones for local time. One for winter time and one for summer time. These names and their rules were compiled into the operating system kernel.
Later the user could select his time zone at a run time with setting the TZ environment variable, that would be set to values like EST5EDT for Eastern Standard Time and Eastern Daylight Time. It is clear from all the exceptions to regular time zones that this scheme is incomplete. The POSIX Operating Systems Standards group decided to do something about it and came up with a really complicated scheme using a string notation for the TZ variable to capture the dates and offset from UTC for the individual areas. The standard prescribes a string on the form:
In spite of this complexity, the POSIX scheme fails in specifying time zone information for historical data. This is why the preferred time zone data is using the tzdata database from IANA (Internet Assigned Numbers Authority). This database uses names like America/Denver and America/Phoenix instead of the incomplete MST7MDT and MST respectively. This database also has the old time zone names but their use is discouraged.
All Linux and Unix operating systems support the IANA scheme, this is also the case for Apple’s OSX. Microsoft Corporation has a similar scheme for Windows but the time zone names are not the same.
The principal benefit of the IANA database is that it also contains historical information on its time zones. It can be used to find out that London was on GMT all year in 2004 but now it switches to BST (British Summer Time) in the summer.
Programming using time data
It should be obvious at this point that handling date and time correctly across time zones is not a trivial task. Please consider the following points before you determine how to handle time zones in your system.
Local time is ambiguous
This is especially true when local time uses DST. If we only store the local timestamp it is impossible to look at historical data in chronological order because of the “Fall back” behavior described above. It is also impossible to convert one local timestamp to local time in another time zone without converting first to UTC, for the same reason. Also note that if such a conversion is needed the time of day is not sufficient for conversion, the date part of the timestamp must be used to determine the offset from UTC. Dealing with historical data can be really tricky, as daylight saving periods sometimes change from one year to the next.
Store timestamps in UTC
If it is required to view data in a different time zone the common practice is to convert local timestamps to UTC before they are persisted and then convert them to the desired time zone when needed. This is almost always the best strategy, as opposed to storing the value of the local time. Even if the system is only currently being used in a single time zone this allows for correct chronological ordering of data even with DST.
Converting time data is not as hard as it might sound. Most programming environments provide library functions to perform the conversion, even for historical data. This is true for code written in C/C++ on Unix-based systems and Windows. More recent programming environments like Java, .NET, Python, Ruby/Rails and PHP also have good support for time zone conversions. If somebody tells you that you can support time zones by storing local time and an integer preference for the user as the offset from UTC, you can tell them they are wrong!
Most databases support a DateTime data type. Some of them have a built-in time zone conversion and storage using a similar scheme as tzdata described above. Before using a new database you should make sure you understand all the details on how time zones are handled (if at all).
Using PostgreSQL the following SQL demonstrates the “Spring forward” and “Fall back” for the year 2016 in Denver Colorado.
DROP TABLE IF EXISTS t; CREATE TABLE t(the_time TIMESTAMP WITH TIME ZONE NOT NULL); INSERT INTO t(the_time) VALUES ('2016-03-13 08:59:59Z'),('2016-03-13 09:00:00Z'), ('2016-11-06 07:59:59Z'),('2016-11-06 08:00:00Z'); SELECT the_time AT TIME ZONE 'GMT' as the_time, the_time AT TIME ZONE 'America/Denver' AS local_time FROM t ORDER BY the_time;
Giving the following results:
Gmt_time local_time 2016-03-13 08:59:59 2016-03-13 01:59:59 2016-03-13 09:00:00 2016-03-13 03:00:00 2016-11-06 07:59:59 2016-11-06 01:59:59 2016-11-06 08:00:00 2016-11-06 01:00:00
This is the same example as before, showing two pairs of timestamps around the changes from daylight saving time on March 13th and at the end of DST on November 6th in 2016. In GMT there is one second between the timestamps. In the spring this becomes one hour and in the fall it becomes minus one hour!
There are several different issues that have to be taken into account regarding time zones when writing a web application. The first issue is if the time zone of the user can be determined automatically. The answer is yes, but it is not as simple as getting the time zone from the HTTP headers in the request.
Another issue with web applications is that many web servers are multithreaded with a common thread pool. Each thread from the pool will serve one request from a user, send a reply and then return to the pool. In such an architecture you must be careful to set the time zone of the thread correctly before beginning any processing on each request. Care must be taken that this operation is not too costly as it must happen on every request, so the time zone information is often stored in the user session.
Testing time zone code
A good area to test any code that uses time zones is to investigate the behavior of the system around the daylight saving boundaries - as I’ve been doing in the examples above. It is maybe especially important in systems where duration or a difference between timestamps is important for their function. Conversion to UTC makes testing those properties very easy, even when the original events originate in different time zones. An example of this is the duration of transport between locations that are not in the same time zone.
It should be clear making computer systems that handle multiple time zones correctly is not simple. If the system also has to handle historical data the complexity increases considerably. This is in part because the political system that controls time zones and daylight saving time, in particular, does not appreciate the technical problems that it creates.
All this would be much easier to deal with if daylight saving time was done away with as most of the world has already done.
Ideally, we would do away with time zones as well! The whole planet can of course simply use UTC. Then a point in time would be defined without any ambiguity. The only difference would be that we would go to work and/school at a different time of day. But noon would no longer happen at or around 12:00 (except on the prime meridian) which is probably not acceptable for the general public. If you look carefully at the map in the beginning of this post you see that China has already taken this step and have a single time zone covering an area comparable to four time zones in Russia.