Considerations when formatting phone numbers
If you ever read my post about considerations when customising address fields, I reckon you would know by now that I am all about consistency and standards. No, I am not aspy (my mother had me tested). But I could argue that computers are, which is why I am all for data quality and integrity.
I reckon everyone reading this would agree that normally, the formatting of phone numbers is a minor issue. However when one is involved in a project as a solutions architect and a requirement arises asking for phone numbers to be ‘perfectly’ formatted on systems, I am sure they’ll beg to differ. In this post I’ll be going over misconceptions and considerations around phone numbers an the consequences it might have on computer systems.
The assertions in this article are not based on whims. The International Telecommunication Union (ITU) has published recommendation E.164 trough its Telecommunication Standardization Sector (ITU-T) which defines a numbering plan for the world-wide public switched telephone network, and E.123 which concerns the international notation of phone numbers.
Understanding the phone numbering scheme
So the first thing we need to clarify about the way the public switched telephone network (PSTN) works is that when dialling on the exchange, we enter a combination of numbers which include the number we are trying to reach, along with the necessary trunk prefixes (or trunk codes) that tells the network how to route the call. The phone number is a constant, and the trunk codes are variables that differ depending from where the number is being dialled from. Examples of trunk prefixes include:
- International call prefix – Also known as exit code, this is the prefix used to tell the phone network that one is trying to dial a phone number at another country/region. For example, when calling Brazil from the UK, a user would dial
00
as the international call prefix, followed by55
which is Brazil’s country code. From the USA a user would dial011
instead of00
. - Long distance call prefix – The prefix used to tell the phone network that one is trying to dial a phone number outside the local area. For example, when calling Birmingham from London, a user would dial
0
as the call prefix, followed by the area code for Birmingham which is121.
- Operator prefix – In some countries, the user can/must choose the operator they will use for the call by adding the operator prefix to the number being dialled. In Brazil for example, a user must choose the operator when making an international or long distance call by adding the two-digit operator code between the trunk code and the phone number.
Besides the previously mentioned prefixes, we then have the actual phone number, often referred to as the subscriber phone number. Now that we covered the basics on phone numbering scheme, let’s start covering the main issues with telephone number formatting.
First issue: Writing down the international call prefix with a phone number
When writing a phone number down, a lot of people add the international prefix (e.g. ‘00
‘) in front of the country code. This is fundamentally wrong, as 00
is not part of the number, but it is a trunk code in some countries to inform the phone network that the number that follows is an international number. The problem is that different countries have different international trunk codes (e.g. in the USA, and Canada it is 011
). So for someone in the USA, Canada, Japan, Israel or Australia (to name a few), to see 00
in front of a phone number makes absolutely no sense.
Since the international trunk code differs among countries and regions, the E.123 recommends using a ‘+
‘ (plus sign) to denote it instead. Many network operators and computer systems understand the plus sign and translate it accordingly in order to use the correct exit code depending on the requirements of the current phone network. This is particularly important when roaming across countries/regions that use different exit codes. So one should always use the plus sign when saving contacts in mobile phones.
Second issue: Confusing trunk prefixes and area code
This is by far the biggest sin around the formatting of phone numbers. In order to understand it, we need to look into different aspects of telephone numbering plan.
First, area codes differs from country to country, and these can have a fixed length or a variable length. For example:
- Fixed-length area code – In Brazil, every area code is comprised of two digits.
21
for Rio de Janeiro,11
for São Paulo, etc. - Variable-length area code – In the United Kingdom, the length of the area code varies from two to five digits depending on the area.
20
for London,121
for Birmingham, etc.
Now as you can see, I have omitted the 0
for the examples of area codes mentioned above. The reason why is because both in Brazil and in the UK, 0
is a long distance call prefix which one would dial only if the phone number they are trying to reach is outside their area. Moreover, if I am in New York and I want to call Rio de Janeiro or Birmingham I would not include the 0
between the country code and the area code. This is because both Brazil and the United Kingdom follow a variable-length dialling plan instead of a fixed-length dialling plan.
NOTE: Fixed-length and variable length area codes should not be confused with fixed-length and variable length dialling plans.
Variable-length dialling
When there are different dialling arrangements for local and long distance calls, meaning that trunk prefixes are added or removed depending from where the call is being originated. For instance in both Brazil and the UK, if someone wants to call a phone number within the same city (i.e. phone area) then the person would simply dial the number without the area code. However if calling the number outside the area, then the area code is required.
For example, to call the Birmingham City Council:
- From Birmingham:
303 6789
(just dial the number directly). - From outside Birmingham, but within the UK:
0 121 303 6789
(dial the long distance call prefix which is0
in the UK, followed by the area code for Birmingham which is121
, followed by the subscriber phone number). - From France:
00 44 121 303 6789
(dial the international call prefix which is00
, followed by the country code for the UK which is44
, followed by the area code for Birmingham which is121
, followed by the subscriber phone number. Notice that the0
between the country code and the area code is not included).
As per the ITU-T Recommendation E.123, it is customary in such cases to include the phone number with the area code between parentheses when writing it down. in order to indicate the bit that is not required to be dialled when being in the same area as the number. So according to this recommendation, the correct way to write down the phone number for the Birmingham City Council would be: +44 (121) 303 6789
. A lot of people however would tend to write this phone numbers down as +44 (0) 121 XXX XXX
or +44 (0121) XXX XXX
which is incorrect since 121
is the area code and 0
is just the prefix. This is due to the misconception that the parenthesis is meant to be used to denote trunk digits which are not required depending from where the call is being originated from abroad. This tends to cause a lot of confusion to the point that some people would not know how or when to include the prefix and/or the area code, which would result in calls not being completed or people dialling area codes when not required.
This is a particular case of confusion within the UK as discussed in this article on Wikipedia. However at the time of this writing, the article on Wikipedia does not address the issue in full as it considers the long distance call prefix as part of the area codes. For example, the article in question says that the area code for London is 020
, which is incorrect since 0
is the long distance call prefix, and 20
is the area code. So just to be clear, the correct way to format a Greater London phone number would be +44 (20) 8XXX XXXX
, not +44 (020) 8XXX XXXX
as the Wikipedia article suggests.
Full-number dialling plan
When the phone number must include the long distance call prefix regardless if being called from within the same area (i.e. a local call) or not. For example, in France the area code for Paris is 1.
If someone would like to call the Paris City Hall (Hôtel de Ville):
- From Paris:
0 1 4276 4040
(the first0
is dialled regardless). - From outside Paris, but within France:
0 1 4276 4040
(same as above). - From the UK:
00 33 1 4276 4040
(dial the international call prefix which is00
, followed by the country code for France which is33
, followed by the area code for Paris which is1
, followed by the subscriber phone number. Notice that the0
between the country code and the area code is not included).
Now some countries that followed the fixed dialling plan have decided to take a step further and simply abolish the concept of long distance call prefix. One good example is Italy, where the area code for Rome is 06
. The 0
in this case is not a long distance call prefix. It is actually part of Rome’s area code!
If someone would like to call the City of Rome information hotline:
- From Rome:
06 0606
(the first0
is dialed regardless). - From outside Rome, but within Italy:
06 0606
(same as above). - From the UK:
00 39 06 0606
(dial the international call prefix which is00
, followed by the country code for Italy which is39
, followed by the area code for Rome which is06
, followed by the subscriber phone number).
Practical implications
Just to be clear: My problem is not that most people do not know by heart the idiosyncrasies of phone number formatting. I am quite aware that this is an eccentric subject. The problem is assuming that the way one think is right is right for everyone else. And when such presumptuousness is employed in systems design, things tend to go pear-shape. Remember that it does not matter what you or I think it is right: The only thing that matters here is the computer logic.
A few years ago, I remember being abroad with my father and seeing him struggling to make a phone call on his Blackberry. He was trying to call a number back home which was saved on his address phone’s book, but the call would not complete. I had a look on his phone and chuckled. The number he was trying to call was formatted as a long distance call as if he was manually dialling the number being in Rio de Janeiro (it even included an operator prefix). I had to update all of his phone numbers to follow the correct format.
The correct phone format
Which format is the correct format? A format that will allow the number to be successfully dialled no matter where the dialler is currently at: The format established in the ITU-T Recommendation E.123:
+[COUNTRY CODE] [AREA CODE] [PHONE NUMBER]
That simple. Just make sure that you do not include any prefixes. Here are some examples:
- Birmingham City Council:
+44 121 3036789
- Paris City Hall:
+33 1 42764040
- City of Rome information hotline:
+39 06 0606
This is how I have my phone numbers formatted in every system that I use and create, including my mobile phone contacts. I never had a single problem dialling phones formatted in this fashion, no matter the country I was trying to dial from. When making a local call or long distance call, the country code and area code will be disregarded as appropriate. Also any prefixes will be added as required. If an operator prefix is required, a default one will be specified, which is normally the network carrier you are dialling from (at least in my experience based in Brazil). When making an international phone call, every modern phone system/network phone should understand how to read the plus sign, and will replace it with the appropriate exit code depending where the call is being originated from. Bear in mind that spaces, dashes and parenthesis are just use for formatting purposes and are disregarded when dialling.
Dialling assistants: Do not take them for granted
There are cases where calls are completed even when phone numbers have been formatted incorrectly. Most modern phones today have a dialling assistant function which tends to correct people’s mistakes when dialling numbers, which is good for fixing common mistakes, but (in my view) bad for perpetuating misconceptions. For example if someone has the phone number for the Birmingham City Council saved as 0044 121 3036789
in their phone and they try to call it while they are roaming within the USA, the call might complete even though the exit code within the USA is 011
, not 00
. This is because the phone’s dialling assistant (if available) will send the number within the correct format to the phone network. Even if the number was saved with a long distance call prefix between the country code and the area code, a dialling assistant would remove it.
I feel it is important to emphasise here that these dialling assistants are hard-coded into smartphones like the Apple iPhone or Google Android and other systems. They also take long to develop, are costly to maintain (since telephone numbering schemes tend to change), and are not always perfect (I could write a whole post about the mess Ireland did with the 1
area code). So if you are designing a computer system, take into account that phone numbers formatted incorrectly won’t just magically work.
Storing and presenting phone numbers
Now that we discussed the recommendations on how to format phone numbers, let’s consider how to store them. The first thing to take into account is that we should not necessarily store data in the same way we want to present it. We want our data to be stored in the most agnostic way possible, and any subjectivity should be handled at the presentation. Therefore by taking the ITU-T Recommendation E.123 into account, I opt for simply storing the numbers without any prefixes whatsoever, including the plus sign denoting the international call prefix. I also refrain from adding any other sign such as dashes for separating numbers or parenthesis.
Following this rationale makes it easier to format the number afterwards for presentation purposes at the application layer. For instance, a script could simply add the plus sign in front of the numbers to denote the international calling prefix. Or perhaps the application logic could first determine the client’s location and then show the appropriate prefix, such as 00
for users in the United Kingdom or 011
for users in the United States. Same goes for long distance prefixes.
Going back to the subject of storing the numbers, the core issue is whether the phone number should be stored as a whole within one field, or if it should be split into three or more fields such as country code, area code, subscriber number. This is a key architectural question that should not be decided on a whim, but on the main purpose of having phone numbers and how it fits with the overall purpose of the application. That is, what will the phone number be used for? Think of all functional and non-functional requirements around phone numbers.
From a data standpoint, is the data stored in the system (including the phone number) used for transactional purposes or analytical purposes? Even if used for analytical purposes, how suitable would it be to perform analysis based on fields related to phone numbers? Unless you have a specific requirement for analysing patterns in phone numbers, it is likely that you might obtain answers to your data looking elsewhere.
For instance if you wan to analyse the customers by country, it is likely to be easier to consider a country field related to the customer’s address instead of the country dialling code in their phone number, since there are cases in which countries share the same code (e.g. NANP). Again, I am sure that there might be valid cases to categorise data solely on phone numbers, but you have to consider the architectural implications if you want to go down that route. If you really have a valid reason to do so, perhaps it would be better to have this data extracted and analysed at a data mart, or even directly into an analytical model (e.g. Microsoft Analysis Services) where phone numbers could be split and categorised accordingly.
If you really need to have phone numbers split into different fields. Consider whether this information will be stored in a normalised or denormalised form. Again you must think of the functional and non-functional requirements. If you have a table that for country codes, another for area codes, and another for the phone number, remember the following points:
- That there are cases in which countries share the same country code (e.g. USA, Canada and others). In this case the country is identified by the combination of the country code and the area code.
- Remember that in some countries (e.g. Italy) the area codes include a zero which is not a long distance prefix, but is part of the area code itself.
- Also remember that the length of those numbers vary in size, some cases even in the same country/region. For example in Finland the length of a subscriber phone number can vary within a local exchange.
Bottom line, it would be fairly complex to design a regular expression to validate phone numbers for each country/region, so consider such requirement carefully!
When allowing users to input phone numbers, you have three options. The first is to force the correct expression on users as they type at the client level. For instance, disallowing any non-numerical character (remember that the plus sign, parenthesis and dashes should not really be stored as part of the phone number), automatically adding or removing spaces, and giving feedback to the user indicating that the number is valid or not (e.g. a red X icon indicating the number is invalid, or a green check-mark indicating that the number is valid). An overlay could be displayed explaining why the number is invalid. The second option is to allow the user to type phone numbers as they will, and the phone number is fixed at the application level before being committed to the database. The third option is to allow users to type phone numbers as they will and have them submitted as is. Then a data cleansing job can be run afterwards to perform any corrections (e.g. through Microsoft SQL Data Quality Services).
In my experience the best option is a combination of options one and two. It is best to allow a little freedom to the user, and they can see the correction as soon as the data is committed in order to “educate them” on how to use the system. The third option can still be employed in order to fix any possible anomaly that might come from integrating with other systems, or from importing data through other means.
Going back to the presentation of phone numbers, once the number is fetched from the database by the application, the number could be transformed at the application or client layer in order to be displayed to the audience in the expected convention. This can be done with any language, such as C#, Javascript, Python or Ruby. My suggestion is to add support for different conventions on a needs basis.
Conclusion
Phone numbers follow a complex pattern that varies from region to region. Not only the lengths of phone numbers, but also the prefixes used and the convention used to display phone numbers. Remember mobile phones and other phone systems have complex dialling assistants coded into them which corrects the majority of mistakes. So unless you have a dialling assistant in place you should not expect phone numbers to magically work if they are saved incorrectly. Also, it is important to separate the concept of storing data from presenting data. It is best to follow neutral convention for storing phone numbers that makes it easier to apply any convention afterwards at the application or client layer.