Category Archives: Excel

Excel corrupts data in long numbers

I thought I was going to get some work done today, but instead I spent a significant amount of time dealing with a data corruption problem in Excel.

I had a number of files (e.g. over 250) that were all *.csv files downloaded from a server and full of useful data. Among the data is a 13-digit account number. But when I opened my files I discovered that all the account numbers had the last six digits replaced by zeros. After some experimentation I worked out what was happening. Excel was first converting all of these numbers to what is called Scientific Notation. That is what you see when a long number turns into something like 2.690565 +E11. You can change the format of the cells (to number) to make this temporarily go away, and if you do so before you attempt to save the file, you are OK. But as soon as you open the file again everything goes back to Scientific Notation again. Now, the reason that this is a Bad Thing is that if you do anything that triggers a save of the file while it is in this state it will throw away some of your data. So for example with a number like 2690565134729, Excel first converts it to 2.690565 +E11, and when you save it, Excel throws away the digits that weren’t displayed. So when you reopen your file, the number is now 2690565000000. This is a mess. And once the data is gone, it is gone and unrecoverable.

It turns out that this a known problem with Excel affecting lots of people.If you do a web search on “Disable Scientific Notation in Excel” you will find lots of people with this problem, and also that you cannot turn off this behavior. There were some suggestions of convoluted scripts or macros you could use that might help, but basically Microsoft designed it to work this way and is not interested in changing it.

Years ago I worked with a fellow who had a number of Microsoft certifications. When I complained that apparently Microsoft never did any usability testing, he quickly corrected me. As he explained it, Microsoft does extensive usability testing. They just never let it affect the design of their products.

Well, I was using Excel because that is what they give me at work. But after losing two months worth of data to this stuff, I opened up my copy of OpenOffice.org, which I keep on a thumb drive, and was happy to discover that I had no problems at all. OpenOffice.org hs no compulsion to convert numbers to Scientific Notation, does not corrupt data, and as a result I feel much safer. I configured my workstation to use OpenOffice.org by default for all *.csv files from now on.