Category Archives: OpenOffice.org

Separate Presentation From Content – Office Software

In my previous post I made the point that the Web works best when you separate presentation from content. That is good as far as it goes, but I want to now extend the discussion in another direction, and that is how to use Office software to the maximum advantage. This applies to any Office suite, whether you use Microsoft Office, WordPerfect Office, OpenOffice.org, Libre Office, or indeed any other office productivity suite. I have worked with all the above, have trained people in several of them, and have had experience with how powerful these techniques can be. In fact, I developed an 18-hour course for college students that employed these techniques. The students had mostly been putting off taking this course as long as possible because they did not see the need for this. But the University had made this a requirement, and they would frequently take it near the end of their degree program. But after taking the course, I almost always got the reaction that they were angry that they had not had the course at the beginning of the program because it was so useful. This course covered the basics of using Microsoft Office (Word, Excel, PowerPoint, and Access), but I have used this approach in training on other office suites with just as much success.

Now, something that just happened to me illustrates a useful point. I was on a Web page, and clicked on a link to a PDF file. The file opened, I read what I wanted to read, and when I was done, by force of habit, I clicked the close button on the upper right to get rid of the PDF file. After all, I was done with it, right? Arrrggghhh! I just closed my browser and all of the tabs I had opened. This is why the first thing I do with a new browser is set it up to always open the tabs I last had opened. But the point I want to make here is that my browser automatically opened and displayed a PDF file. That used to require calling a separate program, but apparently that is no longer necessary. And I suspect we will see more of this. For instance, Google Docs is starting to bring all of your Office documents into the browser. At some point the technology is going to treat any piece of data/text/whatever as raw material and display it. And when that happens, all of the arguments on how to construct proper Web pages will apply equally to constructing Office documents.

That is not all, though. In my day job I am a Project Manager, and I have a need to manage large numbers of documents. Documentation management becomes a real concern, and I have to say that most of the places I have worked do not do a good job of it. I think Microsoft Sharepoint, if used properly, could be a good step in the right direction. And for those who are in as position to go the Open Source route, Alfresco could be the solution. While I generally prefer Open Source solutions, Sharepoint is really pretty good, and if you work in a Microsoft shop you may find it easier to promote as a solution. In either case, all of the issues of semantic encoding, of finding the document you need from a large haystack of documents, still apply.

Going back to my academic days at that university, as I was the Office expert I was given the task of putting together the catalog. What that meant was combining a large number of documents, each from a different department, into something that could be considered a unified whole. And these departments did not make the job easy. No two of them used the same convention for laying out their information. and as I recall none of them used the proper semantic tagging at all. Everything was done using font changes, the space bar, inconsistent lists, and if any of them used tabs at all they did it the wrong way. So my first major task was to go through all of these submissions and use semantic tagging. In word processing programs this is done by using what are called styles, and maybe you can see the relationship between styles and style sheets. They are really the same idea, just applied in different domains.

So the proper way to use a word processing program (and again, this applies to Word, WordPerfect, Write, AbiWord, or any other program out there) is to apply a style to each element, just the way you apply a tag in a Web page. The title of your document should be given a style like Header 1, a major section sub-head should be Header 2, and so on. Now, the word processing programs may take you in the wrong direction  at first because they will have an appearance already assigned, or will ask you to specify an appearance when you use the style. Resist the urge! The point in creating your document should be to get the semantic encoding done correctly. Once that is done, you can assign an appearance to each element, and achieve a unified look-and-feel to your document, or even to a whole group of documents.

I will illustrate this with an example from my academic days. In the early days of using personal computers, they were adopted by universities as a tool for their students and faculty. In one large university, they were adopted for use in Freshman Composition classes. In the U.S. at least, these classes are pretty much universal, as the faculty want to make sure that all students can write papers at a minimal (at the very least) level of competency. At this particular university, they had different sections of the course, some of which used Macintosh computers, and the others using DOS computers running WordPerfect. When they did a comparative study of the writing of these two groups, they found something very interesting. The DOS/WordPerfect group were consistently writing better papers with superior content. This was a surprise, and they looked for any possible correlation that might explain it. But the two groups of students seemed to have comparable grades coming out of high school, they had comparable test scores on the standardized tests used for admission, and in general on all measurements the could think of the two groups were in effect identical, except that one group used Macintosh and the other DOS/WordPerfect.

They finally decided that the most likely explanation lay in what each platform allowed you to do. Macintosh computers were the first to have Graphical User Interface. They came with a variety of font tools, graphics tools, and were in general the first personal computers with a graphic design capability. That is one reason why Macintosh got such a big head start with graphic designers and maintains that to this day. DOS computers running WordPerfect were quite different. In the mid-to-late 1980s, they ran on monochrome screens, and basically you were presented with a black screen with a blinking cursor. In later versions, for early color monitors, the screen became blue instead of black, but otherwise the same: a blank screen with a blinking cursor. The only thing you could do with these computers was write. On a Macintosh, though, you were presented right away with font choices, with graphics choices, page layout considerations, etc. The conclusion of the researchers was that having all of these choices available to the students distracted them from the main point, which was to write good compositions.

Nor is this only applicable to word processing. Another area where this crops up is with presentation software (e.g. PowerPoint, Impress, etc.). Most presentation programs will start you with a choice among graphical templates and similar distractions. Again, resist the urge! To make a good presentation, your first concern should be to logically organize your information. When I am creating a presentation I frequently start with an outline. Many programs will let you take an outline and turn it into a presentation with a few mouse clicks. When you have done so, you can then apply any template you like to give the presentation the graphic look you want.

One other advantage of properly using semantic tagging, which is similar to what we found in looking at Web pages, is that it becomes a real time saver. For instance, suppose you had a long document with a number of sections. Each time you came to a section you could set the appearance of your section header by clicking on the font you want, what size it should be, whether or not it should be indented, and so on. Or you could do it properly by just declaring the element to be a particular header (say Header 2), and then setting the appearance for all Header 2’s in your document. Furthermore, if you need to make a change, for whatever reason, you don’t need to go page by page through the document looking for all of the places that need to be changed. You just change the characteristics of the Header 2 style once and the whole document updates.

So for all of the reasons given, using proper semantic encoding and separating the presentation from the content is just as important in Office software applications as it is in building Web pages. In fact, it is a fundamental principle of good information architecture.

Listen to the audio version of this post on Hacker Public Radio!

Excel corrupts data in long numbers

I thought I was going to get some work done today, but instead I spent a significant amount of time dealing with a data corruption problem in Excel.

I had a number of files (e.g. over 250) that were all *.csv files downloaded from a server and full of useful data. Among the data is a 13-digit account number. But when I opened my files I discovered that all the account numbers had the last six digits replaced by zeros. After some experimentation I worked out what was happening. Excel was first converting all of these numbers to what is called Scientific Notation. That is what you see when a long number turns into something like 2.690565 +E11. You can change the format of the cells (to number) to make this temporarily go away, and if you do so before you attempt to save the file, you are OK. But as soon as you open the file again everything goes back to Scientific Notation again. Now, the reason that this is a Bad Thing is that if you do anything that triggers a save of the file while it is in this state it will throw away some of your data. So for example with a number like 2690565134729, Excel first converts it to 2.690565 +E11, and when you save it, Excel throws away the digits that weren’t displayed. So when you reopen your file, the number is now 2690565000000. This is a mess. And once the data is gone, it is gone and unrecoverable.

It turns out that this a known problem with Excel affecting lots of people.If you do a web search on “Disable Scientific Notation in Excel” you will find lots of people with this problem, and also that you cannot turn off this behavior. There were some suggestions of convoluted scripts or macros you could use that might help, but basically Microsoft designed it to work this way and is not interested in changing it.

Years ago I worked with a fellow who had a number of Microsoft certifications. When I complained that apparently Microsoft never did any usability testing, he quickly corrected me. As he explained it, Microsoft does extensive usability testing. They just never let it affect the design of their products.

Well, I was using Excel because that is what they give me at work. But after losing two months worth of data to this stuff, I opened up my copy of OpenOffice.org, which I keep on a thumb drive, and was happy to discover that I had no problems at all. OpenOffice.org hs no compulsion to convert numbers to Scientific Notation, does not corrupt data, and as a result I feel much safer. I configured my workstation to use OpenOffice.org by default for all *.csv files from now on.