Friday, April 23, 2010

Data type issues in spreadsheet-based mail merges in OpenOffice, or why your mail merge doesn't work with 3.2


Let's say you're putting together a mail merge listing the fees people in the county have to pay for getting their new cat license. You'd put together the spreadsheet data kind of like this, then create a database based on that spreadsheet.

Name Fee

Bob $12

Jean $13

Your mail merge document, with the various prompts and the field names, would look kind of like this:

Name: <Name>

Fee for license: <Fee>

So far so good. But let's say that sometimes there's no fee, perhaps organizations don't need to pay for cat licenses. So your data might look like this:

Name Fee


Bob $12


Jean $13


Library N/A

BUT that will not work because mail merges as of 3.2 (possibly earlier) don't like you to mix your types of data in a column. You can have all text, like N/A, or all numbers like 12, but not both.

Here's an example of some test and screen shots I took.

Here's the spreadsheet.

Data1

Here's what is in the database, and therefore what a mail merge would have access to.

Data2

So: in the scenario I outlined you could:

just put in 0, or just blank, instead of N/A

OR do this, which is workable but a bit more complicated.

Name Fee Applicable or Not



Bob $12 [leave blank]



Jean $13 [leave blank]



Library [leave blank] N/A

Then for your mail merge put both the fee and the applicable or not fields next to the Fee prompt. If there's no data the field won't take up any room, and only the correct field will show.

Name: <Name>


Fee for license: <Fee><Applicable or Not>

Note: This change in 3.2 really is a proper approach to databases; having mixed types is not proper database structure. But it is probably causing a few headaches.



Friday, February 16, 2007

CONDITION FORMATING

We will use conditional formatting to identify unique entries between two lists.

In the sheet below, we have two lists of US States. We would like to identify the list entries that are unique to each list. This technique can also be applied to two versions of the same list to identify additions/removals.

listsunique_1.jpg


Step 1: Select B2:B19 and open the Conditional Formatting dialog - below. We have already defined a formatting style that has a darker background.

In the Conditional Formatting dialog, select the "Formula is"option from the drop down list on the left and enter the formula as displayed below. Choose the predefined style. Exit the dialog.

We apply similar conditonal formatting to the D2:D19 array. I will leave that as an exercise for the reader.

LOOK-UP FUNCTIONS IN CALC

Table of Contents

One Goal – Many Ways

LOOKUP

VLOOKUP & HLOOKUP

MATCH & INDEX

CHOOSE

What's In The Middle?

Summary

Introduction

Everyone who uses spreadsheets comes to a point where they want to look up data from a table. The database-like structure of spreadsheets naturally promotes the desire to perform database-like functions. Some businesses use spreadsheets to produce invoices, track inventory, estimate costs, or manage budgets; all of these tasks require some form of look-up capability. This tutorial will help you learn the look-up functions available in Calc.

One Goal – Many Ways

Calc has many ways to help you look up data in columns and arrays. The six look-up functions available are: LOOKUP, VLOOKUP, HLOOKUP, MATCH, INDEX, and CHOOSE.

The first three functions are obviously look-up functions; the last three are not so obvious. The spreadsheets you create/use will become more complex as your skills develop. You will discover that data is not always organized in the optimum configuration for looking up values. There will be times when you can change the data configuration; there will be other times when you can't make any changes. That's why you need a toolbox of methods to extract the information you want from any data set. This tutorial will give you the basic tools; advanced tools will come through practice and application. Let's start with LOOKUP.

LOOKUP

What do you do when you want to find a topic in a book? You go to the book's index, look up the topic, and get the page number. LOOKUP uses the same principle to retrieve values from a list. The syntax is:

LOOKUP(Search criterion;Search vector;result_vector).

The Search criterion is the "topic" you are looking for. It can be a cell reference, regular expression (if you don't know what regular expressions are, the on-line help can help you), or a value that is "hard-wired" into the formula (hard-wired means the data is entered as part of the formula and cannot be changed without editing the formula).

The Search vector is a range of cells in a row or column and is akin to a book index. LOOKUP compares the Search criterion to the values in the Search vector to find a match. The Search vector must be sorted in ascending order.

The result_vector is a range of cells that correlates to those of the Search vector. The data in the result_vector are the page numbers in our book example.

The Search vector and result_vector can be located anywhere on the worksheet; they can even be defined on different worksheets. Most people place them in adjacent rows or columns. This arrangement is often called a look-up table.

Are you ready to see how it works?

  1. Open a workbook.

  2. Enter 1 in cell E11.

  3. Select E11.

  4. Drag E11 down to E15. (AutoFill from 1 to 5.)

  5. Enter Green in F11, Orange in F12, Purple in F13, Red in F14, and Blue in F15.

  6. Select A1.

  7. Enter 5 in A1, 4 in A2, 3 in A3, 2 in A4, and 1 in A5.

  8. Select B1.

  9. Enter =LOOKUP(A1;E11:E15;F11:F15).

  10. Select B1 again and drag the formula to B5.

Why do two cells contain #N/A? When you copy formulas in a spreadsheet, the software tries to help out by changing the cell references to maintain the relationships between the cells and the formulas. This is called relative addressing. Watch it in action:

  1. Click on B1.

  2. Click on the Input Line. (Pay close attention to the colored boxes.)

  3. Press Esc.

  4. Click on B2.

  5. Click on the Input Line.

  6. Press Esc.

  7. Repeat Steps 1-6 for the remaining cells.

You should have seen the colored boxes around the look-up table shift one cell down every time. The solution to this behavior is absolute addressing. Absolute addressing tells the software, "Don't mess with the addresses of these cells!" You make a cell reference absolute by placing a dollar sign ($) before the column letter and row number ($A$1). If you want to anchor the column only, put the $ before the letter ($A1). If you want to anchor the row only, put the $ before the number (A$1). Edit the formulas in your worksheet as follows:

  1. Select B1.

  2. Highlight everything in the parenthesis, except A1;

  3. Press the key combination Shift+F4. (The $'s are inserted automatically.)

  4. Click the green check mark.

  5. Drag the updated formula to B5.

Note: If you press Shift+F4 a second time, you will get something like A$1. If you press Shift+F4 a third time, you will get $A1.

There, it's fixed! All the colors match the numbers as defined by the look-up table. Clear column B and try this:

  1. Select F11:F15. (The lookup table colors.)

  2. Press Ctrl+x. (The Cut keyboard shortcut.)

  3. Click on J2.

  4. Click Edit > Paste Special...

  5. Check Transpose on the bottom left.

  6. Click OK.

  7. Enter =LOOKUP(A1;E11:E15;J2:N2) in B1.

  8. Use Shift+F4 to make the appropriate cell references absolute.

  9. Drag the formula to B5.

What do you think? Pretty Cool? Now that you have used LOOKUP, you need to understand how it works.

LOOKUP keeps an internal count of the number of cells in the Search vector. If we use 3 as the Search criterion, LOOKUP knows that 3 is in the 3rd cell of the Search vector. LOOKUP then goes to the result_vector and returns the value in the 3rd cell in that range (Purple); that's why you can place the vectors anywhere. If the Search criterion lies between two Search vector values, LOOKUP will return the lower value. Confused? Enter 4.5 in A2. The color (red) did not change because LOOKUP couldn't find 4.5. LOOKUP stopped searching when it found 5 and used the next lowest value (4). If the Search criterion is greater than all of the Search vector values, LOOKUP will return the last value in the list. If the Search criterion is less than all of the Search vector values, LOOKUP will return #N/A.

VLOOKUP & HLOOKUP

There are lots of situations where several columns are related to one index value. This is where VLOOKUP and HLOOKUP come into play. I combined the discussion of these functions because they are identical, except for the way they search. The VLOOKUP and HLOOKUP functions search arrays. An array is a block of cells similar to a data table in a book. Suppose you want to convert 70.5 degrees Fahrenheit to degrees Celsius. You would find a conversion table, move down the left column until you found 70, then you would move along the row until you found to the .4 column and read the temperature in Celsius. That's how VLOOKUP works. HLOOKUP works the opposite way; it searches the top row first, then it searches down the appropriate column. The syntax for the functions is:

V(H)LOOKUP(Search criterion;array;index;sort order).

You already know what the Search Criterion is, so I won't repeat the definition. The index is the nth column (row for HLOOKUP) of the array. Sort order is a boolean (TRUE or FALSE) parameter that tells V(H)LOOKUP whether the first column (row) is sorted in ascending order or not. The default value is TRUE and can be omitted; otherwise, type FALSE for sort order. If the Search Criterion lies between two values in the first column (row), V(H)LOOKUP will respond just like LOOKUP; it will return the lower value. Let's make make a fast-food receipt to see how they work.

  1. Rename Sheet1 to LOOKUP. (Look under Format > Sheet...)

  2. Bring up Sheet2 and rename it VLOOKUP.

  3. Enter the information in the table below. Begin in cell H1.



3

Bacon burger

$2.75



4

Small Fries

$1.00



5

Medium Fries

$1.25



6

Large Fries

$1.50



7

Small Drink

$0.75



8

Medium Drink

$0.90



9

Large Drink

$1.25

  1. Type Order ID in A1.

  2. Type Menu Item in B1.

  1. Type Price in C1.

  2. Click on B2.

  3. Enter =IF($A2="";"";VLOOKUP($A2;$H$1:$J$9;2)). (I'll explain IF soon.)

  4. Drag the formula to C2. (Did you notice the $A2?)

  5. Change the last number in the parenthesis in C2 from 2 to 3.

  6. Select B2 and C2.

  7. Drag the selection to row 15.

When I was preparing this tutorial, I got a bunch of #N/A's because there was no data in the A column. They were ugly, so I decided to introduce you to the logical function IF. In my experience, look-up functions and IF are almost always used together. When you use look-up functions, you will see a lot of #N/A's unless you hide them.. The IF statement (function) is handy if you want to set up a blank form or calculation sheet; especially if someone else is going to use it. IF may look scary, but it's really simple. Here's the syntax:

IF(Test; Then_value; Otherwise_value)

The Test is any logical expression that returns true or false. The formula in Steps 6 and 7 tell Calc to leave the cells in column B and C blank if the A column is blank; otherwise, show the result of VLOOKUP. Dress the receipt up a little:

  1. Put a border along the bottom of A1:C1.

  2. Put another border along the bottom of A15:C15.

  3. Format column C for currency.

  4. Type Subtotal: in B16. (Align right.)

  5. Enter =sum(C2:C15) in C16.

  6. Type Tax: in B17. (Align right.)

  7. Enter =C16*0.05 in C17. (5% sales tax)

  8. Type Total: in B18 and make it bold. (Align right.)

  9. Enter =SUM(C16:C17) in C18.

  10. Put a border around C18.

  1. Order lunch. (Enter some Order ID values to test the functionality.)

  2. Now enter 10, 0, and a number like 5.4 in the Order ID column and check the results. (Note the responses to 0 and 5.4.)

  3. (Challenge Step) If you are feeling really good about the IF statement, make the Subtotal (C16), Tax (C17), and Total (C18) cells blank when there are no entries in the A column! My solution is hidden in the blank space below (There are several solutions).

C16: =IF(SUM($C$2:$C$15)=0;"";SUM(C2:C15))

C17: =IF(C16="";"";C16*0.05)

C18: =IF(C17="";"";SUM(C16:C17))

Before you go to the next topic, transpose the look-up array and change VLOOKUP to HLOOKUP. It's a good exercise to prove to yourself that it works!

MATCH & INDEX

Individually, the capabilities of these functions are not very impressive; however, they form a powerful tool to look up information when they are combined. I'll start with MATCH.

MATCH is used to find the position of a value in a row or column. For example, if you have a column of 2000 words starting with S and you are looking for "Special", MATCH may return a number like 1670 (1670th row in the column). The syntax is:

MATCH(search_criterion;lookup_array;type).

The lookup_array is a single column or row of cells. Type is a sort parameter that tells the function how the column or row is sorted; it can be 1 (ascending), 0 (exact match), or -1 (descending). The default is 1 (ascending). If you use the exact match option and there are multiple instances, MATCH will return the position of the first instance. If the search_criterion falls between two entries in the column or row, the lower position is returned.

INDEX, on the other hand, returns the contents of a given cell address. The syntax is:

INDEX(reference;row;column).

The reference is a range or array of cells. The other parameters are self-explanatory. Are any light bulbs starting to glow?

Here's a scenario: You are the Vice President of a company. You hear about a certain employee and you want to find information on that person. The company uses Calc as its database and stores employee data in the format shown in the table below (I only listed the Department Heads to keep things simple).

Employee ID

Last Name

First Name

Department

Employees

1021

Avery

Walter

Dispersing

12

2022

Daniels

Mary

Warehouse

150

1549

Fairchild

Cynthia

Accounting

10

3115

Moore

Jack

Shipping

109

4752

Simpson

Suzanne

Receiving

200

2503

Turner

Mark

Engineering

3

2278

Williams

Robert

Sales

30

2827

Garland

Priscilla

Public Relations

5

2680

Barton

Kyle

Advertising

9

1250

Laramie

Sheila

Marketing

27

Enter the table in a new worksheet (include the column headings). Start in cell P1 so you can't see the table later on. When you've got the data entered, sort it by Employee ID in ascending order. This is how you sort:

  1. Select P2:T9.

  2. Click Data > Sort...

  3. Sort by Column P.

  4. Click Ascending.

  5. Click OK.

Now set up a simple query form.

  1. Select A1 and type Find: (Align right.)

  2. Type First Name: in A3. (Align right.)

  3. Type Department: in A4. (Align right.)

  4. Type No. Employees in A5. (Align right.)

  5. Select B3.

  6. Enter =INDEX($P$2:$T$9;MATCH($B$1;$Q$2:$Q$9;0);3).

  7. Copy the formula to B4 and change the last number in B4 from 3 to 4.

  8. Select B5.

  9. Enter =VLOOKUP(INDEX($P$2:$T$9;MATCH($B$1;$Q$2:$Q$9;0);1);$P$2:$T$9;5)).

I had you enter the last formula that way because I wanted to show you the power of nesting (nesting means using functions in functions). Nesting is really nice, but It can get pretty crazy.

  1. Type Turner into B2.

  2. Press Enter.

Did you notice that we never exposed an ID number? How did we do that if VLOOKUP needs the Employee ID to perform its search? We worked around the Employee ID by substituting it with the INDEX-MATCH combination, thus keeping private information secure and giving VLOOKUP what it needs! If you really want to make sure no one can see the ID numbers, hide the columns.

  1. Select columns P through T.

  2. Click Format > Column > Hide.

Let me explain what's happening. MATCH is using the text in B1 to identify a row position in the array (Turner is in row 4). INDEX then returns the data in the 4th row and the 3rd column to cell B3 (First Name). INDEX also returns the data in the 4th row and the 4th column to cell B4 (Department). In the last formula, INDEX returns data from the 4th row and 1st column, which is passed to VLOOKUP as the Search criterion. VLOOKUP uses the Search criterion to get data in the 5th column (No. Employees). That's all there is to it!

If you want the practice, wrap everything with IF to blank the three result cells when the Last Name field is blank. When you're done, rename the worksheet MATCH and save your work, then we'll move on to the CHOOSE function.

CHOOSE

You'll be happy to hear that the CHOOSE function is really easy. The syntax is:

CHOOSE(Index; value1;...value30).

Index is a number from 1 to 30. The values are an embedded list; usually text. CHOOSE is useful when you want to look up things like days of the week, months of the year, colors of the rainbow, or any other categorical data.

CHOOSE differs from the other look-up functions in that there is always a one-to-one relationship between the index values and the list values. Let's see how it works by building a teacher's grade book.

  1. Insert a new worksheet. (Insert > Sheet...)

  2. Rename the worksheet CHOOSE.

  3. Copy the Last Name and First Name columns from the MATCH worksheet to the CHOOSE worksheet.

  4. Sort them by Last Name in ascending order.

  5. AutoFill cells C1:L1 with numbers 1-10.

  6. Type Average in M1.

  7. Select the top row.

  8. Format the cells with a bold font; background color of Gray 10%; center alignment; and a bottom space of 3.00pt.

  9. Insert a row between each name.

  10. Enter 0 in R1.

  11. Enter 60 in R2.

  12. Enter 65 in R3.

  13. Select R2:R3.

  14. Drag the selection to R9. (AutoFill with increments of 5.)

  15. AutoFill S1:S9 with numbers 1-9.

  16. Select C3.

  17. Enter =IF(C2="";"";CHOOSE(LOOKUP(C2;$R$1:$R$9;$S$1:$S$9);"F";"D";"D+";"C";"C+";"B";"B+";"A";"A+")).

  18. Drag the formula to M3.

  19. Copy line 3 to line 5, line 7..., line 17.

  20. Select Light Blue as the font color for the rows in Step 19.

  21. Select M2.

  22. Enter =AVERAGE(C2:L2).

  23. Copy M2 to M4, M6,...M16.

  24. Format the table to suit your tastes.

  25. Enter 61 in C2. (The letter grade should be D.)

  26. Enter more grades to make sure everything works correctly.

  27. Save your work.

As you can see, the CHOOSE function has a lot of potential. You could use it with various lists to add cool stuff to your spreadsheet. Here's a fun application: Do you want to know the day of the week you were born on?

  1. Click on B27

  2. Enter =CHOOSE(WEEKDAY(A27);"Sunday";"Monday";"Tuesday";

    "Wednesday";"Thursday";"Friday";"Saturday").

  3. Enter your birthday in A27.

You can also enter cell references instead of typing text. An alternative to the formula above is:

=CHOOSE(WEEKDAY(A27);U1;U2;U3;U4;U5;U6;U7).

You would put the days of the week into the cells, beginning with Sunday in cell U1. The same could be done with the grade list. The cell references must be individual cells; CHOOSE will not accept cell ranges.

What's In The Middle?

We have discussed what happens when a Search Criterion falls between two Search vector values; the look-up functions return the lower value. But what if we need a result_vector value between two Search vector values? What do you do? You interpolate.

You interpolate numbers almost daily; however, most of the time you do it in your head without stopping to think how it's done. For instance, what number is halfway between 0 and 5? The number 2.5 probably comes to mind immediately. Most of us know that from experience, but how do you calculate other intermediate values?

Before you can interpolate between values you have to know something about the data. Some data sets are linear, which means that if you were to plot them on a chart they would form a straight line. Other data sets are nonlinear, meaning that a plot of the data would form a curved line. We're going to keep things simple by assuming that we can interpolate values by connecting data points with straight lines, even if the points form a curve (this assumption is good for most common data you will encounter). Using this assumption to get intermediate values is called linear interpolation. If the assumption is not valid, then you have to use nonlinear interpolation, which is a topic too complex to cover in this tutorial.

Now let's talk about lines. If you think way back to your grade school days (that's a long, long time for some of us), you probably learned that if you know two points on a line, you can find any other point. Do you recall something like that? You may also recall that the equation of a line has the general form of y = mx + b, where m is the slope (rise over run) of the line and b is the y-intercept (the point where the line crosses the y axis). The line equation is the basis for the method I am going to present. Let's look at some data.

  1. Add a new sheet to the workbook you are using for this tutorial.

  2. Rename it Interp.

  3. Enter the data below, beginning in F1.

1

7

2

11

3

17

4

25

5

35

6

47

7

61

8

77

9

95

10

115

11

137

12

161

13

187

14

215

15

245

16

277

17

311

18

347

19

385

20

425

  1. Add a chart and look at the graph of the data points. (Definitely not a straight line!)

Note: The data are the result of using the quadratic equation y = x2 + x + 5 if you want to check your answers.

What is the result_vector value for a Search criterion of 6.5? The exact answer from the equation is 53.75. If you calculated it by hand, you would do this way:

= 54

If you look at the calculation closely you will see that it is the equation of a line in the form of y = m*x + b. Also note the difference between the exact answer and the interpolated value. There is always error involved when you interpolate. In this case the difference is only 0.46%, which is negligible.

So what? What does that have to do with using look-up functions to interpolate between values? Well, now you have a pattern to follow! I've broken it down below:

  1. Assume we enter 6.5 in A1.

  2. (61-47): INDEX($F$1:$G$20;MATCH(A1;$F$1:$F$20)+1;2) – LOOKUP(A1;$F$1:$F$20;$G$1:$G$20). (Note the use of MATCH(...)+1 to get 61.)

  3. (7-6): INDEX($F$1:$G$20;MATCH(A1;$F$1:$F$20)+1;1) – INDEX($F$1:$G$20;MATCH(A1;$F$1:$F$20);1).

  4. (6.5-6): A1-INDEX($F$1:$G$20;MATCH(A1;$F$1:$F$20);1).

  5. 47: LOOKUP(A1;$F$1:$F$20;$G$1:$G$20).

  6. Put it all together in B1.

  7. Enter =(((INDEX($F$1:$G$20;MATCH(A1;$F$1:$F$20)+1;2) – LOOKUP(A1;$F$1:$F$20;$G$1:$G$20)))/(INDEX($F$1:$G$20;MATCH(A1;$F$1:$F$20)+1;1) – INDEX($F$1:$G$20;MATCH(A1;$F$1:$F$20);1)))*(A1-INDEX($F$1:$G$20;MATCH(A1;$F$1:$F$20);1))+LOOKUP(A1;$F$1:$F$20;$G$1:$G$20).

Yes!!! The answer in B1 is 54! Enter more values in the A column, then drag the formula down. You should get good answers down the B column. If you have more than one column of data, you could use VLOOKUP and HLOOKUP instead of LOOKUP; just make sure that your column numbers in INDEX and MATCH point to the appropriate column(s).

Summary

Congratulations! You have learned to use six different look-up techniques: LOOKUP, VLOOKUP, HLOOKUP, MATCH, INDEX, and CHOOSE. In addition, you learned how to use the IF statement to control how empty cells behave, how to nest functions to supercharge your look-up capabilities, and how to to get result_vector values between Search vector values. The skills you learned will make you a valuable spreadsheet user. Keep up the good work. Thanks for giving me your time and attention. Good Luck!