One Goal – Many Ways
LOOKUP
VLOOKUP & HLOOKUP
MATCH & INDEX
CHOOSE
What's In The Middle?
Summary
Introduction
Everyone who uses spreadsheets comes to a point where they want to look up data from a table. The database-like structure of spreadsheets naturally promotes the desire to perform database-like functions. Some businesses use spreadsheets to produce invoices, track inventory, estimate costs, or manage budgets; all of these tasks require some form of look-up capability. This tutorial will help you learn the look-up functions available in Calc.
One Goal – Many Ways
Calc has many ways to help you look up data in columns and arrays. The six look-up functions available are: LOOKUP, VLOOKUP, HLOOKUP, MATCH, INDEX, and CHOOSE.
The first three functions are obviously look-up functions; the last three are not so obvious. The spreadsheets you create/use will become more complex as your skills develop. You will discover that data is not always organized in the optimum configuration for looking up values. There will be times when you can change the data configuration; there will be other times when you can't make any changes. That's why you need a toolbox of methods to extract the information you want from any data set. This tutorial will give you the basic tools; advanced tools will come through practice and application. Let's start with LOOKUP.
LOOKUP
What do you do when you want to find a topic in a book? You go to the book's index, look up the topic, and get the page number. LOOKUP uses the same principle to retrieve values from a list. The syntax is:
LOOKUP(Search criterion;Search vector;result_vector).
The Search criterion is the "topic" you are looking for. It can be a cell reference, regular expression (if you don't know what regular expressions are, the on-line help can help you), or a value that is "hard-wired" into the formula (hard-wired means the data is entered as part of the formula and cannot be changed without editing the formula).
The Search vector is a range of cells in a row or column and is akin to a book index. LOOKUP compares the Search criterion to the values in the Search vector to find a match. The Search vector must be sorted in ascending order.
The result_vector is a range of cells that correlates to those of the Search vector. The data in the result_vector are the page numbers in our book example.
The Search vector and result_vector can be located anywhere on the worksheet; they can even be defined on different worksheets. Most people place them in adjacent rows or columns. This arrangement is often called a look-up table.
Are you ready to see how it works?
Open a workbook.
Enter 1 in cell E11.
Select E11.
Drag E11 down to E15. (AutoFill from 1 to 5.)
Enter Green in F11, Orange in F12, Purple in F13, Red in F14, and Blue in F15.
Select A1.
Enter 5 in A1, 4 in A2, 3 in A3, 2 in A4, and 1 in A5.
Select B1.
Enter =LOOKUP(A1;E11:E15;F11:F15).
Select B1 again and drag the formula to B5.
Why do two cells contain #N/A? When you copy formulas in a spreadsheet, the software tries to help out by changing the cell references to maintain the relationships between the cells and the formulas. This is called relative addressing. Watch it in action:
Click on B1.
Click on the Input Line. (Pay close attention to the colored boxes.)
Press Esc.
Click on B2.
Click on the Input Line.
Press Esc.
Repeat Steps 1-6 for the remaining cells.
You should have seen the colored boxes around the look-up table shift one cell down every time. The solution to this behavior is absolute addressing. Absolute addressing tells the software, "Don't mess with the addresses of these cells!" You make a cell reference absolute by placing a dollar sign ($) before the column letter and row number ($A$1). If you want to anchor the column only, put the $ before the letter ($A1). If you want to anchor the row only, put the $ before the number (A$1). Edit the formulas in your worksheet as follows:
Select B1.
Highlight everything in the parenthesis, except A1;
Press the key combination Shift+F4. (The $'s are inserted automatically.)
Click the green check mark.
Drag the updated formula to B5.
Note: If you press Shift+F4 a second time, you will get something like A$1. If you press Shift+F4 a third time, you will get $A1.
There, it's fixed! All the colors match the numbers as defined by the look-up table. Clear column B and try this:
Select F11:F15. (The lookup table colors.)
Press Ctrl+x. (The Cut keyboard shortcut.)
Click on J2.
Click Edit > Paste Special...
Check Transpose on the bottom left.
Click OK.
Enter =LOOKUP(A1;E11:E15;J2:N2) in B1.
Use Shift+F4 to make the appropriate cell references absolute.
Drag the formula to B5.
What do you think? Pretty Cool? Now that you have used LOOKUP, you need to understand how it works.
LOOKUP keeps an internal count of the number of cells in the Search vector. If we use 3 as the Search criterion, LOOKUP knows that 3 is in the 3rd cell of the Search vector. LOOKUP then goes to the result_vector and returns the value in the 3rd cell in that range (Purple); that's why you can place the vectors anywhere. If the Search criterion lies between two Search vector values, LOOKUP will return the lower value. Confused? Enter 4.5 in A2. The color (red) did not change because LOOKUP couldn't find 4.5. LOOKUP stopped searching when it found 5 and used the next lowest value (4). If the Search criterion is greater than all of the Search vector values, LOOKUP will return the last value in the list. If the Search criterion is less than all of the Search vector values, LOOKUP will return #N/A.
VLOOKUP & HLOOKUP
There are lots of situations where several columns are related to one index value. This is where VLOOKUP and HLOOKUP come into play. I combined the discussion of these functions because they are identical, except for the way they search. The VLOOKUP and HLOOKUP functions search arrays. An array is a block of cells similar to a data table in a book. Suppose you want to convert 70.5 degrees Fahrenheit to degrees Celsius. You would find a conversion table, move down the left column until you found 70, then you would move along the row until you found to the .4 column and read the temperature in Celsius. That's how VLOOKUP works. HLOOKUP works the opposite way; it searches the top row first, then it searches down the appropriate column. The syntax for the functions is:
V(H)LOOKUP(Search criterion;array;index;sort order).
You already know what the Search Criterion is, so I won't repeat the definition. The index is the nth column (row for HLOOKUP) of the array. Sort order is a boolean (TRUE or FALSE) parameter that tells V(H)LOOKUP whether the first column (row) is sorted in ascending order or not. The default value is TRUE and can be omitted; otherwise, type FALSE for sort order. If the Search Criterion lies between two values in the first column (row), V(H)LOOKUP will respond just like LOOKUP; it will return the lower value. Let's make make a fast-food receipt to see how they work.
Rename Sheet1 to LOOKUP. (Look under Format > Sheet...)
Bring up Sheet2 and rename it VLOOKUP.
Enter the information in the table below. Begin in cell H1.
-
|
| 3 | Bacon burger | $2.75 |
|
| 4 | Small Fries | $1.00 |
|
| 5 | Medium Fries | $1.25 |
|
| 6 | Large Fries | $1.50 |
|
| 7 | Small Drink | $0.75 |
|
| 8 | Medium Drink | $0.90 |
|
| 9 | Large Drink | $1.25 |
Type Order ID in A1.
Type Menu Item in B1.
Type Price in C1.
Click on B2.
Enter =IF($A2="";"";VLOOKUP($A2;$H$1:$J$9;2)). (I'll explain IF soon.)
Drag the formula to C2. (Did you notice the $A2?)
Change the last number in the parenthesis in C2 from 2 to 3.
Select B2 and C2.
Drag the selection to row 15.
When I was preparing this tutorial, I got a bunch of #N/A's because there was no data in the A column. They were ugly, so I decided to introduce you to the logical function IF. In my experience, look-up functions and IF are almost always used together. When you use look-up functions, you will see a lot of #N/A's unless you hide them.. The IF statement (function) is handy if you want to set up a blank form or calculation sheet; especially if someone else is going to use it. IF may look scary, but it's really simple. Here's the syntax:
IF(Test; Then_value; Otherwise_value)
The Test is any logical expression that returns true or false. The formula in Steps 6 and 7 tell Calc to leave the cells in column B and C blank if the A column is blank; otherwise, show the result of VLOOKUP. Dress the receipt up a little:
Put a border along the bottom of A1:C1.
Put another border along the bottom of A15:C15.
Format column C for currency.
Type Subtotal: in B16. (Align right.)
Enter =sum(C2:C15) in C16.
Type Tax: in B17. (Align right.)
Enter =C16*0.05 in C17. (5% sales tax)
Type Total: in B18 and make it bold. (Align right.)
Enter =SUM(C16:C17) in C18.
Put a border around C18.
Order lunch. (Enter some Order ID values to test the functionality.)
Now enter 10, 0, and a number like 5.4 in the Order ID column and check the results. (Note the responses to 0 and 5.4.)
(Challenge Step) If you are feeling really good about the IF statement, make the Subtotal (C16), Tax (C17), and Total (C18) cells blank when there are no entries in the A column! My solution is hidden in the blank space below (There are several solutions).
C16: =IF(SUM($C$2:$C$15)=0;"";SUM(C2:C15))
C17: =IF(C16="";"";C16*0.05)
C18: =IF(C17="";"";SUM(C16:C17))
Before you go to the next topic, transpose the look-up array and change VLOOKUP to HLOOKUP. It's a good exercise to prove to yourself that it works!
MATCH & INDEX
Individually, the capabilities of these functions are not very impressive; however, they form a powerful tool to look up information when they are combined. I'll start with MATCH.
MATCH is used to find the position of a value in a row or column. For example, if you have a column of 2000 words starting with S and you are looking for "Special", MATCH may return a number like 1670 (1670th row in the column). The syntax is:
MATCH(search_criterion;lookup_array;type).
The lookup_array is a single column or row of cells. Type is a sort parameter that tells the function how the column or row is sorted; it can be 1 (ascending), 0 (exact match), or -1 (descending). The default is 1 (ascending). If you use the exact match option and there are multiple instances, MATCH will return the position of the first instance. If the search_criterion falls between two entries in the column or row, the lower position is returned.
INDEX, on the other hand, returns the contents of a given cell address. The syntax is:
INDEX(reference;row;column).
The reference is a range or array of cells. The other parameters are self-explanatory. Are any light bulbs starting to glow?
Here's a scenario: You are the Vice President of a company. You hear about a certain employee and you want to find information on that person. The company uses Calc as its database and stores employee data in the format shown in the table below (I only listed the Department Heads to keep things simple).
-
Employee ID | Last Name | First Name | Department | Employees |
1021 | Avery | Walter | Dispersing | 12 |
2022 | Daniels | Mary | Warehouse | 150 |
1549 | Fairchild | Cynthia | Accounting | 10 |
3115 | Moore | Jack | Shipping | 109 |
4752 | Simpson | Suzanne | Receiving | 200 |
2503 | Turner | Mark | Engineering | 3 |
2278 | Williams | Robert | Sales | 30 |
2827 | Garland | Priscilla | Public Relations | 5 |
2680 | Barton | Kyle | Advertising | 9 |
1250 | Laramie | Sheila | Marketing | 27 |
Enter the table in a new worksheet (include the column headings). Start in cell P1 so you can't see the table later on. When you've got the data entered, sort it by Employee ID in ascending order. This is how you sort:
Select P2:T9.
Click Data > Sort...
Sort by Column P.
Click Ascending.
Click OK.
Now set up a simple query form.
Select A1 and type Find: (Align right.)
Type First Name: in A3. (Align right.)
Type Department: in A4. (Align right.)
Type No. Employees in A5. (Align right.)
Select B3.
Enter =INDEX($P$2:$T$9;MATCH($B$1;$Q$2:$Q$9;0);3).
Copy the formula to B4 and change the last number in B4 from 3 to 4.
Select B5.
Enter =VLOOKUP(INDEX($P$2:$T$9;MATCH($B$1;$Q$2:$Q$9;0);1);$P$2:$T$9;5)).
I had you enter the last formula that way because I wanted to show you the power of nesting (nesting means using functions in functions). Nesting is really nice, but It can get pretty crazy.
Type Turner into B2.
Press Enter.
Did you notice that we never exposed an ID number? How did we do that if VLOOKUP needs the Employee ID to perform its search? We worked around the Employee ID by substituting it with the INDEX-MATCH combination, thus keeping private information secure and giving VLOOKUP what it needs! If you really want to make sure no one can see the ID numbers, hide the columns.
Select columns P through T.
Click Format > Column > Hide.
Let me explain what's happening. MATCH is using the text in B1 to identify a row position in the array (Turner is in row 4). INDEX then returns the data in the 4th row and the 3rd column to cell B3 (First Name). INDEX also returns the data in the 4th row and the 4th column to cell B4 (Department). In the last formula, INDEX returns data from the 4th row and 1st column, which is passed to VLOOKUP as the Search criterion. VLOOKUP uses the Search criterion to get data in the 5th column (No. Employees). That's all there is to it!
If you want the practice, wrap everything with IF to blank the three result cells when the Last Name field is blank. When you're done, rename the worksheet MATCH and save your work, then we'll move on to the CHOOSE function.
CHOOSE
You'll be happy to hear that the CHOOSE function is really easy. The syntax is:
CHOOSE(Index; value1;...value30).
Index is a number from 1 to 30. The values are an embedded list; usually text. CHOOSE is useful when you want to look up things like days of the week, months of the year, colors of the rainbow, or any other categorical data.
CHOOSE differs from the other look-up functions in that there is always a one-to-one relationship between the index values and the list values. Let's see how it works by building a teacher's grade book.
Insert a new worksheet. (Insert > Sheet...)
Rename the worksheet CHOOSE.
Copy the Last Name and First Name columns from the MATCH worksheet to the CHOOSE worksheet.
Sort them by Last Name in ascending order.
AutoFill cells C1:L1 with numbers 1-10.
Type Average in M1.
Select the top row.
Format the cells with a bold font; background color of Gray 10%; center alignment; and a bottom space of 3.00pt.
Insert a row between each name.
Enter 0 in R1.
Enter 60 in R2.
Enter 65 in R3.
Select R2:R3.
Drag the selection to R9. (AutoFill with increments of 5.)
AutoFill S1:S9 with numbers 1-9.
Select C3.
Enter =IF(C2="";"";CHOOSE(LOOKUP(C2;$R$1:$R$9;$S$1:$S$9);"F";"D";"D+";"C";"C+";"B";"B+";"A";"A+")).
Drag the formula to M3.
Copy line 3 to line 5, line 7..., line 17.
Select Light Blue as the font color for the rows in Step 19.
Select M2.
Enter =AVERAGE(C2:L2).
Copy M2 to M4, M6,...M16.
Format the table to suit your tastes.
Enter 61 in C2. (The letter grade should be D.)
Enter more grades to make sure everything works correctly.
Save your work.
As you can see, the CHOOSE function has a lot of potential. You could use it with various lists to add cool stuff to your spreadsheet. Here's a fun application: Do you want to know the day of the week you were born on?
Click on B27
Enter =CHOOSE(WEEKDAY(A27);"Sunday";"Monday";"Tuesday";
"Wednesday";"Thursday";"Friday";"Saturday").
Enter your birthday in A27.
You can also enter cell references instead of typing text. An alternative to the formula above is:
=CHOOSE(WEEKDAY(A27);U1;U2;U3;U4;U5;U6;U7).
You would put the days of the week into the cells, beginning with Sunday in cell U1. The same could be done with the grade list. The cell references must be individual cells; CHOOSE will not accept cell ranges.
What's In The Middle?
We have discussed what happens when a Search Criterion falls between two Search vector values; the look-up functions return the lower value. But what if we need a result_vector value between two Search vector values? What do you do? You interpolate.
You interpolate numbers almost daily; however, most of the time you do it in your head without stopping to think how it's done. For instance, what number is halfway between 0 and 5? The number 2.5 probably comes to mind immediately. Most of us know that from experience, but how do you calculate other intermediate values?
Before you can interpolate between values you have to know something about the data. Some data sets are linear, which means that if you were to plot them on a chart they would form a straight line. Other data sets are nonlinear, meaning that a plot of the data would form a curved line. We're going to keep things simple by assuming that we can interpolate values by connecting data points with straight lines, even if the points form a curve (this assumption is good for most common data you will encounter). Using this assumption to get intermediate values is called linear interpolation. If the assumption is not valid, then you have to use nonlinear interpolation, which is a topic too complex to cover in this tutorial.
Now let's talk about lines. If you think way back to your grade school days (that's a long, long time for some of us), you probably learned that if you know two points on a line, you can find any other point. Do you recall something like that? You may also recall that the equation of a line has the general form of y = mx + b, where m is the slope (rise over run) of the line and b is the y-intercept (the point where the line crosses the y axis). The line equation is the basis for the method I am going to present. Let's look at some data.
Add a new sheet to the workbook you are using for this tutorial.
Rename it Interp.
Enter the data below, beginning in F1.
-
1 | 7 |
2 | 11 |
3 | 17 |
4 | 25 |
5 | 35 |
6 | 47 |
7 | 61 |
8 | 77 |
9 | 95 |
10 | 115 |
11 | 137 |
12 | 161 |
13 | 187 |
14 | 215 |
15 | 245 |
16 | 277 |
17 | 311 |
18 | 347 |
19 | 385 |
20 | 425 |
Add a chart and look at the graph of the data points. (Definitely not a straight line!)
Note: The data are the result of using the quadratic equation y = x2 + x + 5 if you want to check your answers.
What is the result_vector value for a Search criterion of 6.5? The exact answer from the equation is 53.75. If you calculated it by hand, you would do this way:
= 54
If you look at the calculation closely you will see that it is the equation of a line in the form of y = m*x + b. Also note the difference between the exact answer and the interpolated value. There is always error involved when you interpolate. In this case the difference is only 0.46%, which is negligible.
So what? What does that have to do with using look-up functions to interpolate between values? Well, now you have a pattern to follow! I've broken it down below:
Assume we enter 6.5 in A1.
(61-47): INDEX($F$1:$G$20;MATCH(A1;$F$1:$F$20)+1;2) – LOOKUP(A1;$F$1:$F$20;$G$1:$G$20). (Note the use of MATCH(...)+1 to get 61.)
(7-6): INDEX($F$1:$G$20;MATCH(A1;$F$1:$F$20)+1;1) – INDEX($F$1:$G$20;MATCH(A1;$F$1:$F$20);1).
(6.5-6): A1-INDEX($F$1:$G$20;MATCH(A1;$F$1:$F$20);1).
47: LOOKUP(A1;$F$1:$F$20;$G$1:$G$20).
Put it all together in B1.
Enter =(((INDEX($F$1:$G$20;MATCH(A1;$F$1:$F$20)+1;2) – LOOKUP(A1;$F$1:$F$20;$G$1:$G$20)))/(INDEX($F$1:$G$20;MATCH(A1;$F$1:$F$20)+1;1) – INDEX($F$1:$G$20;MATCH(A1;$F$1:$F$20);1)))*(A1-INDEX($F$1:$G$20;MATCH(A1;$F$1:$F$20);1))+LOOKUP(A1;$F$1:$F$20;$G$1:$G$20).
Yes!!! The answer in B1 is 54! Enter more values in the A column, then drag the formula down. You should get good answers down the B column. If you have more than one column of data, you could use VLOOKUP and HLOOKUP instead of LOOKUP; just make sure that your column numbers in INDEX and MATCH point to the appropriate column(s).
Summary
Congratulations! You have learned to use six different look-up techniques: LOOKUP, VLOOKUP, HLOOKUP, MATCH, INDEX, and CHOOSE. In addition, you learned how to use the IF statement to control how empty cells behave, how to nest functions to supercharge your look-up capabilities, and how to to get result_vector values between Search vector values. The skills you learned will make you a valuable spreadsheet user. Keep up the good work. Thanks for giving me your time and attention. Good Luck!