Excel Cheat Sheet
Excel Cheat Sheet
Category: Formulas / General VBA | [Item URL] Here's a VBA function that might be useful in some situations. The ExactWordInString functions returns True if a specified word is contained in a text string. You might think that this function is just a variation on Excel's FIND function or VBA's Instr function. There's a subtle difference. The ExactWordInString function looks for a complete word -- not text that might be part of a different word. The examples in the accompanying figure should clarify how this function works. Cell C2 contains this formula, which was copied to the cells below:
=ExactWordInString(A2,B2)
The function identifies the complete word trapped, but not the word trap, which is part of trapped. Also, note that a space is not required after a word in order to identify it as a word. For example, the word can be followed by a punctuation mark. The function, listed below, modified the first argument (Text) and replaces all non-alpha characters with a space character. It then adds a leading and trailing space to both arguments. Finally, it uses the Instr function to determine if the modified Word argument is present in the modified Text argument. To use this function in a formula, just copy and paste it to a VBA module in your workbook.
Function ExactWordInString(Text As String, Word As String) As Boolean ' Returns TRUE if Word is contained in Text as an exact word match Dim i As Long Const Space As String = " " Text = UCase(Text) ' Replace non-text characters with a space For i = 0 To 64 Text = Replace(Text, Chr(i), Space) Next i For i = 91 To 255 Text = Replace(Text, Chr(i), Space) Next i
'
Add initial and final space to Text & Word Text = Space & Text & Space Word = UCase(Space & Word & Space) ExactWordInString = InStr(Text, Word) <> 0
End Function
* Update * Excel MVP Rick Rothstein sent me a much simpler function that produces the same result. In fact, it uses just one statement:
Function ExactWordInString(Text As String, Word As String) As Boolean ExactWordInString = " " & UCase(Text) & " " Like "*[!A-Z]" & UCase(Word) & "[!A-Z]*" End Function
This formula is not always accurate, however. If you specify a day number that doesn't exist (for example, the 6th Friday), it returns a date in the following month. Cell D6 contains a modified formula that displays "(none)" if the date isn't in the month specified. This formula is much longer:
=IF(MONTH(DATE(C3,C4,1+((C6-(C5>=WEEKDAY(DATE(C3,C4,1))))*7)+ (C5-WEEKDAY(DATE(C3,C4,1)))))<>C4,"(none)",DATE(C3,C4,1+ ((C6-(C5>=WEEKDAY(DATE(C3,C4,1))))*7)+(C5-WEEKDAY(DATE(C3,C4,1)))))
In some cases, you might need to determine the last occurrence of a day in a particular month. This calculation requires a different formula (refer to the figure below):
=DATE(C9,C10+1,1)-1+IF(C11>WEEKDAY(DATE(C9,C10+1,1)-1), C11-WEEKDAY(DATE(C9,C10+1,1)-1)-7,C11-WEEKDAY(DATE(C9,C10+1,1)-1))
In this figure, the formula in cell D10 displays the date of the last Friday in March, 2008. The download file for this tip contains another example that has an easy-to-use interface. The user can select the parameters from drop-down lists. The megaformula in the Calculated Date column is very complex because it needs to covert words into values.
If you copy this formula and paste it to the next column, the references are adjusted and the pasted formula is:
=SUM(B2:B13)
Making an exact copy of a single formula is easy: Press F2, highlight the formula, and press Ctrl+C to copy it as text. Then paste it to another cell. In some situations, however, you might need to make an exact copy of a range of formulas. In an older tip, I described a rather complicated way to do this. See Making An Exact Copy Of A Range Of Formulas. Matthew D. Healy saw that tip and shared another method, which uses Notepad. Here's how it works:
1. Put Excel in formula view mode. The easiest way to do this is to press Ctrl+` (that character is a "backwards apostrophe," and is usually on the same key that has the ~ (tilde).
2. Select the range to copy. 3. Press Ctrl+C 4. Start Windows Notepad 5. Press Ctrl+V to past the copied data into Notepad 6. In Notepad, press Ctrl+A followed by Ctrl+C to copy the text 7. Activate Excel and activate the upper left cell where you want to paste the formulas. And, make sure that the sheet you are copying to is in formula view mode. 8. Press Ctrl+V to paste. 9. Press Ctrl+` to toggle out of formula view mode. Note: If the paste operation back to Excel doesn't work correctly, chances are that you've used Excel's Text-to-Columns feature recently, and Excel is trying to be helpful by remembering how you last parsed your data. You need to fire up the Convert Text to Columns Wizard. Choose the Delimited option and click Next. Clear all of the Delimiter option checkmarks except Tab.
Calculating Easter
Category: Formulas | [Item URL]
Easter is one of the most difficult holidays to calculate. Several years ago, a Web site had a contest to see who could come up with the best formula to calculate the date of Easter for any year. Here's one of the formulas submitted (it assumes that cell A1 contains a year):
=DOLLAR(("4/"&A1)/7+MOD(19*MOD(A1,19)-7,30)*14%,)*7-6
Just for fun, I calculated the date of Easter for 300 years from 1900 through 2199. Then I created a pivot table, and grouped the dates by day. And then, a pivot chart:
During this 300-year period, the most common date for Easter is March 31 (it occurs 13 times on that data). The least common is March 24 (only one occurrence). I also learned that the next time Easter falls on April Fool's Day will be in 2018.
To create an Excel formula to convert a Unix timestamp to a readable data and time, start by converting the seconds to days. This formula assumes that the Unix timestamp is in cell A1:
=(((A1/60)/60)/24)
Then, you need to add the result to the date value for January 1, 1970. The modified formula is:
=(((A1/60)/60)/24)+DATE(1970,1,1)
Finally, you need to adjust the formula for the GMT offset. For example, if you're in New York the GMT offset is -5. Therefore, the final formula is:
=(((A1/60)/60)/24)+DATE(1970,1,1)+(-5/24)
A simpler (but much less clear) formula that returns the same result is:
=(A1/86400)+25569+(-5/24)
Both of these formulas return a date/time serial number, so you need to apply a number format to make it readable as a date and time.
Naming Techniques
Most Excel users know how to name cells and ranges. Using named cells and ranges can make your formulas more readable, and less prone to errors. Most users, however, don't realize that Excel lets you provide names for other types of items. This document describes some useful naming techniques that you may not be aware of.
Naming a constant
If formulas in your worksheet use a constant value (such as an interest rate), the common procedure is to insert the value for the constant into a cell. Then, if you give a name to the cell (such as InterestRate), you can use the name in your formulas. Here's how create a named constant that doesn't appear in a cell:
1. Select the Insert Name Define command to display the Define Name dialog box.
2. Enter the name (such as InterestRate) in the field labeled Names in workbook. 3. Enter the value for the name in the Refers to field (this field normally holds a formula). For example, you can enter =.075. 4. Click OK
Try it out by entering the name into a cell (preceded by an equal sign). For example, if you defined a name called InterestRate, enter the following into a cell:
=InterestRate
This formula will return the constant value that you defined for the InterestRate name. And this value does not appear in any cell.
You can use the Define Name dialog box and edit the formula for a name. And you can use all of the standard operators and worksheet functions. Try this: 1. Create a name for cell D4. Call it Amount. 2. Enter =Amount into any cell. The cell will display the value in cell D4. 3. Use the Insert Name Define command and edit the refers to field so it appears as =$D$4*2 You'll find that entering =Amount now displays the value in cell D4 multiplied by 2.
You'll find that this formula always returns the contents of the cell directly below. NOTE: It's important to understand that the formula you enter in Step 4 above depends on the active cell. Since cell A1 was the active cell, =A2 is the formula that returns the cell below. If, for example, cell C6 was the active cell when you created the name, you would enter =C7 in step 4.
3. In the Names in workbook field, enter SumAbove. 4. In the Refers to field, enter =SUM(A$1:A2) Notice that the formula in Step 3 is a mixed reference (the row part is absolute, but the column part is relative). Try it out by entering =SumAbove into any cell. You'll find that this formula returns the sum of all cells in the column from Row 1 to the row directly above the cell.
'
Range("A1") = "Address" Range("B1") = "Formula" Range("C1") = "Value" Range("A1:C1").Font.Bold = True End With ' Process each formula Row = 2 For Each Cell In FormulaCells Application.StatusBar = Format((Row - 1) / FormulaCells.Count, "0%") With FormulaSheet Cells(Row, 1) = Cell.Address _ (RowAbsolute:=False, ColumnAbsolute:=False) Cells(Row, 2) = " " & Cell.Formula Cells(Row, 3) = Cell.Value Row = Row + 1 End With Next Cell ' Adjust column widths FormulaSheet.Columns("A:C").AutoFit Application.StatusBar = False End Sub
The DCOUNT function. The data must be set up in a table, and a separate criterion range is required. The COUNT function. Simply counts the number of cells in a range that contain a number. The COUNTA function. Counts the number of non-empty cells in a range. The COUNTBLANK function. Counts the number of empty cells in a range. The COUNTIF function. Very flexible, but often not quite flexible enough. An array formula. Useful when the other techniques won't work.
Formula Examples
Listed below are some formula examples that demonstrate various counting techniques. These formula all use a range named data. To count the number of cells that contain a negative number:
=COUNTIF(data,"<0")
To count the number of cells that contain the word "yes" (not case sensitive):
=COUNTIF(data,"yes")
To count the number of cells that contain text that begins with the letter "s" (not case-sensitive):
=COUNTIF(data,"s*")
To count the number of cells that contain the letter "s" (not case-sensitive):
=COUNTIF(data,"*s*")
To count the number of cells that contain either "yes" or "no" (not case-sensitive):
=COUNTIF(data,"yes")+COUNTIF(data,"no")
To count the number of cells that contain a value between 1 and 10:
=COUNTIF(data,">=1")-COUNTIF(data,">10")
To count the number of cells that contain an error value (this is an array formula, entered with Ctrl+Shift+Enter):
=SUM(IF(ISERR(data),1,0))
The other formula examples listed above can also be converted to VBA.
Linear Trendline
Equation: y = m * x + b m: =SLOPE(y,x) b: =INTERCEPT(y,x)
Logarithmic Trendline
Equation: y = (c * LN(x)) + b c: =INDEX(LINEST(y,LN(x)),1) b: =INDEX(LINEST(y,LN(x)),1,2)
Power Trendline
Equation: y=c*x^b c: =EXP(INDEX(LINEST(LN(y),LN(x),,),1,2)) b: =INDEX(LINEST(LN(y),LN(x),,),1)
Exponential Trendline
Equation: y = c *e ^(b * x) c: =EXP(INDEX(LINEST(LN(y),x),1,2)) b: =INDEX(LINEST(LN(y),x),1)
b = =INDEX(LINEST(y,x^{1,2}),1,3)
Following are step-by-step instructions to accomplish this task without using VBA (contributed by Bob Umlas): 1. Select the source range (A1:D10 in this example). 2. Group the source sheet with another empty sheet (say Sheet2). To do this, press Ctrl while you click the sheet tab for Sheet2 3. Select Edit - Fill - Across worksheets (choose the All option in the dialog box). 4. Ungroup the sheets (click the sheet tab for Sheet2) 5. In Sheet2, the copied range will be selected. Choose Edit - Cut. 6. Activate cell A11 (in Sheet2) and press Enter to paste the cut cells. A11.D20 will be selected. 7. Re-group the sheets. Press Ctl and click the sheet tab for Sheet1 8. Once again, use Edit - Fill - Across worksheets. 9. Activate Sheet1, and you'll find that A11:D20 contains an exact replica of the formulas in A1:D10. Note: For another method of performing this task, see Making An Exact Copy Of A Range Of Formulas, Take 2.
Excel's Conditional Formatting feature has many uses. Suppose you need to compare two lists, and identify the items that are different. The figure below shows an example. These lists happen to contain text, but this technique also works with numeric data.
The first list is in A2:B19, and this range is named OldList. The second list is in D2:E19, and the range is named NewList. The ranges were named using the Insert - Name - Define command. Naming the ranges is not necessary, but it makes them easier to work with. As you can see, items in OldList that do not appear in NewList are highlighted with a yellow background. Items in NewList that do not appear in OldList are highlighted with a green background. These colors are the result of Conditional Formatting.
How to do it
1. 2. 3. 4. Start by selecting the OldList range. Choose Format - Conditional Formatting In the Conditional Formatting dialog box, use the drop-down list to choose Formula is. Enter this formula:
=COUNTIF(NewList,A2)=0
5. Click the Format button and specify the formatting to apply when the condition is true (a yellow background in this example). 6. Click OK
The cells in the NewList range will use a similar conditional formatting formula. 1. 2. 3. 4. Select the NewList range. Choose Format - Conditional Formatting In the Conditional Formatting dialog box, use the drop-down list to choose Formula is. Enter this formula:
=COUNTIF(OldList,D2)=0
5. Click the Format button and specify the formatting to apply when the condition is true (a green background in this example). 6. Click OK Both of these conditional formatting formulas use the COUNTIF function. This function counts the number of times a particular value appears in a range. If the formula returns 0, it means that the item does not appear in the range. Therefore, the conditional formatting kicks in and the cell's background color is changed. The cell reference in the COUNTIF function should always be the upper left cell of the selected range.
Q. Whenever I open a particular Excel workbook, I get a message asking if I want to update the links. I've examined every formula in the workbook, and I am absolutely certain that the workbook contains no links to any other file. What can I do to convince Excel that the workbook has no links? You've encountered the infamous "phantom link" phenomenon. I've never known Excel to be wrong about identifying links, so there's an excellent chance that your workbook does contain one or more links -- but they are probably not formula links. Follow these steps to identify and eradicate any links in a workbook. 1. Select Edit, Links. In many cases, this command may not be available. If it is available, the Links dialog
box will tell you the name of the source file for the link. Click the Change Source button and change the link so it refers to the active file. 2. Select Insert, Name, Define. Scroll through the list of names in the Define Name dialog box and examine the Refers to box (see the figure below). If a name refers to another workbook or contains an erroneous reference such as #REF!, delete the name. This is, by far, the most common cause of phantom links 3. If you have a chart in your workbook, click on each data series in the chart and examine the SERIES formula displayed in the formula bar. If the SERIES formula refers to another workbook, you've identified
your link. To eliminate the link move or copy the chart's data into the current workbook and recreate your chart. 4. If your workbook contains any custom dialog sheets, select each object in each dialog sheet and examine the formula bar. If any object contains a reference to another workbook, edit or delete the reference.
Next, save your workbook and then re-open it. It should open up without asking you to update the links.
Because Excel stores dates and times as numeric values, it's possible to add or subtract one from the other. However, if you have a workbook containing only times (no dates), you may have discovered that subtracting one time from another doesn't always work. Negative time values appear as a series of hash marks (########), even though you've assigned the [h]:mm format to the cells. By default, Excel uses a date system that begins with January 1, 1900. A negative time value generates a date/time combination that falls before this date, which is invalid. The solution is to use the optional 1904 date system. Select Tools, Options, click the Calculation tab, and check the 1904 date system box to change the starting date to January 2, 1904. Your negative times will now be displayed correctly, as shown below. Be careful if you workbook contains links to other files that don't use the 1904 date system. In such a case, the mismatch of date systems could cause erroneous results.
Q. I often import data into Excel from various applications, including Access. I've found that values are sometimes imported as text, which means I can't use them in calculations or with commands that require values. I've tried formatting the cells as values, with no success. The only way I've found to convert the text into values is to edit the cell and then press Enter. Is there an easier way to make these conversions?
This is a common problem in Excel. The good news is the Excel 2002 is able to identify such cells and you can easily correct them If you're using an older version of Excel, you can use this method: 1. Select any empty cell 2. Enter the value 1 into that cell 3. Choose Edit, Copy 4. Select all the cells that need to be converted 5. Choose Edit, Paste Special 6. In the Paste Special dialog box, select the Multiply option, then click OK. This operation multiplies each cell by 1, and in the process converts the cell's contents to a value.
8. Click the Format button and select the type of formatting you want for the cells that contain a formula.
9. Click OK. After you've completed these steps, every cell that contains a formula and is within the range you selected in Step 4 will display the formatting of your choice.
How does it work? The key component is creating a named formula in Steps 2 and 3. This formula, unlike standard formulas, doesn't reside in a cell, but it still acts like a formula by returning a value -- in this case either 'True' or 'False'. The formula uses the GET.CELL function, which is part of the XLM macro language (VBA's predecessor) and cannot be used directly in a worksheet. Using a value of 48 as the first argument for GET.CELL causes the function to return 'True' if the cell contains a formula. The INDIRECT function essentially creates a reference to each cell in the selected range.
Select Case .Operator Case xlAnd Filter = Filter & " AND " & .Criteria2 Case xlOr Filter = Filter & " OR " & .Criteria2 End Select End With End With Finish: FilterCriteria = Filter End Function
After you've entered the VBA code, you can use the function in your formulas. The single-cell argument for the FilterCriteria function can refer to any cell within the column of interest. The formula will return the current AutoFilter criteria (if any) for the specified column. When you turn AutoFiltering off, the formulas don't display anything. The figure below shows the FilterCriteria in action. The function is used in the cells in row 1. For example, cell A1 contains this formula:
=FilterCriteria(A3)
As you can see, the list is currently filtered to show rows in which column A contains January, column C contains a code of A or B, and column D contains a value greater than 125 (column B is not filtered, so the formula in cell B1 displays nothing). The rows that don't match these criteria are hidden.
In the real world, a simple average often isn't adequate for your needs. For example, an instructor might calculate student grades by averaging a series of test scores but omitting the two lowest scores. Or you might want to compute an average that ignores both the highest and lowest values. In cases such as these, the AVERAGE function won't do, so you must create a more complex formula. The following Excel formula computes the average of the values contained in a range named "scores," but excludes the highest and lowest values:
=(SUM(scores)-MIN(scores)-MAX(scores))/(COUNT(scores)-2)
Here's an example that calculates an average excluding the two lowest scores:
=(SUM(scores)-MIN(scores)-SMALL(scores,2))/(COUNT(scores)-2)
Column A consists of formulas that refer to column B. The formula in cell A1 is: =IF(B1<>"",COUNTA($B$1:B1)&".","") This formula, which is copied down to the other cells in column A, displays the next consecutive item number if the corresponding cell in column B is not empty. If the cell in column B is empty, the formula displays nothing. As items are added or deleted from column B, the numbering updates automatically.
To create this type of message box for your worksheet: 1. Select the cells for which you need to punch in unique entries (here, the correct range to select is A2:A20).
Conditional Formatting. You'll need to keep this in mind when you're cutting and pasting in mission-critical applications.
The formula in cell H4 looks up the entries in cells H2 and H3 and then returns the corresponding value from the table. The formula in H4 is: =INDEX(A1:E14, MATCH(H2,A1:A14,0), MATCH(H3,A1:E1,0)).
The formula uses the INDEX function, with three arguments. The first is the entire table range (A1:A14). The second uses the MATCH function to return the offset of the desired month in column A. The third argument uses the MATCH function to return the offset of the desired product in row 1. You may prefer to take advantage of Excel's natural-language formulas. For example, enter the following formula to return Sprocket sales for June: =June Sprockets If natural-language formulas aren't working, select Tools, Options, click the Calculation tab, and place a check mark next to "Accept labels in formulas." Be aware that using natural language formulas is not 100% reliable!
Press F5 to display the Go To dialog box, and click the Special button. In the Go To Special dialog, choose the Constants button and select Numbers. When you click OK, the nonformula numeric cells will be selected. Press Delete to delete the values. The Go To Special dialog box has many other options for selecting cells of a particular type.
The other, more efficient approach also uses the Paste Special dialog box. To increase a range of values (prices, in this example) by 5 percent: 1. Enter 1.05 into any blank cell. 2. Select the cell and choose Edit, Copy. 3. Select the range of values and choose Edit, Paste Special. 4. Choose the Multiply option and click OK. 5. Delete the cell that contains the 1.05.
When you enter this formula, you must press Ctrl-Shift-Enter. Pressing only Enter will give you the wrong result. Excel will place brackets around the formula to remind you that you've created an array formula.
The preceding formula works fine in many cases, but it will return an error if the range contains any blank cells. The formula below (also an array formula, so input it with Ctrl-Shift-Enter) is more complex, but it will handle a range that contains a blank cell.
The steps below are specific to this example. But they can easily be adapted to other types of data transformations.
2. Choose Edit - Copy 3. Select the first cell in the original data column (in this case, cell A2). 4. Choose Edit - Paste Special. This displays the Paste Special dialog box. 5. In the Paste Special dialog box, click the Value option button. This step is critical. It pastes the results of the formulas -- not the formulas. 6. Click OK. At this point, the worksheet looks like this:
Creating A Megaformula
This tip describes how to create what I call a "megaformula" -- a single formula that does the work of several intermediate formulas.
An Example
The goal is to create a formula that returns the string of characters following the final occurrence of a specified character. For example, consider the text string below (which happens to be a URL): http://spreadsheetpage.com/index.php/tips Excel does not provide a straightforward way to extract the characters following the final slash character (i.e., "tips") from this string. It is possible, however, do do so by using a number of intermediate formulas. The figure below shows a multi-formula solution. The original text is in cell A1. Formulas in A2:A6 are used to produce the desired result. The formulas are displayed in column B.
Following is a description of the intermediate formulas (which will eventually be combined into a single formula).
=RIGHT(A1,LEN(A1)-FIND(CHAR(1),SUBSTITUTE(A1,"/",CHAR(1),LEN(A1)LEN(SUBSTITUTE(A1,"/",""))))) The formula now refers only to cell A1, and the intermediate formula are no longer necessary. This single formula does the work of five other formulas. This general technique can be applied to other situations in which a final result uses several intermediate formulas. NOTE: You may think that using such a complex formula would cause the worksheet to calculate more slowly. In fact, you may find just the opposite: Using a single formula in place of multiple formulas may speed up recalculation. Any calculation speed differences, however, will probably not be noticeable unless you have thousands of copies of the formula.
The formula below, for example, returns 1 if cell A1 contains "A". If cell A1 does not contain "A", the formula returns an empty string. =IF(A1="A",1,"") For more decision-making power, you can "nest" IF functions within a formula. In other words, you can use an IF function as the second argument for an IF function. Here's an example: =IF(A1="A",1,IF(A1="B",2,IF(A1="C",3,""))) This formula checks cell A1. If it contains "A", the formula returns 1. If it doesn't contain "A", then the second argument is evaluated. The second argument contains another IF function that determines if A1 contains a "B". If so, the formula returns 2; if not, the formula evaluates the IF function contained in the second argument and checks to see if A1 contains "C". If so, it returns 3; otherwise, it returns an empty string. Excel allows up to seven levels of nested IF functions. The formula below works correctly, but Excel will not allow you to nest the IF functions any deeper than this. =IF(A1="A",1,IF(A1="B",2,IF(A1="C",3,IF(A1="D",4, IF(A1="E",5,IF(A1="F",6,IF(A1="G",7,IF(A1="H",8,"")))))))) The sections that follow present various ways to get around the limit of seven nested IF functions. Be aware that these techniques may not be appropriate for all situations.
Note:
In many cases, you can avoid using IF functions and use a VLOOKUP function. This will require a separate table in your worksheet. In the figure below, the lookup table is in B1:C10. The formula in A2 is: =VLOOKUP(A1,B1:C10,2)
Another alternative, suggest by Daniel Filer is to use Boolean multiplication. This technique takes advantage of the fact that, when multiplying, TRUE is treated as 1 and FALSE is treated as 0. Here's an example: =(A1="A")*1+(A1="B")*2+(A1="C")*3+(A1="D")*4+(A1="E")*5 +(A1="F")*6+(A1="G")*7+(A1="H")*8+(A1="I")*9+(A1="J")*10
The formula automatically reduces the "fraction" to the simplest form, and it allows up to four characters on either side of the colon. Jerry Meng pointed out a much simpler formula that produces the same result, but does not have the four-character limit:
=A1/GCD(A1,B1)&":"&B1/GCD(A1,B1)
Jerry's formula uses the GCD function, which is available only when the Analysis Toolpak Add-In is installed. Note: Be aware that the result of these formulas is a text string, not a fractional value. For example, the ratio of 1:8 is not the same as 1/8.
has enabled it to evolve into a far more useful function, and explains some of the techniques being deployed. This article comes in two parts. This first part discusses SUMPRODUCT, how it has evolved, how it works, whilst Part 2 provides a number of real world problems and the solutions,
returns 140, or (1*10)+(2*20)+(3*30)=10+40+90=140. This is a useful function, but nothing more than that. A further, more 'creative' use of SUMPRODUCT has evolved, and is still evolving, driven as far as I can see mainly by the regular contributors of the Microsoft Excel newsgroups. This has been a creative and productive process
that has significantly increased the useability of SUMPRODUCT, but in a way that you will not find documented in Excel's Help.
C Price 7,500 8,300 6,873 11,200 13,200 14,999 17,500 23,500 18,000
which returns 4.
But supposing that we want a count of how many Fords are sold in June, or the value of them? The number can be calculated with
=SUM(IF(A1:A10="Ford",IF(B1:B10="June",1,0),0))
which is an array formula so is committed with Ctrl-Shift-Enter, not just Enter. Similarly, the value is obtained with
=SUM(IF(A1:A10="Ford",IF(B1:B10="June",C1:C10,0),0))
which is also an array formula. But as this page is about SUMPRODUCT, you would expect that we could use that function in this case, and we can. The solution for the number of Fords sold in June using this function is
=SUMPRODUCT((A1:A10="Ford")*(B1:B10="June")).
=SUMPRODUCT((A1:A10="Ford")+(A1:A10="Renault"))
which returns the result 6 as expected[1]. So far, so good, in that we have a versatile function that can do any number of conditional tests, and has an inbuilt flexibility that provides extensibility. Its power is augmented when combined with other functions, such as can be found in the examples page[2].
Advantages of SUMPRODUCT
Multiple conditional tests are a major advantage of the SUMPRODUCT function as descibed above, but it has two other considerable advantages. The first is that it can function with closed workbooks, and the second is that the handling of text values can be tailored to the requirement. In the case of another workbook, the SUMIF function can be used to calculate a value, such as in
=SUMIF('[Nowfal Rates.xls]RATES'!$K$11:$K$13,"gt;1")
This is fine in itself, and the value remains if the other workbook is closed, but as soon as the sheet is re-calculated, the formula returns #VALUE. Similarly, if the formula is entered with the other workbook already closed, a #VALUE is immediately returned.
SUMPRODUCT, however, overcomes this problem. The formula =SUMPRODUCT(--('[Nowfal Rates.xls]RATES'!$K$11:$K$13>1),--('[Nowfal Rates.xls]RATES'! $K$11:$K$13))
returns the same result, but it will still work when the other workbook is closed and the sheet is recalculated, and can be initially entered referencing the closed workbook, without a #VALUE error. The second major advantage is being able to handle text in numeric columns differently. Consider the follwoing dataset, as shown in Table 2. A 1 2 3 4 Item x y x Table 2. If we are looking at rows 1:4. we can see that we have a text value in B1 In this case it is simply a heading row, but the principle applies to a text value in any row. Using SUMPRODUCT, we can either return an error, or ignore the text. This can be useful if we want to ignore errors, or if we want to trap the error (and presumably correct it later). Errors will be returned if we use this version
=SUMPRODUCT((A1:A4="x")*(B1:B4))
B Number 1 2 3
To ignore errors, use this amended version which uses the double unary operator (see SUMPRODUCT Explained below for details)
=SUMPRODUCT(--(A1:A4="x"),(B1:B4))
And a third, most significant advantage, is that the conditional test range or the condition can be constructed in a huge number of ways to facilitate the requirement, such as
LEFT(A1:A10), ISNUMBER(MATCH(A1:A10,{"apples","pears"},0),or ISNUMBER(MATCH(K2:K30,ROW(INDIRECT(TODAY()&":"&TODAY()+10)),0))
SUMPRODUCT Explained
Understanding how SUMPRODUCT works helps to determine where to use it, how to can construct thus formula, and thus how it can be extended. Table 3. below shows an example data set that we will use. A 9 10 11 12 13 14 `5 Ford Vauxhall Ford Ford Ford Ford Ford B B C A A D A A C 3 4 2 1 4 3 2
16 17 18 19 20
A A A A A
8 6 8 7 6
Table 3. In this example, the problem is to find how many Fords with a category of "A" were sold. A9:A20 holds the make, B9:B20 has the category, and C9:C20 has the number sold. The formula to get this result is =SUMPRODUCT((A9:A20="Ford")*(B9:B20="A")*(C9:C20)). The first part of the formula (A9:A20="Ford") checks the array of makes for a value of Ford. This returns an array of TRUE/FALSE, in this case it is {TRUE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,TRUE} Similarly, the categories are checked for the vale A with (B9:B20="A"). Again, this returns an array of TRUE/FALSE, or {FALSE,FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE} And finally, the numbers are not checked but taken as is, that is (C9:C20), which returns an array of numbers {3,4,2,1,4,3,2,8,6,8,7,6} So now we have three arrays, two of TRUE/FALSE values, one of numbers. This is showm in Table 4. A 9 11 12 13 14 15 17 18 19 20 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE B * FALSE * * * * * * * * * TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE * * * * * * * * * C 3 4 2 1 4 3 2 8 6 8 7 6
10 FALSE * FALSE *
* FALSE *
16 FALSE *
well as an array of numbers. By using the '*' (multiply) operator, we can get numeric values that can be summed. '*' has the effect of coercing these two arrays into a single array of 1/0 values. Multiplying TRUE by TRUE returns 1 (try it, enter =TRUE*TRUE in a cell and see the result), any
other combination returns 0. Therefore, when both conditions are satisfied, we get a 1, whereas if any or both conditions are not satisfied, we get a 0. Multiplying the first array of TRUE/FALSE values by the second array of TRUE/FALSE values returns a composite array of 1/0 values, or {0,0,1,1,0,1,1,0,1,1,1,1}. This subsequent array of 1/0 values is then multiplied by the array of numbers sold to give a further array, an array of numbers sold that satisfy the two test conditions. SUMPRODUCT then sums the members of this array to give the count. Table 4. shows the values that the conditional tests break down to before being acted upon by the '*' operator. Table 5. shows a virtual representation of those TRUE/FALSE values as their numerical equivalents of 1/0 and the individual multiplication results. From this, you should be able to see how SUMPRODUCT arrives at its result, namely 35. A 9 10 11 12 13 14 15 16 17 18 19 20 1 0 1 1 1 1 1 0 1 1 1 1 * * * * * * * * * * * * B 0 0 1 1 0 1 1 1 1 1 1 1 Table 5. Table 6. shows you the same virtual representation of 1/0 numerical values without the numbers sold column, that is using SUMPRODUCT to count the number of rows satisfying the two conditions, or
=SUMPRODUCT((A9:A20=A1)*(B9:B20="A"))
C * * * * * * * * * * * * 3 4 2 1 4 3 2 8 6 8 7 6 0 0 2 1 0 3 2 0 6 8 7 6 35
A 9 10 11 12 13 14 15 16 17 1 0 1 1 1 1 1 0 1 * * * * * * * * *
B 0 0 1 1 0 1 1 1 1 = 0 = 0 = 1 = 1 = 0 = 1 = 1 = 0 = 1
18 19 20
1 1 1
* * * Table 6.
1 1 1
= 1 = 1 = 1 8
If you have been able to follow this explanation all of the way through, it may have occurred to you that although we are using the SUMPRODUCT function, the '*' operators have resolved the multiple arrays into a single composite array, leaving SUMPRODUCT to simply sum the members of that composite array, that is, there is no product. This is perfectly correct, and perfectly valid, SUMPRODUCT can work on a single array (put 1,2,3 in cells A1,A2,A3, and insert =SUMPRODUCT(A1:A3) in a cell, it returns 6 correctly). In reality, we only need the '*' to coerce the arrays that are being tested for a particular condition, we do not need it for the array that is not subject to a conditional test. So we could also use =SUMPRODUCT((A9:A20="Ford")*(B9:B20="A"),(C9:C20)), which does use the product aspect (see more on this in the next section). When using the SUMPRODUCT function, all arrays must be the same size, as corresponding members of each array are multiplied by each other. When using the SUMPRODUCT function, no array can be a whole column (A:A), the array must be for a range within a column (although the best part of a column could be defined with A1:A65535 if so desired). Whole rows (1:1) are acceptable[3]. In a SUMPRODUCT function, the arrays being evaluated cannot be a mix of column and row ranges, they must all be columns, or all rows. However, the row data can be transposed to present it to SUMPRODUCT as columnar - see the Using TRANSPOSE to test against values in a column not row example.
Format of SUMPRODUCT
In the examples presented so far, the format has been
=SUMPRODUCT((array1=condition1)*(array2=condition2)*(array3))
=SUMPRODUCT((array1=condition1)*(array2=condition2),(array3))
which works as the '*' operator is only required to coerce the conditional arrays that resolve to TRUE/FALSE into numeric values. As it the use of a arithmetic operator that coreces the TRUE/FALSE values to 1/0, we could use many different operators and achieve the same result. Thus, it is also possible to coerce each of the conditional arrays individually by multiplying them by 1,
=SUMPRODUCT((array1=condition1)*1,(array2=condition2)*1,(array3))
or
=SUMPRODUCT(1*(array1=condition1),1*(array2=condition2),(array3))
=SUMPRODUCT((array1=condition1)^1,(array2=condition2)^1,(array3))
or by adding 0, or
=SUMPRODUCT((array1=condition1)+0,(array2=condition2)+0,(array3)) =SUMPRODUCT(0+(array1=condition1),0+(array2=condition2),(array3))
=SUMPRODUCT(N(array1=condition1),N(array2=condition2),(array3))
These methods differ from the '*' operator in that they are applied to individual arrays, '*' operates on two arrays. All of these methods work, when there is more than one conditional array, so it is really a matter of preference as to which to use. If there is a single conditional array, then the '*' operator cannot be used (there are not two to multiply), so one of the other above methods has to be used. Yet another method is to use the double unary operator, --, in this way
=SUMPRODUCT(--(array1=condition1),--(array2=condition2),(array3))
The double unary operator also coerces the indivual array(s), which then acts more akin to classic SUMPRODUCT. There has been much discussion that one way is faster than another, or is more of a 'standard' than another, but in reality there will be few instances where one method will gain a noticeable performance advantage over another, and as for standards, this is all new territory, and will mainly be used by people who have never been involved in using these standards, and who care even less. For me, I believe it is a matter of preference. Personally, I am being swayed to the double unary -notation, because it avoids a function call, it works in all situations (the '*' operator won't work on a single array), and I don't like the '1*', '*1', '^1', or '+0' variations. So my preference is for
=SUMPRODUCT(--(array1=condition1),--(array2=condition2),(array3)) which also has more similarity to classic SUMPRODUCT,
There is one other varitaion which has been promoted recently, which is the single unary operator, '-', such as
=SUMPRODUCT(-(array1=condition1),-(array2=condition2),(array3))
but I would not encourage this as it has no real merit that I can see, and has to be paired off, otherwise it will return a negative result. So, to sum up ... Tests, like A=10 normally resolve to TRUE or FALSE, and any operator is only needed if you want to coerce an array of TRUE/FALSE values to 1/0 integers, such as
=SUMPRODUCT(--(B5:B1953=101))
SUMPRODUCT arrays are normally separated by the comma. So, to preserve this format, if you have multiple conditions, you can use the -- on both conditions like so
=SUMPRODUCT(--(B5:B1953=101),--(C5:C1953=7))
But, if you simply multiply two arrays of TRUE/FALSE, that implicitly resolves to 1/0 values that are then summed, you don;t need comma, so you could then use
=SUMPRODUCT((B5:B193=101)*(C5:C193=7))
Any further, final, array of values can use the same operator, or could revert to comma. So your formula can be written as
=SUMPRODUCT(--(B5:B1953=101),--(C5:C1953=7),(D5:D1953))
or or or
or
=SUMPRODUCT(--(B5:B1953=101),--(C5:C1953=7)*(D5:D1953))
If the result is the product of two conditions being multiplied, it is fine to multiply them together as this will coerce the True/False values to 1/0 values to allow the summing
=SUMPRODUCT((condition1)*(condition2))
However, if there is only one condition, you can coerce to 1/0 with the double unary -=SUMPRODUCT(--(condition1))
There is no situation that I know of whereby a solution using -- could not be achieved somehow with a '*'. Conversely, if using the TRANSPOSE function within SUMPRODUCT, then the '*' has to be used. So, as you can see there are a number of possibilities, and you make your own choice. I leave the final word to Harlan Grove, who once wrote this paragraph on why he prefers the double unary operator ... ....as I've written before, it's not the speed of double unary minuses I like, it's the fact that due to Excel's operator precedence it's harder to screw up double unary minuses with typos than it is to screw up the alternatives ^1, *1, +0. Also, since I read left to right, I prefer my number type coercions on the left rather than the right of my Boolean expressions, and -- looks nicer than 1* or 0+. Wrapping Boolean expressions inside N() is another alternative, possibly clearer, but it eats a nested function call level, so I don't use it.
This will load the mCount variable with the number of Fords, 4 in this instance. Similalry, we can use SUMIF to calculate the value Dim mModel As String Dim mCount As Long mModel = "Ford" mValue = Application.WorksheetFunction.SumIf( _ Range("A1:A10"), mModel, Range("C1:C10")) This will load the mCount variable with the value of the Fords, 33873 in this instance. The natural next step is to assume that we can extend this technique to our multiple condition test formulae discussed above. If we are using COUNTIFS and SUMIFS in Excel 2007 (see SUMPRODUCT and Excel 2007) then this is correct. For example, we can count how many Fords were sold in June using Dim mModel As String Dim mMonth As String Dim mCount As Long mModel = "Ford" mMonth = "June" mCount = Application.WorksheetFunction.CountIfs( _ Range("A1:A10"), mModel, _ Range("B1:B10"), mMonth) We get a result of 3 here in our mCount variable. Unfortunately, this technique cannot be extended to array formulae, or conditional testing SUMPRODUCT formulae. For example, a simple formula to count how many Fords were sold in Feb might be
=SUMPRODUCT((A2:A10="Ford")*(B2:B10="Feb"))
(none, as it happens), and you might think that we could use the following VBA to get the same result Dim mModel As String Dim mMonth As String Dim mCount As Long mModel = "Ford" mMonth = "Feb" mCount = Application.WorksheetFunction.Sumproduct( _ Range("A1:A10") = mModel , Range("C1:C10") = mMonth)) This fails to compile, never mind getting the correct result. In this case, VBA is trying to make a simple call to the worksheet function, but when array and these type of SUMPRODUCT formulae are resolved in Excel each item is within the array is resolved and then passed to the main function for SUMming, AVERAGEing, or whatever is being actioned. As VBA doesnt evaluate the ranges, it is not passing correct information to the worksheet function, so we get the error[4]. There is a solution to this problem, and that is to evaluate the function call within VBA, using the VBA Evaluate method, which converts a Microsoft Excel name to an a value. The code here is
mModel As String mMonth As String mFormula As String mCount As Long mModel = "Ford" mMonth = "Feb" mFormula = "SUMPRODUCT((A1:A10=""" & mModel & _ """)*(B1:B10=""" & mMonth & """))" mCount = Application.Evaluate(mFormula)
Although there is more effort required to ensure that the syntax of the function call is properly constructed, and that strings tested against are properly formed with quotes around them[5], it is still a useful technique to have, and provides the capability to use SUMPRODUCT (and by association, array formulae) within VBA.
where we count those items where A1:A10 is = Ford AND B1:B10 = June, and where A1:A10 = Ford AND B1: B10 = June multiplied by C1:C10. In Excel 2007, COUNTIFS and SUMIFS can be used in place of SUMPRODUCT. The Excel 2007 formulae would be
=COUNTIFS(A1:A10,"Ford",B1:B10,"June") =SUMIFS(C1:C10,A1:A10,"Ford",B1:B10,"June")
A further improvement is that in Excel 2007, SUMPRODUCT can address a whole column, which is a helpful change. So, with Excel 2007 supporting multiple conditional tests, does this mean that the special use of SUMPRODUCT is now redundant, and that it is relegated to its original, simple array multiplication role? Whilst this may seem to be the case at first sight, a little thought shows that SUMPRODUCT retains its unique position in the Excel developers toolkit. Why? Because COUNTIFS and SUMIFS are still unable to calculate values in closed workbooks just as their predecessors could not; and the Excel 2007 functions are still not able to accommodate the complex extra functions that can be added to the conditional ranges in SUMPRODUCT.
Performance Considerations
Double Unary v * Operator
In most circumstances, either the '*' or -- versions of SUMPRODUCT can be used, and both will function correctly. There are some exceptions to this. Consider a table of names and amounts in A1:B10, where row 1 is a text heading of 'Name' and 'Amount'. The formula
=SUMPRODUCT(--(A1:A10="Bob"),--(B1:B10>0),B1:B10)
will correctly sum the positive values in column B where the value in column is 'Bob'. However, this formula
=SUMPRODUCT((A1:A10="Bob")*(B1:B10>0)*(B1:B10))
returns a #VALUE! Error. The reason for the error is due to the text in B1, multiplying a text value creates an error. To overcome it with the latter form, the ranges need to start beyond the heading, in A2 and B2[6]. Similalrly, if one or more of the ranges within the formula is multi-column, then the '*' operator again has to be used. Whilst this formula fails
=SUMPRODUCT(--(A1:A10="Bob"),--(B1:C10>0),--(B1:C10))
Using Transpose
If using the TRANSPOSE function within SUMPRODUCT, then the '*' operator has to be used.
Formula Efficiency
Most people will be familiar with the fact that array formulas can be very expensive, and if overused can significantly impair the recalculation of a worksheet/workbook. Whilst SUMPRODUCT is not an array formula per se, it suffers from the same problem. Although SUMPRODUCT is often faster than an equivalent array formula, it is marginal. And like array formula, SUMPRODUCT is much slower than COUNTIF/SUMIF,thus it is better to use these if appropriate. So, never use SUMPRODUCT in this situation
=SUMPRODUCT((A1:A10="Ford")*(C1:C10))
Even two COUNTIF /SUMIF functions are quicker than one SUMPRODUCT, so this formula
=COUNTIF(A1:A10,>=10)-COUNTIF(A1:A10,>20)
Notes
[1] We can also use =SUMPRODUCT(--(A1:A10={"Ford","Renault"})) in this instance as we have a single range being tested for two (or more) values, the -- is to coerce the Booleans to numbers that can be counted - see later. [2] Although array formulae are mentioned here, they are not explained. For a detailed discussion, see Chip Pearson's Array Formulas web page. [3] Excel 2007 has now removed this constraint, SUMPRODUCT can now use whole columns, as can any array formulae - see SUMPRODUCT and Excel 2007 [4] Note that the simple form of SUMPRODUCT, =SUMPRODUCT(rng1,rng2) works perfectly well in VBA as Application.WorksheetFunction.SUMPRODUCT(rng1,rng2), as VBA is conforming to the functions call criteria [5] When embedding quotes within a string, the quotes have to be doubled up, otherwise the single quote is taken as the start or end of the string. This gets more complex if the quotes are just after or just before an opening/closing quote, as we then have three quotes, i.e. one to tell VBA that the next quotes is part of the string, one for the embedded quotes, and one to close the string [6] The error is not caused because the text field is being summed, SUM happily ignores text, but rather because the value in column B is multiplied by the result of the conditional tests, it is multiplying text by a number that causes the #VALUE! [7] As can be seen, this restriction applies to SUMPRODUCT formulae with multiple columns, whether the multiple columns are within a conditional range or a value range
Introduction
Array formulas are a powerful tool in Excel. An array formula is a formula that works with an array, or series, of data values rather than a single data value. There are two flavors of array formulas: first, there are those formulas that work with an array or series of data and aggregate it, typically using SUM, AVERAGE, or COUNT, to return a single value to a single cell. In this type of array formula, the result, while calculated from arrays, is a single value. We will examine this type of array formula first. The second flavor of array formulas is a formula that returns a result in to two or more cells. These types of array formulas return an array of values as their result.
=AVERAGE(IF(A1:A5>0,A1:A5,FALSE)) This formula works by testing each cell in A1:A5 to > 0. This returns an array of Boolean values such as {TRUE, TRUE, FALSE, FALSE, TRUE}. A BOOLEAN VALUE is a data type that contains either the value TRUE or the value FALSE. When converted to numbers in an arithmetic operation, TRUE is equivalent to 1 and FALSE is equivalent to 0. Most arithmetic functions like SUM and AVERAGE ignore Boolean values, so those values must be converted to numeric values before passing them to SUM or AVERAGE. The IF function tests each of these results individually, and returns the corresponding value from A1:A5 if True or the value FALSE if false. Fully expanded, the formula would look like the following: =AVERAGE(IF({TRUE,TRUE,FALSE,FALSE,TRUE},{A1,A2,A3,A4,A5}, {FALSE,FALSE,FALSE,FALSE,FALSE}) Note that the single FALSE value at the end of the original formula is expanded to an array of the appropriate size to match the array from the A1:A5 range in the formula. In array formulas, all arrays must be the same size. Excel will expand single elements to arrays as necessary, but will not resize arrays with more than one element to another size. If the arrays are not of the same size, you will get a #VALUE or in some cases a #N/A error. When the IF function evaluates, the following intermediate array is formed: {A1, A2, FALSE, FALSE, A5}. This is a substitution of the TRUE elements with the values from A1:A5 and the FALSE elements by FALSE. Since the AVERAGE function is designed within Excel to ignore Boolean values (TRUE or FALSE values), it will average only elements A1, A2, and A5 ignoring the TRUE and FALSE values. Note that the FALSE value is not converted to a zero. It is ignored completely by the AVERAGE function. Array formulas are ideal for counting or summing cells based on multiple criteria.
Consider the table of data shown to the right. It lists the number of products (column C) in different categories (column A) sold by various salesman (column B). To calculate the number of Fax machines sold by Brown, we can use the following array formula: =SUM((A2:A10="Fax")*(B2:B10="Brown")*(C2:C10)) This function builds three arrays. The first array is a series of TRUE or FALSE values which are the results of comparing A2:A10 to the word "Fax". (Remember, Excel will expand the single "Fax" element to an array of items all of which are "Fax".) The second array is also a series of TRUE or FALSE
values, the result of comparing B2:B10 to "Brown". (The single "Brown" element in the formula is expanded to an array of the appropriate size.) The third array is comprised of the number of units sold from the range C2:C10. These three arrays are multiplied together. When you multiply two arrays, the result is itself an array, each element of which is the product of the corresponding elements of the two arrays being multiplied. For example, {1, 2, 3} times {4, 5, 6} is {1*4, 2*5, 3*6} = {4, 10, 18}. When TRUE and FALSE values are used in any arithmetic operation, they are given the values 1 and 0, respectively. Thus in the formula above, Excel expands the formula into the three arrays: (A2:A10="Fax") {TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE} (B2:B10="Brown") {TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE} (C2:C10) {1, 10, 20, 30, 40, 50, 60, 70, 80} When these array are multiplied, treating TRUE equal to 1 and FALSE equal to 0, we get the array {1, 0, 0, 0, 0, 0, 60, 0, 0} which are the quantities of Brown's two Fax sales. The SUM function simply adds up the elements of the array and return a result of 61, the number of Fax machines sold by Brown. You may have noticed that the logic of the formula tests Product equals "Fax" AND Salesman equals "Brown", but nowhere do we use the AND function. Here, we use multiplication to act as a logical AND function. Multiplication follows the same rules as the AND operator. It will return TRUE (or 1) only when both of the parameters are TRUE (or <> 0). If either or both parameters are FALSE (or 0), the result is FALSE (or 0).
A "negative and" or NAND operation is a comparison that returns TRUE when neither or exactly one of the elements is TRUE, but returns FALSE if both elements are TRUE. For example, we can count the number of sales except those in which Jones sold a Fax with the formula: =SUM(IF((A2:A10="Fax")+(B2:B10="Jones")<>2,1,0))
the formula, and array-enter it again with CTRL SHIFT ENTER, but you cannot change a single element of the array. Some of the built-in Excel functions return an array of values. These formulas must be entered into an array of cells. For example, the MINVERSE function returns the inverse of a matrix with an equal number of rows and columns. Since the inverse of a matrix is itself a matrix, the MINVERSE function must be entered into a range of cells with the same number of rows and columns as the matrix to be inverted. Therefore, if your matrix is in cells A1:B2 (two rows and two columns), you must select a range the same size, type the formula =MINVERSE(A1:B2) and press CTRL SHIFT ENTER rather than just ENTER. This enters the formula as an array formula into all the selected cells. If you were to use the MINVERSE function in a single cell, only the upper left corner value of the inverted matrix would be returned. For information about writing your own VBA functions that return arrays, see Writing Your Own Functions In VBA.
D-Functions are typically faster than array formulas, all else being equal The selection criteria in a D-Function must reside in cells. Array formulas can include the selection criteria directly in the formula D-Functions can return only a single value to a single cell, while array formulas can return arrays to many cells
Introduction
Almost every worksheet contains at least one table of data, typically a set of rows and columns. Very frequently, you will need to return a row or column of values from the table the row or column position in the table, or you may need to return a value from the table based upon a match of values in the row headers and column headers. For example, you may need to return the 5th row of a table, or you may need to return the row where the ID number is 1234. The simplest types of lookups are performed with the VLOOKUP or HLOOKUP functions. The functions are well documented in the Help file and are not discussed in detail on this page. It is assumed that you are familiar with VLOOKUP and HLOOKUP. For more complicated lookups in tables, we will use formulas based on the OFFSET, MATCH, and INDEX functions. While the Help file describes these functions individually, it does not describe how these functions can be combined to create more powerful and flexible lookup formulas. That is the goal of this page. At the core of most of the formulas on this page is the OFFSET function. You should be familiar with this function before proceeding with this page. Most of the formulas on this page are array formulas. Array formulas are described in detail on the Array Formulas page on this web site. You should be at ease with array formulas in order to modify the lookup formulas presented on this page. With few exceptions, the formulas on this page use only a single range reference, a Defined Name that refers to the data table against which the lookup is
performed. Using a single reference may make the formulas longer, but it also makes them considerably more flexible. To use the formulas on your own worksheets, you need only modify a single name. This convenience makes up for, in my opinion, the longer formula length. Of course, if you are not using a defined name, simply replace the name in the formula with the appropriate range reference. If the formulas on this page do not return the expected result when you use them on your own worksheets, the first thing to check is to ensure that the formula is entered as an array formula. If you are unsure whether a formula needs to be array entered, go ahead and enter it as an array formula; that is completely safe. ENTERING AN ARRAY FORMULA: When you enter a formula as an array formula, you must press CTRL SHIFT ENTER rather than just ENTER when you first enter the formula and whenever you edit it later. If you do this properly, Excel will display the formula in the formula bar enclosed in curly braces, { }. You do not type in the curly braces, { }; Excel will display them automatically. In the interest of brevity and clarity, the formulas on this page do not have any error checking and handling. For example, there is nothing to prevent you from attempting to return the 6th row of a table that has only 4 rows. If a parameter in a function call is invalid, you will most likely get a #N/A error. You may want to add some error checks when you use these formulas in your own worksheets. As is the case with many types of formulas in Excel, there are several different ways to accomplish the same thing. Many of the formulas on this page could be written with a combination of the INDEX and MATCH functions instead of the OFFSET function. OFFSET is neither better nor worse than INDEX/MATCH. For consistency, I have chosen to use OFFSET for nearly all the tasks at hand. Other sources may use other methods. I encourage you to learn a variety of ways to accompish a task.
Example Data
The example formulas in the first section of this page, those formulas for returning rows and columns of a table, use the following data table.
This table contains two named ranges that are used in the formulas. The name Table refers to the entire table, cells B2:G7, which includes the row labels and column labels. The name InnerTable refers only the the actual data, cells C3:G7, which does not include the row labels and the column labels. For illustration, the values of the row labels (abby, beth, etc.) and the column labels (apples, oranges, etc) are in alphabetical order. This is for illustration only. The formulas do not require that the values be in any particular order.
The following formula returns a row from the InnerTable range. It return only the data values, not the row header. =OFFSET(InnerTable,E18-1,0,1,COLUMNS(InnerTable)) In this formula, cell E18 contains the 1-based row of InnerTable to return. Thus, if cell E18 contains 5, the formula returns the following values.
By changing the values that are passed to the OFFSET function, we can return a column from either the Table or InnerTable range, either by using a column offset or the value of a column label. The following formula will return a column from the Table range. =OFFSET(Table,0,E22-1,ROWS(Table),1) If cell E22 contains the value 3, the third column of Table is returned, as shown below:
Since this formula returns a column of data from Table, it should be array entered into to a range that is one column wide and has the same number of rows and the Table range. You can also return a column from Table that corresponds to a matching column label. The following formula will return the column from Table whose column label is equal to the value in cell E39. =OFFSET(Table,0,MATCH(E39,OFFSET(Table,0,0,1,COLUMNS(Table)),0)1,ROWS(Table),1) If cell E39 contains the value plums, the following values are returned.
Double Lookups
A double lookup is a formula that returns a value from a table based on a match of values in both the rows and columns. Refering to the example data shown above, you may want to return the value corresponding to the dora row and the plums column. If cell E74 contains the value to match on the rows (e.g., dora) and cell E75 contains the value to match on the columns (e.g., plums), the following formula will return the appropriate value from the Table range: =OFFSET(Table,MATCH(E74,OFFSET(Table,0,0,ROWS(Table),1),0)-1, MATCH(E75,OFFSET(Table,0,0,1,COLUMNS(Table)),0)-1)
Left Lookups
While the VLOOKUP function is very useful, it has a significant limitation. That is that you can only return a value to the right of the lookup column. For example, you can look in column B for a value and then return the corresponding value from column D. However, the reverse is not true. You cannot look up a value in column D and return the corresponding value from column B. This is where a Left Lookup formula is useful. For example, suppose you have the following table, and a defined name of LLTable that refers to the actual data (colored in red).
The following formula will look for a value in the Value column and return the corresponding value in the Type column. =OFFSET(LLTable,MATCH(F67,OFFSET(LLTable,0,1,ROWS(LLTable),1),0)-1,0,1,1) In this formula, cell F67 contains the value to be searched for in the Value column. Thus, if F67 contains 44, the formula will return dd.
Upper Lookups
The HLOOKUP function is the "transpose" of the VLOOKUP function. As VLOOKUP scans down a column for a match and then moves to the right to return a value, HLOOKUP scans across a row for a match and then moves down to return a value. HLOOKUP cannot move upwards to return a value. For example, you can search row 5 to find a match and then return the corresponding value from row 8, but the reverse is not possible. You cannot scan row 8 and return a value from row 5. Just as the Left Lookup formula overcame the limitation of VLOOKUP, an Upper Lookup formula can overcome the limitation of HLOOKUP. Consider the following table:
In this table, the range displayed in red has the name ULTable. The followng formula will allow you to look in the Value row for a value equal to cell J82 and return the corresponding value from the Type row. =OFFSET(ULTable,0, MATCH(J82, OFFSET(ULTable,ROWS(ULTable)1,0,1,COLUMNS(ULTable)),0)-1,1,1) For example, if J82 contains 33, the formula will return cc.
Arbitrary Lookups
Another limitation of the VLOOKUP function is that if there are duplicate matches in the lookup column, the first occurrence of the matching value is used. For example, consider the following table of data:
With a simple VLOOKUP function for the value Beth, the value 22 will be returned, since 22 corresponds to the first occurrence of the value Beth. It may be necessary, however, to return the value corresponding to the second or third occurrence of Beth. If the table of values (colored in red, excluding the Name and Score column labels) is named ALTable, the following formula will return the value form the Score column corresponding the the Nth occurrence of the value in cell F90, where the number N is in cell F91. For example, if F90 contains the value Beth and cell F91 contains the value 3 (indicating to find the 3rd occurrence of Beth), the formula will return the value 88. =INDEX(ALTable,SMALL(IF(OFFSET(ALTable,0,0,ROWS(ALTable),1)=F90, ROW(OFFSET(ALTable,0,0,ROWS(ALTable),1))-ROW(OFFSET(ALTable,0,0,1,1))+1, ROW(OFFSET(ALTable,ROWS(ALTable)-1,0,1,1))+1),F91),2) A special case of the arbitrary lookup formula above is to return the value corresponding to the last occurrence in the list. For example, if cell F94 contains the value Beth, the following formula will return the value 88, which corresponds to the last occurrence of the value Beth. =INDEX(ALTable,SMALL(IF(OFFSET(ALTable,0,0,ROWS(ALTable),1)=F94, ROW(OFFSET(ALTable,0,0,ROWS(ALTable),1))-ROW( OFFSET(ALTable,0,0,1,1) )+1, ROW(OFFSET(ALTable,ROWS(ALTable)-1,0,1,1)) +1),COUNTIF(OFFSET(ALTable,0,0,ROWS(ALTable),1),F94)),2)
The following array formula will return the smallest number in the list CMTable that greater than or equal to the value in cell E105. =INDEX(CMTable,MATCH(MIN(IF(CMTable-E105>=0,CMTable,FALSE)),IF(CMTableE105>=0,CMTable,FALSE),0)) Thus is E105 has the value 5, the formula will return 5.1, which is the smallest number in the list that is greater than or equal to 5. The second Closest Match formula will return the largest number in a list that is less than or equal to a specified number. In the following formula, cell E108 contains the test value. =INDEX(CMTable,MATCH(MAX(IF(CMTable-E108<=0,CMTable,FALSE)),IF(CMTableE108<=0,CMTable,FALSE),0)) Thus, if cell E108 has the value 8, the formula will return 7.4, which is the largest number in the range that is less than or equal to 8. The third and final Closest Match formula will return the value in a list that is closest to a specified value. The returned value might be less than the test value or it might be greater than the test value. =INDEX(CMTable,MATCH(MIN(ABS(CMTable-E111)),ABS(CMTable-E111),0),1) Thus, if cell E111 contains the value 5, the formula will return 5.1, since 5.1 is closer to 5 than any other value in the list.
Introduction
Very often, Excel is used to manage lists of data, such as employee names or phone lists. In such circumstances, duplicates may exist in the list and need to be identified. This page contains a number of formulas that can be used to work with duplicate items in a list of data. All the formulas on this page are array formulas. DEFINITION: Array Formula An array formula is a formula that works with arrays or series of data rather than single data values. When you enter an array formula, type the formula in the cell and then press CTRL SHIFT ENTER rather than just ENTER when you first enter the formula and when you edit it later. If you do this properly, Excel will display the formula enclosed in curly braces { }. Array formulas are discussed in detail on the Array Formulas page. You can download an example workbook here that illustrates all the formulas on this page. For a VBA Function that returns an array of the distinct items in a range or array, see the Distinct Values Page. This function can be called either from a range of worksheet cells or from other VB code.
This first example will highlight duplicate rows in the range B2:B11. Select the cells that you wish to test and format, B2:B11 in this example. Then, open the Conditional Formatting dialog from the Format menu, change Cell Value Is to Formula Is, enter the formula below, and choose a font or background format to apply to cells that are duplicates. =COUNTIF($B$2:$B$11,B2)>1 The formula above, when used in Conditional Formatting, will highlight all duplicates. That is, if the value 'abc' occurs twice in the list, both instances of 'abc' will be highlighted. This is shown in the image to the left, in which all occurrences of 'a' and 'g' are higlighted.
You can use the following formula in Conditional Formatting to highlight only the first occurrence of an entry in the list. For example, the first occurrence of 'abc' will be highlighted, but the second and subsequent occurrences of 'abc' will not be highlighted. =IF(COUNTIF($B$2:$B$11,B2)=1,FALSE,COUNTIF($B$2:B2,B2)=1) This is shown at the left where only the first occurrences of the duplicate items 'a', 'e', and 'g' are highlighted. The second and subsequent occurrences of these values are not highlighted.
You can also do the reverse of this with Conditional Formatting. Using the formula below in Conditional Formatting will highlight only the second and subsequent occurrences of a value. The first occurrence of the value will not be highlighted. =IF(COUNTIF($B$2:$B$11,B2)=1,FALSE,NOT(COUNTIF($B$2:B2,B2)=1)) This is shown at the left where only the second occurrences of 'a', 'b', 'c' and 'f' are highlighted. The first occurrences of these items are not highlighted.
Another formula for Conditional Formatting will highlight only the last occurrence of a duplicate element in a list (or the element itself if it occurs only once). =IF(COUNTIF($B$2:$B$11,B2)=1,TRUE,COUNTIF($B$2:B2,B2)=COUNTIF($B$2:$B$11,B2) ) As you can see only the last occurrences of elements 'a', 'b', 'c', and 'f' are highlighted. Element 'd' is highlighted because it occurs only once. The occurrences of 'a', 'b', 'c' and 'f' that occurs before the last occurrence are not highlighted.
We can round out our discussion of highlighting duplicate rows with two additional formula related to distinct items in a list.
The following can be used in Conditional Formatting to highlight elements that occur only once in the range B2:B11. =COUNTIF($B$2:$B$11,B2)=1 This image illustrates the formula. Elements 'b', 'c', and 'e' are highlighted because they occur only once in the list. Items 'a', 'd' and 'f' are not highlighted because they occur more than one time in the list.
Finally, the following formula can be used in Conditional Formatting to highlight the distinct values in B2:B11. If an element occurs once, it is highlighted. If it occurs more then once, then only the first occurrence is highlighted. =COUNTIF($B$2:B2,B2)=1 As you can see, only the first or only occurrences of the elements are highlighted. If an element is duplicated, as is 'b', the duplicate elements are not highlighted.
Array Formulas
Many of the formulas described here are Array Formulas, which are a special type of formula in Excel. If you are not familiar with Array Formulas, click here.
Array To Column
Sometimes it is useful to convert an MxN array into a single column of data, for example for charting (a data series must be a single row or column). Click here for more details.
=SUM((A1:A10>=5)*(A1:A10<=10)*A1:A10)
Dynamic Ranges
You can define a name to refer to a range whose size varies depending on its contents. For example, you may want a range name that refers only to the portion of a list of numbers that are not blank. such as only the first N non-blank cells in A2:A20. Define a name called MyRange, and set the Refers To property to: =OFFSET(Sheet1!$A$2,0,0,COUNTA($A$2:$A$20),1) Be sure to use absolute cell references in the formula. Also see then Named Ranges page for more information about dynamic ranges.
To find the range that contains data, use the following array formula: =ADDRESS(ROW(DataRange2),COLUMN(DataRange2),4)&":"& ADDRESS(MAX((DataRange2<>"")*ROW(DataRange2)),COLUMN(DataRange2)+ COLUMNS(DataRange2)-1,4) This will return the range H7:I17. If you need the worksheet name in the returned range, use the following array formula: =ADDRESS(ROW(DataRange2),COLUMN(DataRange2),4,,"MySheet")&":"& ADDRESS(MAX((DataRange2<>"")*ROW(DataRange2)),COLUMN(DataRange2)+ COLUMNS(DataRange2)-1,4) This will return MySheet!H7:I17. To find the number of rows that contain data, use the following array formula: =(MAX((DataRange2<>"")*ROW(DataRange2)))-ROW(DataRange2)+1 This will return the number 11, indicating that the first 11 rows of DataRange2 contain data. To find the last entry in the first column of DataRange2, use the following array formula: =INDIRECT(ADDRESS(MAX((DataRange2<>"")*ROW(DataRange2)), COLUMN(DataRange2),4)) To find the last entry in the second column of DataRange2, use the following array formula: =INDIRECT(ADDRESS(MAX((DataRange2<>"")*ROW(DataRange2)), COLUMN(DataRange2)+1,4))
Where A10 is the cell containing the text, and B10 is the number of the word you want to get. This formula can be extended to get any set of words in the string. To get the words from M for N words (e.g., the 5th word for 3, or the 5th, 6th, and 7th words), use the following array formula: =MID(A10,SMALL(IF(MID(" "&A10,ROW(INDIRECT ("1:"&LEN(A10)+1)),1)=" ",ROW(INDIRECT("1:"&LEN(A10)+1))), B10),SUM(SMALL(IF(MID(" "&A10&" ",ROW(INDIRECT ("1:"&LEN(A10)+2)),1)=" ",ROW(INDIRECT("1:"&LEN(A10)+2))), B10+C10*{0,1})*{-1,1})-1) Where A10 is the cell containg the text, B10 is the number of the word to get, and C10 is the number of words, starting at B10, to get.
Note that in the above array formulas, the {0,1} and {-1,1} are enclosed in array braces (curly brackets {} ) not parentheses.
Grades
A frequent question is how to assign a letter grade to a numeric value. This is simple. First create a define name called "Grades" which refers to the array: ={0,"F";60,"D";70,"C";80,"B";90,"A"}
Then, use VLOOKUP to convert the number to the grade: =VLOOKUP(A1,Grades,2) where A1 is the cell contains the numeric value. You can add entries to the Grades array for other grades like C- and C+. Just make sure the numeric values in the array are in increasing order.
Left Lookups
The easiest way do table lookups is with the =VLOOKUP function. However, =VLOOKUP requires that the value returned be to the right of the value you're looking up. For example, if you're looking up a value in column B, you cannot retrieve values in column A. If you need to retrieve a value in a column to the left of the column containing the lookup value, use either of the following formulas: =INDIRECT(ADDRESS(ROW(Rng)+MATCH(C1,Rng,0)-1,COLUMN(Rng)-ColsToLeft)) Or =INDIRECT(ADDRESS(ROW(Rng)+MATCH(C1,Rng,0)-1,COLUMN(A:A) )) Where Rng is the range containing the lookup values, and ColsToLeft is the number of columns to the left of Rng that the retrieval values are. In the second syntax, replace "A:A" with the column containing the retrieval data. In both examples, C1 is the value you want to look up.
Ranking Numbers
Often, it is useful to be able to return the N highest or lowest values from a range of data. Suppose we have a range of numeric data called RankRng. Create a range next to RankRng (starting in the same row, with the same number of rows) called TopRng. Also, create a named cell called TopN, and enter into it the number of values you want to return (e.g., 5 for the top 5 values in RankRng). Enter the following formula in the first cell in TopRng, and use Fill Down to fill out the range: =IF(ROW()-ROW(TopRng)+1>TopN,"",LARGE(RankRng,ROW()-ROW(TopRng)+1)) To return the TopN smallest values of RankRng, use =IF(ROW()-ROW(TopRng)+1>TopN,"",SMALL(RankRng,ROW()-ROW(TopRng)+1)) The list of numbers returned by these functions will automatically change as you change the contents of RankRng or TopN.
Miscellaneous
Sheet Name
Suppose our active sheet is named "MySheet" in the file C:\Files\MyBook.Xls. To return the full sheet name (including the file path) to a cell, use =CELL("filename",A1) Note that the argument to the =CELL function is the word "filename" in quotes, not your actual filename. This will return "C:\Files\[MyBook.xls]MySheet" To return the sheet name, without the path, use =MID(CELL("filename",A1),FIND("]",CELL("filename",A1))+1, LEN(CELL("filename",A1))-FIND("]",CELL("filename",A1))) This will return "MySheet"
File Name
Suppose our active sheet is named "MySheet" in the file C:\Files\MyBook.Xls. To return the file name without the path, use =MID(CELL("filename",A1),FIND("[",CELL("filename",A1))+1,FIND("]", CELL("filename",A1))-FIND("[",CELL("filename",A1))-1) This will return "MyBook.xls" To return the file name with the path, use either =LEFT(CELL("filename",A1),FIND("]",CELL("filename",A1))) Or =SUBSTITUTE(SUBSTITUTE(LEFT(CELL("filename",A1),FIND("]", CELL("filename",A1))),"[",""),"]","") The first syntax will return "C:\Files\[MyBook.xls]" The second syntax will return "C:\Files\MyBook.xls" In all of the examples above, the A1 argument to the =CELL function forces Excel to get the sheet name from the sheet containing the formula. Without it, and Excel calculates the =CELL function when another sheet is active, the cell would contain the name of the active sheet, not the sheet actually containing the formula.