Microsoft Excel Functions

When analysing data in a Microsoft Excel spreadsheet, there are two things that I find I typically need to do:

  1. Identify duplicate rows
  2. Find matches against a lookup table

This article explains how to do these two things in excel using formulas and functions.

You might be thinking, "bah!, I'm a developer why do I need to do anything in Excel?". However I recommend you get your Excel skills up to scratch — there are lot of powerful things you can do with it. Plus all the users out there will be using it, so you might as well know a bit about it.

Identifying Duplicate Rows

Say you have the following list of operators:

A
1 OPRID
2 TIMMSR
3 JOHNSP
4 NIXONL
5 JOHNSP
6 BROWNK
7 SMITHJ

How do you identify if a column (e.g. operator IDs) has duplicate values? To start with, you can sort the data in that column. This can help you see patterns of repeating data. However to be accurate, you should use COUNTIF function:

=COUNTIF(range, criteria)

The range is your data column and the criteria is the cell. the COUNTIF function returns a count of the number of matches, so if you have more than one in a column (e.g. A:A) then you have a duplicate.

So in the above example, you would create a second column with formulas as shown (duplicated column). Note that I'm showing the formula and the result, but all you would see in Excel is the result in the duplicated column.

A B C
1 OPRID FORMULA DUPLICATED
2 TIMMSR =COUNTIF(A:A, A1) > 1 FALSE
3 JOHNSP =COUNTIF(A:A, A2) > 1 TRUE
4 NIXONL =COUNTIF(A:A, A3) > 1 FALSE
5 JOHNSP =COUNTIF(A:A, A4) > 1 TRUE
6 BROWNK =COUNTIF(A:A, A5) > 1 FALSE
7 SMITHJ =COUNTIF(A:A, A6) > 1 FALSE

Notice that the two JOHNSP rows return true, indicating that they are duplicated. Obviously, in this small example you could just spot the duplicated rows but try doing that with 5000 rows. A filter on the duplicated column for true values would show you just the duplicated rows.

Note that in Microsoft Excel 2007 there is a Remove Duplicates button on the Data ribbon. This is handy if you just want to clear out duplicates, but not so useful if you want to identify the duplicated rows.

Finding Matches against a Lookup Table

Say you have the following list of operator IDs and roles along with a lookup table of roles you want to find in amongst the list:

A B
1 OPRID ROLE
2 TIMMSR PeopleSoft Administrator
3 TIMMSR PeopleSoft User
4 TIMMSR PeopleTools
5 TIMMSR Standard Non-Page Permissions
6 JOHNSP Employee
7 JOHNSP PeopleSoft User
8 NIXONL Employee
9 NIXONL Standard Non-Page Permissions
10 NIXONL PeopleSoft User
11 BROWNK PeopleSoft User
12 BROWNK PeopleTools
13 BROWNK Standard Non-Page Permissions
14 SMITHJ PeopleSoft Administrator
15 SMITHJ PeopleSoft User
16
17 LOOKUP
18 Employee
19 PeopleTools

How would you find which of the operator IDs have the roles listed in the lookup table at the bottom (either Employee or PeopleTools)?

In this case, use the MATCH function. Note you can also use the VLOOKUP function, but I find the MATCH function easier.

The syntax of the MATCH function is:

=MATCH(lookup_value, lookup_array, [match_type])

The parameters you need to specify are the lookup value (the cell), your lookup array (lookup table) and the match type. Match types can be 0 (exact match), -1 (less than), and 1 (greater than).

MATCH returns the row number that matched in the lookup table or #N/A if there is no match. Typically you want to return a boolean result like true or false or yes or no, so you need to encapsulate MATCH with the ISNA function which returns true when the result is #N/A and false when it isn't and then wrap that in an IF statement to return your desired result (e.g. NO_MATCH or MATCH):

=IF(ISNA(MATCH(lookup_value, lookup_array, [match_type])), "NO_MATCH", "MATCH")

So for the above example, you would create a MATCH column and use the formulas as shown:

A B C D
1 OPRID ROLE FORMULA MATCH
2 TIMMSR PeopleSoft Administrator =IF(ISNA(MATCH(B2, $A$18:$A$19, 0)), "NO", "YES") NO
3 TIMMSR PeopleSoft User =IF(ISNA(MATCH(B3, $A$18:$A$19, 0)), "NO", "YES") NO
4 TIMMSR PeopleTools =IF(ISNA(MATCH(B4, $A$18:$A$19, 0)), "NO", "YES") YES
5 TIMMSR Standard Non-Page Permissions =IF(ISNA(MATCH(B5, $A18:$A$19, 0)), "NO", "YES") NO
6 JOHNSP Employee =IF(ISNA(MATCH(B6, $A$18:$A$19, 0)), "NO", "YES") YES
7 JOHNSP PeopleSoft User =IF(ISNA(MATCH(B7, $A$18:$A$19, 0)), "NO", "YES") NO
8 NIXONL Employee =IF(ISNA(MATCH(B8, $A$18:$A$19, 0)), "NO", "YES") YES
9 NIXONL Standard Non-Page Permissions =IF(ISNA(MATCH(B9, $A$18:$A$19, 0)), "NO", "YES") NO
10 NIXONL PeopleSoft User =IF(ISNA(MATCH(B10, $A$18:$A$19, 0)), "NO", "YES") NO
11 BROWNK PeopleSoft User =IF(ISNA(MATCH(B11, $A$18:$A$19, 0)), "NO", "YES") NO
12 BROWNK PeopleTools =IF(ISNA(MATCH(B12, $A$18:A$19, 0)), "NO", "YES") YES
13 BROWNK Standard Non-Page Permissions =IF(ISNA(MATCH(B13, $A$18:$A$19, 0)), "NO", "YES") NO
14 SMITHJ PeopleSoft Administrator =IF(ISNA(MATCH(B14, $A$18:$A$19, 0)), "NO", "YES") NO
15 SMITHJ PeopleSoft User =IF(ISNA(MATCH(B15, $A$18:$A$19, 0)), "NO", "YES") NO
16
17 LOOKUP
18 Employee
19 PeopleTools

Note the dollar signs ($) around the lookup array ($A$18:$A$19) in the MATCH function. This is important as it prevents the values from auto-adjusting (incrementing) as you perform an auto-fill for all rows in the column. That way the location of the lookup table stays static (cells A18 and A19) as your formulas are adjusted by the auto-fill.

Comments

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License