All Articles
how to clean data in excel

Many HR professionals rely on Excel spreadsheets to perform HR processes or track employee data. It can take a great deal of effort to make sure that your spreadsheet is update and free from any errors or “dirty data”. For this reason we’ve put together a guide on how to clean up your HR data if you are using an Excel sheet and drop down unnecessary data .

The foundation of any “how to Excel” spreadsheet is data. There are numerous things that can go wrong if you acquire a dataset from someone or download it from another system and import it into Excel. Before doing an analyzing, make sure there are no blank rows, inconsistent cases, spelling problems, or duplicates. When you attempt to examine data that has not been cleaned in Excel, you will get inconsistent and erroneous findings. Excel includes numerous tools, functions, and utilities that can assist you in successfully cleaning your data. In this guide, we’ll look at eight of them and explore new Excel functions to use the Microsoft excel spreadsheets at its highest potential .

Related articles:

What is Data Management? Your Guide to Excel in the Future of HR
How to Effectively Upgrade and Modernize your HR Applications

First… why is clean HR data important?

Aggregating data from a lot of disparate data sources and making them compatible can take weeks, if not months. This is particularly true for multinational corporations or organizations that grows through mergers and acquisitions. These organizations usually record the same data using various systems in different countries or across various entities.

The issue with data is that it is easily contaminated. Data becomes unreliable as soon as data gathering processes change even slightly without proper standardization and cleansing. This method will save time and effort by trying to make sense of reports that are polluted with inconsistent or inaccurate data.

Cleaning data is beneficial for people analytics. It also aids in standard HR reporting by ensuring that accurate data is being provided each time.

#1 Remove Duplicate Records

To summarize, there are two ways to deal with duplicate data in your excel: highlight it or delete it.

Highlight Duplicate Data

Step 1: Select the data and go to Home > Conditional Formatting > Highlight Cells Rules > Duplicate Values to highlight duplicate values.

Step 2: When you apply the formatting, all duplicate values will be marked for you to remove. Be careful to only delete records that are duplicated and that do not contain certain repeat elements such as two employees who have the same last name.

Delete Duplicate Data

Step 1: Select the data and Go to Data > Remove Duplicates.

Step 2: If your data includes headers, make sure the box ‘My list has headers’ is selected.

Step 3: Click OK after selecting the all the column(s) where no duplicate data should be possible. For example, if you have a spreadsheet for time keeping purposes and an employee’s First and Last name should appear on multiple rows, but not on the same date, select the Column(s) where ‘First Name’, ‘Last Name’ and ‘Date’ appear.

This clears the list of duplicate values and keeps everything sorted. If you wish to keep the old version list, copy and paste the data somewhere else first.

#2 Update Missing Values

Using the ‘Go to’ option in Excel is a simple way to fill in missing values.

Step 1: To open the ‘Got to’ dialogue box, use CTRL+G. Select the ‘Special’ option (or)

Step 2: Navigate to Edit > Find > Go to…, select the ‘Go to Special’ option.

Step 3: Select ‘Blanks’ and then click OK.

Step 4: On the keyboard, press F2 (or) click the formula bar.

Now you can fill in the blanks with whatever value you desire. This value will be assigned to the active cell (blank cells). CRTL + Enter will fill in all the blank cells with the same value. If you need to fill them with different values, highlight the cells with the formatter to easily see missing data.

#3 Change Text to Proper Case

When inheriting a worksheet or importing data from text files, sometimes names and titles are inconsistent. It’s possible that all the text is in lower/upper case, or that it’s a mix of both. Using the three functions below, you can easily make everything consistent.

First insert a new column, then use the following functions:

  • =LOWER(text) – to make all text in the cell lower case
  • =UPPER(text) – to make all text in the cell UPPER CASE
  • =PROPER(text) – to make all text in the cell Proper Case

Copy (CTRL + C) the new column and select Paste Special > Values in the same column before removing the column containing the old values.

#4 Remove Extra Spaces

It’s tough to notice extra spaces in a cell. While extra spaces between words or numbers are sometimes visible, trailing spaces are not. Using the TRIM function is a great way to get rid of these excess spaces.

Create a new column and use the formula:
=TRIM(text)

The cell reference (or text) is used as the input for the TRIM function in Excel. It gets rid of the leading and trailing spaces, as well as the extra spaces between words (except single spaces).

Again, copy (CTRL + C) the new column and select Paste Special > Values in the same column before removing the column containing the old values.

#5 Parse Data Using Text to Column

It’s possible that when you obtain data from a database or import it from a text file, all of the text will be crammed into one cell.

Using Excel’s Text to Column feature, you can split this text into numerous cells.

Step 1: Select the type of data in your sheet (select delimited if your data is separated by characters such as comma, hyphen, dot, etc.). Click Next.

Step 2: Select in the character that separates your data (i.e. your delimiter). Using the Other option, you can type in another character that is not already listed.

Step 3: Choose your data format. Select the destination cell. The current cell is overwritten if the target cell is not selected.

#6 Define and Fix Invalid Data

Step 1: Select one or more cells to validate, then click the Data Validation button on the Data tab.

Step 2: Make a validation rule in Excel. Define the validation criteria on the Settings tab according to your requirements. You can include any of the following in the criteria:

  • Values – in the criteria box, choose the type of data that should appear in the cell. Make a rule based on a value or formula in another cell using cell references. For example, an ‘Age’ field should only be a ‘Whole Number’ between 15 and 99.
  • Formulas – Use the Custom option for more advanced validation rules.

Click OK to close the Data Validation window after configuring the validation rule, or navigate to another tab in the Wizard to add an input message or/and error warning.

This data cleansing method can be used to check for numerical outliers such as an age that falls outside of a normal age range or a hire date that is prior to the foundation of the company for example.

#7 Fix Date and Time Formats

Dates and times must be converted and standardized since there are so many distinct date formats and can cause confusion if reported in different ways. Use these formulas to standardize date formats:

  • =DATE() – Returns the serial number that corresponds to a specific date. The result is formatted as a date if the cell format was General before the function was entered.
  • =DATEVALUE() – Converts a text-based date into a serial number.
  • =TIME()- The decimal number for a specific time is returned. The result is formatted as a date if the cell format was General before the function was entered.
  • =TIMEVALUE()- Returns the decimal representation of the time as a text string. The decimal number represents the times from 0:00:00 (12:00:00 AM) to 23:59:59 (11:59:59 P.M.).

#8 Add Data Using V-Lookup

If you have an export or external report from another data source that you would like to add to your existing spreadsheet, you can use the VLOOKUP function. For example, if you get this year’s performance scores for a list of employees and would like to add a new column of data to your spreadsheet, you can use this function to do it in an efficient manner.

In order to construct the VLOOKUP syntax, you’ll need four pieces of information:

  • The value you’re looking for – also known as the lookup value.
  • The range in which the lookup value can be found. For VLOOKUP to perform properly, the lookup value must always be in the first column of the range. If your lookup value is in cell C2, for example, your range should begin with C.
  • The number of the column in the range containing the return value. For example, if the range is B2:D11, the first column should be B, the second should be C, and so on.
  • You may select TRUE for an approximate match and FALSE for an exact match of the return value. If you don’t indicate anything, TRUE or approximate match will always be the default value.

Putting all of the above together, we get: =VLOOKUP(lookup value, range containing the lookup value, the column number in the range containing the return value, Approximate match (TRUE) or Exact match (FALSE)).

Conlusion

In this article, we provide you with 8 ways to clean your data in excel. We’ve discussed how to easily eliminate several forms of data inaccuracies. There are numerous things that can go wrong with data, including structure, placement, formatting, unnecessary spaces, and so on.

But the best way to ensure that your HR data is accurate and consistent is with the help of technology. It may seem like an obvious (and expensive) answer. But it doesn’t have to be that way. The good news is that there are alternatives.