Monday, November 25, 2019

Steps to Preparing Data- Keep it Clean!

Collecting pounds of data is useless unless we can do something with it that leads to new knowledge and information. You may start with a mound of useless numbers, samples, and information. It can feel a little overwhelming. You can reduce data anxiety by thinking about your study beforehand and following a few steps to preparing it for analysis and use.

Pre-planning is important. Developing your coding process, organization methods, and statistical measurements beforehand will lead to a better study. That doesn't mean the process are set in stone but that a better data plan improves the end results.

Scrub out useless data that isn't going to help your study. Be aware that what you may find useless does contain useful information. For example, if you have a lot of people who abandon your survey it may be the language, design, or even type of questions that push people to leave.

Take out that data which is truly not helpful to your study because of inaccuracy and human error. Review each removed data points and try and keep some records of what you did. I save multiple versions before and after the scrubbing.

Then I begin to categorize the information about the variables. Sometimes I need to code the data so it is more useful. That occurs when you need a specific numerical number or letter to designate where the data came from. Depending on how I want to categorize I will use any number of methods and coding methodologies.

There is a lot of information out there on classification of data. I suggest you read this blog article from the Digital Guardian Blog. https://digitalguardian.com/blog/what-data-classification-data-classification-definition  It provides some great resources.

Data classification makes data useful. You will want to ensure that whatever encoding process you use that you can find the things you need. I have seem people put together great coding systems and then find they can't retrieve the information from their data bases properly. Keep it simple in science!

Raw data is relatively useless. You have a responsibility to make it more useful by preparing it for data analytics. The better job you do at the root level, the better off you will be when you want to analyze that data for connections and meaning. One of the best things you can do is have this all written down and figured out before you get started.


No comments:

Post a Comment