How to Analyze a Statistical Survey: Standard Deviation, Outliers, and More

Updated on May 6, 2012
Source

It's Analyzing Time!

Now that you have your data, It's time to put it to use. There are quite literally hundreds of things that can be done with your data in order to interpret it. Statistics can sometimes be fickle because of this. For instance, I could say that the average weight for a baby is 12 pounds. Based on this number, any person having a baby would expect it to weigh approximately this much. However, based on standard deviation, or the average difference from the mean, the average baby could actually never weight close to 12 pounds. After all, the average of 1 and 23 is also 12. So here's how you can figure it all out!

X Values
12
23
12
14
21
23
1
1
5
100
Added Total of All X Values = 212

Finding the Arithmetic Mean

The mean is the average value. You probably learned this in grade school, but I'll give a short refresher just in case you've forgotten. In order to find the mean, a person must add together all values and then divide by the total number of values. Here's an example

If you count the total number of calculations added, you'll get a value of ten. Divide the sum of all x values, which is 212, by 10 and you'll have your mean!

212/10=21.2

21.2 is the mean of this number set.

Now this number can sometimes be a very decent representation of the data. Like in the above example of weights and babies, however, this value can sometimes be a very poor representation. In order to measure whether it's a decent representation or not, standard deviation can be used.

Standard Deviation

Standard deviation is the average distance numbers lie from the mean. In other words, if the standard deviation is a large number, the mean might not represent the data very well. Standard deviation is in the eyes of the beholder. Standard deviation could be equal to one and be considered large or it could be in the millions and still be considered small. The importance of the value of standard deviation is dependent on what's being measured. For instance, while deciding the reliability of carbon dating, the standard deviation might be in millions of years. On the other hand, this could be on a scale of billions of years. Being a few million off in this case wouldn't be such a big deal. If I'm measuring the size of the average television screen and the standard deviation is 32 inches, the mean obviously doesn't represent the data well because screens do not have a very large scale to them.

x
x - 21.2
(x - 21.2)^2
12
-9.2
84.64
23
1.8
3.24
12
-9.2
84.64
14
-7.2
51.84
21
-0.2
0.04
23
1.8
3.24
1
-20.2
408.04
1
-20.2
408.04
5
-16.2
262.44
100
78.8
6209.44
 
 
Sum of 7515.6

Finding Standard Deviation and Variance

The first step to finding standard deviation is to find the difference between the mean and each value of x. This is represented by the second column to the right. It does not matter whether you subtract the value from the mean or the mean from the value.

This is because the next step is to square all of these terms. To square a number simply means to multiply it by itself. The squaring of the terms will make all negatives positive. This is because any negative times a negative results in a positive. This is represented in column three. At the end of this step, add all squared terms together.

Divide this sum by the total number of values (In this case, it's ten.) The number computed is what's called the variance. The variance is a number sometimes used in higher level statistical analyses. It's far beyond what this lesson covers, so you can forget about it's importance besides its use to find standard deviation. That is unless you plan to explore higher levels of statistics.

Variance = 7515.6/10 = 751.56

The standard deviation is the square root of the variance. A square root of a number is merely the value that when multiplied by itself, will result in the number.

Standard deviation = √751.56 ≈ 27.4146

Outliers

An outlier is a number that is basically an oddball when compared to the rest of the number set. It has a value that is nowhere near any of the other numbers. Often times, outliers pose very big problems in statistics. For instance, in the sample problem, the value 100 posed a significant issue. The standard deviation was raised much higher than it would have been without this value being present. This means that this number might have also made the mean misrepresent the data set.

x
n
1
1
1
2
5
3
12
4
12
5
14
6
21
7
23
8
23
9
100
10
1st quartile
2nd quartile
n
1
14
1
1
21
2
5
23
3
12
23
4
12
100
5

How To Identify Outliers

So how do we know if a number is technically an outlier or not? The first step to determine this is to put all x values in order, like in the first column to the right

Then the median, or middle number, must be found. This can be done by counting the number of x values and dividing by 2. Then you count that many values in from both ends of the data set and you'll find which number is your median. If there are an even number of values, like in this example, you'll get a different value from the opposing sides. The mean of these values is the median. The median values to be averaged are bolded in column one of the first chart. Column two merely counts out the values. In this example.....

10/2 = 5

The value 5 numbers from the top is 12.

The value 5 numbers from the bottom is 14

12 + 14 = 26; 26/2 = median = 13

Now that the median has been found, the 1st and 3rd quartiles can be found. These values are obtained by cutting the data set in half at the median. Then, finding the median of these data sets will find the 1st and 3rd quartiles. The 1st and 3rd quartiles are bolded in the 2nd table to the right.

Now it's time to determine the presence of outliers. This is first done by subtracting the 1st quartile from the 3rd. These two quartiles in conjunction and all numbers in between are known as the inner quartile range. This range represents the middle fifty percent of the data.

23 - 5 = 18

now this number must be multiplied by 1.5. Why 1.5, you might ask? Well this is just the multiplier that's been agreed on. The resulting number is used to find mild outliers. In order to find extreme outliers, 18 must be multiplied by 3. Either way, the values are as listed bellow.

18 x 1.5 = 27

18 x 3 = 54

By subtracting these numbers from the bottom quartile and adding them to the top, acceptable values can be found. The two resulting numbers will give the range which excludes outliers.

5 - 27 = -22

23 + 27 = 50

Acceptable range = -22 to 50

In other words, 100 is at least a mild outlier.

5 - 54 = -49

23 + 54 = 77

Acceptable range = -49 to 77

Since 100 is larger than 77, it is considered to be an extreme outlier.


x
1
5
12
12
14
21
23
23
The sum is 111

What Can Be Done About Outliers?

One way to deal with outliers is to not use the mean at all. Instead, the median can be used to represent a data set. Another option is to use what's known as a trimmed mean.

A trimmed mean is the mean found after cutting an equal portion of values off of both ends of a data set. A trimmed mean of 10% would be the data set with 10% of all values cut off of both ends. I'll use a trimmed mean of 10% for the sample data set. The new mean is......

111/8 = trimmed mean = 13.875

The standard deviation of this value is......

1221.52/8 = variance = 152.69

√152.69 = standard deviation ≈ 12.3568

This value for standard deviation is much more acceptable than the value for the normal mean. Anyone working with this number set might want to consider using the trimmed mean or the median instead of the normal mean.

Conclusion

Now you have some basic tools to evaluate data. If you want to know more about statistics, you might as well take a class. Notice how the normal mean differs from the median and the trimmed mean. This is how statistics can be fickle. If you want to get a point across, using the normal mean could be your ticket to abusing statistics to your will. I'll quote Peter Parker as I always do when speaking of statistics - "With great strength comes great responsibility."

Questions & Answers

    Comments

      0 of 8192 characters used
      Post Comment

      • profile image

        Minah 3 years ago

        I think it would be fun to do.Yeah, I was SO tired when I got home . . . part way into the falafel procses, I wondered why I had decided to make dinner . . . crazy for sure .

      • colpolbear profile image
        Author

        colpolbear 6 years ago from Pennsylvania

        Thank you Kieran. I was lucky enough to have a professor that explained everything extremely thoroughly. It also helped that there were only three other people in the class lol.

      • profile image

        Kieran Gracie 6 years ago

        I always hated statistics at school, mainly because no teacher could explain the subject properly or in an understandable way. This Hub rectifies this very well, so that I can understand at least the basics of statistics.

      working

      This website uses cookies

      As a user in the EEA, your approval is needed on a few things. To provide a better website experience, owlcation.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

      For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: "https://owlcation.com/privacy-policy#gdpr"

      Show Details
      Necessary
      HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
      LoginThis is necessary to sign in to the HubPages Service.
      Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
      AkismetThis is used to detect comment spam. (Privacy Policy)
      HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
      HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
      Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
      CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
      Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
      Features
      Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
      Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
      Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
      Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
      Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
      VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
      PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
      Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
      MavenThis supports the Maven widget and search functionality. (Privacy Policy)
      Marketing
      Google AdSenseThis is an ad network. (Privacy Policy)
      Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
      Index ExchangeThis is an ad network. (Privacy Policy)
      SovrnThis is an ad network. (Privacy Policy)
      Facebook AdsThis is an ad network. (Privacy Policy)
      Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
      AppNexusThis is an ad network. (Privacy Policy)
      OpenxThis is an ad network. (Privacy Policy)
      Rubicon ProjectThis is an ad network. (Privacy Policy)
      TripleLiftThis is an ad network. (Privacy Policy)
      Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
      Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
      Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
      Statistics
      Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
      ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
      Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)