Wednesday, October 28, 2015

Lab 3: Significance Testing

Introduction
For years the question of what is “Up-North” has rattled around with everyone giving a truly different answer. The Wisconsin Tourism Board wants to put an end to this debate and create an accurate map showing what “Up-North” actually means. Before conducting this assignment, I did a little google search to find some different ideas of what people think of “Up-North” as.

Some results found “Up-North” to be...
 -Above the 45th Parallel 
-Most of Wisconsin
-There were also several images found portraying what "Up North" was:









Above we have three separate images found doing a quick Google image search, three images, three different interpretations of what North is. 

Other online searches came up with results as...
Webster: Any section of land above the Mason Dixon Line, or North of the Ohio River
Urban Dictionary: The upstate portion of New York where most of the states prisons are (not really relevant to this project, but still quite interesting)

Objectives
This assignment has three main objectives,
1.) Understand how to calculate Chi-Square and relate it to Hypothesis Testing
2.) Connect spatial output to you calculated Chi-Square statistic
3.) Utilize real-world data connecting stats and geography

“Where is Up-North...?”
Clearly it is impossible to just try and Google search something this broad and hope to find one definite answer. So instead of using Google to try and solve this problem that the tourism board of Wisconsin has come to us with, we are going to use our Geospatial and Statistical Backgrounds to try and solve this problem of where Up-North truly is.

We will be using 4 different data sets, along with our state and county data. Those four data sets are, the number of deer hunting licenses sold per country, number of campground per county, miles of snowmobile trails per county and forest acreage per county. With these four data sets, and the State of Wisconsin split by State Highway 29, runs across the state (east/west), we are ready to try and find where Up-North actually is.

Methods
Using the State of Wisconsin with its 72 counties mapped on it, I brought in a shapefile containing the State Highway 29, found an ArcGIS online page, and used that to decide which counties were north of the highway, and which counties were south of it (figure 1).

Figure 1: Wisconsin divided by Highway 29. Several counties were split by the highway. Which ever part of the county had more land north or south of the highway would determine which category it would belong to.

I found there were 28 counties north of the highway and 44 south of the highway. On the map the counties in blue are considered North by this divide, and the counties in red are considered south by this type of divide.  Later in this lab we are going to use this data to try and find our Chi-Squared data. When that section comes Chi-Squared will be further explained, but until then the basic reason for running that test is to compare your observed outcome with the theoretical expected distribution. Every different category will be ranked by a numerical bases, every category except for North/South counties will be composed of four different categories. The ranking will go from 1-4 where 4 is the least number of tested attributes and 1 is the most. This data will be calculated by finding the range (maximum value – minimum value) and dividing by 4.

Using SCORP DATA provided, we have to join it with our newly created North/South Wisconsin shapefile. If we do not join the two, we will not be able to perform analysis on the data. The four features which will be tested to try and determine where Up-North is listed below once more along with some statistics and a map depicting the 1-4 ranking. This ranking system will use a blue to red color spectrum where dark blue = 1, light blue = 2, light red = 3, dark red = 4. I chose this color spectrum because my north/south map uses dark blue to show the north and dark red to show the south. 

Results
Number of Campgrounds per County in Wisconsin (map 1)

Max Value = 49 Campgrounds
Minimum Value = 0 Campgrounds
Range = 49
Categorical Gap = 12
                4 = 0 – 12
                3 = 13 – 24
                2 = 25 – 37
                1 = 38 – 49 

Map 1: Campgrounds in Wisconsin Counties. Dark blue received a score of 1 while dark red received a score of 4

This map shows several northern counties in Wisconsin to be in the blue, indicating they fall into categories 1 or 2 (above). This is the first of four categories we will be testing to determine what is "Up North". Just looking at this map and this one category of campgrounds, we can assume the northwestern corner of Wisconsin is the most "Up North" and the rest of the state is not considered part of the "Up North" category. 

Geography Reasoning: In Wisconsin we have a very wooded northern region, and because of this we can expect to see more campsites in those locations as oppose to a campground in the middle of the City of Milwaukee. 

Number of Deer Hunting (Gun) Licenses Issued Per County in Wisconsin (map 2)

Max Value = 21575 Licenses
Minimum Value = 0 Licenses
Range = 21575
Categorical Gap = 5394
                4 = 0 – 5393
                3 = 5394 – 10787
                2 = 10788 – 16180
                1 = 16181 – 21575
Map 2: This map is showing which counties have purchased the most Deer Hunting Licenses where blue is most and red is least
This map is one that can throw off the thought of what truly is up north. Many residents of Wisconsin own land in the northern area of the state, but live in other regions of the state. These numbers are based on how many licenses were purchased in total, not per capita, which gives areas like Milwaukee, Madison and Green Bay areas the advantage (all three locations appear blue on map) due to their large populations. A more accurate map to use instead of this would be where are Deer being shot instead of where the licenses are purchased since in Wisconsin a license is good for the entire state. 

Geography Reasoning: As mentioned above three of the largest cities in Wisconsin; Milwaukee, Madison and Green Bay are all found in the blue and this is because a large number of people live in these cities who may travel else where to go hunting. The northern regions of Wisconsin are not home to that many people, which is why most counties are dark red and a few are light red.

Miles of Snowmobile Trails Per County in Wisconsin (map 3)

Max Value = 641 Miles
Minimum Value = 0 Miles
Range = 641
Categorical Gap = 161
                4 = 0 – 159
                3 = 160 – 319
                2 = 320 – 480
                1 = 481 – 641


Map 3: Northeastern Wisconsin seems to be the hot spot for snowmobile tracks measured in miles per county
In map 3 we see which counties have the most or least snowmobile trails in measurement of miles in the State of Wisconsin. Snowmobiling is often thought of as an "Up-North" activity done through the woods or on the frozen lake surfaces. This map shows the strong presence of those trails being located in the northeastern region. And very view in the southern region of the state.


Geography Reasoning: The northeastern portion of Wisconsin, where we see much of the dark and light blue, is made mostly of forests and lakes. Very view city establishments are located in that region, making them prime real estate for these trails. Also that portion of Wisconsin is home to many cottages and cabins; one popular activity to do at those home away from homes is to go snowmobiling. 
              
Acres of Forest Per County is Wisconsin (map 4) 

Max Value 963,865 Acres
Minimum Value = 8,100 Acres
Range = 955,765
Categorical Gap = 238,942
                4 = 8100 - 247,043
                3 = 247,042 - 485982
                2 = 485,983 - 724,923
                1 = 724,924 – 963865

Map 4: Where are the forests in Wisconsin? Looks like they are all in the North.

This map best shows where the north is located based on popular sayings. Most of Northern Wisconsin is composed of dense forests, while Southern Wisconsin is mostly farm land. This map makes it look like there are not any forests in the southern portion of the state, but that is not true. The map is skewed because of the large Northern Wisconsin counties that are made up entirely of forests. Nearly 1 million acres of forest are found in Bayfield County, that is about 75% of the entire county and 400 square miles more forest then the size of the entire Milwaukee County. 


Geography Reasoning: As mentioned above the northern part of this state is home to dense forests and multiple state and nationally recognized forested parks such as the Chequamegon National Forest and the Nicolet National Forest. The southern portion of this state is composed of large cities, urban buildup and vast farm lands. 

Where is up north really located then?

Taking our four factors we used plus the north/south divide, I used the field calculator in ArcMap to create a map showing the scores each county received. One more breakdown on how the points were awarded to different counties:
north of highway 29 = 1
south of highway 29 = 2
Of each of the 4 variables used:
Within 25% of the max = 1
Between 25-50% of the max = 2
Between 50-75% of the max = 3
Less then 75% of the max = 4

Using this data I created a map which shows where up north actually is, based on the criteria listed above.

Map 5: Blue is considered the north, while red is considered not up north

Taking a look at this map we see there are several blue counties located in the northern portion of the state, along with a few red ones up there. The general estimate of the up north area though can be classified as the dark and light blue counties and some of the red ones. The breakdown of points was 7-12 was considered up north and 13-16 was not considered up north. The reason several counties that are believed to be up north but show up light or dark red can most likely be traced back to the Deer License set of data, since not many people live in those counties, not many people are going to buy Deer Hunting Licenses.

Conclusion 

The features I picked for this lab were picked for a specific reason. When I think of "Up North" I think of all my trips up to my Grandmother's house in the U.P. (Upper Peninsula of Michigan). Whenever we would drive there we would pass through the northern section of Wisconsin, and I would constantly be seeing Snowmobile trails along the roadways, forest after forest after forest, and once I learned how to drive, I began seeing a lot of deer. So naturally when asked what features I would pick when starting this assignment, I instantly thought of all the trips up to Michigan.


Just looking at a map cannot give you an affirmative answer as to which part of the state is "Up North" and which part is not considered that. In order to figure this out more accurately we are going to use a statistical test called Chi-Squared testing. This test will take the observed numbers we received and tell us what the expected count should be, and then give us a corresponding probability and critical value to see if the feature is dependent or independent on location. Dependent or related would suggest that the feature is considered to be a trait of up north (such as forests are dependent on a northern location in the state). Otherwise they could be independent or not related to one another (such as car sales are not dependent upon how much food you can eat).

Discussion 
Study Question = Does X variable mean that something is considered up north      
           
          Null Hypothesis = X variable is same in the north and south
                  meaning: That variable does not relate to the north suggesting it can be anywhere
          
          Alternative Hypothesis =  X variable is different in the north and south
                 meaning: That variable does relate to the north suggesting if it exists it is up north

These tests will be done to a significance level of 0.05 (95%). 

Number of Campgrounds per County in Wisconsin

Above we see the table which was produced using SPSS software. We calculated the Expected 

Outcome based on the observed outcome. Using that data we then could calculate the Chi-Squared 

value. For this variable our hypotheses are;


Null: There is no difference in campgrounds in the north vs campgrounds in the south
Alternative: There is a difference in campgrounds in the north vs campgrounds in the south

In order to find out the degree of freedom (df) we have to use this formula
(column #s - 1)(row #s - 1) = df
df = 3
The critical value at df = 3 at 0.05 level of significance is 7.815

The Chi-Square Value for Campgrounds is equal to 15.812

0______________________7.815___________^15.812^__________Infinity


Because this number falls past the critical value we have to reject the null hypothesis. 

There is a difference in campgrounds in the north vs campgrounds in the south and if we scroll back 

up to map 1 we can see there are many more blue counties in the north then there are blue counties in

the south.


Number of Deer Hunting (Gun) Licenses per County in Wisconsin



Above we see the table which was produced using SPSS software. We calculated the Expected 

Outcome based on the observed outcome. Using that data we then could calculate the Chi-Squared 

value. For this variable our hypotheses are;


Null: There is no difference in deer hunting licenses in the north vs campgrounds in the south
Alternative: There is a difference in deer hunting licenses in the north vs campgrounds in the south

In order to find out the degree of freedom (df) we have to use this formula
(column #s - 1)(row #s - 1) = df
df = 3
The critical value at df = 3 at 0.05 level of significance is 7.815

The Chi-Square Value for Deer Hunting Licenses = 2.968

0________^2.968^______________7.815_____________________Infinity


Because this number falls before the critical value we have to fail to reject the null hypothesis.

There is not a significant difference in deer hunting licenses in the north vs deer hunting licenses in

the south and if we scroll back up to map 2 we can see there are more blue counties in the south then

there are blue counties in the north, but there is a fairly even spread of light red counties throughout 

the state.


Miles of Snowmobile Trails per County in Wisconsin



Above we see the table which was produced using SPSS software. We calculated the Expected 

Outcome based on the observed outcome. Using that data we then could calculate the Chi-Squared

value. For this variable our hypotheses are;


Null: There is no difference in miles of snowmobile trails in the north vs miles of snowmobile trails in the south
Alternative: There is a difference in snowmobile trails in the north vs miles of snowmobile trails in the south

In order to find out the degree of freedom (df) we have to use this formula
(column #s - 1)(row #s - 1) = df
df = 3
The critical value at df = 3 at 0.05 level of significance is 7.815

The Chi-Square Value for Campgrounds is equal to 18.742

0______________________7.815___________^18.742^__________Infinity


Because this number falls past the critical value we have to reject the null hypothesis.

There is a difference in miles of snowmobile trails in the north vs miles of snowmobile trails in the 

south and if we scroll back up to map 3 we can see there are many more blue counties in the north 

then there are blue counties in the south.


Number of Acres of Forest per County in Wisconsin



Above we see the table which was produced using SPSS software. We calculated the Expected 

Outcome based on the observed outcome. Using that data we then could calculate the Chi-Squared 

value. For this variable our hypotheses are;


Null: There is no difference in acres of forest in the north vs acres of forest in the south
Alternative: There is a difference in acres of forest in the north vs acres of forest in the south

In order to find out the degree of freedom (df) we have to use this formula
(column #s - 1)(row #s - 1) = df
df = 3
The critical value at df = 3 at 0.05 level of significance is 7.815

The Chi-Square Value for Campgrounds is equal to 33.962

0______________________7.815___________^33.962^__________Infinity


Because this number falls past the critical value we have to reject the null hypothesis. 
There is a difference in acres of forest in the north vs acres of forest in the south and if we scroll back up to map 4 we can see there are many more blue counties in the north then there are blue counties in the south.

So where is the North?

Using both maps and stats I was able to find which of my tested variables would give an accurate description of what up north really means. Three of the four tested variables showed that there is a difference between the north and south for the given variable. Everything tested but the number of deer hunting licenses issued proved to be above the critical value, meaning there was a difference between the north or south, and the maps helped confirm that theory.

Study Questions

Fill in the missing portions of the table


Interval Type
Confidence Level
n
Sig Level
z or t
z or t value
A
Two Tailed
90
45
.1 (.05/side)
Z
+1.65
B
Two Tailed
95
12
.05
T
2.201
C
One Tailed
95
36
.05
Z
1.65
D
Two Tailed
99
180
.01(.005/side)
Z
+2.58
E
One Tailed
80
60
.2
Z
2.06
F
One Tailed
99
23
.01
T
2.5
G
Two Tailed
99
15
.01
T
2.997

1.       A Department of Agriculture and Live Stock Development organization in Kenya estimate that yields in a certain district should approach the following amounts in metric tons (averages based on data from the whole country) per hectare: groundnuts. 0.5; cassava, 3.70; and beans, 0.30.  A survey of 100 farmers had the following results:

     μ              σ
                Ground Nuts      0.40        1.07
                Cassava              3.4          1.42
                Beans                 0.33        0.14
               
a.       Test the hypothesis for each of these products.  Assume that each are 2 tailed with a Confidence Level of 95% *Use the appropriate test
b.      Be sure to present the null and alternative hypotheses for each as well as conclusions
c.       What are the probabilities values for each crop? 
d.      What are the similarities and differences in the results 

A.) 
Z-score = sample mean – country mean/ (SD/sqrt(n))

Ground Nuts = (0.4 – 0.5)/(1.07/sqrt[100]) = (-0.1)/(0.107) =  -0.9346
Cassava = (3.4 – 3.7)/(1.42/sqrt[100]) = (-0.3)/(0.142) = -2.1127
Beans = (0.33 – 0.3)/(0.14/sqrt[100]) = (.03)/(0.014) = 2.1429

2 Tailed Test with 95% CI (0 – 0.25) (0.25 – 97.5) (97.5 – 100)
Z-Score Breakdown (-infinity – -1.96) (-1.96 – 1.96) (1.96 – infinity)

BOLD = FAIL
NOT BOLD = FAIL TO REJECT

Z-Score(Ground Nuts)
-infinity__________-1.96____^-0.9346^______0__________1.96__________infinity

Z-Score (Cassava)
-infinity_____^-2.1127^_____-1.96__________0__________1.96__________infinity

Z-Score (Beans)
-infinity__________-1.96__________0__________1.96_____^2.1429^_____infinity


B.) 
Null Hypothesis: at 95% Confidence intervals there is no difference between Kenya’s average production of crops (Ground Nuts, Cassava, Beans) when compared to the sample of 100 farmers.

Alternative Hypothesis:  at 95% Confidence intervals there is a difference between Kenya’s average production of crops (Ground Nuts, Cassava, Beans) when compared to the sample of 100 farmers.


C.) 
is mixed in with A
Ground Nuts = -0.9346 FAIL TO REJECT THE NULL HYPOTHESIS (NO DIFFERENCE)

Cassava = -2.1127 REJECT THE NULL HYPOTHESIS (DIFFERENCE)

Beans = 2.1429 REJECT THE NULL HYPOTHESIS (DIFFERENCE)


D.) 
The similarities amongst the data are that two of the three data sets fall outside the significant range of having a difference between the 100 farmer sample and the Country average.  The numbers were all across the board in terms of Z-Scores. One was a -2.1127, meaning it was over 2 standard deviations below the Country Average. One was a -0.9346, technically no significant difference at a 95% confidence interval, but still almost an entire standard deviation below the mean. The last one was 2.1429 standard deviations over the mean. So very different numbers amongst the 3 crop types.

 An exhaustive survey of all users of a wilderness park taken in 1960 revealed that the average number of persons per party was 2.8.  In a random sample of 25 parties in 1985, the average was 3.7 persons with a standard deviation of 1.45 (one tailed test, 95% Con. Level) (5 pts)

a.       Test the hypothesis that the number of people per party has changed in the intervening years.  (State null and alternative hypotheses)
b.      What is the corresponding probability value

A.)
Null Hypothesis: at a 95% confidence interval there is no difference in the average number of people per party in 1960 when compared to those of a 1985 sample.

Alternative Hypothesis: at a 95% confidence interval there is a difference in the average number of people per party in 1960 when compared to those of a 1985 sample.

B.)
1960 = 2.8
Sample from 1985 = 3.7 with a standard deviation of 1.45 (sample size = 25)

t – Test = (sample mean – 1960s mean) / (standard deviation / sqrt [sample size])
t – Test = (3.7 – 2.8) / (1.45/sqrt [25]) = (0.9)/(0.29) = 3.1034
t = 3.1034
df = 25 – 1 = 24
= 1.711 

0________________________1.711_________^3.1034^_______infinity

Due to the fact that the sample came up with a t – score of 3.1034 we have to reject the null hypothesis. There is a significant difference between the 1985 sample at the Wilderness Park when compared to the average party size in 1960. 








                

No comments:

Post a Comment