Introduction
For years the question of what is “Up-North” has rattled around
with everyone giving a truly different answer. The Wisconsin Tourism Board
wants to put an end to this debate and create an accurate map showing what
“Up-North” actually means. Before conducting this assignment, I did a little
google search to find some different ideas of what people think of “Up-North”
as.
Some results found “Up-North” to be...
-Above the 45th Parallel
-Most of Wisconsin
-There were also several images found portraying what "Up
North" was:
Above we have three separate images found doing a quick Google
image search, three images, three different interpretations of what North
is.
Other online searches came up with results as...
Webster: Any section of land above the Mason Dixon Line,
or North of the Ohio River
Urban Dictionary: The upstate portion of
New York where most of the states prisons are (not really relevant to this
project, but still quite interesting)
Objectives
This
assignment has three main objectives,
1.)
Understand how to calculate Chi-Square and relate it to Hypothesis Testing
2.)
Connect spatial output to you calculated Chi-Square statistic
3.)
Utilize real-world data connecting stats and geography
“Where is Up-North...?”
Clearly
it is impossible to just try and Google search something this broad and hope to
find one definite answer. So instead of using Google to try and solve this
problem that the tourism board of Wisconsin has come to us with, we are going
to use our Geospatial and Statistical Backgrounds to try and solve this problem
of where Up-North truly is.
We
will be using 4 different data sets, along with our state and county data.
Those four data sets are, the number of deer hunting licenses sold per country,
number of campground per county, miles of snowmobile trails per county and
forest acreage per county. With these four data sets, and the State of
Wisconsin split by State Highway 29, runs across the state (east/west), we are
ready to try and find where Up-North actually is.
Methods
Using
the State of Wisconsin with its 72 counties mapped on it, I brought in a
shapefile containing the State Highway 29, found an ArcGIS online page, and
used that to decide which counties were north of the highway, and which
counties were south of it (figure 1).
I
found there were 28 counties north of the highway and 44 south of the highway.
On the map the counties in blue are considered North by this divide, and the
counties in red are considered south by this type of divide. Later in this lab we are going to use this
data to try and find our Chi-Squared data. When that section comes Chi-Squared
will be further explained, but until then the basic reason for running that
test is to compare your observed outcome with the theoretical expected
distribution. Every different category will be ranked by a numerical bases,
every category except for North/South counties will be composed of four
different categories. The ranking will go from 1-4 where 4 is the least number
of tested attributes and 1 is the most. This data will be calculated by finding
the range (maximum value – minimum value) and dividing by 4.
Using
SCORP DATA provided, we have to join it with our newly created North/South
Wisconsin shapefile. If we do not join the two, we will not be able to perform
analysis on the data. The four features which will be tested to try and
determine where Up-North is listed below once more along with some statistics and a map depicting the 1-4 ranking. This ranking system will use a blue to red color spectrum where dark blue = 1, light blue = 2, light red = 3, dark red = 4. I chose this color spectrum because my north/south map uses dark blue to show the north and dark red to show the south.
Results
Number of Campgrounds per County in Wisconsin (map 1)
Max Value = 49
Campgrounds
Minimum Value = 0 Campgrounds
Range = 49
Categorical Gap = 12
4 = 0 – 12
3 = 13 – 24
2 = 25 – 37
1 = 38 – 49
Minimum Value = 0 Campgrounds
Range = 49
Categorical Gap = 12
4 = 0 – 12
3 = 13 – 24
2 = 25 – 37
1 = 38 – 49
Map 1: Campgrounds in Wisconsin Counties. Dark blue received a score of 1 while dark red received a score of 4 |
This map shows several
northern counties in Wisconsin to be in the blue, indicating they fall into
categories 1 or 2 (above). This is the first of four categories we will be
testing to determine what is "Up North". Just looking at this map and
this one category of campgrounds, we can assume the northwestern corner of
Wisconsin is the most "Up North" and the rest of the state is not
considered part of the "Up North" category.
Geography Reasoning: In Wisconsin we have a
very wooded northern region, and because of this we can expect to see more
campsites in those locations as oppose to a campground in the middle of the
City of Milwaukee.
Number of Deer Hunting (Gun) Licenses Issued Per County in Wisconsin (map 2)
Max Value = 21575 Licenses
Minimum Value = 0 Licenses
Range = 21575
Categorical Gap = 5394
4 = 0 – 5393
3 = 5394 – 10787
2 = 10788 – 16180
1 = 16181 – 21575
Minimum Value = 0 Licenses
Range = 21575
Categorical Gap = 5394
4 = 0 – 5393
3 = 5394 – 10787
2 = 10788 – 16180
1 = 16181 – 21575
Map 2: This map is showing which counties have purchased the most Deer Hunting Licenses where blue is most and red is least |
This map is one that can throw off the thought of what truly is up
north. Many residents of Wisconsin own land in the northern area of the state,
but live in other regions of the state. These numbers are based on how many
licenses were purchased in total, not per capita, which gives areas like
Milwaukee, Madison and Green Bay areas the advantage (all three locations
appear blue on map) due to their large populations. A more accurate map to use
instead of this would be where are Deer being shot instead of where the
licenses are purchased since in Wisconsin a license is good for the entire
state.
Geography Reasoning: As mentioned above three
of the largest cities in Wisconsin; Milwaukee, Madison and Green Bay are all
found in the blue and this is because a large number of people live in these
cities who may travel else where to go hunting. The northern regions of
Wisconsin are not home to that many people, which is why most counties are dark
red and a few are light red.
Miles of Snowmobile
Trails Per County in Wisconsin (map 3)
Max Value = 641 Miles
Minimum Value = 0 Miles
Range = 641
Categorical Gap = 161
4 = 0 – 159
3 = 160 – 319
Minimum Value = 0 Miles
Range = 641
Categorical Gap = 161
4 = 0 – 159
3 = 160 – 319
2 = 320 – 480
1 = 481 – 641
Map 3: Northeastern Wisconsin seems to be the hot spot for snowmobile tracks measured in miles per county |
In map 3 we see
which counties have the most or least snowmobile trails in measurement of miles
in the State of Wisconsin. Snowmobiling is often thought of as an
"Up-North" activity done through the woods or on the frozen lake
surfaces. This map shows the strong presence of those trails being located in
the northeastern region. And very view in the southern region of the state.
Geography
Reasoning: The northeastern
portion of Wisconsin, where we see much of the dark and light blue, is made
mostly of forests and lakes. Very view city establishments are located in that
region, making them prime real estate for these trails. Also that portion of
Wisconsin is home to many cottages and cabins; one popular activity to do at
those home away from homes is to go snowmobiling.
Acres of Forest Per
County is Wisconsin (map 4)
Max Value 963,865 Acres
Minimum Value = 8,100 Acres
Range = 955,765
Categorical Gap = 238,942
Minimum Value = 8,100 Acres
Range = 955,765
Categorical Gap = 238,942
4 = 8100 - 247,043
3 = 247,042 - 485982
2 = 485,983 - 724,923
1
= 724,924 – 963865
Where is up north really located then?
Taking our four factors we used plus the north/south divide, I used the field calculator in ArcMap to create a map showing the scores each county received. One more breakdown on how the points were awarded to different counties:
north of highway 29 = 1
south of highway 29 = 2
Of each of the 4 variables used:
Within 25% of the max = 1
Between 25-50% of the max = 2
Between 50-75% of the max = 3
Less then 75% of the max = 4
Using this data I created a map which shows where up north actually is, based on the criteria listed above.
Cassava = -2.1127 REJECT THE NULL HYPOTHESIS (DIFFERENCE)
Beans = 2.1429 REJECT THE NULL HYPOTHESIS (DIFFERENCE)
Map 4: Where are the forests in Wisconsin? Looks like they are all in the North. |
This map best
shows where the north is located based on popular sayings. Most of Northern
Wisconsin is composed of dense forests, while Southern Wisconsin is mostly farm
land. This map makes it look like there are not any forests in the southern
portion of the state, but that is not true. The map is skewed because of the
large Northern Wisconsin counties that are made up entirely of forests. Nearly
1 million acres of forest are found in Bayfield County, that is about 75% of
the entire county and 400 square miles more forest then the size of the entire
Milwaukee County.
Geography
Reasoning: As mentioned
above the northern part of this state is home to dense forests and multiple
state and nationally recognized forested parks such as the Chequamegon National
Forest and the Nicolet National Forest. The southern portion of this state is
composed of large cities, urban buildup and vast farm lands.
Taking our four factors we used plus the north/south divide, I used the field calculator in ArcMap to create a map showing the scores each county received. One more breakdown on how the points were awarded to different counties:
north of highway 29 = 1
south of highway 29 = 2
Of each of the 4 variables used:
Within 25% of the max = 1
Between 25-50% of the max = 2
Between 50-75% of the max = 3
Less then 75% of the max = 4
Using this data I created a map which shows where up north actually is, based on the criteria listed above.
Map 5: Blue is considered the north, while red is considered not up north |
Taking a look at this map we see there are several blue counties
located in the northern portion of the state, along with a few red ones up
there. The general estimate of the up north area though can be classified as
the dark and light blue counties and some of the red ones. The breakdown of
points was 7-12 was considered up north and 13-16 was not considered up north.
The reason several counties that are believed to be up north but show up light
or dark red can most likely be traced back to the Deer License set of data,
since not many people live in those counties, not many people are going to buy
Deer Hunting Licenses.
Conclusion
The features I
picked for this lab were picked for a specific reason. When I think of "Up
North" I think of all my trips up to my Grandmother's house in the U.P.
(Upper Peninsula of Michigan). Whenever we would drive there we would pass through
the northern section of Wisconsin, and I would constantly be seeing Snowmobile
trails along the roadways, forest after forest after forest, and once I learned
how to drive, I began seeing a lot of deer. So naturally when asked what
features I would pick when starting this assignment, I instantly thought of all
the trips up to Michigan.
Just looking at
a map cannot give you an affirmative answer as to which part of the state is
"Up North" and which part is not considered that. In order to figure
this out more accurately we are going to use a statistical test called
Chi-Squared testing. This test will take the observed numbers we received and
tell us what the expected count should be, and then give us a corresponding
probability and critical value to see if the feature is dependent or
independent on location. Dependent or related would suggest that the feature is
considered to be a trait of up north (such as forests are dependent on a
northern location in the state). Otherwise they could be independent or not
related to one another (such as car sales are not dependent upon how much food
you can eat).
Discussion
Study Question = Does X
variable mean that something is considered up north
Null Hypothesis = X variable is same in the north and south
meaning: That variable does not relate to the north suggesting it can be anywhere
Alternative Hypothesis = X
variable is different in the north and south
meaning: That variable does relate to the north suggesting if it exists it is up north
These tests will be done to a significance level of 0.05 (95%).
Number of Campgrounds per County in Wisconsin
Above we see the table which was produced using SPSS software. We calculated the Expected
Outcome based on the observed outcome. Using that data we then could calculate the Chi-Squared
value. For this variable our hypotheses are;
Null: There is no difference in campgrounds in the north vs campgrounds in the south
Alternative: There is a difference in campgrounds in the north vs campgrounds in the south
In order to find out the degree of freedom (df) we have to use this formula
(column #s - 1)(row #s - 1) = df
df = 3
The critical value at df = 3 at 0.05 level of significance is 7.815
The Chi-Square Value for Campgrounds is equal to 15.812
0______________________7.815___________^15.812^__________Infinity
Because this number falls past the critical value we have to reject the null hypothesis.
There is a difference in campgrounds in the north vs campgrounds in the south and if we scroll back
up to map 1 we can see there are many more blue counties in the north then there are blue counties in
the south.
Number of Deer Hunting (Gun) Licenses per County in Wisconsin
Above we see the table which was produced using SPSS software. We calculated the Expected
Outcome based on the observed outcome. Using that data we then could calculate the Chi-Squared
value. For this variable our hypotheses are;
Null: There is no difference in deer hunting licenses in the north vs campgrounds in the south
Alternative: There is a difference in deer hunting licenses in the north vs campgrounds in the south
In order to find out the degree of freedom (df) we have to use this formula
(column #s - 1)(row #s - 1) = df
df = 3
The critical value at df = 3 at 0.05 level of significance is 7.815
The Chi-Square Value for Deer Hunting Licenses = 2.968
0________^2.968^______________7.815_____________________Infinity
Because this number falls before the critical value we have to fail to reject the null hypothesis.
There is not a significant difference in deer hunting licenses in the north vs deer hunting licenses in
the south and if we scroll back up to map 2 we can see there are more blue counties in the south then
there are blue counties in the north, but there is a fairly even spread of light red counties throughout
the state.
Miles of Snowmobile Trails per County in Wisconsin
Above we see the table which was produced using SPSS software. We calculated the Expected
Outcome based on the observed outcome. Using that data we then could calculate the Chi-Squared
value. For this variable our hypotheses are;
Null: There is no difference in miles of snowmobile trails in the north vs miles of snowmobile trails in the south
Alternative: There is a difference in snowmobile trails in the north vs miles of snowmobile trails in the south
In order to find out the degree of freedom (df) we have to use this formula
(column #s - 1)(row #s - 1) = df
df = 3
The critical value at df = 3 at 0.05 level of significance is 7.815
The Chi-Square Value for Campgrounds is equal to 18.742
0______________________7.815___________^18.742^__________Infinity
Because this number falls past the critical value we have to reject the null hypothesis.
There is a difference in miles of snowmobile trails in the north vs miles of snowmobile trails in the
south and if we scroll back up to map 3 we can see there are many more blue counties in the north
then there are blue counties in the south.
Number of Acres of Forest per County in Wisconsin
Above we see the table which was produced using SPSS software. We calculated the Expected
Outcome based on the observed outcome. Using that data we then could calculate the Chi-Squared
value. For this variable our hypotheses are;
Null: There is no difference in acres of forest in the north vs acres of forest in the south
Alternative: There is a difference in acres of forest in the north vs acres of forest in the south
In order to find out the degree of freedom (df) we have to use this formula
(column #s - 1)(row #s - 1) = df
df = 3
The critical value at df = 3 at 0.05 level of significance is 7.815
The Chi-Square Value for Campgrounds is equal to 33.962
0______________________7.815___________^33.962^__________Infinity
Because this number
falls past the critical value we have to reject the null hypothesis.
There is a difference in
acres of forest in the north vs acres of forest in the south and if we scroll
back up to map 4 we can see there are many more blue counties in the north then
there are blue counties in the south.
So where is the North?
Using both maps and
stats I was able to find which of my tested variables would give an accurate
description of what up north really means. Three of the four tested variables
showed that there is a difference between the north and south for the given
variable. Everything tested but the number of deer hunting licenses issued
proved to be above the critical value, meaning there was a difference between
the north or south, and the maps helped confirm that theory.
Study Questions
Fill in the missing portions of the table
Interval Type
|
Confidence Level
|
n
|
Sig Level
|
z or t
|
z or t value
|
|
A
|
Two Tailed
|
90
|
45
|
.1 (.05/side)
|
Z
|
+1.65
|
B
|
Two Tailed
|
95
|
12
|
.05
|
T
|
2.201
|
C
|
One Tailed
|
95
|
36
|
.05
|
Z
|
1.65
|
D
|
Two Tailed
|
99
|
180
|
.01(.005/side)
|
Z
|
+2.58
|
E
|
One Tailed
|
80
|
60
|
.2
|
Z
|
2.06
|
F
|
One Tailed
|
99
|
23
|
.01
|
T
|
2.5
|
G
|
Two Tailed
|
99
|
15
|
.01
|
T
|
2.997
|
1. A
Department of Agriculture and Live Stock Development organization in Kenya
estimate that yields in a certain district should approach the following
amounts in metric tons (averages based on data from the whole country) per
hectare: groundnuts. 0.5; cassava, 3.70; and beans, 0.30. A survey of 100 farmers had the following
results:
μ σ
Ground
Nuts 0.40 1.07
Cassava 3.4 1.42
Beans 0.33 0.14
a. Test
the hypothesis for each of these products.
Assume that each are 2 tailed with a Confidence Level of 95% *Use the
appropriate test
b. Be
sure to present the null and alternative hypotheses for each as well as
conclusions
c. What
are the probabilities values for each crop?
d. What
are the similarities and differences in the results
A.)
Z-score = sample mean – country mean/ (SD/sqrt(n))
Ground Nuts = (0.4 – 0.5)/(1.07/sqrt[100]) = (-0.1)/(0.107)
= -0.9346
Cassava = (3.4 – 3.7)/(1.42/sqrt[100]) = (-0.3)/(0.142) =
-2.1127
Beans = (0.33 – 0.3)/(0.14/sqrt[100]) = (.03)/(0.014) =
2.1429
2 Tailed Test with 95% CI (0 – 0.25) (0.25 – 97.5)
(97.5 – 100)
Z-Score Breakdown (-infinity
– -1.96) (-1.96 – 1.96) (1.96 –
infinity)
BOLD = FAIL
NOT BOLD = FAIL TO REJECT
NOT BOLD = FAIL TO REJECT
Z-Score(Ground Nuts)
-infinity__________-1.96____^-0.9346^______0__________1.96__________infinity
-infinity__________-1.96____^-0.9346^______0__________1.96__________infinity
Z-Score (Cassava)
-infinity_____^-2.1127^_____-1.96__________0__________1.96__________infinity
-infinity_____^-2.1127^_____-1.96__________0__________1.96__________infinity
Z-Score (Beans)
-infinity__________-1.96__________0__________1.96_____^2.1429^_____infinity
-infinity__________-1.96__________0__________1.96_____^2.1429^_____infinity
B.)
Null Hypothesis: at 95% Confidence intervals there is no
difference between Kenya’s average production of crops (Ground Nuts, Cassava,
Beans) when compared to the sample of 100 farmers.
Alternative
Hypothesis: at 95% Confidence intervals
there is a difference between Kenya’s average production of crops (Ground Nuts,
Cassava, Beans) when compared to the sample of 100 farmers.
C.)
is mixed in with A
Ground Nuts = -0.9346 FAIL TO REJECT THE NULL HYPOTHESIS (NO DIFFERENCE)
Cassava = -2.1127 REJECT THE NULL HYPOTHESIS (DIFFERENCE)
Beans = 2.1429 REJECT THE NULL HYPOTHESIS (DIFFERENCE)
D.)
The similarities amongst the data are that two of the
three data sets fall outside the significant range of having a difference
between the 100 farmer sample and the Country average. The numbers were all across the board in
terms of Z-Scores. One was a -2.1127, meaning it was over 2 standard deviations
below the Country Average. One was a -0.9346, technically no significant
difference at a 95% confidence interval, but still almost an entire standard
deviation below the mean. The last one was 2.1429 standard deviations over the
mean. So very different numbers amongst the 3 crop types.
An
exhaustive survey of all users of a wilderness park taken in 1960 revealed that
the average number of persons per party was 2.8. In a random sample of 25 parties in 1985, the
average was 3.7 persons with a standard deviation of 1.45 (one tailed test, 95%
Con. Level) (5 pts)
a. Test
the hypothesis that the number of people per party has changed in the
intervening years. (State null and
alternative hypotheses)
b. What
is the corresponding probability value
A.)
Null Hypothesis: at a 95% confidence interval there is no
difference in the average number of people per party in 1960 when compared to
those of a 1985 sample.
Alternative
Hypothesis: at a 95% confidence interval there is a difference in the average
number of people per party in 1960 when compared to those of a 1985 sample.
B.)
1960 = 2.8
Sample from 1985 = 3.7 with a standard deviation of 1.45 (sample size = 25)
Sample from 1985 = 3.7 with a standard deviation of 1.45 (sample size = 25)
t – Test = (sample mean – 1960s mean) / (standard deviation
/ sqrt [sample size])
t – Test = (3.7 – 2.8) / (1.45/sqrt [25]) = (0.9)/(0.29) =
3.1034
t = 3.1034
df = 25 – 1 = 24
= 1.711
0________________________1.711_________^3.1034^_______infinity
0________________________1.711_________^3.1034^_______infinity
Due to the fact that the sample came up with a t – score of
3.1034 we have to reject the null hypothesis. There is a significant difference
between the 1985 sample at the Wilderness Park when compared to the average
party size in 1960.