#install.packages("Statamarkdown")
library(Statamarkdown)
stataexe <- "C:/Program Files/Stata18/StataSE-64.exe"
knitr::opts_chunk$set(engine.path=list(stata=stataexe))
Getting started with Stata Maps
Introduction
Suppose we want to map population level statistics by district or by province , we can obviously show these on a map .
Important to have the following:
- A shapefile that contains information on the boundaries of South Africa and its counties
- Geo-coordinates of major South African cities
- Geographically disaggregated data we want to map (ie country population)
Setup the Stata Engine in R
- this involves loading the package
Statamarkdown
- defining the Stata installation path
Importing stata Map packages
ssc install spmap
ssc install shp2dta
ssc install mif2dta
checking spmap consistency and verifying not already installed...
all files already exist and are up to date.
checking shp2dta consistency and verifying not already installed...
all files already exist and are up to date.
checking mif2dta consistency and verifying not already installed...
all files already exist and are up to date.
Importing a shapefile in stata
the image below is a screenshot with the relevant data needed for this presentation
Step 1
- the first thing you need is a good
base map
to work with. Depending on what you are working on , a simple map of the boundaries of SA is sufficient. - we need a shapefile to work with together with (and related files e.g dbf file that come with it as highlighted in the image above)
- The relevant command is called
shp2dta
and it creates two linked Stata files which are:- A list of all geographical units i.e for our case
SA_counties
- A list of coordinates
coord_SA
that describe the related boundaries.
- A list of all geographical units i.e for our case
*****loading the shapefiles*****using "Datasets\zaf_admbnda_adm2_sadb_ocha_20201109", database(SA_counties) coordinates(coord_SA) genid(county_id) replace shp2dta
type: 5
Glimpse of the generated Datasets
use coord_SA ,clear
list in 1/5
| _ID _X _Y |
|----------------------------|
1. | 1 . . |
2. | 1 29.02187 -30.00443 |
3. | 1 29.02401 -30.00522 |
4. | 1 29.02775 -30.00661 |
5. | 1 29.02909 -30.0071 |
+----------------------------+
use SA_counties ,clear
list Shape_Leng Shape_Area ADM2_ID county_id in 1/5
| Shape_~g Shape_~a ADM2_ID county~d |
|------------------------------------------|
1. | 10.01106 1.010062 DC44 1 |
2. | 5.053988 .6500897 DC25 2 |
3. | 16.2535 2.027643 DC12 3 |
4. | 8.405694 1.645116 DC37 4 |
5. | 4.47912 .2652066 BUF 5 |
+------------------------------------------+
Step 2
- we now import the Geographically disaggregated data we want to map (ie country population)
stats SA population by districts for 2018*****
****Load "Datasets\statssa_population_districts_2018.csv_", clear
import delimited rename adm2_id ADM2_ID
save "pop_districts_final.dta", replace
(encoding automatically selected: UTF-8)
(30 vars, 52 obs)
file pop_districts_final.dta saved
Glimpse of the population dataset
use "pop_districts_final.dta" ,clear
list ADM2_ID p_total in 1/5
| ADM2_ID p_total |
|-------------------|
1. | BUF 834997 |
2. | CPT 4005016 |
3. | DC1 436403 |
4. | DC10 479923 |
5. | DC12 880790 |
+-------------------+
Step 3: Merge the datasets
- merge Geo-coordinates of major South African cities and Geographically disaggregated data we want to map (ie county population) > to achieve the above we need a variable that is common to both datasets and this variable is
ADM2_ID
*****Merge the datasets*********use "SA_counties.dta", clear
merge 1:1 ADM2_ID using "pop_districts_final"
drop _merge
cap
save "master_data.dta"
Result Number of obs
-----------------------------------------
Not matched 0
Matched 52 (_merge==3)
-----------------------------------------
file master_data.dta already exists
r(602);
r(602);
Glimpse merged dataset (masterdata)
use "master_data.dta" ,clear
list ADM2_ID Shape_Leng Shape_Area county_id p_total in 1/5
| ADM2_ID Shape_~g Shape_~a county~d p_total |
|----------------------------------------------------|
1. | BUF 4.47912 .2652066 5 834997 |
2. | CPT 5.210914 .2383657 11 4005016 |
3. | DC1 14.65113 2.973226 48 436403 |
4. | DC10 17.86512 5.624098 6 479923 |
5. | DC12 16.2535 2.027643 3 880790 |
+----------------------------------------------------+
Step 4: Draw the map
- Having merged the county level information with master file we use the
spmap
command to draw the map. - we specify
p_total
, the variable capturing the total population in the country per Province , as the variable of interest we want to base our coloring scheme on. - we need to remind stata that the file containing the base map is called
coord_SA.dta
and the id that identifies each geographical unit is calledcounty_id
- further you can add additional aesthetics to your map.
***Draw the map****
use "master_data.dta"
using "coord_SA.dta", id(county_id) legend(on) fcolor(Greens2) legend(label(2 "74247-502821") label(3 "502821-811400") label(4 "811400,1164473.5" ) label(5 "31164473.5,4949347" ) ) title("Population Density by District")
spmap p_total /*specify base map (coord_SA) and variable identifying relevant geographic units (county_id)*/
/*change default labels (just cosmetics really) */
quietly graph export graph_map1.svg, replace
use "master_data.dta"
graph bar (asis) p_total ,title("Population distribution by District,2018") over(ADM1_EN) asyvars over(ADM2_EN , sort(p_total) lab(angle(90))) ysize(5) nofill legend(pos(3) col(1)) xsize(14)
quietly graph export graph_1hbar.png, replace