sysuse auto, clear
(1978 automobile data)
hypertension
This report details reproducible research using Stata , it was compiled using Quarto and Rstudio.
The big central section is the Results window, where you’ll see the results of the commands you run. Under it is the Command window, where you’ll type commands when you’re working interactively.
On the left is the History window, which contains a history of the commands you’ve run. Click once on a command to paste it back into the Command window for editing. Double-click on a command to run it again. You can also press Page Up when you’re in the Command window to recall past commands. Right-click on a command or block of commands to copy it into the clipboard or send it to the Do File editor. This allows you to take something you’ve done interactively and turn it into part of a do file.
Beneath the History window Stata displays the working directory. This is where Stata will save files if you don’t specify another location.
On the right is the Variables windows, which contains a list of the variables in the current data set. Click once on a variable name to select it, and information about the variable will be shown in the Properties window on the bottom right. Click twice, and the variable name will be pasted into the Command window. You can also start typing a variable name in the Command window and press Tab, and Stata will either complete the variable name or give you a list of variables that match what you’ve typed so far.
The Properties window also has a section for properties of the data set. One to keep an eye on is the size, or how much memory it requires. Stata must load your entire data set into memory. Modern computers have so much memory that most Stata users never have to worry about it, but big data users must make sure they don’t run out of memory. If you try to use more memory than your computer has, the operating system will use disk space as memory and Stata will become so slow that it’s practically unusable.
If a command is running, the button on the far right of the top toolbar will turn stop-sign red. Clicking it will tell Stata to stop what it’s doing—though it may take some time to notice. Pressing q will do the same thing.
The menus above the toolbar give you access to most of Stata’s commands and a graphical user interface for running them, but you’ll work much faster if you type them instead. There are some situations where the menus are useful. Importing data from non-Stata formats is one, because the graphical user interface will give you a preview of the data you can use to figure out the correct options for importing it. Making graphs is another, because there are so many options for graphs that setting them using the graphical user interface is a good alternative to memorizing them all. When you do something using the menus, Stata will craft a command based on what you chose and add it to the history just as if you’d typed it. You can then paste the command into a do file to make it reproducible
Let’s begin by opening a Stata system dataset named auto and examining its structure:
make | price | mpg | rep78 | headroom | trunk | weight | length | turn | displacement | gear_ratio | foreign |
---|---|---|---|---|---|---|---|---|---|---|---|
AMC Concord | 4099 | 22 | 3 | 2.5 | 11 | 2930 | 186 | 40 | 121 | 3.58 | 0 |
AMC Pacer | 4749 | 17 | 3 | 3.0 | 11 | 3350 | 173 | 40 | 258 | 2.53 | 0 |
AMC Spirit | 3799 | 22 | NA | 3.0 | 12 | 2640 | 168 | 35 | 121 | 3.08 | 0 |
Buick Century | 4816 | 20 | 3 | 4.5 | 16 | 3250 | 196 | 40 | 196 | 2.93 | 0 |
Buick Electra | 7827 | 15 | 4 | 4.0 | 20 | 4080 | 222 | 43 | 350 | 2.41 | 0 |
Buick LeSabre | 5788 | 18 | 3 | 4.0 | 21 | 3670 | 218 | 43 | 231 | 2.73 | 0 |
Buick Opel | 4453 | 26 | NA | 3.0 | 10 | 2230 | 170 | 34 | 304 | 2.87 | 0 |
Buick Regal | 5189 | 20 | 3 | 2.0 | 16 | 3280 | 200 | 42 | 196 | 2.93 | 0 |
Buick Riviera | 10372 | 16 | 3 | 3.5 | 17 | 3880 | 207 | 43 | 231 | 2.93 | 0 |
Buick Skylark | 4082 | 19 | 3 | 3.5 | 13 | 3400 | 200 | 42 | 231 | 3.08 | 0 |
This opens Stata’s Data Editor, which shows you your data set in a spreadsheet-like form, in browse mode. You can also invoke the Data Editor in edit mode by typing edit or clicking the button that looks like a pencil writing in a spreadsheet. Then it will allow you to make changes. You might use edit mode for data entry, but since you should never change your data interactively get in the habit of using browse mode so you don’t make changes by accident.
A Stata data set is a matrix, with one row for each observation and one column for each variable. This raises the question “What is an observation in this data set?” The values of the make variable suggests they are cars, but are they individual cars or kinds of cars? The fact that there is just one row for each value of make suggests kinds of cars.
The first line tells Stata that we are going to input data for a string variable called name. The number 12 tells input that we want the string variable to allow up to 12 characters for each observation. The next four lines are the raw data, which include the names Ringo, John, Paul, and George. The word end in the last line tells Stata that we are finished adding data.
Almost all Stata commands use a standard syntax. This syntax allows you to control what part of the data set the command acts on, modify what the command does, and more.
We’ll discuss five syntax elements:
Stata is a command-based language. Most Stata commands are verbs. They tell Stata to do something: summarize, tabulate, regress, etc. Normally the command itself comes first and then you tell Stata the details of what you want it to do after.
Many commands can be abbreviated: sum instead of summarize, tab instead of tabulate, reg instead of regress. Commands that can destroy data, like replace, cannot be abbreviated.
We’ll explore the elements of Stata syntax using a command that makes it easy to see what they do and works well in a web book, list. It lists your data set in the Results window. (I apologize for all the scrolling you’ll need to do in this chapter! .
1. | make | price | mpg | rep78 | headroom | trunk | weight |
| AMC Concord | 4,099 | 22 | 3 | 2.5 | 11 | 2,930 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 186 | 40 | 121 | 3.58 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
2. | make | price | mpg | rep78 | headroom | trunk | weight |
| AMC Pacer | 4,749 | 17 | 3 | 3.0 | 11 | 3,350 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 173 | 40 | 258 | 2.53 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
3. | make | price | mpg | rep78 | headroom | trunk | weight |
| AMC Spirit | 3,799 | 22 | . | 3.0 | 12 | 2,640 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 168 | 35 | 121 | 3.08 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
4. | make | price | mpg | rep78 | headroom | trunk | weight |
| Buick Century | 4,816 | 20 | 3 | 4.5 | 16 | 3,250 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 196 | 40 | 196 | 2.93 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
5. | make | price | mpg | rep78 | headroom | trunk | weight |
| Buick Electra | 7,827 | 15 | 4 | 4.0 | 20 | 4,080 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 222 | 43 | 350 | 2.41 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
6. | make | price | mpg | rep78 | headroom | trunk | weight |
| Buick LeSabre | 5,788 | 18 | 3 | 4.0 | 21 | 3,670 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 218 | 43 | 231 | 2.73 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
7. | make | price | mpg | rep78 | headroom | trunk | weight |
| Buick Opel | 4,453 | 26 | . | 3.0 | 10 | 2,230 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 170 | 34 | 304 | 2.87 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
8. | make | price | mpg | rep78 | headroom | trunk | weight |
| Buick Regal | 5,189 | 20 | 3 | 2.0 | 16 | 3,280 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 200 | 42 | 196 | 2.93 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
9. | make | price | mpg | rep78 | headroom | trunk | weight |
| Buick Riviera | 10,372 | 16 | 3 | 3.5 | 17 | 3,880 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 207 | 43 | 231 | 2.93 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
10. | make | price | mpg | rep78 | headroom | trunk | weight |
| Buick Skylark | 4,082 | 19 | 3 | 3.5 | 13 | 3,400 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 200 | 42 | 231 | 3.08 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
11. | make | price | mpg | rep78 | headroom | trunk | weight |
| Cad. Deville | 11,385 | 14 | 3 | 4.0 | 20 | 4,330 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 221 | 44 | 425 | 2.28 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
12. | make | price | mpg | rep78 | headroom | trunk | weight |
| Cad. Eldorado | 14,500 | 14 | 2 | 3.5 | 16 | 3,900 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 204 | 43 | 350 | 2.19 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
13. | make | price | mpg | rep78 | headroom | trunk | weight |
| Cad. Seville | 15,906 | 21 | 3 | 3.0 | 13 | 4,290 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 204 | 45 | 350 | 2.24 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
14. | make | price | mpg | rep78 | headroom | trunk | weight |
| Chev. Chevette | 3,299 | 29 | 3 | 2.5 | 9 | 2,110 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 163 | 34 | 231 | 2.93 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
15. | make | price | mpg | rep78 | headroom | trunk | weight |
| Chev. Impala | 5,705 | 16 | 4 | 4.0 | 20 | 3,690 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 212 | 43 | 250 | 2.56 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
16. | make | price | mpg | rep78 | headroom | trunk | weight |
| Chev. Malibu | 4,504 | 22 | 3 | 3.5 | 17 | 3,180 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 193 | 31 | 200 | 2.73 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
17. | make | price | mpg | rep78 | headroom | trunk | weight |
| Chev. Monte Carlo | 5,104 | 22 | 2 | 2.0 | 16 | 3,220 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 200 | 41 | 200 | 2.73 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
18. | make | price | mpg | rep78 | headroom | trunk | weight |
| Chev. Monza | 3,667 | 24 | 2 | 2.0 | 7 | 2,750 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 179 | 40 | 151 | 2.73 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
19. | make | price | mpg | rep78 | headroom | trunk | weight |
| Chev. Nova | 3,955 | 19 | 3 | 3.5 | 13 | 3,430 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 197 | 43 | 250 | 2.56 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
20. | make | price | mpg | rep78 | headroom | trunk | weight |
| Dodge Colt | 3,984 | 30 | 5 | 2.0 | 8 | 2,120 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 163 | 35 | 98 | 3.54 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
21. | make | price | mpg | rep78 | headroom | trunk | weight |
| Dodge Diplomat | 4,010 | 18 | 2 | 4.0 | 17 | 3,600 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 206 | 46 | 318 | 2.47 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
22. | make | price | mpg | rep78 | headroom | trunk | weight |
| Dodge Magnum | 5,886 | 16 | 2 | 4.0 | 17 | 3,600 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 206 | 46 | 318 | 2.47 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
23. | make | price | mpg | rep78 | headroom | trunk | weight |
| Dodge St. Regis | 6,342 | 17 | 2 | 4.5 | 21 | 3,740 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 220 | 46 | 225 | 2.94 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
24. | make | price | mpg | rep78 | headroom | trunk | weight |
| Ford Fiesta | 4,389 | 28 | 4 | 1.5 | 9 | 1,800 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 147 | 33 | 98 | 3.15 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
25. | make | price | mpg | rep78 | headroom | trunk | weight |
| Ford Mustang | 4,187 | 21 | 3 | 2.0 | 10 | 2,650 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 179 | 43 | 140 | 3.08 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
26. | make | price | mpg | rep78 | headroom | trunk | weight |
| Linc. Continental | 11,497 | 12 | 3 | 3.5 | 22 | 4,840 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 233 | 51 | 400 | 2.47 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
27. | make | price | mpg | rep78 | headroom | trunk | weight |
| Linc. Mark V | 13,594 | 12 | 3 | 2.5 | 18 | 4,720 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 230 | 48 | 400 | 2.47 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
28. | make | price | mpg | rep78 | headroom | trunk | weight |
| Linc. Versailles | 13,466 | 14 | 3 | 3.5 | 15 | 3,830 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 201 | 41 | 302 | 2.47 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
29. | make | price | mpg | rep78 | headroom | trunk | weight |
| Merc. Bobcat | 3,829 | 22 | 4 | 3.0 | 9 | 2,580 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 169 | 39 | 140 | 2.73 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
30. | make | price | mpg | rep78 | headroom | trunk | weight |
| Merc. Cougar | 5,379 | 14 | 4 | 3.5 | 16 | 4,060 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 221 | 48 | 302 | 2.75 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
31. | make | price | mpg | rep78 | headroom | trunk | weight |
| Merc. Marquis | 6,165 | 15 | 3 | 3.5 | 23 | 3,720 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 212 | 44 | 302 | 2.26 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
32. | make | price | mpg | rep78 | headroom | trunk | weight |
| Merc. Monarch | 4,516 | 18 | 3 | 3.0 | 15 | 3,370 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 198 | 41 | 250 | 2.43 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
33. | make | price | mpg | rep78 | headroom | trunk | weight |
| Merc. XR-7 | 6,303 | 14 | 4 | 3.0 | 16 | 4,130 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 217 | 45 | 302 | 2.75 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
34. | make | price | mpg | rep78 | headroom | trunk | weight |
| Merc. Zephyr | 3,291 | 20 | 3 | 3.5 | 17 | 2,830 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 195 | 43 | 140 | 3.08 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
35. | make | price | mpg | rep78 | headroom | trunk | weight |
| Olds 98 | 8,814 | 21 | 4 | 4.0 | 20 | 4,060 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 220 | 43 | 350 | 2.41 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
36. | make | price | mpg | rep78 | headroom | trunk | weight |
| Olds Cutl Supr | 5,172 | 19 | 3 | 2.0 | 16 | 3,310 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 198 | 42 | 231 | 2.93 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
37. | make | price | mpg | rep78 | headroom | trunk | weight |
| Olds Cutlass | 4,733 | 19 | 3 | 4.5 | 16 | 3,300 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 198 | 42 | 231 | 2.93 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
38. | make | price | mpg | rep78 | headroom | trunk | weight |
| Olds Delta 88 | 4,890 | 18 | 4 | 4.0 | 20 | 3,690 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 218 | 42 | 231 | 2.73 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
39. | make | price | mpg | rep78 | headroom | trunk | weight |
| Olds Omega | 4,181 | 19 | 3 | 4.5 | 14 | 3,370 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 200 | 43 | 231 | 3.08 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
40. | make | price | mpg | rep78 | headroom | trunk | weight |
| Olds Starfire | 4,195 | 24 | 1 | 2.0 | 10 | 2,730 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 180 | 40 | 151 | 2.73 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
41. | make | price | mpg | rep78 | headroom | trunk | weight |
| Olds Toronado | 10,371 | 16 | 3 | 3.5 | 17 | 4,030 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 206 | 43 | 350 | 2.41 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
42. | make | price | mpg | rep78 | headroom | trunk | weight |
| Plym. Arrow | 4,647 | 28 | 3 | 2.0 | 11 | 3,260 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 170 | 37 | 156 | 3.05 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
43. | make | price | mpg | rep78 | headroom | trunk | weight |
| Plym. Champ | 4,425 | 34 | 5 | 2.5 | 11 | 1,800 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 157 | 37 | 86 | 2.97 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
44. | make | price | mpg | rep78 | headroom | trunk | weight |
| Plym. Horizon | 4,482 | 25 | 3 | 4.0 | 17 | 2,200 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 165 | 36 | 105 | 3.37 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
45. | make | price | mpg | rep78 | headroom | trunk | weight |
| Plym. Sapporo | 6,486 | 26 | . | 1.5 | 8 | 2,520 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 182 | 38 | 119 | 3.54 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
46. | make | price | mpg | rep78 | headroom | trunk | weight |
| Plym. Volare | 4,060 | 18 | 2 | 5.0 | 16 | 3,330 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 201 | 44 | 225 | 3.23 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
47. | make | price | mpg | rep78 | headroom | trunk | weight |
| Pont. Catalina | 5,798 | 18 | 4 | 4.0 | 20 | 3,700 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 214 | 42 | 231 | 2.73 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
48. | make | price | mpg | rep78 | headroom | trunk | weight |
| Pont. Firebird | 4,934 | 18 | 1 | 1.5 | 7 | 3,470 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 198 | 42 | 231 | 3.08 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
49. | make | price | mpg | rep78 | headroom | trunk | weight |
| Pont. Grand Prix | 5,222 | 19 | 3 | 2.0 | 16 | 3,210 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 201 | 45 | 231 | 2.93 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
50. | make | price | mpg | rep78 | headroom | trunk | weight |
| Pont. Le Mans | 4,723 | 19 | 3 | 3.5 | 17 | 3,200 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 199 | 40 | 231 | 2.93 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
51. | make | price | mpg | rep78 | headroom | trunk | weight |
| Pont. Phoenix | 4,424 | 19 | . | 3.5 | 13 | 3,420 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 203 | 43 | 231 | 3.08 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
52. | make | price | mpg | rep78 | headroom | trunk | weight |
| Pont. Sunbird | 4,172 | 24 | 2 | 2.0 | 7 | 2,690 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 179 | 41 | 151 | 2.73 | Domestic |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
53. | make | price | mpg | rep78 | headroom | trunk | weight |
| Audi 5000 | 9,690 | 17 | 5 | 3.0 | 15 | 2,830 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 189 | 37 | 131 | 3.20 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
54. | make | price | mpg | rep78 | headroom | trunk | weight |
| Audi Fox | 6,295 | 23 | 3 | 2.5 | 11 | 2,070 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 174 | 36 | 97 | 3.70 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
55. | make | price | mpg | rep78 | headroom | trunk | weight |
| BMW 320i | 9,735 | 25 | 4 | 2.5 | 12 | 2,650 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 177 | 34 | 121 | 3.64 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
56. | make | price | mpg | rep78 | headroom | trunk | weight |
| Datsun 200 | 6,229 | 23 | 4 | 1.5 | 6 | 2,370 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 170 | 35 | 119 | 3.89 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
57. | make | price | mpg | rep78 | headroom | trunk | weight |
| Datsun 210 | 4,589 | 35 | 5 | 2.0 | 8 | 2,020 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 165 | 32 | 85 | 3.70 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
58. | make | price | mpg | rep78 | headroom | trunk | weight |
| Datsun 510 | 5,079 | 24 | 4 | 2.5 | 8 | 2,280 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 170 | 34 | 119 | 3.54 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
59. | make | price | mpg | rep78 | headroom | trunk | weight |
| Datsun 810 | 8,129 | 21 | 4 | 2.5 | 8 | 2,750 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 184 | 38 | 146 | 3.55 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
60. | make | price | mpg | rep78 | headroom | trunk | weight |
| Fiat Strada | 4,296 | 21 | 3 | 2.5 | 16 | 2,130 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 161 | 36 | 105 | 3.37 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
61. | make | price | mpg | rep78 | headroom | trunk | weight |
| Honda Accord | 5,799 | 25 | 5 | 3.0 | 10 | 2,240 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 172 | 36 | 107 | 3.05 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
62. | make | price | mpg | rep78 | headroom | trunk | weight |
| Honda Civic | 4,499 | 28 | 4 | 2.5 | 5 | 1,760 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 149 | 34 | 91 | 3.30 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
63. | make | price | mpg | rep78 | headroom | trunk | weight |
| Mazda GLC | 3,995 | 30 | 4 | 3.5 | 11 | 1,980 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 154 | 33 | 86 | 3.73 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
64. | make | price | mpg | rep78 | headroom | trunk | weight |
| Peugeot 604 | 12,990 | 14 | . | 3.5 | 14 | 3,420 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 192 | 38 | 163 | 3.58 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
65. | make | price | mpg | rep78 | headroom | trunk | weight |
| Renault Le Car | 3,895 | 26 | 3 | 3.0 | 10 | 1,830 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 142 | 34 | 79 | 3.72 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
66. | make | price | mpg | rep78 | headroom | trunk | weight |
| Subaru | 3,798 | 35 | 5 | 2.5 | 11 | 2,050 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 164 | 36 | 97 | 3.81 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
67. | make | price | mpg | rep78 | headroom | trunk | weight |
| Toyota Celica | 5,899 | 18 | 5 | 2.5 | 14 | 2,410 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 174 | 36 | 134 | 3.06 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
68. | make | price | mpg | rep78 | headroom | trunk | weight |
| Toyota Corolla | 3,748 | 31 | 5 | 3.0 | 9 | 2,200 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 165 | 35 | 97 | 3.21 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
69. | make | price | mpg | rep78 | headroom | trunk | weight |
| Toyota Corona | 5,719 | 18 | 5 | 2.0 | 11 | 2,670 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 175 | 36 | 134 | 3.05 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
70. | make | price | mpg | rep78 | headroom | trunk | weight |
| VW Dasher | 7,140 | 23 | 4 | 2.5 | 12 | 2,160 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 172 | 36 | 97 | 3.74 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
71. | make | price | mpg | rep78 | headroom | trunk | weight |
| VW Diesel | 5,397 | 41 | 5 | 3.0 | 15 | 2,040 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 155 | 35 | 90 | 3.78 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
72. | make | price | mpg | rep78 | headroom | trunk | weight |
| VW Rabbit | 4,697 | 25 | 4 | 3.0 | 15 | 1,930 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 155 | 35 | 89 | 3.78 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
73. | make | price | mpg | rep78 | headroom | trunk | weight |
| VW Scirocco | 6,850 | 25 | 4 | 2.0 | 16 | 1,990 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 156 | 36 | 97 | 3.78 | Foreign |
+----------------------------------------------------------------------+
+----------------------------------------------------------------------+
74. | make | price | mpg | rep78 | headroom | trunk | weight |
| Volvo 260 | 11,995 | 17 | 5 | 2.5 | 14 | 3,170 |
|----------------------------------------------------------------------|
| length | turn | displa~t | gear_r~o | foreign |
| 193 | 37 | 163 | 2.98 | Foreign |
+----------------------------------------------------------------------+
Listing one or more variables after a command tells the command it should only act on the variables listed:
| make price mpg |
|----------------------------------|
1. | AMC Concord 4,099 22 |
2. | AMC Pacer 4,749 17 |
3. | AMC Spirit 3,799 22 |
4. | Buick Century 4,816 20 |
5. | Buick Electra 7,827 15 |
|----------------------------------|
6. | Buick LeSabre 5,788 18 |
7. | Buick Opel 4,453 26 |
8. | Buick Regal 5,189 20 |
9. | Buick Riviera 10,372 16 |
10. | Buick Skylark 4,082 19 |
|----------------------------------|
11. | Cad. Deville 11,385 14 |
12. | Cad. Eldorado 14,500 14 |
13. | Cad. Seville 15,906 21 |
14. | Chev. Chevette 3,299 29 |
15. | Chev. Impala 5,705 16 |
|----------------------------------|
16. | Chev. Malibu 4,504 22 |
17. | Chev. Monte Carlo 5,104 22 |
18. | Chev. Monza 3,667 24 |
19. | Chev. Nova 3,955 19 |
20. | Dodge Colt 3,984 30 |
|----------------------------------|
21. | Dodge Diplomat 4,010 18 |
22. | Dodge Magnum 5,886 16 |
23. | Dodge St. Regis 6,342 17 |
24. | Ford Fiesta 4,389 28 |
25. | Ford Mustang 4,187 21 |
|----------------------------------|
26. | Linc. Continental 11,497 12 |
27. | Linc. Mark V 13,594 12 |
28. | Linc. Versailles 13,466 14 |
29. | Merc. Bobcat 3,829 22 |
30. | Merc. Cougar 5,379 14 |
|----------------------------------|
31. | Merc. Marquis 6,165 15 |
32. | Merc. Monarch 4,516 18 |
33. | Merc. XR-7 6,303 14 |
34. | Merc. Zephyr 3,291 20 |
35. | Olds 98 8,814 21 |
|----------------------------------|
36. | Olds Cutl Supr 5,172 19 |
37. | Olds Cutlass 4,733 19 |
38. | Olds Delta 88 4,890 18 |
39. | Olds Omega 4,181 19 |
40. | Olds Starfire 4,195 24 |
|----------------------------------|
41. | Olds Toronado 10,371 16 |
42. | Plym. Arrow 4,647 28 |
43. | Plym. Champ 4,425 34 |
44. | Plym. Horizon 4,482 25 |
45. | Plym. Sapporo 6,486 26 |
|----------------------------------|
46. | Plym. Volare 4,060 18 |
47. | Pont. Catalina 5,798 18 |
48. | Pont. Firebird 4,934 18 |
49. | Pont. Grand Prix 5,222 19 |
50. | Pont. Le Mans 4,723 19 |
|----------------------------------|
51. | Pont. Phoenix 4,424 19 |
52. | Pont. Sunbird 4,172 24 |
53. | Audi 5000 9,690 17 |
54. | Audi Fox 6,295 23 |
55. | BMW 320i 9,735 25 |
|----------------------------------|
56. | Datsun 200 6,229 23 |
57. | Datsun 210 4,589 35 |
58. | Datsun 510 5,079 24 |
59. | Datsun 810 8,129 21 |
60. | Fiat Strada 4,296 21 |
|----------------------------------|
61. | Honda Accord 5,799 25 |
62. | Honda Civic 4,499 28 |
63. | Mazda GLC 3,995 30 |
64. | Peugeot 604 12,990 14 |
65. | Renault Le Car 3,895 26 |
|----------------------------------|
66. | Subaru 3,798 35 |
67. | Toyota Celica 5,899 18 |
68. | Toyota Corolla 3,748 31 |
69. | Toyota Corona 5,719 18 |
70. | VW Dasher 7,140 23 |
|----------------------------------|
71. | VW Diesel 5,397 41 |
72. | VW Rabbit 4,697 25 |
73. | VW Scirocco 6,850 25 |
74. | Volvo 260 11,995 17 |
+----------------------------------+
auto |>
select(make,price,mpg)
make | price | mpg |
---|---|---|
AMC Concord | 4099 | 22 |
AMC Pacer | 4749 | 17 |
AMC Spirit | 3799 | 22 |
Buick Century | 4816 | 20 |
Buick Electra | 7827 | 15 |
Buick LeSabre | 5788 | 18 |
Buick Opel | 4453 | 26 |
Buick Regal | 5189 | 20 |
Buick Riviera | 10372 | 16 |
Buick Skylark | 4082 | 19 |
Cad. Deville | 11385 | 14 |
Cad. Eldorado | 14500 | 14 |
Cad. Seville | 15906 | 21 |
Chev. Chevette | 3299 | 29 |
Chev. Impala | 5705 | 16 |
Chev. Malibu | 4504 | 22 |
Chev. Monte Carlo | 5104 | 22 |
Chev. Monza | 3667 | 24 |
Chev. Nova | 3955 | 19 |
Dodge Colt | 3984 | 30 |
Dodge Diplomat | 4010 | 18 |
Dodge Magnum | 5886 | 16 |
Dodge St. Regis | 6342 | 17 |
Ford Fiesta | 4389 | 28 |
Ford Mustang | 4187 | 21 |
Linc. Continental | 11497 | 12 |
Linc. Mark V | 13594 | 12 |
Linc. Versailles | 13466 | 14 |
Merc. Bobcat | 3829 | 22 |
Merc. Cougar | 5379 | 14 |
Merc. Marquis | 6165 | 15 |
Merc. Monarch | 4516 | 18 |
Merc. XR-7 | 6303 | 14 |
Merc. Zephyr | 3291 | 20 |
Olds 98 | 8814 | 21 |
Olds Cutl Supr | 5172 | 19 |
Olds Cutlass | 4733 | 19 |
Olds Delta 88 | 4890 | 18 |
Olds Omega | 4181 | 19 |
Olds Starfire | 4195 | 24 |
Olds Toronado | 10371 | 16 |
Plym. Arrow | 4647 | 28 |
Plym. Champ | 4425 | 34 |
Plym. Horizon | 4482 | 25 |
Plym. Sapporo | 6486 | 26 |
Plym. Volare | 4060 | 18 |
Pont. Catalina | 5798 | 18 |
Pont. Firebird | 4934 | 18 |
Pont. Grand Prix | 5222 | 19 |
Pont. Le Mans | 4723 | 19 |
Pont. Phoenix | 4424 | 19 |
Pont. Sunbird | 4172 | 24 |
Audi 5000 | 9690 | 17 |
Audi Fox | 6295 | 23 |
BMW 320i | 9735 | 25 |
Datsun 200 | 6229 | 23 |
Datsun 210 | 4589 | 35 |
Datsun 510 | 5079 | 24 |
Datsun 810 | 8129 | 21 |
Fiat Strada | 4296 | 21 |
Honda Accord | 5799 | 25 |
Honda Civic | 4499 | 28 |
Mazda GLC | 3995 | 30 |
Peugeot 604 | 12990 | 14 |
Renault Le Car | 3895 | 26 |
Subaru | 3798 | 35 |
Toyota Celica | 5899 | 18 |
Toyota Corolla | 3748 | 31 |
Toyota Corona | 5719 | 18 |
VW Dasher | 7140 | 23 |
VW Diesel | 5397 | 41 |
VW Rabbit | 4697 | 25 |
VW Scirocco | 6850 | 25 |
Volvo 260 | 11995 | 17 |
An if condition tells a command which observations it should act on. It will only act on those observations where the condition is true. This allows you to do things with subsets of the data. An if condition comes after a variable list:
| make foreign |
|--------------------------|
53. | Audi 5000 Foreign |
54. | Audi Fox Foreign |
55. | BMW 320i Foreign |
56. | Datsun 200 Foreign |
57. | Datsun 210 Foreign |
|--------------------------|
58. | Datsun 510 Foreign |
59. | Datsun 810 Foreign |
60. | Fiat Strada Foreign |
61. | Honda Accord Foreign |
62. | Honda Civic Foreign |
|--------------------------|
63. | Mazda GLC Foreign |
64. | Peugeot 604 Foreign |
65. | Renault Le Car Foreign |
66. | Subaru Foreign |
67. | Toyota Celica Foreign |
|--------------------------|
68. | Toyota Corolla Foreign |
69. | Toyota Corona Foreign |
70. | VW Dasher Foreign |
71. | VW Diesel Foreign |
72. | VW Rabbit Foreign |
|--------------------------|
73. | VW Scirocco Foreign |
74. | Volvo 260 Foreign |
+--------------------------+
auto |>
select(make,foreign) |>
filter(foreign ==1)
make | foreign |
---|---|
Audi 5000 | 1 |
Audi Fox | 1 |
BMW 320i | 1 |
Datsun 200 | 1 |
Datsun 210 | 1 |
Datsun 510 | 1 |
Datsun 810 | 1 |
Fiat Strada | 1 |
Honda Accord | 1 |
Honda Civic | 1 |
Mazda GLC | 1 |
Peugeot 604 | 1 |
Renault Le Car | 1 |
Subaru | 1 |
Toyota Celica | 1 |
Toyota Corolla | 1 |
Toyota Corona | 1 |
VW Dasher | 1 |
VW Diesel | 1 |
VW Rabbit | 1 |
VW Scirocco | 1 |
Volvo 260 | 1 |
Note the two equals signs! In Stata you use one equals sign when you’re setting something equal to something else (see Creating and Changing Variables) and two equals signs when you’re asking if two things are equal. Other operators you can use are:
!
all by itself means “not”
and reverses whatever condition follows it.
Internally, Stata equates true and false with one and zero. That means you can write:
| make |
|----------------|
53. | Audi 5000 |
54. | Audi Fox |
55. | BMW 320i |
56. | Datsun 200 |
57. | Datsun 210 |
|----------------|
58. | Datsun 510 |
59. | Datsun 810 |
60. | Fiat Strada |
61. | Honda Accord |
62. | Honda Civic |
|----------------|
63. | Mazda GLC |
64. | Peugeot 604 |
65. | Renault Le Car |
66. | Subaru |
67. | Toyota Celica |
|----------------|
68. | Toyota Corolla |
69. | Toyota Corona |
70. | VW Dasher |
71. | VW Diesel |
72. | VW Rabbit |
|----------------|
73. | VW Scirocco |
74. | Volvo 260 |
+----------------+
| make |
|-------------------|
1. | AMC Concord |
2. | AMC Pacer |
3. | AMC Spirit |
4. | Buick Century |
5. | Buick Electra |
|-------------------|
6. | Buick LeSabre |
7. | Buick Opel |
8. | Buick Regal |
9. | Buick Riviera |
10. | Buick Skylark |
|-------------------|
11. | Cad. Deville |
12. | Cad. Eldorado |
13. | Cad. Seville |
14. | Chev. Chevette |
15. | Chev. Impala |
|-------------------|
16. | Chev. Malibu |
17. | Chev. Monte Carlo |
18. | Chev. Monza |
19. | Chev. Nova |
20. | Dodge Colt |
|-------------------|
21. | Dodge Diplomat |
22. | Dodge Magnum |
23. | Dodge St. Regis |
24. | Ford Fiesta |
25. | Ford Mustang |
|-------------------|
26. | Linc. Continental |
27. | Linc. Mark V |
28. | Linc. Versailles |
29. | Merc. Bobcat |
30. | Merc. Cougar |
|-------------------|
31. | Merc. Marquis |
32. | Merc. Monarch |
33. | Merc. XR-7 |
34. | Merc. Zephyr |
35. | Olds 98 |
|-------------------|
36. | Olds Cutl Supr |
37. | Olds Cutlass |
38. | Olds Delta 88 |
39. | Olds Omega |
40. | Olds Starfire |
|-------------------|
41. | Olds Toronado |
42. | Plym. Arrow |
43. | Plym. Champ |
44. | Plym. Horizon |
45. | Plym. Sapporo |
|-------------------|
46. | Plym. Volare |
47. | Pont. Catalina |
48. | Pont. Firebird |
49. | Pont. Grand Prix |
50. | Pont. Le Mans |
|-------------------|
51. | Pont. Phoenix |
52. | Pont. Sunbird |
+-------------------+
You can combine conditions with & (logical and) or | (logical or)
. The character used for logical or is called the “pipe” character and you type it by pressing Shift-Backslash, the key right above Enter. Try:
| make price mpg |
|------------------------------|
7. | Buick Opel 4,453 26 |
14. | Chev. Chevette 3,299 29 |
20. | Dodge Colt 3,984 30 |
24. | Ford Fiesta 4,389 28 |
42. | Plym. Arrow 4,647 28 |
|------------------------------|
43. | Plym. Champ 4,425 34 |
57. | Datsun 210 4,589 35 |
62. | Honda Civic 4,499 28 |
63. | Mazda GLC 3,995 30 |
65. | Renault Le Car 3,895 26 |
|------------------------------|
66. | Subaru 3,798 35 |
68. | Toyota Corolla 3,748 31 |
+------------------------------+
auto |>
filter(mpg>25 & price<5000) |>
select(make,price,mpg)
make | price | mpg |
---|---|---|
Buick Opel | 4453 | 26 |
Chev. Chevette | 3299 | 29 |
Dodge Colt | 3984 | 30 |
Ford Fiesta | 4389 | 28 |
Plym. Arrow | 4647 | 28 |
Plym. Champ | 4425 | 34 |
Datsun 210 | 4589 | 35 |
Honda Civic | 4499 | 28 |
Mazda GLC | 3995 | 30 |
Renault Le Car | 3895 | 26 |
Subaru | 3798 | 35 |
Toyota Corolla | 3748 | 31 |
| make price mpg |
|------------------------------|
1. | AMC Concord 4,099 22 |
2. | AMC Pacer 4,749 17 |
3. | AMC Spirit 3,799 22 |
4. | Buick Century 4,816 20 |
7. | Buick Opel 4,453 26 |
|------------------------------|
10. | Buick Skylark 4,082 19 |
14. | Chev. Chevette 3,299 29 |
16. | Chev. Malibu 4,504 22 |
18. | Chev. Monza 3,667 24 |
19. | Chev. Nova 3,955 19 |
|------------------------------|
20. | Dodge Colt 3,984 30 |
21. | Dodge Diplomat 4,010 18 |
24. | Ford Fiesta 4,389 28 |
25. | Ford Mustang 4,187 21 |
29. | Merc. Bobcat 3,829 22 |
|------------------------------|
32. | Merc. Monarch 4,516 18 |
34. | Merc. Zephyr 3,291 20 |
37. | Olds Cutlass 4,733 19 |
38. | Olds Delta 88 4,890 18 |
39. | Olds Omega 4,181 19 |
|------------------------------|
40. | Olds Starfire 4,195 24 |
42. | Plym. Arrow 4,647 28 |
43. | Plym. Champ 4,425 34 |
44. | Plym. Horizon 4,482 25 |
45. | Plym. Sapporo 6,486 26 |
|------------------------------|
46. | Plym. Volare 4,060 18 |
48. | Pont. Firebird 4,934 18 |
50. | Pont. Le Mans 4,723 19 |
51. | Pont. Phoenix 4,424 19 |
52. | Pont. Sunbird 4,172 24 |
|------------------------------|
57. | Datsun 210 4,589 35 |
60. | Fiat Strada 4,296 21 |
62. | Honda Civic 4,499 28 |
63. | Mazda GLC 3,995 30 |
65. | Renault Le Car 3,895 26 |
|------------------------------|
66. | Subaru 3,798 35 |
68. | Toyota Corolla 3,748 31 |
71. | VW Diesel 5,397 41 |
72. | VW Rabbit 4,697 25 |
+------------------------------+
auto |>
filter(mpg>25 | price<5000) |>
select(make,price,mpg)
make | price | mpg |
---|---|---|
AMC Concord | 4099 | 22 |
AMC Pacer | 4749 | 17 |
AMC Spirit | 3799 | 22 |
Buick Century | 4816 | 20 |
Buick Opel | 4453 | 26 |
Buick Skylark | 4082 | 19 |
Chev. Chevette | 3299 | 29 |
Chev. Malibu | 4504 | 22 |
Chev. Monza | 3667 | 24 |
Chev. Nova | 3955 | 19 |
Dodge Colt | 3984 | 30 |
Dodge Diplomat | 4010 | 18 |
Ford Fiesta | 4389 | 28 |
Ford Mustang | 4187 | 21 |
Merc. Bobcat | 3829 | 22 |
Merc. Monarch | 4516 | 18 |
Merc. Zephyr | 3291 | 20 |
Olds Cutlass | 4733 | 19 |
Olds Delta 88 | 4890 | 18 |
Olds Omega | 4181 | 19 |
Olds Starfire | 4195 | 24 |
Plym. Arrow | 4647 | 28 |
Plym. Champ | 4425 | 34 |
Plym. Horizon | 4482 | 25 |
Plym. Sapporo | 6486 | 26 |
Plym. Volare | 4060 | 18 |
Pont. Firebird | 4934 | 18 |
Pont. Le Mans | 4723 | 19 |
Pont. Phoenix | 4424 | 19 |
Pont. Sunbird | 4172 | 24 |
Datsun 210 | 4589 | 35 |
Fiat Strada | 4296 | 21 |
Honda Civic | 4499 | 28 |
Mazda GLC | 3995 | 30 |
Renault Le Car | 3895 | 26 |
Subaru | 3798 | 35 |
Toyota Corolla | 3748 | 31 |
VW Diesel | 5397 | 41 |
VW Rabbit | 4697 | 25 |
This shows you cars that get more than 25 miles per gallon or cost less than $5000. A car only needs to meet one of the two conditions to be shown (meeting both is fine too). In set theory terms it is the union of the two sets.
All the conditions to be combined must be complete. If you wanted to list the cars that have a 1 or a 2 for rep78 you should not use:
list make rep78 if rep78==1 | 2 What this does and why is left to the reader, but it’s not what you want. Instead, you should use:
| make rep78 |
|---------------------------|
12. | Cad. Eldorado 2 |
17. | Chev. Monte Carlo 2 |
18. | Chev. Monza 2 |
21. | Dodge Diplomat 2 |
22. | Dodge Magnum 2 |
|---------------------------|
23. | Dodge St. Regis 2 |
40. | Olds Starfire 1 |
46. | Plym. Volare 2 |
48. | Pont. Firebird 1 |
52. | Pont. Sunbird 2 |
+---------------------------+
Options change how a command works. They go after any variable list or if condition, following a comma. The comma means “everything after this is options” so you only type one comma no matter how many options you’re using.
| make foreign |
|-----------------------------|
1. | AMC Concord 0 |
2. | AMC Pacer 0 |
3. | AMC Spirit 0 |
4. | Buick Century 0 |
5. | Buick Electra 0 |
|-----------------------------|
6. | Buick LeSabre 0 |
7. | Buick Opel 0 |
8. | Buick Regal 0 |
9. | Buick Riviera 0 |
10. | Buick Skylark 0 |
|-----------------------------|
11. | Cad. Deville 0 |
12. | Cad. Eldorado 0 |
13. | Cad. Seville 0 |
14. | Chev. Chevette 0 |
15. | Chev. Impala 0 |
|-----------------------------|
16. | Chev. Malibu 0 |
17. | Chev. Monte Carlo 0 |
18. | Chev. Monza 0 |
19. | Chev. Nova 0 |
20. | Dodge Colt 0 |
|-----------------------------|
21. | Dodge Diplomat 0 |
22. | Dodge Magnum 0 |
23. | Dodge St. Regis 0 |
24. | Ford Fiesta 0 |
25. | Ford Mustang 0 |
|-----------------------------|
26. | Linc. Continental 0 |
27. | Linc. Mark V 0 |
28. | Linc. Versailles 0 |
29. | Merc. Bobcat 0 |
30. | Merc. Cougar 0 |
|-----------------------------|
31. | Merc. Marquis 0 |
32. | Merc. Monarch 0 |
33. | Merc. XR-7 0 |
34. | Merc. Zephyr 0 |
35. | Olds 98 0 |
|-----------------------------|
36. | Olds Cutl Supr 0 |
37. | Olds Cutlass 0 |
38. | Olds Delta 88 0 |
39. | Olds Omega 0 |
40. | Olds Starfire 0 |
|-----------------------------|
41. | Olds Toronado 0 |
42. | Plym. Arrow 0 |
43. | Plym. Champ 0 |
44. | Plym. Horizon 0 |
45. | Plym. Sapporo 0 |
|-----------------------------|
46. | Plym. Volare 0 |
47. | Pont. Catalina 0 |
48. | Pont. Firebird 0 |
49. | Pont. Grand Prix 0 |
50. | Pont. Le Mans 0 |
|-----------------------------|
51. | Pont. Phoenix 0 |
52. | Pont. Sunbird 0 |
53. | Audi 5000 1 |
54. | Audi Fox 1 |
55. | BMW 320i 1 |
|-----------------------------|
56. | Datsun 200 1 |
57. | Datsun 210 1 |
58. | Datsun 510 1 |
59. | Datsun 810 1 |
60. | Fiat Strada 1 |
|-----------------------------|
61. | Honda Accord 1 |
62. | Honda Civic 1 |
63. | Mazda GLC 1 |
64. | Peugeot 604 1 |
65. | Renault Le Car 1 |
|-----------------------------|
66. | Subaru 1 |
67. | Toyota Celica 1 |
68. | Toyota Corolla 1 |
69. | Toyota Corona 1 |
70. | VW Dasher 1 |
|-----------------------------|
71. | VW Diesel 1 |
72. | VW Rabbit 1 |
73. | VW Scirocco 1 |
74. | Volvo 260 1 |
+-----------------------------+
Options must always be one word. Here the words “no” and “label” are combined because otherwise Stata would think they were two different options.
Note that browse has very few options (nolabel is one of them). If you’ve been replacing list with browse in your code, stick with list for the rest of the chapter.
Many options require additional information, such as a number or a variable they apply to. This additional information goes in parentheses directly after the option name. The string() option tells the list command to truncate string variables after a given number of characters, with the number going in the parentheses:
| make |
|---------|
1. | AMC C.. |
2. | AMC P.. |
3. | AMC S.. |
4. | Buick.. |
5. | Buick.. |
|---------|
6. | Buick.. |
7. | Buick.. |
8. | Buick.. |
9. | Buick.. |
10. | Buick.. |
|---------|
11. | Cad. .. |
12. | Cad. .. |
13. | Cad. .. |
14. | Chev... |
15. | Chev... |
|---------|
16. | Chev... |
17. | Chev... |
18. | Chev... |
19. | Chev... |
20. | Dodge.. |
|---------|
21. | Dodge.. |
22. | Dodge.. |
23. | Dodge.. |
24. | Ford .. |
25. | Ford .. |
|---------|
26. | Linc... |
27. | Linc... |
28. | Linc... |
29. | Merc... |
30. | Merc... |
|---------|
31. | Merc... |
32. | Merc... |
33. | Merc... |
34. | Merc... |
35. | Olds 98 |
|---------|
36. | Olds .. |
37. | Olds .. |
38. | Olds .. |
39. | Olds .. |
40. | Olds .. |
|---------|
41. | Olds .. |
42. | Plym... |
43. | Plym... |
44. | Plym... |
45. | Plym... |
|---------|
46. | Plym... |
47. | Pont... |
48. | Pont... |
49. | Pont... |
50. | Pont... |
|---------|
51. | Pont... |
52. | Pont... |
53. | Audi .. |
54. | Audi .. |
55. | BMW 3.. |
|---------|
56. | Datsu.. |
57. | Datsu.. |
58. | Datsu.. |
59. | Datsu.. |
60. | Fiat .. |
|---------|
61. | Honda.. |
62. | Honda.. |
63. | Mazda.. |
64. | Peuge.. |
65. | Renau.. |
|---------|
66. | Subaru |
67. | Toyot.. |
68. | Toyot.. |
69. | Toyot.. |
70. | VW Da.. |
|---------|
71. | VW Di.. |
72. | VW Ra.. |
73. | VW Sc.. |
74. | Volvo.. |
+---------+
By groups allows you to execute a command separately for subgroups within your data. Try:
-> foreign = Domestic
+-------------------+
| make |
|-------------------|
1. | AMC Concord |
2. | AMC Pacer |
3. | AMC Spirit |
4. | Buick Century |
5. | Buick Electra |
|-------------------|
6. | Buick LeSabre |
7. | Buick Opel |
8. | Buick Regal |
9. | Buick Riviera |
10. | Buick Skylark |
|-------------------|
11. | Cad. Deville |
12. | Cad. Eldorado |
13. | Cad. Seville |
14. | Chev. Chevette |
15. | Chev. Impala |
|-------------------|
16. | Chev. Malibu |
17. | Chev. Monte Carlo |
18. | Chev. Monza |
19. | Chev. Nova |
20. | Dodge Colt |
|-------------------|
21. | Dodge Diplomat |
22. | Dodge Magnum |
23. | Dodge St. Regis |
24. | Ford Fiesta |
25. | Ford Mustang |
|-------------------|
26. | Linc. Continental |
27. | Linc. Mark V |
28. | Linc. Versailles |
29. | Merc. Bobcat |
30. | Merc. Cougar |
|-------------------|
31. | Merc. Marquis |
32. | Merc. Monarch |
33. | Merc. XR-7 |
34. | Merc. Zephyr |
35. | Olds 98 |
|-------------------|
36. | Olds Cutl Supr |
37. | Olds Cutlass |
38. | Olds Delta 88 |
39. | Olds Omega |
40. | Olds Starfire |
|-------------------|
41. | Olds Toronado |
42. | Plym. Arrow |
43. | Plym. Champ |
44. | Plym. Horizon |
45. | Plym. Sapporo |
|-------------------|
46. | Plym. Volare |
47. | Pont. Catalina |
48. | Pont. Firebird |
49. | Pont. Grand Prix |
50. | Pont. Le Mans |
|-------------------|
51. | Pont. Phoenix |
52. | Pont. Sunbird |
+-------------------+
-------------------------------------------------------------------------------
-> foreign = Foreign
+----------------+
| make |
|----------------|
1. | Audi 5000 |
2. | Audi Fox |
3. | BMW 320i |
4. | Datsun 200 |
5. | Datsun 210 |
|----------------|
6. | Datsun 510 |
7. | Datsun 810 |
8. | Fiat Strada |
9. | Honda Accord |
10. | Honda Civic |
|----------------|
11. | Mazda GLC |
12. | Peugeot 604 |
13. | Renault Le Car |
14. | Subaru |
15. | Toyota Celica |
|----------------|
16. | Toyota Corolla |
17. | Toyota Corona |
18. | VW Dasher |
19. | VW Diesel |
20. | VW Rabbit |
|----------------|
21. | VW Scirocco |
22. | Volvo 260 |
+----------------+
Identify the unique values of foreign (in this case, 0 and 1 or “Domestic” and “Foreign”) Temporarily split the data set into groups based on their value of foreign Run the subsequent command (list make) separately for each group You’ll see how powerful by is later.
In order for by to work, the data must be sorted by the same variable. You can do that with the sort command:
-> rep78 = 1
+----------------+
| make |
|----------------|
1. | Pont. Firebird |
2. | Olds Starfire |
+----------------+
-------------------------------------------------------------------------------
-> rep78 = 2
+-------------------+
| make |
|-------------------|
1. | Plym. Volare |
2. | Pont. Sunbird |
3. | Dodge Diplomat |
4. | Cad. Eldorado |
5. | Dodge St. Regis |
|-------------------|
6. | Chev. Monza |
7. | Dodge Magnum |
8. | Chev. Monte Carlo |
+-------------------+
-------------------------------------------------------------------------------
-> rep78 = 3
+-------------------+
| make |
|-------------------|
1. | Pont. Grand Prix |
2. | Olds Toronado |
3. | Olds Cutl Supr |
4. | Ford Mustang |
5. | Buick Regal |
|-------------------|
6. | AMC Pacer |
7. | AMC Concord |
8. | Buick Century |
9. | Olds Cutlass |
10. | Linc. Continental |
|-------------------|
11. | Buick LeSabre |
12. | Buick Riviera |
13. | Pont. Le Mans |
14. | Linc. Versailles |
15. | Merc. Zephyr |
|-------------------|
16. | Merc. Monarch |
17. | Cad. Deville |
18. | Merc. Marquis |
19. | Renault Le Car |
20. | Linc. Mark V |
|-------------------|
21. | Chev. Malibu |
22. | Cad. Seville |
23. | Plym. Arrow |
24. | Fiat Strada |
25. | Buick Skylark |
|-------------------|
26. | Chev. Chevette |
27. | Chev. Nova |
28. | Audi Fox |
29. | Olds Omega |
30. | Plym. Horizon |
+-------------------+
-------------------------------------------------------------------------------
-> rep78 = 4
+----------------+
| make |
|----------------|
1. | Olds Delta 88 |
2. | Datsun 200 |
3. | VW Dasher |
4. | Honda Civic |
5. | Ford Fiesta |
|----------------|
6. | Datsun 810 |
7. | Buick Electra |
8. | Merc. Cougar |
9. | Merc. XR-7 |
10. | VW Scirocco |
|----------------|
11. | Olds 98 |
12. | Pont. Catalina |
13. | BMW 320i |
14. | VW Rabbit |
15. | Chev. Impala |
|----------------|
16. | Mazda GLC |
17. | Datsun 510 |
18. | Merc. Bobcat |
+----------------+
-------------------------------------------------------------------------------
-> rep78 = 5
+----------------+
| make |
|----------------|
1. | Audi 5000 |
2. | Subaru |
3. | Volvo 260 |
4. | Dodge Colt |
5. | Toyota Corona |
|----------------|
6. | Honda Accord |
7. | VW Diesel |
8. | Datsun 210 |
9. | Plym. Champ |
10. | Toyota Corolla |
|----------------|
11. | Toyota Celica |
+----------------+
-------------------------------------------------------------------------------
-> rep78 = .
+---------------+
| make |
|---------------|
1. | Buick Opel |
2. | Pont. Phoenix |
3. | Plym. Sapporo |
4. | AMC Spirit |
5. | Peugeot 604 |
+---------------+
summarize (or just sum) gives you summary statistics which will help you understand the distribution of continuous (quantitative) variables. Start by adding sum all by itself to your do file and runing it by pressing Ctrl-d or clicking the “play” button in the top right of your Stata window, then take a look at the output:
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
make | 0
price | 74 6165.257 2949.496 3291 15906
mpg | 74 21.2973 5.785503 12 41
rep78 | 69 3.405797 .9899323 1 5
headroom | 74 2.993243 .8459948 1.5 5
-------------+---------------------------------------------------------
trunk | 74 13.75676 4.277404 5 23
weight | 74 3019.459 777.1936 1760 4840
length | 74 187.9324 22.26634 142 233
turn | 74 39.64865 4.399354 31 51
displacement | 74 197.2973 91.83722 79 425
-------------+---------------------------------------------------------
gear_ratio | 74 3.014865 .4562871 2.19 3.89
foreign | 74 .2972973 .4601885 0 1
skimr::skim(auto)
Name | auto |
Number of rows | 74 |
Number of columns | 12 |
_______________________ | |
Column type frequency: | |
character | 1 |
numeric | 11 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
make | 0 | 1 | 6 | 17 | 0 | 74 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
price | 0 | 1.00 | 6165.26 | 2949.50 | 3291.00 | 4220.25 | 5006.50 | 6332.25 | 15906.00 | ▇▂▁▁▁ |
mpg | 0 | 1.00 | 21.30 | 5.79 | 12.00 | 18.00 | 20.00 | 24.75 | 41.00 | ▅▇▃▂▁ |
rep78 | 5 | 0.93 | 3.41 | 0.99 | 1.00 | 3.00 | 3.00 | 4.00 | 5.00 | ▁▂▇▅▃ |
headroom | 0 | 1.00 | 2.99 | 0.85 | 1.50 | 2.50 | 3.00 | 3.50 | 5.00 | ▅▃▇▃▂ |
trunk | 0 | 1.00 | 13.76 | 4.28 | 5.00 | 10.25 | 14.00 | 16.75 | 23.00 | ▃▇▅▇▃ |
weight | 0 | 1.00 | 3019.46 | 777.19 | 1760.00 | 2250.00 | 3190.00 | 3600.00 | 4840.00 | ▇▅▇▅▂ |
length | 0 | 1.00 | 187.93 | 22.27 | 142.00 | 170.00 | 192.50 | 203.75 | 233.00 | ▃▇▅▇▃ |
turn | 0 | 1.00 | 39.65 | 4.40 | 31.00 | 36.00 | 40.00 | 43.00 | 51.00 | ▅▅▇▂▁ |
displacement | 0 | 1.00 | 197.30 | 91.84 | 79.00 | 119.00 | 196.00 | 245.25 | 425.00 | ▇▂▅▃▁ |
gear_ratio | 0 | 1.00 | 3.01 | 0.46 | 2.19 | 2.73 | 2.96 | 3.35 | 3.89 | ▅▅▇▂▅ |
foreign | 0 | 1.00 | 0.30 | 0.46 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
This gives basic summary statistics for all the variables in your data set. Note that there is nothing for make: it is a string variable so summary statistics don’t make sense. Also note that for rep78 the number of observations is 69 rather than 74. That’s because five missing values were ignored and the summary statistics calculated over the remaining 69 values of rep78. Most statistical commands take a similar approach to missing values and that’s usually what you want, so you rarely have to include special handing for missing values in statistical commands.
All the syntax elements you learned earlier also work with statistical commands. To get summary statistics for just mpg, give sum a variable list:
skimr::skim(auto |> select(mpg))
Name | select(auto, mpg) |
Number of rows | 74 |
Number of columns | 1 |
_______________________ | |
Column type frequency: | |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
mpg | 0 | 1 | 21.3 | 5.79 | 12 | 18 | 20 | 24.75 | 41 | ▅▇▃▂▁ |
If you want summary statistics for just the foreign cars, add an if condition:
Name | select(filter(auto, forei… |
Number of rows | 22 |
Number of columns | 1 |
_______________________ | |
Column type frequency: | |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
mpg | 0 | 1 | 24.77 | 6.61 | 14 | 21 | 24.5 | 27.5 | 41 | ▆▇▇▃▁ |
If you want summary statistics of mpg for both foreign and domestic cars, calculated separately, use by:
-> foreign = Domestic
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
mpg | 52 19.82692 4.743297 12 34
-------------------------------------------------------------------------------
-> foreign = Foreign
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
mpg | 22 24.77273 6.611187 14 41
The detail (d) option will give more information. Try:
Mileage (mpg)
-------------------------------------------------------------
Percentiles Smallest
1% 12 12
5% 14 12
10% 14 14 Obs 74
25% 18 14 Sum of wgt. 74
50% 20 Mean 21.2973
Largest Std. dev. 5.785503
75% 25 34
90% 29 35 Variance 33.47205
95% 34 35 Skewness .9487176
99% 41 41 Kurtosis 3.975005
tabulate (tab) will create tables of frequencies, which will help you understand the distribution of categorical variables. It can also be useful for string variables that describe categories or groups.
If you give tab a variable list with one variable it will give you a one-way table, while if you give it two variables it will give you a two-way table (i.e. crosstabs). To get an idea of what tab does, add the following to your do file and run it:
janitor::tabyl(auto$rep78)
auto$rep78 | n | percent | valid_percent |
---|---|---|---|
1 | 2 | 0.0270270 | 0.0289855 |
2 | 8 | 0.1081081 | 0.1159420 |
3 | 30 | 0.4054054 | 0.4347826 |
4 | 18 | 0.2432432 | 0.2608696 |
5 | 11 | 0.1486486 | 0.1594203 |
NA | 5 | 0.0675676 | NA |
Repair |
record | Car origin
1978 | Domestic Foreign | Total
-----------+----------------------+----------
1 | 2 0 | 2
2 | 8 0 | 8
3 | 27 3 | 30
4 | 9 9 | 18
5 | 2 9 | 11
-----------+----------------------+----------
Total | 48 21 | 69
The tab command has a rich set of useful options. The missing values of rep78 were not included in the table, which makes it easy to forget they’re there. Add them with the missing option:
Repair |
record 1978 | Freq. Percent Cum.
------------+-----------------------------------
1 | 2 2.70 2.70
2 | 8 10.81 13.51
3 | 30 40.54 54.05
4 | 18 24.32 78.38
5 | 11 14.86 93.24
. | 5 6.76 100.00
------------+-----------------------------------
Total | 74 100.00
To get percentages in a two-way table add the row, column, or cell options:
| Key |
|-------------------|
| frequency |
| row percentage |
| column percentage |
| cell percentage |
+-------------------+
Repair |
record | Car origin
1978 | Domestic Foreign | Total
-----------+----------------------+----------
1 | 2 0 | 2
| 100.00 0.00 | 100.00
| 4.17 0.00 | 2.90
| 2.90 0.00 | 2.90
-----------+----------------------+----------
2 | 8 0 | 8
| 100.00 0.00 | 100.00
| 16.67 0.00 | 11.59
| 11.59 0.00 | 11.59
-----------+----------------------+----------
3 | 27 3 | 30
| 90.00 10.00 | 100.00
| 56.25 14.29 | 43.48
| 39.13 4.35 | 43.48
-----------+----------------------+----------
4 | 9 9 | 18
| 50.00 50.00 | 100.00
| 18.75 42.86 | 26.09
| 13.04 13.04 | 26.09
-----------+----------------------+----------
5 | 2 9 | 11
| 18.18 81.82 | 100.00
| 4.17 42.86 | 15.94
| 2.90 13.04 | 15.94
-----------+----------------------+----------
Total | 48 21 | 69
| 69.57 30.43 | 100.00
| 100.00 100.00 | 100.00
| 69.57 30.43 | 100.00
tab has an option called sum which gives summary statistics for a given variable, calculated over the observations in each cell of the table. Try:
There’s also a chi2 option that runs a chi-squared test on a two-way table:
chisq.test(table(auto$rep78,auto$foreign))
Pearson's Chi-squared test
data: table(auto$rep78, auto$foreign)
X-squared = 27.264, df = 4, p-value = 0.00001758
The primary commands for creating and changing variables are generate (usually abbreviated gen) and replace (which, like other commands that can destroy information, has no abbreviation). gen creates new variables; replace changes the values of existing variables. Their core syntax is identical:
gen variable = expression or
replace variable = expression
where variable is the name of the variable you want to create or change, and expression is the mathematical expression whose result you want to put in it. Expressions can be as simple as a single number or involve all sorts of complicated functions. You can explore what functions are available by typing help functions. If the expression depends on a missing value at any point, the result is missing. Usually this is exactly what you’d expect and want.
The prices in the auto data set are in 1978 dollars, so it might be useful to convert them to January 2024 dollars. To do so you need to multiply the prices by a conversion factor which is the Consumer Price Index in January 2024 divided by the Consumer Price Index in 1978, or about 5. The code will be:
Add this line to your do file, run it, and examine the results with:
If a gen command has an if condition, the resulting variable will (and must) still exist for all observations. However it will be assigned a missing value for observations where the if condition is not true. If a replace command has an if condition, observations where the if condition is not true will be left unchanged. This allows you to set variables to different values for different groups of observations.
Suppose you wanted to collapse the five-point scale of the rep78 variable into a three-point scale. The first step is to lay out exactly how you want to do that in your native language, because if it’s not clear to you you’ll never be able to explain it to Stata. We’ll declare that cars with a rep78 of one or two will get a one for the new variable rep3, cars with a three for rep78 will get a two, and cars with a four or five will get a three.
You can implement that with:
The recode command gives you an alternative way of creating rep3. It is designed solely for recoding tasks and is much less flexible than gen and replace. But it’s very easy to use. The syntax is:
recode var (rule 1) (rule 2) (more rules as needed...), gen(newvar)
The gen option at the end is not required—if it’s not there then the original variable will be changed rather than Stata creating a new variable containing the new values. You can also have recode work on a list of variables, recoding them all in the same way.
The core of the recode command is a list of rules, each in parentheses, that tell it how a variable is to be recoded. They take the form (input_value = output_value). The input_value can be a single number, a list of numbers separated by spaces, or a range of numbers specified with start/end. The output_value will always be a single number. Anything not covered by a rule is left unchanged, so you can use recode to change just a few values of a variable or completely redefine it as we do here.
In creating indicator variables you can take advantage of the fact that Stata treats true as one and false as zero by setting the new variable equal to a condition. Consider:
(The parentheses are optional, but make the command easier to read.) This creates an indicator variable called low_mpg which is one (true) for cars where mpg is less than twenty and zero (false) where mpg is greater than or equal to twenty. To see the results run:
-> low_mpg = 0
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
mpg | 39 25.4359 4.80567 20 41
-------------------------------------------------------------------------------
-> low_mpg = 1
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
mpg | 35 16.68571 2.12508 12 19
You can create and change string variables with gen and replace just like numeric variables. One difference is that string values go in quotes; another is that for a string variable missing is ““, i.e. a string that contains nothing. For example:
(64 missing values generated)
(64 real changes made)
| Summary of Price
cost | Mean Std. dev. Freq.
------------+------------------------------------
High | 12,607.6 1,808.02 10
Low | 5,158.641 1,412.852 64
------------+------------------------------------
Total | 6,165.257 2,949.496 74
You can’t do math with strings, but there are a variety of useful functions for working with them. One of them is strpos(). Given two strings (string values or string variables), it will return the position of the second string within the first string, or a zero if the first string does not contain the second string. This makes it very useful in if conditions. For example, you can select all Volkswagen cars with:
auto |>
filter(str_detect(make ,"VW")) |>
select(make)
make |
---|
VW Dasher |
VW Diesel |
VW Rabbit |
VW Scirocco |
Remember, in an if condition zero is false and anything else is true, so this is equivalent to:
| make |
|-------------|
70. | VW Dasher |
71. | VW Diesel |
72. | VW Rabbit |
73. | VW Scirocco |
+-------------+
Another useful function is word(). Given a string and a number n, word will return the nth word in the string. For example, you can make a variable containing the manufacturer of each car with:
Good labels make your data much easier to understand and work with. While Stata has many kinds of labels, we’ll focus on the most common and most useful: variable labels and value labels.
Variable labels convey information about a variable, and can be a substitute for long variable names. This data set already has a good set of variable labels, as you can see in the Variables window, but let’s make the label on price more specific. The syntax to set a variable label is:
label variable variable_name “label” So type:
You can use the describe command to get information about a variable, including its variable label:
Value labels are used with categorical variables to tell you what the categories mean. We’ve seen one in action with the foreign variable: it was the value labels that told us that a zero means “Domestic” and a one means “Foreign.”
Let’s explore value labels by labeling the values of rep3, the new variable we recoded to collapse rep78 from a five point scale to a three point scale. Value labels are a mapping from a set of integers to a set of text descriptions, so the first step is to define the map. To do so, use the label define command:
label define map_name value1 "label1" value2 "label2"...
Thus:
This creates a mapping called rep_label but does not apply it to anything. Before it does anything useful you have to tell Stata to label the values of the rep3 variable using the rep_label mapping you just defined. The syntax is:
label values variable map
And thus:
To see the results, run:
rep3 | Freq. Percent Cum.
------------+-----------------------------------
Bad | 10 14.49 14.49
Average | 30 43.48 57.97
Good | 29 42.03 100.00
------------+-----------------------------------
Total | 69 100.00
Once a map is defined you can apply it to any number of variables: just replace the single variable in the label values command above with a list of variables. Suppose you’re working with survey data and your variables include the gender of the respondent, the gender of the respondent’s spouse, and the genders of all the respondent’s children. You could define just one map called gender and then use it to label the values of all the gender variables.
Three commands for managing value labels: label dir gives you a list of all the defined labels, and label list tells you what they mean. The describe command we used earlier to see the variable label also tells you the name of the value label associated with the variable (and other useful things).
When you use recode to create a new variable, Stata will automatically create a variable label for it (“RECODE of …”). You can also define value labels for it by putting the desired label for each value at the end of the rule that defines it. Create yet another version of rep3, this time with labels right from its creation, with:
(67 differences between rep78 and rep3c)
we can use import delimited to import the data from CS1policies.csv
(encoding automatically selected: ISO-8859-1)
(4 vars, 1,000 obs)
v1
-------------------------------------------------------------
Percentiles Smallest
1% 10.5 1
5% 50.5 2
10% 100.5 3 Obs 1,000
25% 250.5 4 Sum of wgt. 1,000
50% 500.5 Mean 500.5
Largest Std. dev. 288.8194
75% 750.5 997
90% 900.5 998 Variance 83416.67
95% 950.5 999 Skewness 0
99% 990.5 1000 Kurtosis 1.799998
age
-------------------------------------------------------------
Percentiles Smallest
1% 30 30
5% 31 30
10% 33 30 Obs 1,000
25% 37 30 Sum of wgt. 1,000
50% 44.5 Mean 44.602
Largest Std. dev. 8.459501
75% 52 60
90% 56 60 Variance 71.56316
95% 58 60 Skewness .0442793
99% 60 60 Kurtosis 1.844854
duration
-------------------------------------------------------------
Percentiles Smallest
1% 12.8 12
5% 16.95 12.1
10% 23.4 12.1 Obs 1,000
25% 39.9 12.1 Sum of wgt. 1,000
50% 67.1 Mean 66.7856
Largest Std. dev. 31.45082
75% 93.95 119.8
90% 111.5 119.9 Variance 989.154
95% 115.45 119.9 Skewness -.0036626
99% 118.95 119.9 Kurtosis 1.813458
claimed
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 1,000
25% 0 0 Sum of wgt. 1,000
50% 0 Mean .222
Largest Std. dev. .4157991
75% 0 1
90% 1 1 Variance .1728889
95% 1 1 Skewness 1.337853
99% 1 1 Kurtosis 2.789852
We can type describe to view the contents of the data in memory.
Contains data
Observations: 1,000
Variables: 4
-------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------
v1 int %8.0g
age byte %8.0g
duration float %9.0g
claimed byte %8.0g
-------------------------------------------------------------------------------
Sorted by:
Note: Dataset has changed since last saved.
Codebook gives detailed information for certain variables.
Variable Obs Unique Mean Min Max Label
-------------------------------------------------------------------------------
v1 1000 1000 500.5 1 1000
age 1000 31 44.602 30 60
duration 1000 641 66.7856 12 119.9
claimed 1000 2 .222 0 1
-------------------------------------------------------------------------------
Next we can tabulate the variable ``
claimed | Freq. Percent Cum.
------------+-----------------------------------
0 | 778 77.80 77.80
1 | 222 22.20 100.00
------------+-----------------------------------
Total | 1,000 100.00
Or
| Frequency
--------+-----------
claimed |
0 | 778
1 | 222
Total | 1,000
--------------------
Or
tab1
is used when we have multiple variables to be tabulisedto create a table containing mean values for various numerical variables with respect to a categorical variable we use
table
| duration age
--------+---------------------
claimed |
0 | 72.20386 44.22237
1 | 47.7973 45.93243
Total | 66.7856 44.602
------------------------------
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
age | 1,000 44.602 8.459501 30 60
duration | 1,000 66.7856 31.45082 12 119.9
| v1 age duration claimed |
|-------------------------------|
1. | 1 55 24.3 1 |
2. | 2 45 79.2 0 |
3. | 3 33 77.8 0 |
4. | 4 41 79.3 0 |
5. | 5 53 105 0 |
|-------------------------------|
6. | 6 42 81.2 0 |
7. | 7 44 13 1 |
8. | 8 33 37.1 0 |
9. | 9 59 83.9 0 |
10. | 10 60 67.5 0 |
+-------------------------------+
| v1 age duration claimed |
|-------------------------------|
1. | 1 55 24.3 1 |
2. | 2 45 79.2 0 |
3. | 3 33 77.8 0 |
4. | 4 41 79.3 0 |
5. | 5 53 105 0 |
|-------------------------------|
6. | 6 42 81.2 0 |
7. | 7 44 13 1 |
8. | 8 33 37.1 0 |
9. | 9 59 83.9 0 |
10. | 10 60 67.5 0 |
+-------------------------------+
/* Show first 10 observations of the */
/* first three variables of our data */
list age-duration in 1/10
| age duration |
|----------------|
1. | 55 24.3 |
2. | 45 79.2 |
3. | 33 77.8 |
4. | 41 79.3 |
5. | 53 105 |
|----------------|
6. | 42 81.2 |
7. | 44 13 |
8. | 33 37.1 |
9. | 59 83.9 |
10. | 60 67.5 |
+----------------+
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
v1 | 1,000 500.5 288.8194 1 1000
age | 1,000 44.602 8.459501 30 60
duration | 1,000 66.7856 31.45082 12 119.9
claimed | 1,000 .222 .4157991 0 1
logAge | 1,000 3.779292 .1939759 3.401197 4.094345
We’ll start with data visualizations that show the distribution of a variable.
A histogram will tell you more about the distribution of a continuous variable than summary statistics, and they’re easy to make. Just run:
Stata likes to think of a histogram as an empirical approximation to a probabilithy distribution function, but to get the kind of histogram you learned about in elementary school where the height of the bar is proportional to the number of observation in the bin, add the freq option:
*title("")*
- allows you to give meaningful titles to your barplot*subtitle("")*
- allows for a subtitle below the main title*ytitle("")*
- meaningful title on y-axis*note("")*
- footnotes on graphhistogram age , frequency normal title("Histogram of Age") note("author : Bongani Ncube") ytitle("Age of ....")
(bin=29, start=30, width=1.0344828)
With categorical variables you’re interested in the frequencies. A bar graph won’t show you any more information than you’ll get by using tab to make a frequency table, but you can get a basic understanding of it in a single glance.
With the graph bar command you use the over() option to tell it the variable that defines the bars:
If you have labels for the bars (and you should) there’s a good chance the labels will overlap. This problem is goes away immediately if you use horizontal bars, made with graph hbar:
By default, graph hbar (or graph bar) calculates the percentage of observations in each category. You can change that to frequencies by telling it you want to graph the (count):
You can label the bars with those counts using the blabel(bar) option:
Now it really contains all the same information as a frequency table.
Bar graphs can get tricky. For more on how to create them and make them look presentable, see Bar Graphs in Stata.
Data visualizations are also a great tool for understanding the relationships between two or more variables.
If you have a categorical variable and a continuous variable, one measure of the relationship between them is how the mean of the continous variable varies across categories. graph hbar can do that with (mean) and then the name of the continuous variable:
This can tell you there’s a difference between the categories, but doesn’t tell you much about how the distribution of the continuous variable varies between them. A box plot will tell you more:
Working from the center out: the line in the middle of the box is the median, and the top and bottom of the box is the 75th and 25th percentile respectively. The “whiskers” outside the box go to the upper adjacent and lower adjacent values. (To find the upper/lower adjacent value, take the 75th/25th percentile, add/subtract 1.5 times the difference between the 75th and 25th percentile, and find the largest/smallest value below/above that number.) Observations outside of the whiskers get their own dot.
host not found
http://fmwww.bc.edu/repec/bocode/v/ either
1) is not a valid URL, or
2) could not be contacted, or
3) is not a Stata download site (has no stata.toc file).
r(631);
r(631);
A simple alternative is to create a histogram for each category. You can do this by adding the by() option to the hist command. This is conceptually similar to the by prefix:
Plotting relationships between two categorical variables can be fun, but it gets complicated. See Bar Graphs in Stata.
The classic plot for exploring the relationship between two continous variables is a scatterplot, easily created with scatter:
You can add a third variable to the mix by having it determine the color of the points in a scatterplot. As of Stata 18, this can be done with the colorvar option. By default, Stata will treat the color variable as continuous:
displacement is a measure of the size of a car’s engine. This plot shows us that there is a strong relationship between a car’s gas mileage, its weight, and the size of its engine.
It’s more common to use color to represent a categorical variable. You can tell Stata that the color variable is categorical with the colordiscrete option, but unfortunately you also have to tell it to use a sensible legend for categorical variables with coloruseplegend (don’t ask why it has that name) and to use the value labels in the legend with zlabel(, val).
This shows that foreign cars were generally smaller than domestic cars in 1978, but they frequently have lower gas mileage than domestic cars of comparable size.
A scatterplot is an example of what Stata calls a twoway plot: a plot with a y and x axis. You can combine twoway plots by putting || (two pipe characters) between them. Each additional plot will go on top of the previous plots, like a coat of paint.
An lfit plot plots a linear fit (i.e. a univariate regression) of two variables. (There’s also qfit for quadratic fit, i.e. with a squared term.) Layer it over the scatterplot with:
lfit doesn’t do colorvar(), but you can layer two different lfit plots, one for each subset of the data. This is a good time to use /// to continue a command on the next line:
You’d obviously need to change the labels in the legend before you’d show this to anyone else, and that’s not something we’ll go into. But it’s perfectly adequate to help you understand the relationships between mpg, weight, and foreign.
job (unlabeled)
-------------------------------------------------------------------------------
Type: Numeric (byte)
Range: [1,2] Units: 1
Unique values: 2 Missing .: 0/125
Tabulation: Freq. Value
59 1
66 2
-------------------------------------------------------------------------------
age (unlabeled)
-------------------------------------------------------------------------------
Type: Numeric (byte)
Range: [21,59] Units: 1
Unique values: 36 Missing .: 0/125
Mean: 38.12
Std. dev.: 10.1247
Percentiles: 10% 25% 50% 75% 90%
27 30 37 47 53
-------------------------------------------------------------------------------
ht (unlabeled)
-------------------------------------------------------------------------------
Type: Numeric (float)
Range: [1.52,1.91] Units: .01
Unique values: 14 Missing .: 0/125
Mean: 1.64216
Std. dev.: .06203
Percentiles: 10% 25% 50% 75% 90%
1.57 1.6 1.63 1.68 1.7
-------------------------------------------------------------------------------
wt (unlabeled)
-------------------------------------------------------------------------------
Type: Numeric (float)
Range: [45.9,97.3] Units: .1
Unique values: 66 Missing .: 0/125
Mean: 65.58
Std. dev.: 10.009
Percentiles: 10% 25% 50% 75% 90%
52.3 57.3 65.9 72.7 78.2
-------------------------------------------------------------------------------
triglyc (unlabeled)
-------------------------------------------------------------------------------
Type: Numeric (int)
Range: [63,484] Units: 1
Unique values: 99 Missing .: 0/125
Mean: 197.648
Std. dev.: 84.7462
Percentiles: 10% 25% 50% 75% 90%
98 123 195 255 316
-------------------------------------------------------------------------------
sbp (unlabeled)
-------------------------------------------------------------------------------
Type: Numeric (int)
Range: [100,200] Units: 1
Unique values: 15 Missing .: 0/125
Mean: 128.424
Std. dev.: 18.4929
Percentiles: 10% 25% 50% 75% 90%
110 120 120 136 150
Contains data from bus.dta
Observations: 125
Variables: 6 26 Jan 2019 12:06
-------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------
job byte %9.0g
age byte %9.0g
ht float %9.0g
wt float %9.0g
triglyc int %9.0g
sbp int %9.0g
-------------------------------------------------------------------------------
Sorted by:
job | Freq. Percent Cum.
------------+-----------------------------------
Driver | 59 47.20 47.20
Conductor | 66 52.80 100.00
------------+-----------------------------------
Total | 125 100.00
checking table1 consistency and verifying not already installed...
installing into C:\Users\Admin\ado\plus\...
installation complete.
checking summtab consistency and verifying not already installed...
installing into C:\Users\Admin\ado\plus\...
installation complete.
(125 missing values generated)
(31 real changes made)
(94 real changes made)
What is the code doing?
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------
hypertension float %9.0g
hypertension
-> job = Driver
age
-------------------------------------------------------------
Percentiles Smallest
1% 25 25
5% 27 27
10% 29 27 Obs 59
25% 33 28 Sum of wgt. 59
50% 41 Mean 41.40678
Largest Std. dev. 9.399411
75% 48 58
90% 57 58 Variance 88.34892
95% 58 59 Skewness .1813159
99% 59 59 Kurtosis 2.056059
-------------------------------------------------------------------------------
-> job = Conductor
age
-------------------------------------------------------------
Percentiles Smallest
1% 21 21
5% 22 22
10% 25 22 Obs 66
25% 29 22 Sum of wgt. 66
50% 31 Mean 35.18182
Largest Std. dev. 9.907121
75% 41 54
90% 51 54 Variance 98.15105
95% 54 57 Skewness .7353005
99% 59 59 Kurtosis 2.477688
| Factor Level Value |
|------------------------------------------|
| N 125 |
|------------------------------------------|
| age, mean (SD) 38.1 (10.1) |
|------------------------------------------|
| job Driver 59 (47.2%) |
| Conductor 66 (52.8%) |
|------------------------------------------|
| ht, mean (SD) 1.6 (0.1) |
+------------------------------------------+
file /BusTable1.xls saved
summtab , by(hypertension) catvars(job)
contvars(age ht wt) word wordname(
table1_bus) median medfmt(0) total title(
"Table 1: Summary statistics by hypertension status") replace
Must specify either Word or Excel output (or both)
r(198);
r(198);
Example: Stepping Stones
Stepping Stones is a participatory HIV prevention programme that aims to improve sexual health through building more gender-equitable relationships Cluster Randomized Trial was conducted among young rural men and women in the Eastern Cape Province in South Africa to assess impact of Stepping Stones on HIV and HSV 2 incidence and sexual practices .The 70 study clusters comprised 64 villages and six townships Clusters grouped into seven strata, one stratum comprised the townships and six were villages grouped according to proximity to particular roads
Within each stratum, equal numbers of clusters were allocated to the two study arms Intervention arm in which participants were given the 13 Stepping Stones sessions over a period of three months Control arm in which participants were given a single 3 hour session on HIV prevention
In each cluster recruited about 20 men and 20 women .Those eligible were aged 16 – 24, resident in village where they were at school, and mature enough to understand the study and the consent process - most were recruited from schools In this study unit of randomisation was a cluster of 20 men or a cluster of 20 women Primary outcomes were HIV-incidence and HSV-2 incidence over the study period of approximately two years.
We want to join the two data sets to see which women have become HIV-infected (incident cases or sero-conversions), and to see whether there is any consistent pattern in the experience of IPV Women in the study each have a unique study identification number (idnum) .We can join the data sets together using the merge command as shown below.
idnum
must uniquely identify observations in each datasetidnum | visitnox | intdatex | hivx | ipvnewx |
---|---|---|---|---|
1001 | 1 | 2003-03-19 | 0 | 0 |
1002 | 1 | 2003-03-19 | 0 | 0 |
1006 | 1 | 2003-03-19 | 0 | 0 |
1008 | 1 | 2003-03-19 | 0 | 0 |
1011 | 1 | 2003-03-19 | 1 | 0 |
1012 | 1 | 2003-03-19 | 1 | 1 |
1015 | 1 | 2003-03-19 | 0 | 1 |
1016 | 1 | 2003-03-19 | 0 | 0 |
1017 | 1 | 2003-03-19 | 1 | 0 |
1018 | 1 | 2003-03-19 | 0 | 1 |
idnum | intdate | hiv | ipvnew |
---|---|---|---|
1001 | 2004-03-24 | 0 | 0 |
1002 | 2004-03-24 | 0 | 0 |
1006 | 2004-03-25 | 0 | 0 |
1008 | 2004-03-24 | 0 | 1 |
1011 | 2004-05-12 | 1 | 1 |
1012 | 2004-04-15 | NA | 0 |
1015 | 2004-03-24 | 0 | 0 |
1016 | 2004-04-15 | 0 | 1 |
1017 | 2004-04-15 | NA | 0 |
1018 | 2004-04-15 | 0 | 1 |
the unique column is
idnum
(Stepping Stones women baseline)
(Stepping Stones women 12 months)
use
using
use "Datasets/stonwombas.dta", clear
merge 1:1 idnum using "Datasets/stonwomfol.dta" , gen(stonmerge)
(Stepping Stones women baseline)
Result Number of obs
-----------------------------------------
Not matched 306
from master 306 (stonmerge==1)
from using 0 (stonmerge==2)
Matched 1,109 (stonmerge==3)
-----------------------------------------
(df_merged <- dat_bas |>
inner_join(dat_fol, by ="idnum")) |> head(10)
idnum | visitnox | intdatex | hivx | ipvnewx | intdate | hiv | ipvnew |
---|---|---|---|---|---|---|---|
1001 | 1 | 2003-03-19 | 0 | 0 | 2004-03-24 | 0 | 0 |
1002 | 1 | 2003-03-19 | 0 | 0 | 2004-03-24 | 0 | 0 |
1006 | 1 | 2003-03-19 | 0 | 0 | 2004-03-25 | 0 | 0 |
1008 | 1 | 2003-03-19 | 0 | 0 | 2004-03-24 | 0 | 1 |
1011 | 1 | 2003-03-19 | 1 | 0 | 2004-05-12 | 1 | 1 |
1012 | 1 | 2003-03-19 | 1 | 1 | 2004-04-15 | NA | 0 |
1015 | 1 | 2003-03-19 | 0 | 1 | 2004-03-24 | 0 | 0 |
1016 | 1 | 2003-03-19 | 0 | 0 | 2004-04-15 | 0 | 1 |
1017 | 1 | 2003-03-19 | 1 | 0 | 2004-04-15 | NA | 0 |
1018 | 1 | 2003-03-19 | 0 | 1 | 2004-04-15 | 0 | 1 |
table(df_merged$hivx,df_merged$hiv)
0 1
0 900 65
1 1 104
The summary above shows that 1,109 individuals had their data merged, whereas 306 were not merged because they did not match. 306 were not merged from the master file while 0 were not merged from the using file.
Matching result from |
merge | Freq. Percent Cum.
------------------------+-----------------------------------
Master only (1) | 306 21.63 21.63
Matched (3) | 1,109 78.37 100.00
------------------------+-----------------------------------
Total | 1,415 100.00
We can now use data from both of the datasets e.g. we can see how many women HIV sero-converted during the twelve month follow-up period
HIV | hiv
serostatus | 0 1 . | Total
-----------+---------------------------------+----------
0 | 900 65 291 | 1,256
1 | 1 104 54 | 159
-----------+---------------------------------+----------
Total | 901 169 345 | 1,415
Of 1256 women who tested HIV-negative at baseline, 65 sero-converted (i.e. became HIV-infected) while 900 remained HIV-negative (and remainder did not have a follow-up result) .Note one woman tested HIV-positive at baseline but HIV-negative at follow-up We can identify this woman as participant number 1870 Would need to go to original forms and fieldworkers to understand what happened with this participant (possible for example that a friend “replaced” her at the follow-up visit)
| Key |
|----------------|
| frequency |
| row percentage |
+----------------+
| ipvnew
ipvnewx | 0 1 | Total
-----------+----------------------+----------
0 | 617 137 | 754
| 81.83 18.17 | 100.00
-----------+----------------------+----------
1 | 201 154 | 355
| 56.62 43.38 | 100.00
-----------+----------------------+----------
Total | 818 291 | 1,109
| 73.76 26.24 | 100.00
Amongst women who had not experienced IPV at baseline, 18.2% experienced IPV at follow-up, while amongst women who had experienced IPV at baseline, 43.4% experienced IPV at follow-up
Finally using the visit dates, we can look at the distribution of follow-up days between the two visits - which should be roughly 365 days since follow-up was at 12 months
followdays
-------------------------------------------------------------
Percentiles Smallest
1% 287 -169
5% 344 4
10% 357 103 Obs 1,107
25% 370 208 Sum of wgt. 1,107
50% 386 Mean 409.2882
Largest Std. dev. 70.97736
75% 433 717
90% 510 736 Variance 5037.786
95% 547 738 Skewness .5813372
99% 618 745 Kurtosis 10.23827
Median follow-up time was 386 days (slightly larger than the expected 365 days) and mean was 409 days - due to some participants only being traced after about 2 years Two strange values – one with negative follow-up days (meaning that follow-up visit was recorded as having taken place before baseline visit) and one with only 4 days of follow-up between visits – we can identify the participants (idnum 1423 and idnum 1666) but would need to look at original fieldwork records in order to resolve query
| idnum | visitnox | intdatex | hivx | ipvnewx | intdate | hiv | ipvnew |
| 1423 | 1 | 30apr2004 | 0 | 0 | 04may2004 | 1 | 0 |
|--------------------------------------------------------------------------|
| stonmerge | follow~s |
| Matched (3) | 4 |
+--------------------------------------------------------------------------+
+--------------------------------------------------------------------------+
| idnum | visitnox | intdatex | hivx | ipvnewx | intdate | hiv | ipvnew |
| 1666 | 1 | 26may2005 | 0 | 0 | 08dec2004 | 0 | 1 |
|--------------------------------------------------------------------------|
| stonmerge | follow~s |
| Matched (3) | -169 |
+--------------------------------------------------------------------------+
new_merged |>
filter(followup_days<100)
idnum | visitnox | intdatex | hivx | ipvnewx | intdate | hiv | ipvnew | followup_days |
---|---|---|---|---|---|---|---|---|
1423 | 1 | 2004-04-30 | 0 | 0 | 2004-05-04 | 1 | 0 | 4 days |
1666 | 1 | 2005-05-26 | 0 | 0 | 2004-12-08 | 0 | 1 | -169 days |
In above example joined two data sets using a one-to-one merge, since each data set had only one observation per participant
In some cases there will be many observations per participant in one data set and only one observation per participant in the other data set
Ex: Longitudinal studies where all of follow-up observations are put into the same data set (which thus has many observations per participant) while the other data set contains baseline and design information (and thus one observation per participant)
Example:
COSTOP randomized controlled trial carried out to investigate whether it is safe for HIV-infected patients stabilized on ART (on ART for at least six months, on CTX prophylaxis and with a CD4 count above 250 cells/µl) to stop taking CTX prophylaxis Total of 2180 patients individually randomized to either continue taking CTX or to take an equivalent placebo (i.e. to stop CTX prophylaxis).
One secondary objective was to compare neutrophil counts over time between the two treatment arms, since CTX has some haematological toxicity Data on neutrophil counts given in costop_neutrophil, while baseline data given in costop_base
sex | ageyrs | site | whostbas | cdstrat | idnum |
---|---|---|---|---|---|
1 | 46 | 1 | 2 | 1 | 1001 |
2 | 41 | 1 | 2 | 1 | 1002 |
2 | 47 | 1 | 2 | 1 | 1003 |
1 | 42 | 1 | 3 | 1 | 1004 |
2 | 38 | 1 | 3 | 1 | 1005 |
2 | 51 | 1 | 2 | 1 | 1006 |
2 | 48 | 1 | 4 | 1 | 1007 |
1 | 48 | 1 | 2 | 1 | 1008 |
2 | 35 | 1 | 2 | 1 | 1009 |
2 | 23 | 1 | 2 | 1 | 1010 |
Contains data from Datasets/costop_base.dta
Observations: 2,180
Variables: 6 23 Jan 2023 12:34
-------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------
sex byte %8.0g sexlab Gender
ageyrs byte %9.0g
site byte %8.0g sitelab study Site
whostbas byte %8.0g Baseline WHO stage
cdstrat byte %8.0g cdlab CD4 stratum at baseline
idnum float %9.0g
-------------------------------------------------------------------------------
Sorted by: idnum
-> tabulation of sex
Gender | Freq. Percent Cum.
------------+-----------------------------------
Male | 569 26.10 26.10
Female | 1,611 73.90 100.00
------------+-----------------------------------
Total | 2,180 100.00
-> tabulation of site
study Site | Freq. Percent Cum.
------------+-----------------------------------
Entebbe | 1,002 45.96 45.96
Masaka | 1,178 54.04 100.00
------------+-----------------------------------
Total | 2,180 100.00
-> tabulation of cdstrat
CD4 stratum |
at baseline | Freq. Percent Cum.
------------+-----------------------------------
251-499 | 1,142 52.39 52.39
500+ | 1,038 47.61 100.00
------------+-----------------------------------
Total | 2,180 100.00
ne_abs | idnum | months |
---|---|---|
1.55 | 1001 | 2.661191 |
1.26 | 1001 | 5.519507 |
0.94 | 1001 | 8.279261 |
1.22 | 1001 | 10.940452 |
1.83 | 1001 | 13.798768 |
1.21 | 1001 | 16.558521 |
1.20 | 1001 | 19.318275 |
1.34 | 1001 | 22.078030 |
1.36 | 1001 | 24.837782 |
1.37 | 1001 | 27.597536 |
skimr::skim(costop)
Name | costop |
Number of rows | 23181 |
Number of columns | 3 |
_______________________ | |
Column type frequency: | |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
ne_abs | 88 | 1 | 1.81 | 1.21 | 0.00 | 1.15 | 1.59 | 2.17 | 50.00 | ▇▁▁▁▁ |
idnum | 0 | 1 | 2789.91 | 1149.98 | 1001.00 | 1553.00 | 3058.00 | 4028.00 | 4601.00 | ▇▆▃▅▇ |
months | 0 | 1 | 16.00 | 9.55 | -6.51 | 8.28 | 14.72 | 22.74 | 39.75 | ▁▇▆▆▂ |
Data set costop_neutrophil has 23,093 observations, since participants could have a number of hematology tests during the trial (including neutrophil count)
Can now merge neutrophil data to baseline data to get associated characteristics (e.g. sex, age, study site, CD4 stratum, WHO stage) corresponding to the neutrophil counts
Many observations in neutrophil data (from a single participant) will be merged to a single observation in the baseline data – so known as many-to-one or m:1 merging
Also sometimes called “table lookup” since data for a given participant are “looked up” in the baseline table
use "Datasets/costop_neutrophil.dta" , clear
merge m:1 idnum using "Datasets/costop_base.dta" , gen(cosnm)
(variable idnum was int, now float to accommodate using data's values)
Result Number of obs
-----------------------------------------
Not matched 23
from master 0 (cosnm==1)
from using 23 (cosnm==2)
Matched 23,181 (cosnm==3)
-----------------------------------------
-> tabulation of sex
Gender | Freq. Percent Cum.
------------+-----------------------------------
Male | 5,988 25.81 25.81
Female | 17,216 74.19 100.00
------------+-----------------------------------
Total | 23,204 100.00
-> tabulation of site
study Site | Freq. Percent Cum.
------------+-----------------------------------
Entebbe | 10,782 46.47 46.47
Masaka | 12,422 53.53 100.00
------------+-----------------------------------
Total | 23,204 100.00
-> tabulation of cdstrat
CD4 stratum |
at baseline | Freq. Percent Cum.
------------+-----------------------------------
251-499 | 12,114 52.21 52.21
500+ | 11,090 47.79 100.00
------------+-----------------------------------
Total | 23,204 100.00
There were 24 observations from baseline table that did not have matches in neutrophil data – these correspond to participants who dropped out early without having any hematology tests
Also among 23,117 hematology tests carried out 10,776 were in Entebbe and 12,341 in Masaka We can look at a boxplot to compare the neutrophil counts between Entebbe and Masaka
There appear to be a number of outliers among the neutrophil counts, which could be laboratory errors (or missing decimal places in the results), so we can look at the counts with a cut-off of 10 (in practice we would investigate this together with the laboratory)
With outliers removed seems little difference in distn of neutrophil counts between sites - examine this further by looking at summary statistics by site
Summary for variables: ne_abs
Group variable: site (study Site)
site | N Mean SD p25 p50 p75
--------+------------------------------------------------------------
Entebbe | 10765 1.860236 1.191051 1.17 1.61 2.21
Masaka | 12328 1.770006 1.230289 1.13 1.56 2.14
--------+------------------------------------------------------------
Total | 23093 1.812067 1.212965 1.15 1.59 2.17
---------------------------------------------------------------------
(costop |>
right_join(costop_base , by ="idnum")->many_to_one) |> head(10)
ne_abs | idnum | months | sex | ageyrs | site | whostbas | cdstrat |
---|---|---|---|---|---|---|---|
1.55 | 1001 | 2.661191 | 1 | 46 | 1 | 2 | 1 |
1.26 | 1001 | 5.519507 | 1 | 46 | 1 | 2 | 1 |
0.94 | 1001 | 8.279261 | 1 | 46 | 1 | 2 | 1 |
1.22 | 1001 | 10.940452 | 1 | 46 | 1 | 2 | 1 |
1.83 | 1001 | 13.798768 | 1 | 46 | 1 | 2 | 1 |
1.21 | 1001 | 16.558521 | 1 | 46 | 1 | 2 | 1 |
1.20 | 1001 | 19.318275 | 1 | 46 | 1 | 2 | 1 |
1.34 | 1001 | 22.078030 | 1 | 46 | 1 | 2 | 1 |
1.36 | 1001 | 24.837782 | 1 | 46 | 1 | 2 | 1 |
1.37 | 1001 | 27.597536 | 1 | 46 | 1 | 2 | 1 |
many_to_one |>
group_by(site) |>
summarise(N = n(),
mEAN = mean(ne_abs , na.rm=T),
sdN = sd(ne_abs, na.rm=T))
site | N | mEAN | sdN |
---|---|---|---|
1 | 10782 | 1.860236 | 1.191051 |
2 | 12422 | 1.770006 | 1.230289 |
many_to_one |>
mutate(site = as.factor(site),
site = ifelse(site =="1","Entebbe","Masaka")) |>
ggplot(aes(x=site,y=ne_abs))+
geom_boxplot()
This confirms that on average neutrophil counts are slightly higher in Entebbe than in Masaka .Note that we save the merged data set as “costop_ndm.dta” to use in the next three sections
In neutrophil data set, for each participant we might want to select 1st neutrophil count (to get estimate of this at baseline or enrolment) and also select last neutrophil count (to get estimate at end of the trial) We will see how to do this below:
(many_to_one |>
arrange(idnum,months) |>
group_by(idnum) |>
filter(row_number() ==1)->first_obs) |> head(10)
ne_abs | idnum | months | sex | ageyrs | site | whostbas | cdstrat |
---|---|---|---|---|---|---|---|
1.55 | 1001 | 2.661191 | 1 | 46 | 1 | 2 | 1 |
0.86 | 1002 | 2.694045 | 2 | 41 | 1 | 2 | 1 |
0.96 | 1003 | 5.552361 | 2 | 47 | 1 | 2 | 1 |
NA | 1004 | NA | 1 | 42 | 1 | 3 | 1 |
1.74 | 1005 | 2.759754 | 2 | 38 | 1 | 3 | 1 |
1.96 | 1006 | 2.759754 | 2 | 51 | 1 | 2 | 1 |
1.50 | 1007 | 2.759754 | 2 | 48 | 1 | 4 | 1 |
1.01 | 1008 | 2.989733 | 1 | 48 | 1 | 2 | 1 |
4.60 | 1009 | 2.759754 | 2 | 35 | 1 | 2 | 1 |
3.58 | 1010 | 1.971252 | 2 | 23 | 1 | 2 | 1 |
table(first_obs$sex)
1 2
569 1611
-> tabulation of sex if ne_abs<.
Gender | Freq. Percent Cum.
------------+-----------------------------------
Male | 555 25.81 25.81
Female | 1,595 74.19 100.00
------------+-----------------------------------
Total | 2,150 100.00
-> tabulation of site if ne_abs<.
study Site | Freq. Percent Cum.
------------+-----------------------------------
Entebbe | 991 46.09 46.09
Masaka | 1,159 53.91 100.00
------------+-----------------------------------
Total | 2,150 100.00
-> tabulation of cdstrat if ne_abs<.
CD4 stratum |
at baseline | Freq. Percent Cum.
------------+-----------------------------------
251-499 | 1,128 52.47 52.47
500+ | 1,022 47.53 100.00
------------+-----------------------------------
Total | 2,150 100.00
var > 60
is true if variable is greater than 60 or missing.
summary(first_obs$ne_abs)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.030 1.170 1.610 1.811 2.180 23.670 30
summary(first_obs$months)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-6.505 2.727 2.760 2.317 2.760 11.039 23
Neutrophils Abs
-------------------------------------------------------------
Percentiles Smallest
1% .47 .03
5% .73 .06
10% .87 .1 Obs 2,150
25% 1.17 .23 Sum of wgt. 2,150
50% 1.61 Mean 1.812577
Largest Std. dev. 1.079763
75% 2.18 7.38
90% 2.92 7.92 Variance 1.165888
95% 3.5 10.1 Skewness 5.411018
99% 5.8 23.67 Kurtosis 85.58683
months
-------------------------------------------------------------
Percentiles Smallest
1% -.6899384 -6.505134
5% -.4928131 -3.74538
10% -.4599589 -3.74538 Obs 2,157
25% 2.726899 -3.055442 Sum of wgt. 2,157
50% 2.759754 Mean 2.316625
Largest Std. dev. 1.361104
75% 2.759754 9.297741
90% 2.858316 9.626284 Variance 1.852604
95% 3.022587 11.03901 Skewness -.5982735
99% 5.552361 11.03901 Kurtosis 7.534941
Since we are looking within idnum and have sorted by month within idnum, _n measures number of observation within each participant so _n=1 denotes the first (earliest) hematology test, _n=2 the second test and so on Only 2,156 participants have at least one neutrophil result Note that for over 10% this was found before enrolment i.e. during the screening phase of the trial – as shown by negative values for month Can now see how to find the last neutrophil count
(21,024 observations deleted)
Neutrophils Abs
-------------------------------------------------------------
Percentiles Smallest
1% .54 .26
5% .77 .3
10% .91 .33 Obs 2,149
25% 1.18 .34 Sum of wgt. 2,149
50% 1.63 Mean 1.84342
Largest Std. dev. 1.715529
75% 2.19 10.13
90% 2.86 10.21 Variance 2.94304
95% 3.46 48 Skewness 19.74417
99% 4.99 50 Kurtosis 534.6346
months
-------------------------------------------------------------
Percentiles Smallest
1% 5.716632 -.4599589
5% 13.76591 1.051335
10% 16.55852 2.759754 Obs 2,157
25% 23.49076 2.759754 Sum of wgt. 2,157
50% 30.35729 Mean 28.24435
Largest Std. dev. 7.616737
75% 33.1499 39.45791
90% 35.8768 39.55647 Variance 58.01468
95% 36.00821 39.75359 Skewness -1.010526
99% 38.63655 39.75359 Kurtosis 3.356113
ne_abs | idnum | months | sex | ageyrs | site | whostbas | cdstrat | cosnm |
---|---|---|---|---|---|---|---|---|
1.55 | 1001 | 2.661191 | 1 | 46 | 1 | 2 | 1 | 3 |
1.26 | 1001 | 5.519507 | 1 | 46 | 1 | 2 | 1 | 3 |
0.94 | 1001 | 8.279261 | 1 | 46 | 1 | 2 | 1 | 3 |
1.22 | 1001 | 10.940452 | 1 | 46 | 1 | 2 | 1 | 3 |
1.83 | 1001 | 13.798768 | 1 | 46 | 1 | 2 | 1 | 3 |
1.21 | 1001 | 16.558521 | 1 | 46 | 1 | 2 | 1 | 3 |
1.20 | 1001 | 19.318275 | 1 | 46 | 1 | 2 | 1 | 3 |
1.34 | 1001 | 22.078030 | 1 | 46 | 1 | 2 | 1 | 3 |
1.36 | 1001 | 24.837782 | 1 | 46 | 1 | 2 | 1 | 3 |
1.37 | 1001 | 27.597536 | 1 | 46 | 1 | 2 | 1 | 3 |
(many_to_one |>
arrange(idnum,months) |>
group_by(idnum) |>
mutate(rn =row_number()) |>
filter(rn==max(rn))->last_obs) |> head(10)
ne_abs | idnum | months | sex | ageyrs | site | whostbas | cdstrat | rn |
---|---|---|---|---|---|---|---|---|
2.29 | 1001 | 38.669407 | 1 | 46 | 1 | 2 | 1 | 13 |
0.54 | 1002 | 38.636551 | 2 | 41 | 1 | 2 | 1 | 15 |
1.76 | 1003 | 38.209446 | 2 | 47 | 1 | 2 | 1 | 12 |
NA | 1004 | NA | 1 | 42 | 1 | 3 | 1 | 1 |
1.56 | 1005 | 38.603699 | 2 | 38 | 1 | 3 | 1 | 15 |
0.36 | 1006 | 8.279261 | 2 | 51 | 1 | 2 | 1 | 3 |
2.19 | 1007 | 38.669407 | 2 | 48 | 1 | 4 | 1 | 15 |
1.36 | 1008 | 38.636551 | 1 | 48 | 1 | 2 | 1 | 14 |
2.14 | 1009 | 35.876797 | 2 | 35 | 1 | 2 | 1 | 14 |
1.54 | 1010 | 35.975357 | 2 | 23 | 1 | 2 | 1 | 15 |
summary(last_obs$ne_abs)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.260 1.180 1.630 1.843 2.190 50.000 31
Last observation within each idnum (participant) is labelled _N Note that median and mean neutrophil count are very similar at beginning and end of the trial Now have two data sets – one containing first neutrophil count and other containing last neutrophil count Could merge these data sets (1:1) and hence find for each participant how much neutrophil count has changed over course of the trial
The neutrophil data set is an example of a “long” data set since we have a separate row of data for each visit An alternative to this would be a “wide” data set in which we have one row of data for each participant and within this row the first neutrophil count is recorded as ne_abs1, the second as ne_abs2, the third as ne_abs_3 and so on We see how to do this below:
use "costop_ndm.dta" , clear
drop if ne_abs==.
sort idnum months
by idnum: gen visitnum = _n
tab visitnum
(111 observations deleted)
visitnum | Freq. Percent Cum.
------------+-----------------------------------
1 | 2,156 9.34 9.34
2 | 2,146 9.29 18.63
3 | 2,130 9.22 27.85
4 | 2,106 9.12 36.97
5 | 2,066 8.95 45.92
6 | 1,979 8.57 54.49
7 | 1,880 8.14 62.63
8 | 1,794 7.77 70.40
9 | 1,635 7.08 77.48
10 | 1,486 6.43 83.91
11 | 1,283 5.56 89.47
12 | 947 4.10 93.57
13 | 643 2.78 96.35
14 | 382 1.65 98.01
15 | 191 0.83 98.84
16 | 106 0.46 99.29
17 | 60 0.26 99.55
18 | 38 0.16 99.72
19 | 23 0.10 99.82
20 | 16 0.07 99.89
21 | 10 0.04 99.93
22 | 8 0.03 99.97
23 | 1 0.00 99.97
24 | 1 0.00 99.97
25 | 1 0.00 99.98
26 | 1 0.00 99.98
27 | 1 0.00 99.99
28 | 1 0.00 99.99
29 | 1 0.00 100.00
30 | 1 0.00 100.00
------------+-----------------------------------
Total | 23,093 100.00
ne_abs | idnum | months | sex | ageyrs | site | whostbas | cdstrat | cosnm |
---|---|---|---|---|---|---|---|---|
1.55 | 1001 | 2.661191 | 1 | 46 | 1 | 2 | 1 | 3 |
1.26 | 1001 | 5.519507 | 1 | 46 | 1 | 2 | 1 | 3 |
0.94 | 1001 | 8.279261 | 1 | 46 | 1 | 2 | 1 | 3 |
1.22 | 1001 | 10.940452 | 1 | 46 | 1 | 2 | 1 | 3 |
1.83 | 1001 | 13.798768 | 1 | 46 | 1 | 2 | 1 | 3 |
1.21 | 1001 | 16.558521 | 1 | 46 | 1 | 2 | 1 | 3 |
1.20 | 1001 | 19.318275 | 1 | 46 | 1 | 2 | 1 | 3 |
1.34 | 1001 | 22.078030 | 1 | 46 | 1 | 2 | 1 | 3 |
1.36 | 1001 | 24.837782 | 1 | 46 | 1 | 2 | 1 | 3 |
1.37 | 1001 | 27.597536 | 1 | 46 | 1 | 2 | 1 | 3 |
(costdm |>
filter(!is.na(ne_abs)) |>
arrange(idnum,months) |>
group_by(idnum) |>
mutate(visitnum =1:n())->vist_dat) |> head(10)
ne_abs | idnum | months | sex | ageyrs | site | whostbas | cdstrat | cosnm | visitnum |
---|---|---|---|---|---|---|---|---|---|
1.55 | 1001 | 2.661191 | 1 | 46 | 1 | 2 | 1 | 3 | 1 |
1.26 | 1001 | 5.519507 | 1 | 46 | 1 | 2 | 1 | 3 | 2 |
0.94 | 1001 | 8.279261 | 1 | 46 | 1 | 2 | 1 | 3 | 3 |
1.22 | 1001 | 10.940452 | 1 | 46 | 1 | 2 | 1 | 3 | 4 |
1.83 | 1001 | 13.798768 | 1 | 46 | 1 | 2 | 1 | 3 | 5 |
1.21 | 1001 | 16.558521 | 1 | 46 | 1 | 2 | 1 | 3 | 6 |
1.20 | 1001 | 19.318275 | 1 | 46 | 1 | 2 | 1 | 3 | 7 |
1.34 | 1001 | 22.078030 | 1 | 46 | 1 | 2 | 1 | 3 | 8 |
1.36 | 1001 | 24.837782 | 1 | 46 | 1 | 2 | 1 | 3 | 9 |
1.37 | 1001 | 27.597536 | 1 | 46 | 1 | 2 | 1 | 3 | 10 |
table(vist_dat$visitnum) |>
as.data.frame()
Var1 | Freq |
---|---|
1 | 2156 |
2 | 2146 |
3 | 2130 |
4 | 2106 |
5 | 2066 |
6 | 1979 |
7 | 1880 |
8 | 1794 |
9 | 1635 |
10 | 1486 |
11 | 1283 |
12 | 947 |
13 | 643 |
14 | 382 |
15 | 191 |
16 | 106 |
17 | 60 |
18 | 38 |
19 | 23 |
20 | 16 |
21 | 10 |
22 | 8 |
23 | 1 |
24 | 1 |
25 | 1 |
26 | 1 |
27 | 1 |
28 | 1 |
29 | 1 |
30 | 1 |
Variable “visitnum” measures number of each visit within each participant Note: if a person had (say) visitnum 5, then they must have also had visits 1,2,3 and 4 That is why the frequency decreases with each visit number – so 8 participants had 22 visits and one participant had 30 visits. Now see how to make a wide data set In this wide data set there will be variables for ne_abs1 up to ne_abs30 and month1 up to month30 (although ne_abs30 and month30 will be missing for all except one participant)
(111 observations deleted)
visitnum | Freq. Percent Cum.
------------+-----------------------------------
1 | 2,156 9.34 9.34
2 | 2,146 9.29 18.63
3 | 2,130 9.22 27.85
4 | 2,106 9.12 36.97
5 | 2,066 8.95 45.92
6 | 1,979 8.57 54.49
7 | 1,880 8.14 62.63
8 | 1,794 7.77 70.40
9 | 1,635 7.08 77.48
10 | 1,486 6.43 83.91
11 | 1,283 5.56 89.47
12 | 947 4.10 93.57
13 | 643 2.78 96.35
14 | 382 1.65 98.01
15 | 191 0.83 98.84
16 | 106 0.46 99.29
17 | 60 0.26 99.55
18 | 38 0.16 99.72
19 | 23 0.10 99.82
20 | 16 0.07 99.89
21 | 10 0.04 99.93
22 | 8 0.03 99.97
23 | 1 0.00 99.97
24 | 1 0.00 99.97
25 | 1 0.00 99.98
26 | 1 0.00 99.98
27 | 1 0.00 99.99
28 | 1 0.00 99.99
29 | 1 0.00 100.00
30 | 1 0.00 100.00
------------+-----------------------------------
Total | 23,093 100.00
(j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
> 29 30)
Data Long -> Wide
-----------------------------------------------------------------------------
Number of observations 23,093 -> 2,156
Number of variables 10 -> 67
j variable (30 values) visitnum -> (dropped)
xij variables:
ne_abs -> ne_abs1 ne_abs2 ... ne_abs30
months -> months1 months2 ... months30
-----------------------------------------------------------------------------
+------------------------------------------------------------------------+
1. | idnum | ne_abs1 | months1 | ne_abs2 | months2 | ne_abs3 | months3 |
| 1001 | 1.55 | 2.661191 | 1.26 | 5.519507 | .94 | 8.279261 |
|------------------------------------------------------------------------|
| ne_abs4 | months4 | ne_abs5 | months5 | ne_abs6 | months6 | ne_abs7 |
| 1.22 | 10.94045 | 1.83 | 13.79877 | 1.21 | 16.55852 | 1.2 |
|------------------------------------------------------------------------|
| months7 | ne_abs8 | months8 | ne_abs9 | months9 | ne_abs10 |
| 19.31828 | 1.34 | 22.07803 | 1.36 | 24.83778 | 1.37 |
|-----------+-----------------------------------------------+------------|
| months10 | ne_abs11 | months11 | ne_abs12 | months12 | ne_abs13 |
| 27.59754 | 1.6 | 30.39014 | 1.47 | 35.90965 | 2.29 |
|-----------+-----------+-----------+-----------+-----------+------------|
| months13 | ne_abs14 | months14 | ne_abs15 | months15 | ne_abs16 |
| 38.66941 | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months16 | ne_abs17 | months17 | ne_abs18 | months18 | ne_abs19 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months19 | ne_abs20 | months20 | ne_abs21 | months21 | ne_abs22 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months22 | ne_abs23 | months23 | ne_abs24 | months24 | ne_abs25 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months25 | ne_abs26 | months26 | ne_abs27 | months27 | ne_abs28 |
| . | . | . | . | . | . |
|------------------------------------------------------------------------|
| months28 | ne_abs29 | months29 | ne_abs30 | months30 | sex | ageyrs |
| . | . | . | . | . | Male | 46 |
|------------------------------------------------------------------------|
| site | whostbas | cdstrat | cosnm |
| Entebbe | 2 | 251-499 | Matched (3) |
+------------------------------------------------------------------------+
+------------------------------------------------------------------------+
2. | idnum | ne_abs1 | months1 | ne_abs2 | months2 | ne_abs3 | months3 |
| 1002 | .86 | 2.694045 | .84 | 5.453799 | 1.72 | 7.753593 |
|------------------------------------------------------------------------|
| ne_abs4 | months4 | ne_abs5 | months5 | ne_abs6 | months6 | ne_abs7 |
| 1.09 | 8.279261 | .91 | 10.80903 | .99 | 13.79877 | 1.04 |
|------------------------------------------------------------------------|
| months7 | ne_abs8 | months8 | ne_abs9 | months9 | ne_abs10 |
| 16.55852 | .53 | 19.31828 | 1.43 | 22.07803 | 1.07 |
|-----------+-----------------------------------------------+------------|
| months10 | ne_abs11 | months11 | ne_abs12 | months12 | ne_abs13 |
| 24.83778 | .89 | 27.59754 | 1.07 | 30.32443 | .55 |
|-----------+-----------+-----------+-----------+-----------+------------|
| months13 | ne_abs14 | months14 | ne_abs15 | months15 | ne_abs16 |
| 33.08419 | 1.29 | 35.84394 | .54 | 38.63655 | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months16 | ne_abs17 | months17 | ne_abs18 | months18 | ne_abs19 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months19 | ne_abs20 | months20 | ne_abs21 | months21 | ne_abs22 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months22 | ne_abs23 | months23 | ne_abs24 | months24 | ne_abs25 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months25 | ne_abs26 | months26 | ne_abs27 | months27 | ne_abs28 |
| . | . | . | . | . | . |
|------------------------------------------------------------------------|
| months28 | ne_abs29 | months29 | ne_abs30 | months30 | sex | ageyrs |
| . | . | . | . | . | Female | 41 |
|------------------------------------------------------------------------|
| site | whostbas | cdstrat | cosnm |
| Entebbe | 2 | 251-499 | Matched (3) |
+------------------------------------------------------------------------+
+------------------------------------------------------------------------+
3. | idnum | ne_abs1 | months1 | ne_abs2 | months2 | ne_abs3 | months3 |
| 1003 | .96 | 5.552361 | 2.3 | 8.279261 | 1.31 | 10.80903 |
|------------------------------------------------------------------------|
| ne_abs4 | months4 | ne_abs5 | months5 | ne_abs6 | months6 | ne_abs7 |
| 1.38 | 13.79877 | 1.86 | 16.55852 | 4.46 | 19.31828 | 1.45 |
|------------------------------------------------------------------------|
| months7 | ne_abs8 | months8 | ne_abs9 | months9 | ne_abs10 |
| 22.07803 | 1.24 | 24.83778 | 1.52 | 27.56468 | 1.28 |
|-----------+-----------------------------------------------+------------|
| months10 | ne_abs11 | months11 | ne_abs12 | months12 | ne_abs13 |
| 30.39014 | 1.75 | 35.90965 | 1.76 | 38.20945 | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months13 | ne_abs14 | months14 | ne_abs15 | months15 | ne_abs16 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months16 | ne_abs17 | months17 | ne_abs18 | months18 | ne_abs19 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months19 | ne_abs20 | months20 | ne_abs21 | months21 | ne_abs22 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months22 | ne_abs23 | months23 | ne_abs24 | months24 | ne_abs25 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months25 | ne_abs26 | months26 | ne_abs27 | months27 | ne_abs28 |
| . | . | . | . | . | . |
|------------------------------------------------------------------------|
| months28 | ne_abs29 | months29 | ne_abs30 | months30 | sex | ageyrs |
| . | . | . | . | . | Female | 47 |
|------------------------------------------------------------------------|
| site | whostbas | cdstrat | cosnm |
| Entebbe | 2 | 251-499 | Matched (3) |
+------------------------------------------------------------------------+
+------------------------------------------------------------------------+
4. | idnum | ne_abs1 | months1 | ne_abs2 | months2 | ne_abs3 | months3 |
| 1005 | 1.74 | 2.759754 | 1.34 | 5.519507 | 1.62 | 8.279261 |
|------------------------------------------------------------------------|
| ne_abs4 | months4 | ne_abs5 | months5 | ne_abs6 | months6 | ne_abs7 |
| .65 | 8.673512 | 1.59 | 10.77618 | 1.4 | 13.79877 | .98 |
|------------------------------------------------------------------------|
| months7 | ne_abs8 | months8 | ne_abs9 | months9 | ne_abs10 |
| 16.55852 | 1.81 | 19.31828 | 1.59 | 22.07803 | 2.03 |
|-----------+-----------------------------------------------+------------|
| months10 | ne_abs11 | months11 | ne_abs12 | months12 | ne_abs13 |
| 24.83778 | 1.48 | 27.63039 | 1.45 | 30.39014 | 1.72 |
|-----------+-----------+-----------+-----------+-----------+------------|
| months13 | ne_abs14 | months14 | ne_abs15 | months15 | ne_abs16 |
| 33.1499 | 2.28 | 35.8768 | 1.56 | 38.6037 | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months16 | ne_abs17 | months17 | ne_abs18 | months18 | ne_abs19 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months19 | ne_abs20 | months20 | ne_abs21 | months21 | ne_abs22 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months22 | ne_abs23 | months23 | ne_abs24 | months24 | ne_abs25 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months25 | ne_abs26 | months26 | ne_abs27 | months27 | ne_abs28 |
| . | . | . | . | . | . |
|------------------------------------------------------------------------|
| months28 | ne_abs29 | months29 | ne_abs30 | months30 | sex | ageyrs |
| . | . | . | . | . | Female | 38 |
|------------------------------------------------------------------------|
| site | whostbas | cdstrat | cosnm |
| Entebbe | 3 | 251-499 | Matched (3) |
+------------------------------------------------------------------------+
+------------------------------------------------------------------------+
5. | idnum | ne_abs1 | months1 | ne_abs2 | months2 | ne_abs3 | months3 |
| 1006 | 1.96 | 2.759754 | 1.11 | 5.486653 | .36 | 8.279261 |
|------------------------------------------------------------------------|
| ne_abs4 | months4 | ne_abs5 | months5 | ne_abs6 | months6 | ne_abs7 |
| . | . | . | . | . | . | . |
|------------------------------------------------------------------------|
| months7 | ne_abs8 | months8 | ne_abs9 | months9 | ne_abs10 |
| . | . | . | . | . | . |
|-----------+-----------------------------------------------+------------|
| months10 | ne_abs11 | months11 | ne_abs12 | months12 | ne_abs13 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months13 | ne_abs14 | months14 | ne_abs15 | months15 | ne_abs16 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months16 | ne_abs17 | months17 | ne_abs18 | months18 | ne_abs19 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months19 | ne_abs20 | months20 | ne_abs21 | months21 | ne_abs22 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months22 | ne_abs23 | months23 | ne_abs24 | months24 | ne_abs25 |
| . | . | . | . | . | . |
|-----------+-----------+-----------+-----------+-----------+------------|
| months25 | ne_abs26 | months26 | ne_abs27 | months27 | ne_abs28 |
| . | . | . | . | . | . |
|------------------------------------------------------------------------|
| months28 | ne_abs29 | months29 | ne_abs30 | months30 | sex | ageyrs |
| . | . | . | . | . | Female | 51 |
|------------------------------------------------------------------------|
| site | whostbas | cdstrat | cosnm |
| Entebbe | 2 | 251-499 | Matched (3) |
+------------------------------------------------------------------------+
(vist_dat |>
pivot_wider(names_from ="visitnum",values_from = c("ne_abs","months"))->vist_wide) |> head(10)
idnum | sex | ageyrs | site | whostbas | cdstrat | cosnm | ne_abs_1 | ne_abs_2 | ne_abs_3 | ne_abs_4 | ne_abs_5 | ne_abs_6 | ne_abs_7 | ne_abs_8 | ne_abs_9 | ne_abs_10 | ne_abs_11 | ne_abs_12 | ne_abs_13 | ne_abs_14 | ne_abs_15 | ne_abs_16 | ne_abs_17 | ne_abs_18 | ne_abs_19 | ne_abs_20 | ne_abs_21 | ne_abs_22 | ne_abs_23 | ne_abs_24 | ne_abs_25 | ne_abs_26 | ne_abs_27 | ne_abs_28 | ne_abs_29 | ne_abs_30 | months_1 | months_2 | months_3 | months_4 | months_5 | months_6 | months_7 | months_8 | months_9 | months_10 | months_11 | months_12 | months_13 | months_14 | months_15 | months_16 | months_17 | months_18 | months_19 | months_20 | months_21 | months_22 | months_23 | months_24 | months_25 | months_26 | months_27 | months_28 | months_29 | months_30 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1001 | 1 | 46 | 1 | 2 | 1 | 3 | 1.55 | 1.26 | 0.94 | 1.22 | 1.83 | 1.21 | 1.20 | 1.34 | 1.36 | 1.37 | 1.60 | 1.47 | 2.29 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2.661191 | 5.519507 | 8.279261 | 10.940452 | 13.79877 | 16.55852 | 19.31828 | 22.07803 | 24.83778 | 27.59754 | 30.39014 | 35.90965 | 38.66941 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1002 | 2 | 41 | 1 | 2 | 1 | 3 | 0.86 | 0.84 | 1.72 | 1.09 | 0.91 | 0.99 | 1.04 | 0.53 | 1.43 | 1.07 | 0.89 | 1.07 | 0.55 | 1.29 | 0.54 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2.694045 | 5.453799 | 7.753593 | 8.279261 | 10.80903 | 13.79877 | 16.55852 | 19.31828 | 22.07803 | 24.83778 | 27.59754 | 30.32443 | 33.08419 | 35.84394 | 38.63655 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1003 | 2 | 47 | 1 | 2 | 1 | 3 | 0.96 | 2.30 | 1.31 | 1.38 | 1.86 | 4.46 | 1.45 | 1.24 | 1.52 | 1.28 | 1.75 | 1.76 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 5.552361 | 8.279261 | 10.809034 | 13.798768 | 16.55852 | 19.31828 | 22.07803 | 24.83778 | 27.56468 | 30.39014 | 35.90965 | 38.20945 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1005 | 2 | 38 | 1 | 3 | 1 | 3 | 1.74 | 1.34 | 1.62 | 0.65 | 1.59 | 1.40 | 0.98 | 1.81 | 1.59 | 2.03 | 1.48 | 1.45 | 1.72 | 2.28 | 1.56 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2.759754 | 5.519507 | 8.279261 | 8.673512 | 10.77618 | 13.79877 | 16.55852 | 19.31828 | 22.07803 | 24.83778 | 27.63039 | 30.39014 | 33.14990 | 35.87680 | 38.60370 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1006 | 2 | 51 | 1 | 2 | 1 | 3 | 1.96 | 1.11 | 0.36 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2.759754 | 5.486653 | 8.279261 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1007 | 2 | 48 | 1 | 4 | 1 | 3 | 1.50 | 1.51 | 2.07 | 3.34 | 1.82 | 2.38 | 2.83 | 4.91 | 2.11 | 2.04 | 2.02 | 1.74 | 2.13 | 3.51 | 2.19 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2.759754 | 5.519507 | 8.279261 | 9.429158 | 11.03901 | 13.79877 | 16.55852 | 19.31828 | 22.07803 | 24.83778 | 27.72895 | 30.39014 | 33.14990 | 35.90965 | 38.66941 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1008 | 1 | 48 | 1 | 2 | 1 | 3 | 1.01 | 1.23 | 1.56 | 1.40 | 2.48 | 1.58 | 2.23 | 1.76 | 1.42 | 1.69 | 1.56 | 1.13 | 1.11 | 1.36 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2.989733 | 5.388091 | 8.279261 | 11.039015 | 13.76591 | 16.55852 | 19.28542 | 22.07803 | 24.93634 | 27.59754 | 30.35729 | 33.11704 | 35.87680 | 38.63655 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1009 | 2 | 35 | 1 | 2 | 1 | 3 | 4.60 | 4.63 | 2.64 | 3.14 | 1.88 | 2.14 | 2.32 | 1.02 | 4.09 | 7.69 | 3.12 | 3.50 | 2.34 | 2.14 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2.759754 | 5.519507 | 8.279261 | 11.039015 | 13.79877 | 16.55852 | 19.31828 | 22.07803 | 23.26078 | 24.83778 | 27.59754 | 30.35729 | 33.11704 | 35.87680 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1010 | 2 | 23 | 1 | 2 | 1 | 3 | 3.58 | 2.85 | 1.38 | 2.34 | 1.40 | 1.21 | 1.60 | 2.55 | 2.48 | 2.36 | 1.72 | 1.84 | 1.84 | 1.79 | 1.54 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1.971252 | 2.792608 | 5.552361 | 8.279261 | 11.03901 | 13.79877 | 16.55852 | 19.35113 | 21.42094 | 22.07803 | 24.83778 | 27.59754 | 30.65298 | 33.11704 | 35.97536 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1011 | 2 | 31 | 1 | NA | 1 | 3 | 2.26 | 2.98 | 0.95 | 2.17 | 1.98 | 2.37 | 1.13 | 1.73 | 1.70 | 2.56 | 1.62 | 1.82 | 1.49 | 2.22 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2.792608 | 5.552361 | 8.312115 | 11.071869 | 13.83162 | 16.59138 | 19.31828 | 22.11088 | 23.78645 | 24.83778 | 27.59754 | 30.35729 | 33.34702 | 35.87680 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
First participant is a 46 year old male from Entebbe with 13 neutrophil results, while second participant is a 41 year old female from Entebbe with 15 neutrophil results
Note that for certain applications a wide data set is preferable, while for others a long data set is preferable We can convert a wide data set to a long data set, provided that the variables to be reshaped end in a digit denoting the serial number (so here ne_abs1, ne_abs2 etc)
(j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
> 29 30)
Data Wide -> Long
-----------------------------------------------------------------------------
Number of observations 2,156 -> 64,680
Number of variables 67 -> 10
j variable (30 values) -> visitnum
xij variables:
ne_abs1 ne_abs2 ... ne_abs30 -> ne_abs
months1 months2 ... months30 -> months
-----------------------------------------------------------------------------
The long data set above is an example of a clustered data structure, with repeated measures of neutrophil counts clustered within participants We often want to summarize the data at the cluster level – here to get participant level summaries (number of visits and mean neutrophil count) This can be achieved using the collapse command as shown below:
(costdm |>
filter(!is.na(ne_abs)) |>
mutate(dummy = 1) |>
group_by(idnum) |>
summarise(sum_d = sum(dummy),
n=n(),
mean_abs = mean(ne_abs,na.rm=T))->colapse_data) |> head(10)
idnum | sum_d | n | mean_abs |
---|---|---|---|
1001 | 13 | 13 | 1.433846 |
1002 | 15 | 15 | 0.988000 |
1003 | 12 | 12 | 1.772500 |
1005 | 15 | 15 | 1.549333 |
1006 | 3 | 3 | 1.143333 |
1007 | 15 | 15 | 2.406667 |
1008 | 14 | 14 | 1.537143 |
1009 | 14 | 14 | 3.232143 |
1010 | 15 | 15 | 2.032000 |
1011 | 14 | 14 | 1.927143 |
table(colapse_data$sum_d) |>
as.data.frame()
Var1 | Freq |
---|---|
1 | 10 |
2 | 16 |
3 | 24 |
4 | 40 |
5 | 87 |
6 | 99 |
7 | 86 |
8 | 159 |
9 | 149 |
10 | 203 |
11 | 336 |
12 | 304 |
13 | 261 |
14 | 191 |
15 | 85 |
16 | 46 |
17 | 22 |
18 | 15 |
19 | 7 |
20 | 6 |
21 | 2 |
22 | 7 |
30 | 1 |
skimr::skim(colapse_data$mean_abs)
Name | colapse_data$mean_abs |
Number of rows | 2156 |
Number of columns | 1 |
_______________________ | |
Column type frequency: | |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
data | 0 | 1 | 1.83 | 0.69 | 0.62 | 1.36 | 1.7 | 2.14 | 10.26 | ▇▁▁▁▁ |
(sum) dummy | Freq. Percent Cum.
------------+-----------------------------------
1 | 10 0.46 0.46
2 | 16 0.74 1.21
3 | 24 1.11 2.32
4 | 40 1.86 4.17
5 | 87 4.04 8.21
6 | 99 4.59 12.80
7 | 86 3.99 16.79
8 | 159 7.37 24.17
9 | 149 6.91 31.08
10 | 203 9.42 40.49
11 | 336 15.58 56.08
12 | 304 14.10 70.18
13 | 261 12.11 82.28
14 | 191 8.86 91.14
15 | 85 3.94 95.08
16 | 46 2.13 97.22
17 | 22 1.02 98.24
18 | 15 0.70 98.93
19 | 7 0.32 99.26
20 | 6 0.28 99.54
21 | 2 0.09 99.63
22 | 7 0.32 99.95
30 | 1 0.05 100.00
------------+-----------------------------------
Total | 2,156 100.00
use "jinja_q2.dta",clear
tabu clus2 arm
gen byte dummy =1
collapse (sum) dummy , by(clus2 arm)
by arm ,sort : summa dummy , det
Cluster |
number | study_arm
regrouped | facility home | Total
-----------+----------------------+----------
11 | 50 0 | 50
12 | 0 11 | 11
21 | 59 0 | 59
22 | 0 41 | 41
31 | 0 34 | 34
32 | 0 29 | 29
41 | 36 0 | 36
42 | 0 53 | 53
52 | 0 26 | 26
54 | 40 0 | 40
61 | 28 0 | 28
62 | 0 35 | 35
72 | 27 0 | 27
81 | 0 96 | 96
82 | 9 0 | 9
91 | 22 0 | 22
101 | 65 0 | 65
102 | 0 34 | 34
111 | 0 29 | 29
112 | 35 0 | 35
121 | 0 6 | 6
122 | 32 0 | 32
132 | 0 60 | 60
141 | 45 0 | 45
142 | 0 22 | 22
151 | 0 35 | 35
152 | 12 0 | 12
161 | 0 59 | 59
171 | 0 56 | 56
172 | 0 49 | 49
181 | 17 0 | 17
182 | 10 0 | 10
191 | 0 43 | 43
192 | 17 0 | 17
201 | 56 0 | 56
202 | 0 84 | 84
211 | 0 57 | 57
212 | 34 0 | 34
-----------+----------------------+----------
Total | 594 859 | 1,453
-------------------------------------------------------------------------------
-> arm = facility
(sum) dummy
-------------------------------------------------------------
Percentiles Smallest
1% 9 9
5% 9 10
10% 10 12 Obs 18
25% 17 17 Sum of wgt. 18
50% 33 Mean 33
Largest Std. dev. 17.09145
75% 45 50
90% 59 56 Variance 292.1176
95% 65 59 Skewness .2959819
99% 65 65 Kurtosis 2.077189
-------------------------------------------------------------------------------
-> arm = home
(sum) dummy
-------------------------------------------------------------
Percentiles Smallest
1% 6 6
5% 8.5 11
10% 16.5 22 Obs 20
25% 29 26 Sum of wgt. 20
50% 38 Mean 42.95
Largest Std. dev. 22.17745
75% 56.5 59
90% 72 60 Variance 491.8395
95% 90 84 Skewness .6375892
99% 96 96 Kurtosis 3.267698