Answering CS1B Actuarial Exam Question In R
Interactive learning
To test that you have successfully loaded the required R Packages, run the following codes in order within R: Code:
- The exam requires that you load each dataset on each question using the function
load()
- I will do it at once
- Do not run the chunk below as it wont run , it was only for demonstrative purposes
- Run the following chunks below as it will read in all the data needed for the exercise
An insurance company wants to study the association between the number of years their clients spent in education and their claim amounts. Data from 25 randomly selected claims are contained in the file AmountYears.RData in the following two variables:
- ClaimAmount – this is the claim amount (in £).
- EducationYears – this is the number of years the client spent in education.
use the
plot()
function , to learn more type?plot
in the console
If I were to use packages i would use the ggplot package ,but dont use it
\[ClaimAmount = 609272 − 95999 × EducationYears + 4371 × EducationYears2\]
compared to the model in part (ii), based on the output from part (iii).
Comment
- the quadratic model seems more suitable from the plot below as it traces more points
- \(R^2\) for the quadratic model has improved significantly as compared to the linear model
A financial consultancy working with large firms wishes to model the relationship between a firm’s assets and the number of senior management positions in the firm. The data file firms.Rdata contains the variables:
- assets – this is the value of assets (in millions of £).
- sn_positions – this is the number of senior managements positions.
Mean
Sample size
two separate graphs but on the same scale specifying appropriate axis limits and labels.
A Multiple-Choice (MC) test with 20 questions requires a minimum of 16 correct answers for students to pass the test. A student prepares for the test using a mobile phone application that generates random practice tests with 20 questions per test.
Load the file MCtestResults.Rdata into R. This creates two variables:
- CorrectOutOf20Questions – this contains the number of correct answers the student has achieved with the mobile phone application in each of 50 generated practice MC tests.
- TrialNumber – this contains the corresponding test number from 1 to 50.
The student assumes that the test score, X, which is the number of correctly answered questions per test, has a binomial distribution,
\(X \sim Bin(n,p)\) with n=20.
How do we find p
- there are
50 tests
generated - each test is marked out of
20
- if he where to get everything correct then he who would get
50*20
to give1000
- what he actually got is equal to sum of marks in column
CorrectOutOf20Questions
Probability of passing the test
- this requires to have achieved a mark of
15
or betteri.e
P(X>15)
Proportion of tests that he passed
\[f(x) =1 -exp(-\lambda x^2 )\]
A random sample of 100 values of X is provided in randomSample.Rdata. Loading the sample data into R will generate a vector x with 100 values representing the sample.
- Calculate the value of the log likelihood function for the parameter λ at the point \(\lambda =2\) based on this random sample.
- Plot the values of the log likelihood function for the parameter λ based on the sample in randomSample.Rdata. Your plot of the log likelihood function must be for values of λ = 0.01, 0.02, … , 0.99, 1.
The maximum likelihood estimator for the parameter λ based on a random sample \(X_1 , ..., X_N\) is given by \[\hat{\lambda} = \frac{N}{\sum^N_{i=1}X_{i}^2}\]
entering the car insurance market. An underwriting manager at the company believes that the age and gender of the policyholder will be the most important factors in estimating the number of claims made under a car insurance policy. The underwriting manager has commissioned a survey of its current home insurance customers who also have car insurance, choosing a male customer and a female customer for every age from 18 to 65, asking them how many car insurance claims they have made in the past 3 years. This dataset is saved in the file ClaimsData.Rdata. After loading this data into R, using the command load(“ClaimsData.Rdata”), the data frame ClaimsData will be available, which contains the following three variables:
- age – this is the age (in years) of the policyholder.
- gender – this is either ‘M’ for male or ‘F’ for female.
- claim_count – this is the number of car insurance claims reported by the policyholder over the past 3 years.
- Fit a Generalised Linear Model (GLM) to the data using claim_count as the response variable and age as the explanatory variable, assuming a Poisson distribution for the response variable. Your answer should include the estimated coefficients and the Akaike’s Information Criterion (AIC) of the fitted model.
- Fit, by choosing a suitable argument for family in the glm command, a GLM to the data that is equivalent to the model fitted in part (i). > Your answer should include the estimated coefficients and the AIC of this fitted model.
The underwriting manager believes the Poisson GLM would be improved by adding the explanatory variable gender as well as its interaction with age.
- Fit a Poisson GLM to the data of the form age*gender. Your answer should include the estimated coefficients and the AIC of this fitted model.
- Compare, using scaled deviances, the fit of this model to that in part (ii).