.The owner of a race horse wants to maximize the infinite horizon dis counted returns of his horse. The discount factor α is 2/3. It is possible to participate in a race every day, but after participating the horse may not be fit next day. If the horse is fit, the expected return for that day is $200,000. If the horse is tired, the expected return is only $100,0000. Participation in a race is for free. If the horse is fit and participates in a race, it is fit the next day with probability 2/3 and with probability 1/3 it is tired the next day. If the horse is fit and does not participate in a race, it will still be fit the next day. Similarly, the horse will be tired the next day, if it participates in a race while being tired. If a tired horse rests for a day, it will be fit the next day with probability 1/2 and it is still tired the next day with probability 1/2.
a. Formulate this problem as a Markov decision process problem. Describe the state and action spaces and give the transition probabilities and rewards.
b. Compute the optimal policy that maximizes the infinite discounted reward using policy iteration.
c. Formulate the primal and dual LPs and provide the optimal solution of the dual LP.
d. Consider the problem of question 1 but in this problem we will focus on long-run average reward optimality. Compute the long-run average reward under each policy?