Counter Examples for Compact Action Markov Decision Chains With Average Reward Criteria

Rommert Dekker*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

8 Citations (Scopus)

Abstract

In this note we present two examples of compact-action finite-state Markov decision chains in which a policy improvement procedure yields wrong or limited results. in the first example, which exhibits a multichain structure, there is no convergence of the average rewards of the successive policies to the maximal value. in the second example, which has a unichain structure, the lack of uniqueness of maximizing policies in each step of the algorithm means that there is no convergence of either bias vectors or maximizing policies. Accordingly, no solution to the average optimality equations can be obtained.

Original languageEnglish
Pages (from-to)357-368
Number of pages12
JournalCommunications in Statistics. Stochastic Models
Volume3
Issue number3
DOIs
Publication statusPublished - 1987

Bibliographical note

Funding Information:
' ) Research was sponsored by the Netherlands Foundation for Mathematics (SMC) . Present address: Koninkli j ke/Shell Laboratorium Amsterdam, P.O. Box 3003, 1003 AA Amsterdam.

Fingerprint

Dive into the research topics of 'Counter Examples for Compact Action Markov Decision Chains With Average Reward Criteria'. Together they form a unique fingerprint.

Cite this