Skip to main content

Section Homework 6

Before you begin ...

Permitted MATLAB Functions & Commands for Homework 6.

The following built-in MATLAB functions and commands are permitted for this assignment.
Vector/Matrix
Operations:
round β€’ mod β€’ floor β€’ ceil
Size/Dimensions:
length β€’ numel β€’ size β€’ height β€’ width
Creation:
zeros β€’ ones β€’ true β€’ false
Stats:
sum β€’ max β€’ min β€’ mean β€’ std
Logical:
all β€’ any
Flow Control
Conditional:
if β€’ switch-case β€’ try-catch
Loops:
for β€’ while β€’ continue β€’ break β€’ return
Strings and Character Arrays
Operations:
join β€’ replace
Conversion:
num2str β€’ str2num β€’ str2double β€’ string β€’ char
Other
Printing Text:
fprintf β€’ disp β€’ display β€’ input β€’ assert
Random Generators:
rand β€’ randi β€’ rng
Special Variables:
nargin
Type Detection:
isnan β€’ isfield β€’ isempty
Points will be deducted from any programs using functions outside this list.

Summary.

Subsection reroll_all

Write a function that implements a reroll all policy for dice-poker, regardless of the current hand.
Inputs:
hand (1x5) integer β€” current dice hand (1-6) sorted by rank.
counts (1x6) integer β€” die face counts.
Output:
sel (1x5) logical β€” true for dice to reroll, per the rule set below.
Details:
β–Έ Returns true for all hand locations.
β–Έ This policy serves as a baseline often used for comparison against smarter strategies.
Example:
sel = reroll_all([5 4 3 2 1], [1 1 1 1 1 0])
% Returns: sel =
%   1Γ—5 logical array
%    1   1   1   1   1

Subsection reroll_none

Write a function that implements a conservative reroll none policy for dice-poker, regardless of the current hand.
Inputs:
hand (1x5) integer β€” current dice hand (1-6) sorted by rank.
counts (1x6) integer β€” die face counts.
Output:
sel (1x5) logical β€” true for dice to reroll, per the rule set below.
Details:
β–Έ Returns false for all hand locations.
β–Έ This is another baseline policy for comparison with better policies.
Example:
sel = reroll_none([6 4 3 2 1], [1 1 1 1 0 1])
% Returns: sel =
%   1x5 logical array
%    0   0   0   0   0

Subsection reroll_base

Write a function that implements the standard dealer policy, formerly get_dealer_reroll.
Inputs:
hand (1x5) integer β€” current dice hand (1-6) sorted by rank.
counts (1x6) integer β€” die face counts.
Output:
sel (1x5) logical β€” true for dice to reroll, per the rule set below.
Details:
β–Έ If the hand is a straight, keep all dice.
β–Έ Else if all dice are singles, reroll only the die with face value 1.
β–Έ Else: reroll all singles.
β–Έ If your code uses the rank ID, use your get_rank helper from Homework 3.
β–Έ Paste any helper functions used by this program underneath this the function. I should be able to run this without any dependencies.
Example:
sel = reroll_base([2 2 6 5 3], [0 2 1 0 1 1])
% sel =
%   1x5 logical array
%    0   0   1   1   1

Subsection reroll_greedy

Write a function that implements a greedy policy that always pushes for a five-of-a-kind.
Inputs:
hand (1x5) integer β€” current dice hand (1-6) sorted by rank.
counts (1x6) integer β€” die face counts.
Output:
sel (1x5) logical β€” true for dice to reroll, per the rule set below.
Details:
β–Έ This policy only keeps the dice with the highest multiplicity (the most of). In the case of two-pair, this policy only keeps the highest pair.
Paste any helper functions used by this program underneath this the function. I should be able to run this without any dependencies.
Example:
sel = reroll_greedy([5 5 3 3 2], [0 1 2 0 2 0])
% sel =
%   1x5 logical array
%    0   0   1   1   1

Subsection reroll_singles

Write a function that implements a policy that always rerolls singletons (faces appearing exactly once) β€” except in the special case of a straight.
Inputs:
hand (1x5) integer β€” current dice hand (1-6) sorted by rank.
counts (1x6) integer β€” die face counts.
Output:
sel (1x5) logical β€” true for dice to reroll, per the rule set below.
Details:
Paste any helper functions used by this program underneath this the function. I should be able to run this without any dependencies.
Example:
sel = reroll_singles([3 3 6 4 1], [1 0 2 1 0 1])
% sel =
%   1x5 logical array
%    0   0   1   1   1

Subsection reroll_str8_chaser

Write a function that implements a straight-chasing policy: it prefers to keep dice that progress toward a straight (1-2-3-4-5 or 2-3-4-5-6), falling back to a greedy policy only when a straight is β€œtoo far” (four or more dice away).
Inputs:
hand (1x5) integer β€” current dice hand (1-6) sorted by rank.
counts (1x6) integer β€” die face counts.
Output:
sel (1x5) logical β€” true for dice to reroll, per the rule set below.
Details:
β–Έ Determine how many dice are missing from a low straight (1-5) and a high straight (2-6) and shoot for the version that the hand is closer to.
β–Έ If both versions are tied, defer to the higher straight. If both version are more than 3 dice away, then fall back to reroll_greedy.
β–Έ For consistency, when there are multiples of a die face, reroll the dice to the right of the first one. See example 2 below.
Paste any helper functions used by this program underneath this the function. I should be able to run this without any dependencies.
Example 1: High and low straights are both 2 dice away, so go for the high.
sel = reroll_str8_chaser([6 6 5 4 1], [1 0 0 1 1 2])
% sel =
%   1x5 logical array
%    0   1   0   0   1
Example 2: Stupidly splitting a four-of-a-kind to go for a straight.
sel = reroll_str8_chaser([4 4 4 4 1], [1 0 0 4 0 0])
% sel =
%   1x5 logical array
%    0   1   1   1   0

Subsection apply_policy

This function applies a single reroll policy to an initial 5-dice hand. It applies the given reroll policy, rerolls the selected dice, and returns the final sorted hand along with its rank ID.
Inputs:
hand (1x5) integer β€” current dice values (faces 1-6).
policy (function handle) β€” reroll selection policy.
Outputs:
hand (1x5) integer β€” final sorted hand after applying the policy & rerolling.
rid (1x1) integer β€” rank ID of the final hand as returned by get_rank.
Details:
β–Έ This mimics the dealer’s steps after the initial roll and before you determine a winner in your play_one_round helper from Homework 4.
Helpers:
get_face_counts, sort_by_rank, get_rank, and a user-supplied policy.
Example 1: No Rerolls
[newHand, rid] = apply_policy([2 2 3 5 6], @reroll_none);
% Returns:
%   newHand  =  2   2   6   5   3    β¬…  Sorted 
%   rid  =  7
Example 2: Apply greedy policy
[newHand, rid] = apply_policy([1 4 4 6 2], @reroll_greedy);
% Returns (values will vary):
%   newHand  =  4     4     4     4     1
%   rid  =  2
Example 3: Use base dealer policy
[newHand, rid] = apply_policy([1 2 3 5 6], @reroll_base);
% Returns (values will vary):
%   newHand  =  5     4     3     2     1
%   rid  =  3

Subsection mc_prob_dice_poker_rerolls

This function uses Monte Carlo simulation to estimate the probability of the eight possible 5-dice poker hand ranks after a reroll based on a given policy. It is very similar your mc_prob_dice_poker_rolls function from homework 5, except it depends on the policy used to select which dice to reroll.
Inputs:
policy (function handle) β€” selection policy with signature \(\texttt{sel = policy(roll, counts)}\text{,}\) returning a 1x5 logical vector indicating which dice to reroll. Default: @reroll_none (if policy is omitted).
N (1x1) integer β€” number of simulated 5-dice rolls. Default: 1e5.
seed (1x1) integer β€” random number generator seed.
Outputs:
simEstimates (1x1) struct containing the following fields:
  • fiveKind (1x1) string β€” CI for a five-of-a-kind.
  • fourKind (1x1) string β€” CI for a four-of-a-kind.
  • straight (1x1) string β€” CI for a straight.
  • fullhouse (1x1) string β€” CI for a full-house.
  • threeKind (1x1) string β€” CI for a three-of-a-kind.
  • twoPair (1x1) string β€” CI for a two-pair.
  • onePair (1x1) string β€” CI for a one-pair.
  • singles (1x1) string β€” CI for a singles.
where each confidene interval is formatted as β€œ\(pΜ‚%\) Β± \(ME%\)”.
Details:
β–Έ For each Monte-Carlo Loop:
  • Use randi to generate each roll. Don’t call roll_dice.
  • After the initial roll, apply the policy by calling your apply_policy with the appropriate inputs.
  • Determine and tally the outcome after the reroll.
β–Έ Hard code the confidence levels to 95% and use your pHat_marginErr_w_CL to find the margin of error for each estimate.
β–Έ Results are formatted as percentages with one decimal place for pHat and two decimals for ME. Use round to set the decimals.
β–Έ Handle the special inputs options as follows:
  • If if no inputs are passed, set policy = @reroll_none and N=1e5.
  • If only policy is passed, set N=1e5
  • If seed is provided, set the rng to this seed as usual.
Example 1: No inputs, use default policy and N = 1e5
simEstimates = mc_prob_dice_poker_rerolls();
% Returns (values will vary):
% struct with fields:
% 
%    fiveKind: "0.1% Β± 0.02%"
%    fourKind: "1.9% Β± 0.08%"
%    straight: "3% Β± 0.11%"
%   fullhouse: "3.8% Β± 0.12%"
%   threeKind: "15.5% Β± 0.22%"
%     twoPair: "23% Β± 0.26%"
%     onePair: "46.4% Β± 0.31%"
%     singles: "6.3% Β± 0.15%"
Example 2: Greedy policy with seed, and N = 2e5
simEstimates = mc_prob_dice_poker_rerolls(@reroll_greedy, 2e5, 123);
% Returns:
% struct with fields:
% 
%    fiveKind: "1.1% Β± 0.05%"
%    fourKind: "10.1% Β± 0.13%"
%    straight: "3% Β± 0.08%"
%   fullhouse: "14.7% Β± 0.16%"
%   threeKind: "23.7% Β± 0.19%"
%     twoPair: "28.4% Β± 0.2%"
%     onePair: "12.8% Β± 0.15%"
%     singles: "6.2% Β± 0.11%"

Subsection mc_prob_policy_wins

This function estimates, via Monte Carlo simulation, the win probability of using one reroll policy (policyA) against another (policyB) in simulated 5-dice poker matches. Each trial generates random starting hands, applies each policy once, compares the resulting hands, and tracks the wins.
Inputs:
policyA (function handle) β€” selection policy for player A.
policyB (function handle) β€” selection policy for player B.
N (1x1) integer β€” number of simulated games. Default: 1e5.
seed (1x1) integer β€” optional random number seed.
Outputs:
pHat (1x1) double β€” estimated probability that policyA wins.
ME (1x1) double β€” 95% confidence level for the margin of error of pHat.
Details:
β–Έ For each Monte-Carlo Loop:
  • Generate the initial rolls for both policies.
  • Apply both policies using your apply_policy with the appropriate inputs.
  • Determine and tally the outcome.
β–Έ Count total wins.
β–Έ Compute the sample win probability for policyA.
β–Έ Use 95% confidence levels for the margin of error.
β–Έ Handle the special inputs options as follows:
  • If seed is provided, set the rng to this seed as usual.
Helpers:
apply_policy, get_winner, get_face_counts, sort_by_rank, get_rank
Example 1: Compare conservative vs greedy strategies with fixed seed
[pHat, ME] = mc_prob_policy_wins(@reroll_none, @reroll_greedy, 2e5, 314)
% Returns:
%   pHat  =  0.3024
%   ME  =  0.0020
Example 2: Base vs straight-chaser policy
[pHat, ME] = mc_prob_policy_wins(@reroll_base, @reroll_str8_chaser)
% Returns (values will vary):
%   pHat  =  0.7389
%   ME  =  0.0027

Subsection head2head_results

This function performs a head-to-head Monte Carlo comparison between all defined reroll policies in 5-dice poker. It estimates, for each policy pair, the win probability of the row policy over the column policy and returns a complete win-probability matrix (in percent form).
Inputs:
None.
Outputs:
probs (M x M) double matrix β€” head-to-head win probabilities in percent, where probs(A,B) gives the estimated chance that policy A wins against policy B.
Each value is rounded to one decimal place (e.g., \(64.7\) represents a 64.7% win rate).
Details:
β–Έ Define a structure of six standard policies, mapping names to their function handles:
β€œall” β†’ @reroll_all
β€œnone” β†’ @reroll_none
β€œbase” β†’ @reroll_base
β€œsingles” β†’ @reroll_singles
β€œgreedy” β†’ @reroll_greedy
β€œstr8” β†’ @reroll_str8_chaser
β–Έ To create the probability matrix, use a nested for loop.
β–Έ Both loops should scan through the string array of the fields in the policy structure above. You can hard code this string array or extract them from the structure with policyNames = string(fields(policies)).
β–Έ Initialize an M x M matrix of zeros to store the estimated probabilities.
β–Έ Convert the probabilities to a percent and round to one decimal place using the round(*,1) command.
β–Έ The diagonal entries represent self-comparison (policy vs. itself), which should be approximately 50%.
Helpers:
mc_prob_policy_wins, apply_policy, get_winner, get_face_counts, sort_by_rank, get_rank, and all reroll policy functions.
Example 1: Compute head-to-head win probability matrix
probs = head2head_results()
Display as a formatted table
policyNames = ["all" "none" "base" "singles" "greedy" "str8"];
T = array2table( ...
	probs, 'VariableNames', ...
	policyNames, 'RowNames', ...
	policyNames);
disp(T)
Identify strongest average performer
avgWinRate = mean(probs, 2);
[~, idxBest] = max(avgWinRate);
bestPolicy = policyNames(idxBest)
% Returns the policy with the highest mean win rate across all opponents.