# HTH vs. HTT - The attempt of an explanation

### Max Lang / 2021-04-25

# Question

This question was asked during the really interesting TedTalk about statistics by Peter Donelly.

*Consider two Patterns HTH vs HTT. Which of the following is true?*

*a) The average number of tosses until HTH is larger than the average number of tosses until HTT*

*b) The average number of tosses until HTH is the same as the average number of tosses until HTT*

*c) The average number of tosses until HTH is smaller than the average number of tosses until HTT*

First one might think *b)* is the correct answer.
After a second thought one notices that the sequence *HTH* overlaps itself. Let’s say one got HTHTH it is clear that one got two occurrences of *HTH* in only five coin tosses. (**HTH**TH and HT**HTH**).

Taking a look at the second sequence *HTT* one will find out that there is not such an overlap.
Therefore one might think that *HTH* is more likely than *HTT* which is not the case as shown in the code chunk below.

One should note that we are either looking for

*HTT*or (exclusive) for*HTH*.We only flip until the pattern which we are looking for got hit.

```
set.seed(10)
fair_coin <- c('H','T')
seek <- function(seq) {
# Flipping the coin three times
result <- sample(x= fair_coin, size= 3,replace=TRUE)
# Defining the minimum Number of flips (three)
flips <- 3
# Setting issed to true so while loop iterates at least one time
missed <- TRUE
while (missed == TRUE) {
# If all our elements in the vector are the same as the sequence we seek stop the loop
if (all(seq == result)) {
missed <- FALSE
}
# Recycle last two elements and flip again once (important)
else {
result <- c(result[2],result[3], sample(x= fair_coin, size= 1, replace= TRUE))
flips <- flips + 1
}
}
return(flips)
}
# Setting the number of flips of t experi
n <- 10000
experiment <- data.frame("HTH"=rep(NA,n),"HTT"=rep(NA,n))
HTH <- c('H','T','H')
HTT <- c('H','T','T')
for (i in 1:n) {
experiment[i,] <- c(seek(HTH),seek(HTT))
}
# Average number of flips
summary(experiment)
```

```
## HTH HTT
## Min. : 3.00 Min. : 3.000
## 1st Qu.: 4.00 1st Qu.: 5.000
## Median : 8.00 Median : 7.000
## Mean :10.04 Mean : 8.054
## 3rd Qu.:13.00 3rd Qu.:10.000
## Max. :87.00 Max. :47.000
```

## The attempt of an explanation:

Let’s suppose the following scenarios.

#### Looking for the sequence *HTH*

*HTH*

The simplest scenario in which we win right away.

*HTT*

Because of the last *T* in the sequence one needs to start tossing until one gets another *H* to “start” the *HTT* sequence.

*Comment 1*:

Let’s suppose one already got *HT*. If the next toss is *H* one found the sequence and is done. If it’s a *T*, one has to start all over again: since the last two tosses were *TT* one now needs the full *HTH*.

#### Looking for *HTT*

*HTT*

Again the simplest scenario in which we win right away.

*HTH*

Because we are looking for *HTT* we did not win in this case. However, and this is the important, at this stage one does not need to toss the coin again in order to find the starting token *H*.

*Comment 2*:

Let’s suppose again one already got *HT*. If the next toss is *T*, one found the sequence. If it is a *H*, this is just a minor setback, because one now got *HTH* (while seeking the *HTT* sequence); however, now one already has the *H* and just needs two *T* in a row ( *TT* ). Moreover if the next toss is an H this is not such a big setback again, because *H* is the start of the sequence we are looking for, just like before. A *T* would make our “situation” even better and so on…

Therefore answer *a) The average number of tosses until HTH is larger than the average number of tosses until HTT*.

## Experiment 2

After we put in some thought, we wanted to find how the probabilities of those two sequences behave if we adjust the experiment.

Now we will throw a coin 100 Times and note down each outcome. The result of each run of the experiment will therefore be a line/string looking e.g like this one “HTHTTTHHHTHTHTHTHT…”, containing in total 100 characters. After this we go through our 100 character long string and check for the following patterns.

**HTH****HTT**

Note that HTH can overlap itself. For example *HTHTH* is counted as **HTH**TH and HT**HTH** - two sequences for *HTH*. **HTT** does not have this characteristic.

The **Question** is which of the following statements is true:

- The number of times
**HTH**is in our string is more than the number of times**HTT**occurs - The number of times
**HTH**is in our string is less than the number of times**HTT**occurs - The number of times
**HTH**is in our string is equal to the number of times**HTT**occurs

Let’s make a simple simulaton in R to find out the correct answer. We first defined this function to simulate or experiment.

```
library(ggplot2)
count.pattern <- function(pattern,
n_flips = 100,
coins = rbinom(n_flips, 1, 0.5)) {
hits = 0
for (i in 1:(n_flips - 2)) {
# TRUES are counted as 1 in R, this allows the following step
if (sum(coins[i:(i + 2)] == pattern) == 3) {
hits = 1 + hits
}
}
hits
}
```

### Run the experiment

```
# Encoded 0 = H and 1 = T
HTH = c(0, 1, 0)
HTT = c(0, 1, 1)
TTT = c(1, 1, 1)
n_flips = 100
# Running the experiment 10000 times seeking HTH, HTT and TTT
results = data.frame(
HTH = replicate(10000, count.pattern(HTH, n_flips)),
HTT = replicate(10000, count.pattern(HTT, n_flips)),
TTT = replicate(10000, count.pattern(TTT, n_flips))
)
summary(results)[4,]
```

```
## HTH HTT TTT
## "Mean :12.27 " "Mean :12.25 " "Mean :12.2 "
```

The **means** at 10000 trials are close enough that we can safely assume that
the average amount of successes **is the same**, no matter which pattern.

However, their **distribution is not**:

### Plotting the histograms of all three patterns

```
ggplot(tidyr::pivot_longer(results,
c("HTH", "HTT", "TTT"),
names_to = "pattern",
values_to = "value"),
aes(value)) +
geom_histogram(binwidth = 0.5) +
facet_wrap(vars(pattern)) +
labs(x = paste("Amount of pattern occurences.")) +
ggtitle(paste("Distribution of different pattern occurrences in", n_flips, "coin tosses."))
```

### Also interesting: 5 tosses

The means stay the same, but at this few tosses, some patterns can only appear once, while othes (TTT) can appear up to three times when overlapping is allowed.

```
n_flips = 5
# Running the experiment 10000 times seeking HTH, HTT and TTT
results = data.frame(
HTH = replicate(50000, count.pattern(HTH, n_flips)),
HTT = replicate(50000, count.pattern(HTT, n_flips)),
TTT = replicate(50000, count.pattern(TTT, n_flips))
)
print(summary(results)[4,])
```

```
## HTH HTT TTT
## "Mean :0.3701 " "Mean :0.3733 " "Mean :0.3724 "
```

```
ggplot(tidyr::pivot_longer(results,
c("HTH", "HTT", "TTT"),
names_to = "pattern",
values_to = "value"),
aes(value)) +
geom_histogram(binwidth = 0.5) +
facet_wrap(vars(pattern)) +
labs(x = paste("Amount of pattern occurences.")) +
ggtitle(paste("Distribution of different pattern occurrences in", n_flips, "coin tosses."))
```