Winning at motus

Reading time ~5 minutes

Context

What is motus

Motus is a quite famous (at least in France) game where one has to guess a seven letters word based on its first letter and another letter given somewhere in the word. The player has various trials (8) to guess the word. Each word proposed must start with the same first letter as the word to guess, and the player is informed when a letter is at its place, or if a letter belongs to the word to guess, but is not in the right place.

Being quite popular, this game now has various mobile applications!

Illustration of motus

Fig. 1: Capture of "Motus, le jeu officiel France 2"

As you can see, the letters stressed in red are the one indicate the good position for the letter, and the yellow circle indicates that the letter is in the word, but currently in the wrong position.

Why winning

Long story, but I would like a winning strategy at this game which ideally, I can perform without the help of a computer.

A solution

Beginner approach

For having watched the shows many times on TV, or others playing the app, a strategy used by most players seem to incrementally use the letters which you know well positionned to propose words which comply with the valid letters.

The idea

Though the naive approach works, it feels like “sub optimal words” (i.e. words which seem far from the final word given the information we have when we propose them) can bring a lot of information, since we can craft them to be as informative as possible!

“Crafting” (the process of finding a seven letter words who share as few letter as possible with the words already suggested) is quite hard. However, learning a list of words which are as informative as possible is easy!

Therefore, the idea used here is that, in 4 guesses, one can almost know the letters present in the word to guess (4 * (7-1) = 24). Therefore, for each letter, one only needs to know four words, such that the number of different letters in these four words is maximal.

Then the second hypothesis is that when we know the letters of the words to guess, it is easy to guess it. It turned out to be harder than I thought, yet, my statistics were vastly improved!

The implementation

Details

It took the list of French words from this blog. The first part consists in removing the word which are not sevent characters long and getting rid of their accents.

Then, for a list of words, the coverage_score is defined, it simply is the number of different characters formed by these words.

It is shortly implemented and tested below:

def coverage_score(words):
    cnt = Counter()
    for word in words:
        for character in word:
            cnt[character] += 1
    return len(list(cnt))


assert(coverage_score(['aa', 'bb']) == 2)
assert(coverage_score(['ac', 'cb']) == 3)
assert(coverage_score(['abc', 'def']) == 6)
assert(coverage_score(['abc', 'def', 'cad']) == 6)

Then, for each letter, words are sampled by four and for each sampling, the score is evaluated. This process is repeated 1 000 000 times, and the 4-uple with the highest coverage score is returned. And this is it.

def random_optimization(generator, score, n_attempts):
    i = 0
    best_score = 0
    best_element = generator()
    while i < n_attempts:
        i += 1
        e = generator()
        score_e = score(e)
        if score_e > best_score:
            best_score = score_e
            best_element = e
    return (best_element, best_score)

The code

from collections import Counter
import random
import string
import unidecode


def load_words_and_clean(input_filepath):

    def is_seven_characters_long(s):
        return len(s) == 7

    def remove_line_breaks(s):
        return s.replace("\n", "")

    fp = open(input_filepath, 'r')
    words = fp.readlines()
    words = map(remove_line_breaks, words)

    seven_letter_words = [
        word for word in words if is_seven_characters_long(word)]

    removed_accents = map(unidecode.unidecode, seven_letter_words)
    removed_dashes = filter(lambda x: not "-" in x, removed_accents)

    remove_ez = filter(lambda x: not x.endswith("ez"), removed_dashes)

    return list(remove_ez)


def coverage_score(words):
    cnt = Counter()
    for word in words:
        for character in word:
            cnt[character] += 1
    return len(list(cnt))


assert(coverage_score(['aa', 'bb']) == 2)
assert(coverage_score(['ac', 'cb']) == 3)
assert(coverage_score(['abc', 'def']) == 6)
assert(coverage_score(['abc', 'def', 'cad']) == 6)


def random_optimization(generator, score, n_attempts):
    i = 0
    best_score = 0
    best_element = generator()
    while i < n_attempts:
        i += 1
        e = generator()
        score_e = score(e)
        if score_e > best_score:
            best_score = score_e
            best_element = e
    return (best_element, best_score)


candidate_words = load_words_and_clean("./liste.de.mots.francais.frgut.txt")
print("File loaded!")

for x in string.ascii_lowercase:
    print(x)
    words_starting_with_x = list(
        filter(lambda w: w.startswith(x), candidate_words))

    def generator_x():
        res = [random.choice(words_starting_with_x) for i in range(4)]
        list.sort(res)
        return res

    res = random_optimization(generator_x, coverage_score, 100000)
    print("(coverage : " + str(res[1]) + ")")

    print("\n".join(res[0]))
    print(10 * '-')

Results

Now, I simply have to learn the 104 words listed below

File loaded!
a
(coverage : 19)
affuble
agrions
amochai
apteryx
----------
b
(coverage : 18)
bavocha
bowling
brusque
brutaux
----------
c
(coverage : 18)
candide
capeyat
choques
combler
----------
d
(coverage : 18)
deflore
dejuche
demodat
dopings
----------
e
(coverage : 19)
endurci
engavat
enzymes
explora
----------
f
(coverage : 18)
fachait
faxames
flingot
fuyards
----------
g
(coverage : 18)
giclant
glyphes
gommeux
grisbis
----------
h
(coverage : 19)
hagarde
hickory
honteux
humbles
----------
i
(coverage : 18)
ichtyol
impulse
inegaux
ivoires
----------
j
(coverage : 19)
jockeys
jouxter
jubilat
jumping
----------
k
(coverage : 20)
kamichi
kobolds
krypton
kufique
----------
l
(coverage : 18)
lambris
legende
lexique
lycopes
----------
m
(coverage : 18)
malfont
margeas
midship
muqueux
----------
n
(coverage : 18)
nasarde
neigeux
notable
nymphal
----------
o
(coverage : 19)
oblatif
ocreras
oppidum
oxygene
----------
p
(coverage : 18)
palombe
penchas
poquait
profond
----------
q
(coverage : 15)
quartat
quiches
quidams
quillon
----------
r
(coverage : 18)
rebonds
rejugee
rempile
ruchait
----------
s
(coverage : 19)
sandjak
sculpte
sombras
swingua
----------
t
(coverage : 18)
thymols
topique
trafics
trepang
----------
u
(coverage : 16)
ultimes
urbaine
urgence
uropode
----------
v
(coverage : 19)
vachard
vampent
velique
voyages
----------
w
(coverage : 18)
wergeld
whiskys
wombats
wurmien
----------
x
(coverage : 11)
xanthie
ximenie
xylenes
xylenes
----------
y
(coverage : 16)
yankees
yiddish
yogourt
ypreaux
----------
z
(coverage : 17)
zeolite
ziberas
zincage
zythums
----------

Improvements

Word selection

The thing is that the accepted words will mostly depend on the implementation of the game, I could not find a official dictionary for this. So the conjugated verbs would be rejected, per example (hence the line : remove_ez = filter(lambda x: not x.endswith("ez"), removed_dashes), which does not account for the full complexity of French conjugations, to say the least)

Algorithm

The “coverage” seems quite low (around 19). I believe this can be improved using smarter ideas than a random, brute force optimization.

Perhaps some better optimizations could be performed, such as, start with a word, randomly pick among the most different words and so on so forth.

The score itself

The definition of the coverage score does not take into account the scarcity of some letters. z is quite scarce, and it is often a waste of time to check whether it is in the word to guess.

OCaml List rev_map vs map

If you found this page, you are probably very familiar with OCaml already!So, OCaml has a ````map```` function whose purpose is pretty cl...… Continue reading

How to optimize PyTorch code ?

Published on March 17, 2024

Acronyms of deep learning

Published on March 10, 2024