Tips Main Katla, Ini Dia Tebakan Pertama Terbaik

Setelah permainan tebakan kata Wordle viral dan dimainkan jutaan orang, Indonesia punya versinya sendiri! Katla, adalah versi Indonesia dari game tersebut. Dibuat oleh Fatih Katlifa, game ini kini meramaikan jagat Twitter, TikTok, dan Instagram Indonesia.

Cara Bermain Katla

Cara bermain Katla cukup sederhana, sama seperti aturan Wordle. Mengutip dari situsnya, setiap tebakan harus merupakan kata valid 5 huruf sesuai KBBI dan pemain memiliki 6 kesempatan untuk menebak.

Setiap tebakan tentunya sangat berharga, terlebih jika kita bisa mendapat banyak petunjuk. Untuk itu, statistik dapat digunakan untuk memaksimalkan peluang memperoleh tebakan terbaik. Mari kita bahas!

Statistik

Untuk pengolahan data, bahasa pemrograman yang akan digunakan adalah Python. Kata-kata yang digunakan diperoleh dari source code situs tersebut, kemudian disimpan dalam format .txt. Bagi yang tertarik untuk membaca cuplikan script Python-nya, dapat dilihat di artikel ini.

Selanjutnya, kita dapat menghitung jumlah kemunculan tiap alfabet pada daftar kata tersebut. Dengan menggunakan script Python, diperoleh hasil seperti gambar di bawah ini.

Statistik 1: Distribusi peluang kemunculan huruf

Dari grafik tersebut, dapat dilihat bahwa huruf ‘a’ adalah huruf yang paling mendominasi, dengan peluang kemunculan lebih dari 15%. Huruf selanjutnya yaitu ‘e’ dengan peluang kemunculan hampir 8%. Dengan hasil ini, kita dapat menyimpulkan bahwa dua huruf ini harus dimasukkan di awal tebakan kita.

Salah satu yang penting dari cara kita menebak adalah posisi huruf. Huruf ‘a’ pada awal kata tentu memiliki peluang yang berbeda dengan huruf ‘a’ pada urutan lain. Untuk itu, kita hrus meninjau peluang huruf tersebut muncul di tiap urutan.

Peluang muncul di Urutan Tertentu

Dari gambar di atas, dapat disimpulkan bahwa huruf ‘a’ paling sering muncul di urutan keempat pada sebuah kata. Meskipun huruf ‘a’ juga menjadi huruf yang paling sering berada di posisi kedua sebuah kata, tentu kita tidak ingin menggunakannya dua kali di awal tebakan kita. Huruf selanjutnya yang dapat kita pilih adalah ‘e’ di posisi kedua.

Tebakan Terbaik

Berdasarkan dua kriteria peluang tersebut, secara kasar kita dapat langsung menjumlahkan nilai peluangnya setelah dinormalisasi. Dengan menggunakan script Python, diperoleh kata terbaik untuk tebakan pertama adalah ‘serak’! Selanjutnya, untuk tebakan kedua, kita dapat menggunakan kata ‘pulih’.

Dengan menggunakan kombinasi dua kata ini, semoga anda dapat menebak kata dibalik teka-tekinya dengan mudah. Selamat bermain!


Jika anda tertarik dengan stastik serupa, anda dapat menonton video dari 3Blue1Brown. Di video tersebut, Grant membahas tentang metode penentuan tebakan pertama untuk Wordle berdasarkan Teori Informasi.

Advertisement

What’s the Best Initial Guess for Wordle?

Recently, a daily word game called Wordle has gone viral on Twitter. The rule of the game is simple, guess a five letters word in 6 tries. Some of the hint after guessing a word is shown in the figure below.

I found the game as interesting and challenging since there are hundreds of thousands of words in English but it is not my native language. Luckily I know Python! I calculated the statistics of letters occurrence in English and make an informed initial guess

Statistics!

First thing first, I load all the words from a .txt file provided on GitHub. Since the quiz only considers a five letters word, we eliminate all the words with length other than that. Some words might not be in the quiz dictionary, but I just let it be.

Next, we calculate the occurrence of the alphabet. in the five letters words. This can be done easily by using dictionary in Python, iterate through all the words and letters, count it, and normalize the data. The code can be seen below.

import string
letter_count = dict.fromkeys(string.ascii_lowercase, 0)

for word in words_5_letters:
  for letter in word:
    letter_count[letter] = letter_count[letter] + 1

total_count = sum(letter_count.values())

letter_count_normalized = {key: value/total_count for key, value in letter_count.items()}

sorted_letter_prob = {k: v for k, v in sorted(letter_count_normalized.items(), key=lambda item: item[1], reverse=True)}
Statistics 1: English Letter Probability in 5 Letters Words

The result shows that more than 10% of the letters is ‘a’, with ‘e’ trails behind at almost 10% and ‘s’ at slightly higher than 8%. This is slightly different from the distribution for words with any length, shown in the image below. With this statistics, we are confident that we should include those letters in our initial guess.

Frequency Table
English Letter Frequency (source)

Another statistics, let’s count the occurrence probability of a word in a letter. With “’Pandas’ library, the code is shown below.

import pandas as pd

data = dict.fromkeys(string.ascii_lowercase, [0, 0, 0, 0, 0])
df = pd.DataFrame.from_dict(data, orient='index')
df.columns = [1, 2, 3, 4, 5]

for word in words_5_letters:
  for count,letter in enumerate(word):
    df.loc[letter][count+1] = df.loc[letter][count+1] + 1
        
df_transposed = df.transpose()
df_normalized = df_transposed.div(df_transposed.sum(axis=1), axis=0)

The result is shown in the figure below. One can see that the most common letter for first to fifth position in a word are ‘s’, ‘a’, ‘r’, ‘e’, and ‘s’. This can be a hint on what first word is good as a guess. However, since using ‘s’ double is not efficient, we can substitute the first letter with the next high occurring letter, ‘c’.

Statistics 2: Letter Occurrence Probability in the Words

Hold up! what about the Statistics 1? Yes! We should also consider it so let us do the calculation! Suppose that we consider both Statistics 1 and 2 is equally important so we can measure the most probable word by averaging those criteria. The function to calculate the score is written below

def count_score(word):
  count_crit_1 = 0
  count_crit_2 = 0

  for count,letter in enumerate(word):
    count_crit_1 = count_crit_1 + sorted_letter_prob[letter]/100
    count_crit_2 = count_crit_2 + df_normalized.iloc[count][letter]
    
  return (count_crit_1 + count_crit_2)/2*100

Making an Informed Guess

To make an initial guess, let us iterate through all the five letters words and see which one has the highest probability. Note that we should not include a word with doubled letter since it is not an efficient guess. The code is shown below.

def letter_is_not_doubled(check_string):
  count = {}
  condition = True
    
  for s in check_string:
    if s in count:
      count[s] += 1
    else:
      count[s] = 1
        
  for key in count.keys():
    condition = condition and (count[key] == 1)
        
  return condition
words_score = {}

for word in words_5_letters:
  if letter_is_not_doubled(word):
    words_score[word] = count_score(word)
        
sorted_words_score = {k: v for k, v in sorted(words_score.items(), key=lambda item: item[1], reverse=True)}

From this calculation, we found that the highest probability word is ‘tares’. Thus, we can use this word as our first guess, an informed guess!

Once you play the quiz, you will notice that one guess is not enough so we need another one. For our next guess, we do not want to include the letters already exist in the first guess. Let us define a function to filter out which letter we want to exclude and calculate the score again.

def not_contain_this_letter(word, not_contain):
    condition = True
    
    for letter in not_contain:
        condition = condition and (letter not in word)
        
    return condition
not_contain = 'tares'

word_guess = [word for word in words_5_letters if not_contain_this_letter(word, not_contain)]

words_score_2 = {}

for word in word_guess:
    if letter_is_not_doubled(word):
        words_score_2[word] = count_score(word)
        
sorted_words_score_2 = {k: v for k, v in sorted(words_score_2.items(), key=lambda item: item[1], reverse=True)}
sorted_words_score_2

From this filtering, we found out that the best word for the second guess is ‘colin’! Using the same technique, by excluding ‘tares’ and ‘colin’ we found that the third best guess in case two is not enough is ‘bumpy’.


There you go! Make “tares” as your initial guess, following with ‘colin‘ for the second one, and ‘bumpy’ in case you think you need a third one.

Now you can play Wordle with statistically best initial guess. Good luck!

Magnetometer Calibration Using Levenberg-Marquardt Algorithm

Recently I worked on a magnetometer calibration method. This method is based on Levenberg-Marquardt Algorithm (LMA), a non-linear least-squares optimization algorithm. The method is implemented on ArduPilot and PX4, an open-source flight controller firmware.

I have to admit, formulizing mathematical notion from code is not straightforward. I spent several days to learn from LMA basic and finally understanding the sphere fit and ellipsoid fit algorithm.

In case you are wondering about the mathematical part, I write the formulation of the algorithm on PDF since it can’t be viewed on WordPress (unless I pay more for the plugin).

Click here for the document.

Python vs Julia: Speed Test on Fibonacci Sequence

Recently MIT released a course on Computational Thinking with code 18.S191 and it is available on YouTube. I can code in C++ and Python, so the founder’s claim that this code is as fast as C and as easy as Python gains my interest.

Introduction to Julia

Julia is created in 2009 and first introduced to public in 2012. The developers aimed for scientific computing, machine learning, data mining, and large-scale linear algebra. We might have heard this application on Python, but Julia gives advantages to programmer compared to Python.

Continue reading Python vs Julia: Speed Test on Fibonacci Sequence

Humanitarian Robotics: Autonomous Landmine Detection Rover

Although war is not happening, the dangerous impact is still tangible today. Landmine has been one of the threats left by the past wars, killing 15,000–20,000 people every year according to UN Mine Action Service. Demining efforts cost US$ 300–1000 per mine and imposing danger to people, resulting one person is killed and two are injured for every mines cleared.

HRATC 2017

Robot can be really helpful in solving this problem, as it is designed to do the “dull, dirty, dangerous, and difficult” tasks. In 2017, IEEE Robotics Automation Society’s Special Interest Group on Humanitarian Technology (RAS–SIGHT) held a competition. The competition was Humanitarian Robotics and Automation Technology Challenge (HRATC), held at the 2017 International Conference on Robotics and Automation (ICRA’17).

Autonomous Landmine Detection Rover
Continue reading Humanitarian Robotics: Autonomous Landmine Detection Rover

Text Extraction from a Table Image, using PyTesseract and OpenCV

Extracting text from an image can be exhausting, especially when you have a lot to extract. One commonly known text extraction library is PyTesseract, an optical character recognition (OCR). This library will provide you text given an image.

PyTesseract is really helpful, the first time I knew PyTesseract, I directly used it to detect some a short text and the result is satisfying. Then, I used it to detect text from a table but the algorithm failed perform.

Figure 1. Direct use of PyTesseract to Detect Text in a Table

Figure 1 depicts the text detection result, with green boxes enclosing the detected words. You may realized that most of the text can’t be detected by the algorithm, especially numbers. In my case, these numbers are the essentials of the data, giving me value of daily COVID-19 cases from a local government in my hometown. So, how extract these information?


Getting Started

When writing an algorithm, I always try to think as if I’m teaching the algorithm the way humans do. This way, I can easily put the idea into more detailed algorithms.

When you’re reading a table, the first thing you might notice is the cells. A cell might be separated from another cell using a border (lines), which can be vertical or horizontal. After you identify the cell, you proceed to read the information within. Converting it into algorithm, you may divide the process into three processes, namely cells detection, region of interest (ROI) selection, and text extraction.

Continue reading Text Extraction from a Table Image, using PyTesseract and OpenCV

Programming for Robotics – ROS: Exercise 3

This post is a continuation of previous project, on learning ROS. The lecture and exercise is given by Robotic Systems Lab. – ETH Zurich, can be accessed through this website.

This time, the exercise goal is to make the robot hit the pillar in the simulation environment. The pillar position is found by measuring closest distance from LIDAR measurement. The speed is set to be constant and a simple P controller is made to direct Husky towards the pillar. Both speed and P-gain is written in param file, making it easy to tune avoiding re-build the code. A marker in RViz is created to visualize location of the pillar.

Exercise 3

The video above shows my result for the exercise. You can find the exercise paper sheet on this link.


Thanks to Robotic Systems Lab – ETH Zurich for sharing this helpful course!

List of Prime Numbers

Recently I had an interview for an internship position with Volocopter. It is an Urban Air Mobility (UAM) company based in Germany. As far as I know, this company is one of the most advance in UAM competition.

One part of the interview was an online coding. The interviewer asked me to “provide a list of all prime number up to 1000”. I found this question interesting as a there’s only one characteristic of a prime number.

Continue reading List of Prime Numbers