How does language affect security?

The difference between Chinese and English passwords is of great importance for the security of popular web services.

Regardless of linguistic and cultural differences, both Chinese and English-speaking Internet users seem to find common ground in using easily guessed options for the password “123456”.


But a recent study comparing bilingual password patterns also found remarkable and unique features of Chinese passwords that are important for Internet security outside of China.

The password habits of Chinese users are surprisingly little understood, given that they account for over 20 percent of all Internet users worldwide.
More than 854 million people use the Internet in China alone, more than double the population of the United States. This is why a group of Chinese and American researchers decided to test how password security among Chinese and English-speaking users resists the best hacking algorithms.


It analyzed 106 million real passwords from nine web services – 73 million passwords from six services in Chinese and 33 million passwords from three English-language services – that were discovered by hackers and leaked online between 2009 and 2012.
What may seem like a strong password based on assumptions about the English language can actually be quite weak and easy to guess in terms of the Chinese language.

However, many popular web services in the world, including some Chinese ones, approach password protection from an English perspective.
The researchers pointed to an example of the popular Chinese password “woaini1314”, which is currently rated “strong” by password strength meters used by AOL, Google, and even the popular Chinese social networking site Sina Weibo (and IEEE, the parent organization of IEEE Spectrum).
But people who speak Mandarin Chinese, the most popular dialect of Chinese, can easily guess the password “woaini1314” because “woaini” in Chinese pinyin (Latinized Chinese character system) means “I love you” and “1314” sounds like “Forever” in Chinese.
One of the main differences between Chinese and English passwords is that many Chinese users prefer purely numeric passwords.

Banal passwords

Besides the infamous password “123456”, there are other popular passwords among Chinese users: “111111”, “123123” and “123321.”
When choosing the theme of love, the phrase “5201314” is used because it sounds similar to the phrase “I love you forever and ever” in Chinese.
Some popular password segments add a letter to the string of numbers, such as “a12345” and “12345a”.
Chinese users also often use their mobile phone numbers or specific dates (such as birthdays) in passwords – something that English-speaking users don’t do very often.

Instead, English-speaking users often create passwords consisting of letters only and lean towards certain words or phrases, such as the easy-to-guess “password,” “letmein,” “sunshine,” and “princess.”

Some of the more popular passwords include “abcdef” and “abc123” along with “123456”.

Numeric-only passwords are easier to crack than letter-only passwords because the number combinations are based on only 10 possible digits, as opposed to 26 letters in the modern English alphabet.

But native Chinese speakers sometimes exhibited incredibly complex and inventive passwords: some members of the China Software Developers Network (CSDN) service combined language programming teams with traditional Chinese poetry. The password files used by the researchers contained hashes of leaked or stolen passwords, not plain text versions of the passwords themselves. Researchers have tried to decrypt passwords in both Chinese and English using two modern password cracking algorithms. They tested the Markov chain model, which assigns certain probabilities to password characters based on their relationship to each other, and the Probabilistic Context Free Grammars (PCFG) model, which parses passwords into letter segments, number segments, and character segments before guessing the order of the most probable combinations.

The team also improved upon PCFG’s approach by modifying it to accommodate certain password patterns that are more common to Chinese-speaking users.
For example, they added numeric segments in the popular date format and Chinese names written in the Romanized pinyin system.
They also gave their PCFG-based algorithm the ability to handle interleaving patterns – sequences of alternating numbers and letters – that are found in many Chinese passwords.
Together, these efforts improved the performance of the modified PCFG-based algorithm over Chinese password datasets — it cracked 98-188 percent more passwords than the standard version of the algorithm.


The results also revealed the main strengths and weaknesses of Chinese passwords versus English passwords.
Both types of algorithms crack simpler Chinese passwords than English passwords, if limited to 10,000 guessing attempts or fewer.
But the remaining Chinese passwords were found to be more secure than their English passwords, as the number of guessing attempts exceeded 10,000.
The number of guesses matters because many web services limit the guesses on the network before temporarily blocking a user account.

Leaking or stealing password store files could allow hackers to carry out a theoretically unlimited number of off-line assault attacks because they don’t have to deal with possible blocking of access to a web service.
But even offline guess attacks are still limited by the time and computational cost savings of so many guessing attempts.

Between the two cracking algorithms, the Markov-based algorithm performed best when given the opportunity to make 10,000 or more guesses.
In comparison, the PCFG-based algorithm performs as well or better than the Markov-based algorithm, with fewer guesses.
But the PCFG-based approach also proved to be the most efficient, given that it required 31% less computation and 70% less memory than the Markov-based approach.


From a security perspective, the study is also of great importance for web service companies that have a significant number of Chinese language users, or even companies that hope to someday attract significant Chinese customers as customers.
It is also clear that individual Chinese speakers can do themselves a favor by avoiding predictable numeric patterns like “123456” and “111111” for their passwords, not to mention predictable alphanumeric hybrid patterns based on romantic themes. (The same goes for English speaking people who still use “123456” and “abcdef”).

As part of a deeper dive, the researchers hope to continue evaluating Chinese language password patterns using surveys to better understand what Chinese Internet users think about when creating their passwords.

