Frequency Lists
The source data that the Chinese Frequency Dictionary runs off.
The frequency lists used are just ones I've stumbled upon around the internet. Are they the best? No idea. Do they work well enough for my purposes? Yes, absolutely! Here are the sources:
- HSK2 (2012) List: Official HSK 2 Proficiency Test list. You can find many versions in many places, I got it from this CSV on github.
- HSK3 (2021) List: Official HSK 3 Proficiency Test list. I copied it from this CSV on github.
- SUBTLEX-CH: Chinese word frequencies based on subtitles from movies and TV shows. Read all about it and get the data from: Bhent University SUBTLEX-CH site.
- The Beijing Language and Culture University: Multiple lists from multiple sources, I took the Global and Weibo lists because it seemed like a decent spread of sources. I got the data from this post on the Pleco forums.
Where rank 1 is most common, and higher numbers are less common.
Where rank 1 is most common, and higher numbers are less common.