About the Chinese Frequency Dictionary
I mostly just created it help figure out what is/isn't worth studying!
As a Chinese learner, it's often hard to know what words are important and what aren't worth your time yet. If you just follow official guides like the HSK list you can make a great start, but once you start branching out and watching a lot of content, very quickly you'll hit things that aren't in those lists. Are they important? Is that just some really rare word you just happened to hit but will never see again? Or something that is going to be really useful often so you should study it?
While there is no 100% correct easy answer to any of those questions, frequency is a fairly decent metric. In the end you'll probably want to learn most words, but if you're just learning you're first ~100 or so words, learning something that's ranked 15,000th is basically pointless right now. You'll never see it again in any reasonable time frame. However if you're vocab size is already 3k, and you see something new that's #1,200th in the list then you definitely should study it as it's really common compared to your level.
Don't get me wrong, frequency lists are NOT everything, but they can be useful. So while I personally don't study words in frequency order in isolation, I do find it helpful to look up the frequency of words I find to see how much I care about them - it's not perfect but it's way better than just guessing!
This website is me just making that process easier by creating a quick way to reference both HSK levels and frequencies from a few different sources in one single spot. Since I built it, might as well share the link as well. Hope you find it useful - I do!
You can view the full raw Frequency lists here if you just want to browse them for some reason.
Is this character frequency? Words? Both?
Words only - I don't care about character frequencies by themselves because I'm trying to learn to speak, and for that I talk in words. Any time you see the frequency for an individual character on this site, it's from when that character is used on it's own (as opposed to a global character frequency). e.g. 有's frequency data is for just from when 有 is used on it's own, and not from when it's also included in say 没有 which is counted as a separate word in the frequency lists. If you want pure character frequencies you'll have to find them somewhere else, sorry!
Why only simplified? Why not support Tradition characters to?
Because all the frequency lists and HSK lists are all only for simplified. Technically you can search for traditional but it'll then bring up the simplified version results. It's just because that's all the data I have. I'm sure traditional lists exist, but I don't know about them and haven't tried to include any. Sadly since this is just a quickly hobby project and setting up traditional doesn't help me personally I'm not going to do it. Also because I'm not learning traditional characters I wouldn't even be able to check it was working properly etc. Sorry trad learners!
Libraries Used:
- CC-CEDICT: Chinese-English dictionary definitions and pinyin.
- Multip Frequency Lists: Character/word frequency information + rankings.
- Make Me A Hanzi: Character decompositions and radical/component information.
- Hanzi Writer: Character stroke order animations.