dïe überblönde passed on this article analyzing the great firewall of china and what it censors. the authors tried a statistical probe to see what terms got banned. i thought the results were pretty interesting:
as the authors note, it's a mixture of inconvenient people, codewords to dodge the censors, and names of foreign media sources. i'm amused by the inclusion of the number "89"; i wonder how hard it would be to get them to block all the digits. (perhaps convincing the forces of good to start using them as codewords might work.)
it's also odd that (as the authors mention) the censorship appears to be all done by hand; it's stochastic rather than deterministic. this suggests me that it may be possible to overwhelm or spoof the censors to the point of rendering them useless. that its, the crude word-bombing attacks people have attempted against (hypothetical) western eavesdropping may actually work on the great firewall.
| Table 1: Search block status on 24 October 2011 of the 20 terms with the highest Twitter/Sina log likelihood ratio scores. Search blocked terms are noted with a †. | ||
| † | term | gloss |
| † | 何德普 | He Depu |
| † | 刘晓波 | Liu Xiaobo |
| 北京市监狱 | Beijing Municipal Prison | |
| † | 零八宪章 | Charter 08 [reformer manifesto] |
| 廖廷娟 | Liao Tingjuan | |
| 廖筱君 | Liao Hsiao–chun | |
| † | 共匪 | communist bandit |
| † | 李洪志 | Li Hongzhi, founder of the Falun Gong spiritual movement |
| † | 柴玲 | Chai Ling |
| † | 方滨兴 | Fang Binxing |
| † | 法轮功 | Falun Gong |
| † | 大纪元 | Epoch Times |
| † | 刘贤斌 | Liu Xianbin |
| † | 艾未未 | Ai Weiwei, Chinese artist and activist |
| 王炳章 | Wang Bingzhang | |
| 非公式 | unofficial/informal (Japanese) | |
| † | 魏京生 | Wei Jingsheng, Beijing–based Chinese dissident |
| 唐柏桥 | Tang Baiqiao | |
| † | 鲍彤 | Bao Tong |
| † | 退党 | to withdraw from a political party |
| Table 2: Sensitive terms with statistically significant higher rates of message deletion (p < 0.001). Source designates whether the sensitive term originates in our Twitter LLR list (T), Crandall, et al. (2007) (C), or Wikipedia (Wikipedia, 2011) (W). | |||||
| δw | deletions | total | term | gloss | source(s) |
| 1.000 | 5 | 5 | 方滨兴 | Fang Binxing | T |
| 1.000 | 5 | 5 | 真理部 | Ministry of Truth [official propaganda] | T |
| 0.875 | 7 | 8 | 法轮功 | Falun Gong | T |
| 0.833 | 5 | 6 | 共匪 | communist bandit | T, W |
| 0.717 | 38 | 53 | 盛雪 | Sheng Xue | C |
| 0.500 | 13 | 26 | 法轮 | Falun | T, C, W |
| 0.500 | 16 | 32 | 新语丝 | New Threads | C |
| 0.379 | 145 | 383 | 反社会 | antisociety | C |
| 0.374 | 199 | 532 | 江泽民 | Jiang Zemin | T, C, W |
| 0.373 | 22 | 59 | 艾未未 | Ai Weiwei | T |
| 0.273 | 41 | 150 | 不为人知的故事 | “The Unknown Story” | W |
| 0.257 | 119 | 463 | 流亡 | to be exiled | W |
| 0.255 | 82 | 321 | 驾崩 | death of a king or emperor | T |
| 0.239 | 120 | 503 | 浏览 | to browse | C |
| 0.227 | 112 | 493 | 花花公子 | Playboy | C, W |
| 0.226 | 167 | 740 | 封锁 | to blockade | W |
| 0.223 | 142 | 637 | 大法 | (sc. Falun) Dafa | W |
| Table 3: Deletion rates of terms from Crandall, et al. (2007), previously reported to be blocked by the GFC, that appear frequently (over 100 times) in our sample. Terms that are currently blocked on Sina’s search interface are noted with a †. | |||||
| † | δw | deletions | total | term | gloss |
| † | 0.20 | 88 | 443 | 中宣部 | Central Propaganda Section |
| † | 0.20 | 24 | 120 | 藏独 | Tibetan independence (movement) |
| 0.19 | 30 | 154 | 民联 | Democratic Alliance | |
| † | 0.18 | 132 | 733 | 迫害 | to persecute |
| 0.18 | 124 | 686 | 酷刑 | cruelty/torture | |
| 0.18 | 80 | 457 | 钓鱼岛 | Senkaku Islands | |
| † | 0.18 | 28 | 153 | 太子党 | Crown Prince Party |
| † | 0.17 | 102 | 592 | 法会 | Falun Gong religious assembly |
| † | 0.17 | 88 | 526 | 纪元 | last two characters of Epoch Times |
| 0.17 | 56 | 333 | 民进党 | DPP (Democratic Progressive Party, Taiwan) | |
| 0.16 | 142 | 863 | 洗脑 | brainwash | |
| † | 0.16 | 42 | 256 | 我的奋斗 | Mein Kampf [why?] |
| † | 0.15 | 83 | 567 | 学联 | Student Federation |
| 0.15 | 32 | 208 | 高瞻 | Gao Zhan | |
| 0.14 | 51 | 360 | 无界 | first two characters of circumventing browser [ie, one with a VPN to tunnel around the great firewall] | |
| 0.14 | 36 | 250 | 正念 | correct mindfulness | |
| † | 0.14 | 28 | 198 | 天葬 | sky burial |
| 0.14 | 17 | 122 | 文字狱 | censorship jail | |
| 0.13 | 90 | 677 | 经文 | scripture | |
| † | 0.12 | 91 | 732 | 八九 | 89 (the year of the Tiananmen Square Protest) |
| † | 0.12 | 67 | 564 | 看中国 | watching China, an Internet news Web site |
| † | 0.11 | 35 | 310 | 明慧 | Ming Hui (Web site of Falun Gong) |
| † | 0.10 | 56 | 582 | 民运 | democracy movement |
as the authors note, it's a mixture of inconvenient people, codewords to dodge the censors, and names of foreign media sources. i'm amused by the inclusion of the number "89"; i wonder how hard it would be to get them to block all the digits. (perhaps convincing the forces of good to start using them as codewords might work.)
it's also odd that (as the authors mention) the censorship appears to be all done by hand; it's stochastic rather than deterministic. this suggests me that it may be possible to overwhelm or spoof the censors to the point of rendering them useless. that its, the crude word-bombing attacks people have attempted against (hypothetical) western eavesdropping may actually work on the great firewall.