80,000-plus characters, one keyboard: China’s fight to join the digital age
Thomas Mullaney’s new book tells the story of the engineers who wouldn’t abandon China’s written heritage for the ease of the keyboard and how their efforts paved the way for auto-complete and artificial intelligence.
If you’ve ever watched anyone text with Chinese characters, you most likely overlooked the chasm dividing the 3,500-year-old writing system from the digital age. It took decades of engineering to make the more than 80,000 characters of written Chinese accessible through the compact alphabetic keyboards that are today’s passkeys to information and commerce.
That history is the focus of the new book The Chinese Computer: A Global History of the Information Age (MIT Press, 2024) by Thomas Mullaney, professor of history in the Stanford School of Humanities and Sciences and co-director of SILICON. The story gets underway in 1959, when an MIT engineer released a prototype of the first Chinese-optimized computer just as Communist leader Mao Zedong proclaimed that China needed to adopt the Latin alphabet to become fully modern. In the three decades after the release of that prototype, called the Sinotype, engineers in China, Taiwan, Japan, and the United States devised a host of ways to reconcile China’s written tradition with keyboard computing.
“This is not a story of Chinese engineers solving a Chinese linguistic puzzle,” Mullaney said. “Rather, it’s a story of engineers all over the world who became transfixed, even haunted, by a wickedly hard problem.”
Their idiosyncratic solutions evolved with computer technology and hit upon some of the now-familiar methods we use to interact with computers today.
“On the surface, the Chinese characters look the same on paper and on a screen,” Mullaney explained. “But if you look under the hood, the technology is completely different. The engineers who made that possible found a way for a fifth of humanity to participate fully in the digital computing age and utterly transformed the technology along the way.”
Using letters to build characters
Mullaney’s new book picks up where he left off in The Chinese Typewriter (MIT Press, 2018). The earlier work charted the difficult engineering problem of making a typewriter—invented in the West and optimized for the Latin alphabet—hold anything close to 80,000 characters and remain compact enough for two hands to manage. The dominant typewriter design, with a carriage that advances after each keystroke, also made it impractical for users to build up characters brushstroke by brushstroke.
Those challenges motivated engineers to adapt electric typewriters—which could pause the carriage or display for users which character they were selecting through long combinations of keystrokes—and, soon thereafter, continue into very early computers.
This is where The Chinese Computer begins. MIT engineer Samuel Caldwell planned for users of his 1959 machine, the Sinotype, to key in the brushstrokes of each character. Caldwell approached the Chinese characters that represent Mandarin, Cantonese, and several other languages with an engineer’s eye, tabulating how many unique strokes they contain and which appear most frequently. That led him to an important discovery: Instead of using a series of keystrokes to compose a character, the keystrokes could be put to more efficient use as the character’s directory address, locating it in the computer’s memory.
For millennia, humans have composed words by carving, writing, or typing them. But rendering Chinese through a keyboard required what Mullaney dubs “hypographic” writing, in which humans do not create characters but rather signal to a device which character or word to retrieve from its memory. Once computers began predicting a human’s next linguistic move, the machines were well on their way to providing auto-complete, predictive text, and language-based artificial intelligence.
The keyboard was not the only hurdle Chinese speakers had to clear before they could go digital, the book attests. Even two decades after Caldwell’s Sinotype, the Apple II managed just 20% of the 256 kilobytes of memory needed to store the 8,000 most common Chinese characters. Engineers would also have to trick device memory into handling more data and re-engineer printing heads and drivers to render Chinese characters legibly before the language became fully digital.
The rise and fall and rise again of phonetic transcription
Mullaney closely follows the technical work done to align Chinese languages with an increasingly digital world. But sociopolitical factors influenced that work from the invention of the first typewriter, and these peek around the corners of his narrative. The fate of pinyin is a prime example.
When Mao called for all Chinese languages to abandon characters, he proposed to use the Latin alphabet phonetically. Vietnam had adopted Latin letters in the typewriter era, and, in the 1950s, Chinese linguists developed a system, called Hanyu pinyin, to transcribe Mandarin Chinese using Latin letters.
But pinyin never worked well as a way to compose Mandarin with a keyboard, Mullaney argues. In addition to significant technical challenges, it required consistent, sustained teaching of the Latin alphabet, which China’s Cultural Revolution made difficult. As China gained political stability in the 1980s and 1990s, incomes and computer memories grew in parallel. China became a can’t-miss market, and pinyin gained new life as a basis for hypographic input writing.
Today, most Chinese-speaking texters and word processors use some form of pinyin to solicit characters. As a user begins to type, the device begins to call up possible matching characters. Mullaney shows in meticulous technological detail how this achievement led computing to current predictive text and AI capabilities.
“By the 1990s, Chinese predictive text—paired with pinyin input—had in effect mastered the art of predicting a user’s very next character,” Mullaney writes, “to the point where a goal began to take shape: the anticipation and prediction of longer compounds and even passages.”
The Chinese Computer ultimately suggests that the algorithmic demands of written Chinese made all computers develop differently than they would have otherwise.
“At an aggregate level, the experiment these engineers undertook was so successful that it devoured itself,” Mullaney said. “It was so successful that it became, in a sense, invisible.”