Philosophy of Technology

Class Blog

Turing Test Revised: The Winograd Schema Challenge

Name: Joseph Young

Net ID: Jay283

Article Link:

Course Related Academic Article: Alan Turing’s “Computing Machinery and Intelligence”

Course Related Academic Article: Ned Block’s “Psychologism and Behaviorism”


Note: My first blog post was on the Turing Test and different levels of testing for artificial intelligence. After stumbling across this article, I wanted to continue my research on the Turing Test and its validity. The selected article sheds light onto the problems of using the Turing Test, and an alternative test for Artificial Intelligence. The problem is not whether the Turing Test is a relevant test, but whether this should be the benchmark for testing artificial intelligence.


With great leaps in artificial intelligence over the past few years, everyone involved within the artificial intelligence community has been using the Turing Test as a benchmark. The growing popularity of the Turing Test within movies and popular culture has also added to the artificial intelligence hype, but that does not come without criticism. The Turing Test was created and published over fifty years ago, and that brings up many questions about the test’s validity and relevance to modern day technology. Technology fifty years ago was undoubtedly different from technology today, and many professors from leading universities have challenged the Turing Test.

Originally proposed in “Computing Machinery and Intelligence,” Alan Turing’s Test is an Imitation Game, where a machine imitates the speech of a human being. A machine and human are placed into individual rooms, asked a series of questions from a judge, then respond to the questions through a text format. The human judge asking the question decides who is the human by how well the questions are answered. The Turing Test examines a myriad of intellectual qualities including natural language processing, logical conversation flow, and low forms of sentient intelligence. The ability for a machine to respond in a fluent manner and have logical dialogue is all part of tricking the human into believing the machine is actually human. Emotions like humor or sympathy would also play an important role in tricking the human judge. All of these tests for intelligence are an important part of the Turing Test, but perhaps the most important part of the test is making a human believe in the machine over another human being. When this task is accomplished the human judge is essentially saying that the machine has more humanistic speech qualities than the actual human.

Unfortunately, the Turing Test’s greatest strength is also its greatest weakness. MIT Technology Review’s Will Knight recently released the article “Tougher Turing Test Exposes Chatbots’ Stupidity,” where Knight critiques the validity of using the Turing Test as a benchmark for artificial intelligence. Knight, like many, believe the problem with the Turing Test is that “it’s often easy for a program to fool a person using simple tricks and evasions” (Knight). Knight believes that the human judge is the part of the Turing Test, which falls apart because humans can be easily tricked through generalized statements or deceitful tactics. This was true for certain dating websites using Chatbots. These websites would tell users what they wanted to hear and users would willingly believe they were talking to a real person. Programmers understand how human beings think and are able to take shortcuts through these tactics and as a result were able to fool many people into believing their Chatbots were human beings.

Knight’s article then focuses around the Winograd Schema Challenge, which “asks computers to make sense of sentences that are ambiguous but usually simple for humans to parse” (Knight). The machine would have to show an understanding of the sentence in order to distinguish the ambiguity and therefore contain intelligence. Knight gives the example sentence “The city councilmen refused the demonstrators a permit because they feared violence,” where “they” has an unclear reference to a subject (Knight). While a computer may not be able to distinguish the difference between “they” in reference to the councilmen or demonstrators, it is clear to human readers what “they” refers to councilmen.

During the Winograd Schema event, the best performing contestants only performed three-percent higher than answers chosen at random. The results from the competition show that artificial intelligence still has a long way to go. In comparison to the Turing Test, where there have been a handful of machines that passed the test according to Turing’s standards, but would have a much more difficult time trying to figure out Winograd Schema sentences.  Another notable factor regarding the Winograd Schema test is that “giving computers common-sense knowledge is notoriously difficult. Hand-coding knowledge is impossibly time-consuming, and it isn’t simple for computers to learn about the real world by performing statistical analysis,” therefore the Winograd Schema Challenge would be able to distinguish between blockhead and a human being. In “Psychologism and Behaviorism,” Ned Block’s describes his creation, blockhead, as an exact output for every input machine replica of a human being. Blockhead was designed specifically to pass the Turing Test, but would undoubtedly have trouble when faced with the Winograd Schema Challenge.

All in all, the Turing Test has its limits for testing artificial intelligence, but it still holds value today. That is to say, there is no design problem with the Turing Test, rather the human aspect of the test is where it fails. The Turing Test tries to trick a human judge into believing that the computer is actually the human, but it turns out that humans may not be the best judge. For now, the Winograd Schema Challenge seems to be the best alternative in bench marking artificial intelligence. The Winograd Schema Challenge is able to remove as much human error as possible, and have very objective results. Technology is yet to reach the point at which is can pass the Winograd Schema Challenge, but with rapid advancements in artificial intelligence, we may soon be able to pass the test!


Works Cited

Block, Ned. “Psychologism and Behaviorism.” The Philosophical Review 90.1 (1981): 5. Web.

Knight, Will. “Tougher Turing Test Exposes Chatbots’ Stupidity.” MIT Technology Review. N.p., 15 July 2016. Web. 30 Nov. 2016.

Turing, Alan. “Computing Machinery And Intelligence.” Readings in Cognitive Science (1950): n. pag. Web.


« »

Philosophy of Technology at NYU Shanghai, a course by Anna Greenspan and Brad Weslake.