Related Links

  • Science Fair Project Guide

Project Summary

Difficulty  7  –  10 
Time required Long (a couple of weeks)
Prerequisites An understanding of the material covered in "Paragraph Stats: Writing a JavaScript Program to 'Measure' Text"
Material Availability Readily available
Cost Very Low (under $20)
Safety No issues

Donate to Science Buddies

Sponsor

Sponsored by a generous grant from Symantec Corporation

Internet Safety Tips
Get educated about online safety
with help from Symantec.

symantec.com/norton/familyresources

Abstract

Here's a project where you can try your hand at being a detective with your computer. In this project you'll write a program to do some basic analysis of features of written text (for example, counting the length of each word in the text, or the number of words in each sentence). Then you'll see if you can use the information from your text analysis program to find measurements that can distinguish one author from another. After analyzing known samples of several authors' writings, can your method match up unidentified writing samples with their correct authors?

Objective

The goal of this project is to write a computer program to make some simple measurements on a block of text, and then to see if this information can be used to identify the author of the text.

Introduction

Your English teacher has probably told you that every author has an individual writing style—their own unique 'voice' on the page. Is it possible to find ways to identify that voice through computer analysis of written text?

A familiar case from history argues that it is indeed possible. When our forefathers, newly independent from Great Britain, were debating whether to do away with the Articles of Confederation and adopt the new Constitution written by a convention in Philadelphia, a series of essays was written to argue in favor of adopting the new government. These essays, now called The Federalist Papers, were signed "Publius," but are now attributed to Alexander Hamilton, James Madison, and John Jay. The authorship of 12 of the essays was claimed by both Hamilton and Madison. As Julie Rehmeyer writes in a recent Science News article (Rehmeyer, 2007): "Altogether, researchers have considered more than 1,000 features of writing style. Nearly all the analyses have vindicated Madison."

Relax, you won't need to analyze 1,000 different features for your science fair project. The Science Buddies project, Paragraph Stats: Writing a JavaScript Program to 'Measure' Text, shows you how to write a simple program to measure:

  1. the number of sentences contained in the text,
  2. the number of words in each sentence,
  3. the number of letters in each word,
  4. the average number of words per sentence, and
  5. the average word length.
With some simple modifications to the program, you can count the frequency of each word length and each sentence length in the text. Is this enough information to identify authorship? Try it and find out!

Terms, Concepts and Questions to Start Background Research

To do this project, you should do research that enables you to understand the following terms and concepts:

Questions

Bibliography

Materials and Equipment

To do this experiment you will need the following materials and equipment:

Experimental Procedure

  1. Write the program to analyze text.
    1. For help on writing the JavaScript program to analyze blocks of text, see the Science Buddies project Paragraph Stats: Writing a JavaScript Program to 'Measure' Text.
    2. You may decide that you want to improve the program so that you can make additional measurements. The Variations section has some suggestions for additional measurements, and you will probably come up with others on your own.
  2. Chosse three or more authors and select representative samples of text by each (it's best to use at least 1000 words).
  3. Analyze each text sample with your program.
  4. Experiment with methods of graphing the results to create your own 'writeprint' (Rehmeyer, 2007) for each author.
    1. So that you can make fair comparisons between samples, all of your graphs should share the same scales (i.e., the same range for the x- and y-axes of each graph should be the same). So think carefully when you design your 'writeprint' and make sure that your x- and y-axes are designed to accommodate the full range of possible measurements.
    2. The key is to identify measurements that consistently reveal a difference between authors.
    3. For starters, you may want to try plotting the word length vs. frequency for each author (Mendenhall, 1887).
  5. Have your helper select additional paragraphs from each author. Your helper should also run the analysis on each additional sample, and give you the results, without identifying the authors. Can you determine the author of each unknown sample?

Variations

Credits

Andrew Olson, Ph.D., Science Buddies


Last edit date: 2007-03-23 12:00:00


Career Focus

If you like this project, you might enjoy exploring careers in Computer Science.

Computer Programmer
Computers are essential tools in the modern world, handling everything from traffic control, car welding, movie animation, shipping, aircraft design, and social networking to book publishing, business management, music mixing, health care, agriculture, and online shopping. Computer programmers are the people who write the instructions that tell computers what to do.
  Computer Software Engineer
Are you interested in developing cool video game software for computers? Would you like to learn how to make software run faster and more reliably on different kinds of computers and operating systems? Do you like to apply your computer science skills to solve problems? If so, then you might be interested in the career of a computer software engineer.

Network Systems and Data Communications Analyst
Computers are an important part of our lives. We use computers to hold and process data, to control manufacturing factories, and to surf the Internet. We are all part of many different kinds of computer networks that are continually sharing information. The role of the network systems and data communications analyst is to design, model, and evaluate computer networks so that they can share information seamlessly. This is an exciting career for those people who enjoy working with rapidly changing technology.
  Computer Hardware Engineer
Whether you are playing video games, surfing the Internet, or writing a term paper, computers are an integral part of our daily lives. Computer hardware engineers work to make computers faster, more robust, and more cost-effective. They design the microprocessor chips that make your computer function, along with the equipment that makes computing easy and fun to do.




Join Science Buddies

Become a Science Buddies member! It's free! As a member you will be the first to receive our new and innovative project ideas, news about upcoming science competitions, science fair tips, and information on other science related initiatives.


Support Science Buddies

If this website has helped you, won't you consider a small gift so we may continue developing resources to help teachers and students?

 



 

Science Buddies gratefully acknowledges its Presenting Sponsor
 
It's free! As a member you will be the first to receive our new and innovative project ideas, news about upcoming science competitions, science fair tips, and information on other science related initiatives.


Science Fair Project Home      Our Sponsors      Partners      About Us      Volunteer      Donate      Contact Us      Research Grants & Outreach      Site Map

Science Fair Project Ideas      Science Fair Project Guide      Ask an Expert      Blog      Teacher Resources      Parent Resources      Student Resources      Science Careers      Join Science Buddies     


Privacy Policy Science Buddies

Copyright © 2002-2010 Kenneth Lafferty Hess Family Charitable Foundation. All rights reserved.
Reproduction of material from this website without written permission is strictly prohibited.
Use of this site constitutes acceptance of our Terms and Conditions of Fair Use.