Page 1 of 1

Computer Sleuth: Questions about Project

Posted: Wed Sep 02, 2015 7:35 pm
by [Div]
https://www.sciencebuddies.org/science- ... shtml#help

Hi, I'm planning to work on the project above but I am wondering how I should get my research and what do I have to account for when writing my code to get data and later be able to compare it to other works. I was thinking of finding the average word length, the amount of 4-6 letter words, counting words that are longer than 7 or 8 letters, but I don't know if I should have a table of common words, how many sentences and sentence length. I'm wondering what else can I have or look at reference of seeing what other people have used to make identification easier.

Thanks, would mean a lot if I can get some help on this and then start right away.

Re: Computer Sleuth: Questions about Project

Posted: Mon Sep 07, 2015 8:18 pm
by deleted-249560
You might want to do some web searching on stylometry and read some of the research papers on how the specific algorithms work. What you're suggesting though is a great start. Keeping counts of the numbers of words of each length, how often are words repeated, how many appear in a common word list - those are all really good ideas.

You can also take advantage of many classical book available through Project Gutenberg (https://www.gutenberg.org/). If your methods work, you should be able to compare a couple of Shakespeare works and see similarities but see differences in style when compared to Jules Verne.

This is a really fun area of study. I worked on a commercial product some years back that was intended to help people in a company learn a company's specific style when they write documents. I think you'll enjoy the project. Please write back if you have questions or just to tell us your progress.

Howard