Crack the Code: Breaking a Caesar Cipher
AbstractWhen you hear the word "encryption," you might think about modern computers and things like email and online bank accounts. But did you know that encryption has been around for thousands of years? In this project you will learn about the Caesar cipher, a simple type of encryption that replaces each letter of the alphabet with another letter, and demonstrate how a modern computer can crack this ancient code in just a few seconds.
ObjectiveWrite a program to decrypt text that has been encrypted with a Caesar cipher.
Have you ever wanted to send a secret message to a friend? What if someone, like a parent or a teacher, intercepts the message and reads it? In order to make sure that only your friend could read the message, even if it was intercepted, first you would need to encrypt it. Encryption is the process of encoding a message so only the intended recipient can read it. Encryption is used to protect many of our daily online activities, like emails and credit card transactions, from unauthorized access.
Modern encryption algorithms are very complicated and (ideally) difficult to break. However, encryption has been around for thousands of years—long before computers existed. Leaders throughout history have used various types of encryption to send messages to allied countries and military leaders during wartime. One famous example is the Caesar cipher, used by Julius Caesar in ancient Rome. The Caesar cipher is an example of a substitution cipher, where each letter of the alphabet (in English, 26 letters) is replaced by another letter of the alphabet. This is done by "shifting" the entire alphabet by a certain number of spaces. This number is called the key. For example, here is a shift of 3 (note how the alphabet "wraps around" from the end):
To encode a message, each letter in the original message (called the plaintext) is replaced with the letter directly below it in the shifted alphabet (A becomes D, B becomes E, and so on). The result is called the ciphertext. Here is a plaintext message encrypted using a shift of 3:
|Plaintext:||THIS IS A SECRET MESSAGE|
|Ciphertext:||WKLV LV D VHFUHW PHVVDJH|
In order to share secret messages, you and your friend need to agree on a key in advance. Then, you can use the key to encrypt messages, and your friend can use the same key (shifting the alphabet in the opposite direction) to decrypt them. Anyone who intercepts the messages will be unable to read them if they do not know the key.
But, what if a very determined person wants to crack your code? How could they do it? One major weakness of the Caesar cipher is that it is vulnerable to a brute-force attack, an attack that tries all possible keys to decrypt a message. Since there are only 25 possible keys in English (using a key of 26 gets you back to the original alphabet), for very short encrypted messages it would not take you long to manually try all the keys. For example, here is a short encrypted message (note that this simple version of the Caesar cipher only changes letters; punctuation remains unchanged).
What happens if we try to decrypt this message using a shift of 1? That would mean that during encryption, A became B, B became C, and so on. To decrypt the message, we work backwards (B becomes, A, C becomes B, and so on). If we try this on the entire message, we get this result:
The message is still gibberish, so we know that 1 is not the key (assuming the original message was actually in English!). Can you try to decrypt the message using the other 24 possible keys? Keep trying different keys until you get a sentence that makes sense in English. How long does it take you to do it by hand?
Another method that can be used to crack a Caesar cipher (or any other type of substitution cipher) is frequency analysis. Frequency analysis is based on the fact that certain letters appear with different frequencies in English writing—for example, E usually occurs the most often, followed by T and A; whereas Q and Z appear the least often (Figure 1).
A frequency analysis graph shows how often each letter in the English alphabet is used. The letter j, q, x and z have the lowest usage rates. The letters e, t and a have the highest usage rates.
Figure 1. Letter frequencies in the English language.
For example, look at this encrypted text:
|L PZ AOL TVZA MYLXBLUA SLAALY PU AOPZ ZLUALUJL|
If you count the letters, you will notice that L appears more often than any other letter (9 times). It is therefore a safe guess that L stands for E if this is a substitution cipher and the original message was in English. L is 7 spaces away from E in the alphabet. What happens if you work backwards to decrypt this message using a key of 7 (L becomes E, M becomes F, and so on)?
Doing a brute-force attack or frequency analysis by hand can be easy for very short messages, but can become time-consuming for entire paragraphs or pages of text. This is where writing a computer program to do the work for you comes in handy. In the procedure of this project, you will write your own programs that can first encrypt plaintext using a Caesar cipher, and then attempt to decrypt the text using both a brute-force attack and frequency analysis.
Terms and Concepts
- Caesar cipher
- Substitution cipher
- Brute-force attack
- Frequency analysis
- What is a substitution cipher?
- How does a Caesar cipher work?
- How is a Caesar cipher vulnerable to attacks? Why does this make it a poor choice for modern encryption?
Materials and Equipment
- Computer with programming language of your choice. Python 3 can be downloaded for free from https://www.python.org/downloads/
- Lab notebook
Cybersecurity projects can be fun, but they can also get you in trouble if you are not careful. Make sure you follow these rules when doing a cybersecurity project:
- Do not attack any individual, computer, system, or network without consent from the individual (or person who owns the computer). For example, do not try to guess someone's email password and log into their account unless you get their permission first, or try to hack into a website without permission from the owner of the website.
- Even if you have consent to perform an attack, the attack should be for learning purposes only, and you should help the individual or organization fix any problems you find (this is known as "white hat" hacking). For example, if you are able to guess someone's password, you should tell them they need to pick a stronger password (and help them learn how). Do not read their emails, change any of their account settings, look at private information or files like pictures, or tell anyone else their password.
- If your project involves human subjects, even if you have their consent, you may still need approval from your science fair or an Institutional Review Board (similar to the rules for psychology or medical experiments). See this page for more information.
- Do not pretend to be a different person, company, or other organization online. This includes pretending to be someone else on a social media site, setting up fake websites designed to look like real websites from reputable companies, or sending "phishing" or other emails designed to look like they were sent by someone else. (A controlled experiment where only study participants have access to examples of such websites or emails would be OK.)
- Do not use data that was illegally obtained (for example, contact information stolen from a company's employee database), even if it was stolen by someone else and already posted online.
- Do not publicly post sensitive personal information, even if it was obtained with consent. For example, if your project involves accessing people's contact information (legally), do not post someone's name and address in the "Results" section of your science fair display board. You should destroy any such information (by shredding paper or deleting files) when you are done with your project.
- Do not install or run any malicious software (viruses, malware, spyware, trojans, etc.) on a computer that is connected to the internet. The software could easily spread to other computers and get out of your control.
If you have any doubts or questions about your project, check with your teacher or science fair administrator before you start.
If you are doing this project in Python, you might want to make sure you know how to use the following features of the language before you start (or equivalent features in a program of your choice).
- IF statements
- FOR loops
- The modulo operator (%)
If you get stuck when writing your program, an online search for something general (like "python if statement") or specific (like "how to read a string from a text file in python") will typically give helpful results.
- On your computer, write a sentence or short paragraph (or copy one from this page) and save it as a text file.
- Write a program that:
- Reads a plaintext string from a text file.
- Encrypts the string using a Caesar cipher with a randomly generated key. You can make your program only change the letters A-Z and leave other characters (numbers, punctuation, spaces) unchanged.
- Saves the ciphertext to a new text file.
- Write a program to perform a brute-force attack on the ciphertext. Refer to the background section if you need a reminder about how a brute-force attack works. The program should:
- Load the encrypted string from the text file.
- Try all 25 possible keys to decrypt the ciphertext, saving each result in a new string.
- Look at all 25 resulting strings. Most of them should be gibberish. Do any of them make sense? Can you figure out which one was the correct key?
- Write a program to perform frequency analysis on the ciphertext. Refer to the background section if you need a reminder about how frequency analysis works. The program should:
- Load the encrypted string from the text file.
- Count how many times each letter occurs in the ciphertext, and find the letter that occurs most often.
- Use this information to calculate the key (assuming the most common letter corresponds to the letter E in plaintext).
- Decrypt the text using the key you calculated. Does the resulting plaintext make sense? If not, what do you think went wrong? (hint: be careful with frequency analysis, E might not be the most common letter in individual sentences or short paragraphs)
- Test your program. Search online for text that has already been encrypted with a Caesar cipher (so you cannot "cheat" by already knowing the answer) and try using your program to decrypt it. You can also test your program on the following blocks of text. Which approach works better for each message, brute force or frequency analysis?
Example 1: BNMFQZSTKZSHNMR! XNT GZUD BQZBJDC SGD BNCD! Example 2: UZU PFL KYZEB KYZJ GIFAVTK NRJ WLE? TYVTB FLK KYV
CVRIE DFIV JVTKZFE KF CVRIE RSFLK TRIVVIJ ZE TPSVIJVTLIZKP.
Ask an Expert
This project explores topics key to Industry, Innovation and Infrastructure: Build resilient infrastructure, promote sustainable industrialization and foster innovation.
- Can you expand your Caesar cipher program so it also encrypts other characters (letters, punctuation, spaces)?
- The Caesar cipher is just one type of substitution cipher. There are many other types of substitution ciphers, including more complicated types that are designed to defeat frequency analysis. Can you write a program to encrypt and decrypt messages using a different type of cipher?
- This project requires that you check the results of your decryption program manually to see if the decryption worked. This can still be time consuming if you need to decrypt many separate messages. Can you automate this process? (hint: do a web search for "python check if a word is English")
- Frequency analysis is less reliable for short blocks of text where E might not be the most common letter. Examine various chunks of text (for example, taken from your favorite website or book) of different lengths. On average, how many characters long does a string written in English need to be before E becomes the most common letter?
- This project makes things "easy" for you by telling you that the original messages are written in English and encrypted using a Caesar cipher. In the real world, when you intercept a message, you might have no idea how it was encrypted or even what language it was written in. Can you write a program that attempts to decrypt messages using multiple types of substitution cipher? What about a program that works on messages written in Spanish or another language?
- Find a friend to work with. Write your own encryption algorithm and challenge your friend to crack it, and vice versa.
- Share your program with someone else and use it to encrypt and decrypt messages that you send each other (for example, do the encryption on your computer, send the encrypted text via email, and the recipient can decrypt on their computer). Do you think it is secure to keep using the same key forever? Can you come up with a system to change the key, for example based on the date?
If you like this project, you might enjoy exploring these related careers:
- Science Fair Project Guide
- Other Ideas Like This
- Cybersecurity Project Ideas
- Computer Science Project Ideas
- My Favorites