Time due: 11:00 PM Monday, May 19
Go through the following sections of the class zyBook, doing the Participation Activities and Challenge Activities. We will be looking at whether you have ever successfully completed them; it does not matter how many attempts you make before a successful completion (or how many attempts you make after a successful completion if you want to experiment).
You should complete this part of the assignment as soon as possible, long before the stated due date, to give you ample time for Part 2.
Top-level officials in the Heard Island government, who are supposed to use secure government channels to communicate about classified information, instead carelessly used an unauthorized messaging app for a group chat about a sensitive operation. They picked a bad app; although it encrypts the messages, it uses a very weak form of encryption. Agents of their rival, the government of Île des Pingouins, have intercepted some of the encrypted messages and have hired you to assist with decrypting those messages.
Each of the encrypted messages you have been given has been encrypted using one of the oldest known encryption schemes: a simple substitution cipher. In this scheme, each letter in an original plaintext message is consistently replaced by a letter to produce a ciphertext message (e.g., every A is replaced by N). To be reversible, different plaintext letters are not replaced by the same ciphertext letter (e.g., if every A is replaced by N, no other letter will also be replaced by N). It is allowable for a letter to be replaced by itself (e.g., every M is replaced by M). If the sender and receiver have agreed on the substitution scheme (the key), the receiver can easily decrypt the encrypted message. As an example, suppose the key is this:
ABCDEFGHIJKLMNOPQRSTUVWXYZ plaintext letters NRWZKXCHFBOIMTGVJLYADEPQSU corresponding ciphertext letters
Then the plaintext message KRILL AND SQUID
would be
encrypted as OLFII NTZ YJDFZ
.
Simple substitution ciphers are very insecure; their cryptanalysis (recovering the plaintext message from the ciphertext message without knowing the key) is not difficult. It's even easier if the cryptanalyst can use a known plaintext attack, one in which there is a word or phrase that is known (or strongly suspected) to occur in the message that was encrypted. This known word or phrase is a crib.
For this project, you will write a function that will take a string containing ciphertext messages all encrypted with the same key, and a crib that may occur in one of the messages. The output will be the ciphertext messages with plaintext letters substituted for ciphertext letters to the extent that they can be determined from the crib. As an example, suppose there are four ciphertext messages:
Is udbmrx cgkqxs Crs xsu Ncgebmcgu qu ubbg qu nbuucvrs. Dsqyx Curqgx icrr icg wby umys -- is zqg'j rbus! Cj ibmrx zbuj usksg tcrrcbg rqgjsyg wcud ZU 31 cu zdqrrsgecge!
and the crib is Ile des Pingouins
. The only ciphertext
fragment that could possibly be an encryption of the crib (because it's the
only phrase that has words of the right length with the right pattern of
repeated letters) is Crs xsu Ncgebmcgu
. That implies that
ciphertext c
corresponds to plaintext i
,
r
decrypts to l
, etc. Your program would output
iE SdOULD INkqDE ILE DES PINGOUINS qS SOON qS POSSIvLE. dEqyD ISLqND iILL iIN wOy SUyE -- iE zqN'j LOSE! Ij iOULD zOSj SEkEN tILLION LqNjEyN wISd zS 31 IS zdqLLENGING!
The case of letters in ciphertext and the crib is irrelevant. (For example,
ciphertext nCGebMcGU
matches crib PiNgOUIns
; the
fact that the ciphertext's n
may be lower case and the crib's
corresponding P
may be upper case is irrelevant.) However, when
you output the (partially) decrypted messages, all plaintext letters must be
written in upper case, while remaining ciphertext letters that could not be
determined from the crib must be written in lower case. All non-letter
characters (punctuation, digits, blanks, newlines, etc.) must be written
unchanged.
The function you implement to do this must have the following prototype:
bool decrypt(const char ciphertext[], const char crib[]);
The parameter ciphertext
is a single C string with all the
encrypted messages, separated by newline characters. (There might or might
not be a newline character after the last message.) For example, a caller
who wanted to decrypt the two messages
Nmgcud dct wby tcudqgxrcge zrquucwcsx cgwbytqjcbg?
could pass as the first argument "Nmgcud dct wby
tcudqgxrcge\nzrquucwcsx cgwbytqjcbg?"
or "Nmgcud dct wby
tcudqgxrcge\nzrquucwcsx cgwbytqjcbg?\n"
. You may assume (and thus
don't have to check) that the ciphertext will contain no more than 50 newline
characters, and that no message within the ciphertext will be longer than 120
characters (not counting a newline at the end of the message). In other
words, there will never be more than 120 characters between two newlines in
the ciphertext or before the first newline or after the last newline. It is
possible that a message has no words (e.g., is empty, or has, say, digits and
spaces but no letters).
The parameter crib
is a C string that denotes the crib, the
sequence of one or more words that appear consecutively in order in at
least one of the ciphertext messages. (For this spec, we define a
word to be a sequence of one or more letters.) One or more
blanks (i.e., ' '
characters) separate words in the crib;
non-letter characters in crib
are to be treated as if they were
blanks. Thus, the crib "hush-hush until May 29, 2025"
should be
treated the same as "hush hush until may"
would be, as
indicating the sequence consisting of those four words. You must
not assume any particular limit to the possible length of
the crib string argument that is passed to the function.
If the crib string has no words, or if no ciphertext fragment in any
message could possibly be an encryption of the crib, the
decrypt
function returns false without writing anything to
cout
. Otherwise, it writes to cout
the
(partially) decrypted messages as described above and returns true. The
decrypt function must not cause any other output to be written to
cout
. If more than one ciphertext fragment is a possible
encryption of the crib, then choose any one of those matching fragments as
the match for the crib. For example, if the ciphertext string were
"Rzy pka mjr"
and the crib were "the dog"
, then the
output would be exactly one of THE DOG mjT
or Gzy THE
DOG
, your choice.
A crib word must match an entire ciphertext word. The crib word
"aba"
matches "cdc"
in "cdc ef"
,
but not in "cdcef"
or "efcdc"
. A match for the
crib does not span multiple messages. For example, if the ciphertext string
were "bwra wmwt\nqeirtk spst\n"
, and the crib were "alan
turing"
, the "wmwt"
from the first message and the
"qeirtk"
from the second are not considered a match for the
crib.
A word is a sequence of letters only, so the crib "dog"
would
not match anything in the ciphertext "ew'q p-aj"
, but
the crib "he"
could match either ew
or
aj
in that ciphertext. As another example, the crib
"s cloak and"
matches something in the ciphertext
"Kpio't dmpbl-boe-ebhhfs opwfm"
; the partially decrypted
plaintext would be written as "kOiN'S CLOAK-AND-DAhhfs NOwfL"
.
All the preceding rules imply that all of these crib strings should be treated the same way:
"hush-hush until May 29, 2025" " hush:-)hUSh--- --- until mAY !! " "hush hush until may"
and would match something in the ciphertext string
"Eafc fc cfggh! zyxZYXzyx--Abca abCa bdefg## $$kqh6437 wvuWVUwvu\n\n8 9\n"
causing the partially decrypted plaintext of that string to be written the same way that
"THIS IS SILLY! zyxzyxzyx--HUSH HUSH UNTIL## $$MAY6437 wvuwvuwvu\n\n8 9\n"
would be written.
Your decrypt
function and any functions you write that
it calls must not use any std::string
objects (C++
strings); you must use C strings. Your program must not use any
standard library containers (such as vectors, maps, etc.).
Note: Some algorithms that you might consider for your
decrypt
function may appear at first to require that you
assume a limit on the length of the crib string. We prohibited that. But
we gave you permission to assume that the maximum length of any message
within the ciphertext string is 120, so you know that a crib string that
could possibly match only messages longer than 120 characters could not
possibly match any of the ciphertext messages; for crib strings like that,
you could return false without any further analysis. Thus, if you think
about it a little, you can determine maximum limits known at compile time for
any auxiliary arrays and C strings you might want your decrypt
function to declare.
Standard C++ requires that the number of elements in an array you declare to be known at compile time. Since the g31 command on cs31.seas.ucla.edu enforces that requirement, and your program must run under that compiler, you must meet that requirement. Thus, you must not do something like this:
bool decrypt(const char ciphertext[], const char crib[]) { char a[strlen(crib)]; // Error! strlen(crib) not known at compile time
The decrypt
function is the only function you are required to
write. You may write additional functions as part of your solution if you
wish. While we won't test those additional functions separately, their
use may help you structure your program more readably. Of course, to test
your decrypt
function, you'll want to write a main routine
that calls it. During the course of developing your solution, you might
change that main routine many times. As long as your main routine
compiles correctly when you turn in your solution, it doesn't matter what
it does, since we will rename it to something harmless and never call it
(because we will supply our own main routine to throroughly test your
decrypt
function).
Your decrypt
function and any functions that it calls must not
cause anything to be read from cin
. They must not cause
anything to be written to cout
other than the (partially)
decrypted messages required by this spec. If you want these functions to
write things out for debugging purposes, write to cerr
instead of
cout
. When we test your program, we will cause everything
written to cerr
to be discarded instead — we will never
see that output, so you may leave those debugging output statements in
your program if you wish.
Your implementation must not use any global variables whose values may be changed during execution.
Your program must build successfully under both g31 and either Visual C++ or clang++.
The correctness of your program must not depend on undefined program behavior.
Your program could not, for example, assume anything about t
's
value, or even whether or not the program crashes:
int main() { char t[6]; strcpy(t, "Enigma"); // too long: 7 chars including '\0' …
Here's an example of a main routine that performs some simple tests of the decrypt function:
void runtest(const char ciphertext[], const char crib[]) { cout << "====== " << crib << endl; bool result = decrypt(ciphertext, crib); cout << "Return value: " << result << endl; } int main() { cout.setf(ios::boolalpha); // output bools as "true"/"false" runtest("Hirdd ejsy zu drvtry od.\nO'z fodvtrry.\n", "my secret"); runtest("Hirdd ejsy zu drvtry od.\nO'z fodvtrry.\n", "shadow"); }
The output of running the program with this main routine would be the
following. (Only two of the lines below are written by decrypt
,
of course; the others are written by runtest
.)
====== my secret hiESS ejsT MY SECRET oS. o'M foSCREET. Return value: true ====== shadow Return value: false
You won't turn anything in through the CS 31 web site for Part 1; the zyBook system notes your successful completion of the PAs and CAs. For Part 2, you will turn in a zip file containing these two files and nothing more:
A text file named decrypt.cpp that contains the source code for your C++ program. Your source code should have helpful comments that tell the purpose of your data structures and program segments, and explain any tricky code.
By May 18, there will be links on the class webpage that will enable you to turn in your zip file electronically. Turn in the file by the due time above.
Although the program you turn in must use C strings and is forbidden from
using C++ strings, you can experiment with ideas for doing this project
without that restriction. For example, you could create an experimental
project (that you will not turn in) and pretend the required function is
bool decrypt(const string ciphertext, const string crib)
.
(This experimental version declares the parameters as const
string
s so that your experimental implementation doesn't try to modify
them in any way even though they are copies of the caller's arguments. This
is because the real function has const char[]
parameters, which
won't allow the caller's arguments to be modified.)
You could work out a lot of what you need to do for this project using C++ strings without the distraction of having to wrestle with C strings. Use what you learn from the experimental project when writing the real project that uses only C strings. Warning: It may not be wise to try to completely finish the experimental C++ string version before even starting the real C string version; it might take you more time than you thought to figure out how to work with C strings, so you might not have anything working in the C string version that you must turn in. Instead, when you have just a few things working in a C++ string version, try implementing them and getting them to work in the C string version, so you'll know how much time it takes you to translate from using C++ strings to C strings. Get more things working in the C++ string version, then in the C string version. Once you're comfortable with C strings, you might abandon the experimental version and continue on with just the real C string version. (Or not; maybe you prefer to continue working out each new bit with C++ strings first before implementing it with C strings.)
Microsoft made a controversial decision to issue by default a warning in
some cases when your code uses certain functions from the standard C and
C++ libraries (e.g., strcpy
). These warnings call those
functions unsafe and recommend using different functions in their place;
those other functions, though, are not Standard C++ function, so will
cause a compilation failure when you try to build your program under g31
or clang++. Therefore, for this class, we want to use functions like
strcpy
without getting that warning from Visual C++; to
eliminate the warning messages, put the following line in your program
before any of your #include
s:
#define _CRT_SECURE_NO_WARNINGS
It is OK and harmless to leave that line in when you build your program using g31 or clang++ and when you turn it in.
If you declare a large array in a function, Visual C++ issues a harmless warning C6262: Function uses 'NNNNN' bytes of stack: exceeds /analyze:stacksize '16384'. Consider moving some data to heap., where NNNNN is some number. You can eliminate that warning by adding this line at the top of your program:
#pragma warning(disable : 6262)
It is OK and harmless to leave that line in when you build your program using g31 or clang++ and when you turn it in, even if you get a warning about the pragma being ignored.
Alternatively, in Visual Studio, select Project / yourProjectName properties, then select Configuration Properties / Code Analysis / General, and then in Code Analysis's stacksize, modify 16384 to, say, 100000.
If your program dies under Visual C++ with a dialog box appearing saying
"Debug Assertion Failed! ... File: ...\src\isctype.c ... expression:
(unsigned)(c+1)<=256", then you called one of the functions defined by
<cctype>
, such as isalpha
or
tolower
, with a character whose encoding is outside the range
of 0 through 127. Since all the normal characters you would use (space,
letters, punctuation, '\0'
, etc.) fall inside that range, you're
probably passing an uninitialized character to the function. Perhaps
you're examining a character past the '\0'
marking the end of
a C string, or perhaps you built what you thought was a C string but
forgot to end it with a '\0'
.