As usual, do this assignment on the SEASnet GNU/Linux servers lnxsrv11, lnxsrv13, and lnxsrv15, with /usr/local/cs/bin prepended to your PATH.
If you need a hint, ask a TA or an LA. This assignment is not intended to be done without any hints.
The basic idea here is to get a mental model of how Emacs works by looking at a bit of its keybindings and source code.
Start up a fresh Emacs with a *scratch* buffer.
To warm up, compute (2607 − 1) × (2607 − 1) (i.e., 2**(607 - 1) * (2**607 - 1)) in the *scratch* buffer, by using the expt and other functions. This is the 14th perfect number, discovered in 1952.
Use Emacs to determine how many bits it would take to represent this number in base-2 notation (not counting any sign bit), by writing a Lisp expression that yields the number of bits as an integer.
Type M-: and use it to compute (2607 − 1) × (2607 − 1).
Get a list of keybindings by typing C-h b.
Look for two keybindings: C-h k and M-SPC. C-h k stands for “Type Control-h, then ‘k’.” M-SPC is “Meta Space”; on good keyboards you can get this by holding down Alt while hitting the space bar, but you may need to type “Esc” and then follow by hitting the space bar. We will examine these two keybindings in more detail.
Type C-h k C-h k and describe what happens and why. (This should relate to the C-h b output mentioned previously.)
Type C-h k M-SPC and describe what happens and why. (This should also relate.)
Try out M-SPC on some sample text with a lot of white space, to see how it works.
Visit the source code for the function that implements M-SPC, by going to its help and clicking (or typing RET) on its source file name.
Notice how M-SPC is implemented in terms of a more-general function, which does not have a keybinding. Use M-: to execute this more-general function on a buffer, such that the function changes the buffer’s contents.
Similarly, use M-x to execute the more-general function on a buffer.
Use the Emacs command M-x what-line and see what it does.
M-x what-line simply tells you what line you are on, not how many lines are in the buffer. Design and implement a command M-x gps-line that acts like M-x what-line except that it says “Line 27/106” in contexts where M-x what-line would merely say “Line 27”; here, it’s assumed the buffer has 106 lines. Do this by using C-h f to get help about what-line, navigating through that help to find its source code, putting a copy of the source code into a new file gps-line.el, editing that file, loading it into Emacs, and then executing your new command.
When counting all the lines in a buffer, simply count the number of newline characters that it contains. This means that if a buffer ends in a non-newline, you should not count the characters after the last newline to be part of another line. Also, an empty buffer has zero lines.
Test your function on buffers that do not end in newline. Your function should be able to say things like “Line 1/0” and “Line 3/2”.
Consider the Python 3 script randline.py. Read it and understand what it does.
randline.py
.
What happens when this script is invoked on a non-empty file?
What happens when this script is invoked on an empty file like
/dev/null
, and why?
Note that this script was adapted from an older Python 2 version: randline_old.py
. Examine how this version differs from the newer one. (Hint: you can try using the diff command for this!)
SEASnet no longer has Python 2 installed, so try running randline_old with Python 3 instead. What happens, and why?
Use Emacs to write a new script shuf.py
in the style
of randline.py
but using more modern Python 3 instead.
Your script should implement the GNU
shuf
command that is part of GNU Coreutils.
GNU shuf
is written in C, whereas
you want a Python implementation so that you can more easily add
new features to it.
Your program should run on /usr/local/cs/bin/python3
as installed on SEASnet.
Your program should support the following shuf
options, with the same behavior as GNU shuf
:
--echo
(-e
),
--input-range
(-i
),
--head-count
(-n
),
--repeat
(-r
),
and --help
.
As with GNU shuf
, if --repeat
(-r
)
is used without --head-count
(-n
),
your program should run forever.
Your program should also support
zero non-option arguments or
a single non-option argument “-
”
(either of which means read from standard input),
or a single non-option argument other than “-
”
(which specifies the input file name).
Your program need not support the other options of GNU shuf
.
As with GNU shuf
, your program
should report an error if given invalid arguments.
Your solution should use the argparse
module
instead of the obsolescent optparse
.
It should not import any
modules other than argparse
,
string
and the non-optparse
modules that randline.py
already imports.
Don’t forget to change its usage message to
accurately describe the modified behavior.
The Python 3.11 release notes say that Python 3.11 is significantly faster
than older releases. Can you measure the performance difference?
Use Bash’s time
command to compare the performance
of your implementation when run via
SEASnet’s /usr/bin/python3
(which should predate Python 3.11),
versus running it via /usr/local/cs/bin/python3
(which should be Python 3.11 or later),
versus running Coreutils /usr/local/cs/bin/shuf
.
Use Bash commands like the following to time these three benchmarks (this
example is for Coreutils, and assumes /usr/local/cs/bin
is at the start of your PATH
):
time shuf < /usr/share/unicode/ucd/BidiTest.txt > /dev/null
For each of these three benchmarks, run the benchmark at least three times
on the text file shown above,
and report the median of the sum of the user and system times.
Do your benchmarks on the same SEASnet host, and document the CPU
and operating system version of the SEASnet host you used
by consulting the lscpu
command and
the /etc/os-release
file.
If your Python implementation runs on /usr/local/cs/bin/python3
but not /usr/bin/python3
, do not benchmark it on the latter;
instead, briefly explain which features of the newer Python your program
relies on, and why.
Submit the following files within a compressed tarball named assign2.tgz.
All files other than the .drib files should use GNU/Linux style, i.e., UTF-8 encoding with LF-terminated lines.
The shell command:
tar -tvf assign2.tgz
should output a list of file names that contains gps-line.el etc., with sizes and other metainformation about the files.