Course: CIS451/651 Data
Compression in Multimedia
Professor: Paul D. Amer
Semester: Spring 2009
Title: Project 2 - Simple Run Length Encoding/Decoding
Due Date:
Description
(40 points) Company "UD-Banners-and-Art" finds it necessary to
store billions of ASCII art files and banner-like posters (e.g., see unix command "banner" to see how to make a banner
using only a single character ). To save disk
space, a promising CISC451/651 student proposes the following simple run length
coding scheme. For a given ASCII input file, each "run" of identical
characters in an original file is compressed into a 'RunLength
- Character' pair. In the following example, an original file contains 33
bytes including 2 bytes for blanks and 1 byte for a newline character indicated
by "<nl>". Count the 33 bytes
below:
original file named “example.txt” : AAAbbbbbbb YYYYYYYYYYYY6XXXY!!!<nl>
Using the following self-describing compressed file
organization specification:
typedef struct
{
ASCII-OCTET Signature[7];
/* header signature (always ASCII "amerRLE")
*/
ASCII-OCTET
Version;
/* amerRLE format version (initially ASCII
"1")*/
ASCII-OCTET AlphabetLength[3]; /* size
alphabet (ASCII "001" to "128")
*/
ASCII-OCTET Alphabet[AlphabetLength];
/* input alphabet
*/
struct
{
ASCII-OCTET RunLength; /*
# of times Character repeats ("1" to "9")*/
ASCII-OCTET Character; /* character repeated RunLength times */
} DataPair[]; /*
variable number of pairs; different for */
/*
each file being
compressed
*/
} AMERRLE;
the full compressed file “example.txt.ARLE” would contain 39 bytes and look
like (remember <nl> represents 1 byte):
amerRLE1008Ab
Y6X!<nl>3A7b2 9Y3Y163X1Y3!1<nl>
If you were to print the file “example.txt.ARLE”,
you would see 3 lines of output (note: the third line is blank):
amerRLE1008Ab Y6X!
3A7b2 9Y3Y163X1Y3!1
For this example, the original 33 byte file gets into 39 bytes (ie, negative compression!)
Tasks
- Write well-commented,
perfectly-indented Java programs: amerRLE_encode
and amerRLE_decode. After
discussing the general concepts of this compression method, students
working in a group of 2 should have one person program amerRLE_encode, and the
other person program amerRLE_decode.
Try not to interact until each program has been initially
debugged. Then interact to get both
programs working properly. I expect
that variable names, indentation style, etc., will differ significantly
between the coder and decoder since each is written by a different
programmer.
- amerRLE_encode takes a single argument -
a <filename> to be compressed.
As output, amerRLE_encode
produces <filename>.ARLE If the output file
already exists, prompt the user to see if the output file should be
overwritten, or simply abort compression and output a warning. If
the input file is an invalid file for being compressed, output a warning
"Error: file cannot be compressed because <reason>" and
abort. An input file would be invalid if, for example, it contains
any byte value in the range 128-255.
- amerRLE_decode takes a single argument -
a <filename>.ARLE to be decompressed. As output, amerRLE_decode
produces <filename>.RECONSTRUCTED If the output
file already exists, prompt the user to see if the output file should be
overwritten, or simply abort decompression and output a warning. The input file
to be uncompressed MUST have the extension ".ARLE" and
its format MUST be valid. If the input is not a valid amerRLE compressed file, output a meaningful error
message, and abort decompression. For example, a file to be
uncompressed would be invalid if the signature or version is incorrect, or
if any DataPair.Character is not defined in the
Alphabet, or if the DataPair.RunLength is not
within "1"-"9". Other examples of invalid files
exist. Think of others, and carefully document which errors your
code tests.
- Strong Recommendation: before
beginning any programming, make up examples by hand of what you think are
valid and invalid .ARLE files. Then use these files to help
debug your programs.
What to Turn In
- (submit) Use the unix
command “script” to show your programs work correctly:
- start script
- cat amerRLE_encode.java
- cat amerRLE_decode.java
- compile both programs
- for 3 different testfiles, do the following: /* the first test file must be
apple */
- cat <testfile>
- amerRLE_encode <testfile>
- cat <testfile>.ARLE
- amerRLE_decode <testfile>.ARLE
- cat <testfile>.RECONSTRUCTED
- diff <testfile> <testfile>.RECONSTRUCTED
- ls -l
<testfile>*
- (submit) Written performance
evaluation report (maximum 2 pages - strictly enforced!).
Evaluate the performance of amerRLE
compression as a compression scheme for ASCII art and banner files.
(You are not evaluating amerRLE as
a compression scheme for all files, only as a scheme for ASCII art
and banner files.) Your evaluation should be written at a technical
level. Do NOT restate the assignment description in your
report. You may use some of my
test
files
containing ASCII art. IMPORTANT: You must find or create at least 5 additional ASCII art and banner
files to use in your evaluation.
(Check out http://ascii.mastervb.net/ and http://www.glassgiant.com/ascii/)
In your report, provide a summary table of at least 20 uncompressed vs. compressed files.
- (submit) In your report, discuss the
theoretical best (i.e., maximum or most positive) and worst (i.e., minimum or most negative)
compression that can possibly result from amerRLE encoding a file with N
characters. (Ignore the fixed overhead for the signature, version,
alphabet length and alphabet.)
- (submit) For this assignment, after
a full effort at debugging and testing, each group should test their code
with one other group as follows. Group A encodes a file which is
then decoded by Group B. Is the decoded .RECONSTRUCTED file the same as the original? Then reverse roles
using Group B's encoder and Group A's decoder. Use the unix command ‘diff’
to verify that the original and .RECONSTRUCTED files are
identical. In your report, document
with which group you tested your code, and what happened (i.e., did all
code work the first time? if not, what was
the problem? Be honest in documenting what happens; there is no
grade penalty for programs that did not initially work correctly.)
- WARNING: Be careful copying test
files from another group. Most file transfer programs have the annoying habit of introducing/deleting newline characters when transferring files between Windows and Unix systems using ASCII mode (often the default). Using ftp in "binary" mode avoids this problem. Verify that after your transfer that the numbers of bytes in the original and copied files are equal.
Notes
- If a character run is of
length 10 or more, compress the run into multiple data pairs.
- An uncompressed file may contain
any of the valid 128 ASCII characters. Remember that <nl> is a valid ASCII
character. The compressed file's alphabet should consist of
one instance of each character in the uncompressed file.
- Programs must be written in
Java. If you do not know Java, now is the time to learn.
- The AMERRLE struct specified above is an abstract pseudo-code
specification to help each student understand the assignment; it is not a
required item for your implementation. (The variable names are
strongly recommended.)
- Students are expected to
work in groups of 2 unless given the instructor's permission to work
alone. Working together means just that; during the
coding/debugging/testing, both students are physically working on
the assignment in the same room. Each group should turn in one
listing and evaluation. Each
student must be able to explain all code if asked to do so by the
professor or TA. If a student
cannot explain both the coder and decoder programs, the student will receive no
credit for the assignment.
- Grading will be based on:
- Code correctness and
testing (60%)
- Code documentation
and readability (see Amer's Coding
Standard for hints) (25%)
- program uses meaningful variable
names (using meaningless generic names such as {data, count, flag, stop, input, output, infile, outfile, value, returnValue, calculatedValue, answer, number1, number2} will result in loss of credit
- programs's indentation matches the code's control flow
- program code is explain with meaningful comments
- Note: source listings submitted with lines
that wraparound will be returned without grading
- Performance
evaluation (15%)