class HuffmanIO

TO DO: document

Constants

FLUSH_WAIT

Iterate tight loop at most 2**FLUSH_WAIT times waiting on file flush.

Attributes

decoded[R]

IO to which the decoder writes.

encoded[R]

IO to which the encoder writes and from which the decoder reads.

symbols[R]

IO from which the encoder reads symbols.

Public Class Methods

new(argument) click to toggle source

Open IO objects :symbols, :encoded, :decoded according to `argument'.

If `argument' matches '–string=<string>' then the IO objects are of class StringIO, with :symbols initialized to <string>, and with :encoded and :decoded empty.

Otherwise, the IO objects are of class File, with `argument' the path of the :symbols File, and with `argument' extended with '.encoded' and '.decoded' the paths of :encoded and :decoded, respectively.

The :symbols IO object is read-only. The other IO objects are opened to write (“w+”), but are also readable.

# File huffman.rb, line 105
def initialize(argument)
    if /^--string=/.match(argument) then
        string = argument.sub(/^--string=/, '')
        @symbols = StringIO.new(string, 'r')
        @decoded = StringIO.new('', 'w+')
        @encoded = StringIO.new('')
    else
        begin
            @symbols = File.new(argument, 'r')
            @decoded = File.new(argument + '.encoded', 'w+')
            @encoded = File.new(argument + '.decoded', 'w+')
        rescue Exception => e
            close
            raise UserError, e.message
        end
    end
    if @symbols.size == 0 then
        close
        raise UserError, "Empty source of symbols `#{argument}'"
    end
end

Public Instance Methods

check_decoded() click to toggle source

Return two Boolean values: Is :decoded is identical to :symbols?

(1) Use a well tested internal method to perform the check.

(2) Execute OS utility to compare source and decoded files on disk, if the OS is recognized as Windows or POSIX. There is no reason for this but to play around with Ruby.

# File huffman.rb, line 160
def check_decoded
    match = decoded_matches_symbols
    alt_match = match
    if @symbols.is_a? StringIO then
        # Nothing is on disk
    elsif OS.windows? then
        command = "FC /B #{@symbols.path} #{@decoded.path}"
        alt_match = system(command)
        warn "Failed to execute Windows FC" if alt_match == nil
    elsif OS.posix? then
        command = "diff #{@symbols.path} #{@decoded.path} -q"
        alt_match = system(command)
        warn "Failed to execute POSIX diff" if alt_match == nil
    end
    return match, alt_match
end
close() click to toggle source

Close IO objects, ignoring exceptions.

# File huffman.rb, line 130
def close
    for io in [@symbols, @decoded, @encoded]
        begin
            io.close
        rescue
        end
    end
end
decoded_matches_symbols(wait=0) click to toggle source

Are :symbols and :decoded IO objects identical in contents?

# File huffman.rb, line 142
def decoded_matches_symbols(wait=0)
    @decoded.flush
    (0..2**wait).each { |busy_waiting| busy_waiting + 1 }
    @symbols.rewind
    @decoded.rewind
    match = @symbols.each_byte.all? { |b| b == decoded.getbyte }
    match or ((wait < FLUSH_WAIT) and decoded_matches_symbols(wait + 1))
end
report(n_symbols, entropy, cross_entropy) click to toggle source

TO DO: document EXTENSIVELY

# File huffman.rb, line 180
def report(n_symbols, entropy, cross_entropy)
    match, alt_match = check_decoded
    bit_rate = @encoded.size / @symbols.size.to_f
    divergence = bit_rate - entropy
    puts
    puts "Source file size               : #{@symbols.size} bytes"
    puts "Encoded file size              : #{@encoded.size} bits"
    puts "Decoded file size              : #{@decoded.size} bytes"
    puts "Source and decoded files match : #{match}"
    puts "Confirm match (experimental)   : #{alt_match}"
    puts "Number of distinct symbols     : #{n_symbols}"
    puts "Entropy H(p)                   : #{entropy.round(5)} bits"
    puts "Cross entropy H(p, q)          : #{cross_entropy.round(5)} bits"
    puts "Actual bits per encoded symbol : #{bit_rate.round(5)}"
    puts "Relative entropy D(p||q)       : #{divergence.round(5)} bits"
    puts
    puts "Here p is the distribution of symbols in the source"
    puts "and q is the ideal distribution of symbols for the code,"
    puts "with H(p, q) = H(p) + D(p||q)."
    puts
end