TO DO: document
Iterate tight loop at most 2**FLUSH_WAIT times waiting on file flush.
IO to which the decoder writes.
IO to which the encoder writes and from which the decoder reads.
IO from which the encoder reads symbols.
Open IO objects :symbols, :encoded, :decoded according to `argument'.
If `argument' matches '–string=<string>' then the IO objects are of class StringIO, with :symbols initialized to <string>, and with :encoded and :decoded empty.
Otherwise, the IO objects are of class File, with `argument' the path of the :symbols File, and with `argument' extended with '.encoded' and '.decoded' the paths of :encoded and :decoded, respectively.
The :symbols IO object is read-only. The other IO objects are opened to write (“w+”), but are also readable.
# File huffman.rb, line 105 def initialize(argument) if /^--string=/.match(argument) then string = argument.sub(/^--string=/, '') @symbols = StringIO.new(string, 'r') @decoded = StringIO.new('', 'w+') @encoded = StringIO.new('') else begin @symbols = File.new(argument, 'r') @decoded = File.new(argument + '.encoded', 'w+') @encoded = File.new(argument + '.decoded', 'w+') rescue Exception => e close raise UserError, e.message end end if @symbols.size == 0 then close raise UserError, "Empty source of symbols `#{argument}'" end end
Return two Boolean values: Is :decoded is identical to :symbols?
(1) Use a well tested internal method to perform the check.
(2) Execute OS utility to compare source and decoded files on disk, if the OS is recognized as Windows or POSIX. There is no reason for this but to play around with Ruby.
# File huffman.rb, line 160 def check_decoded match = decoded_matches_symbols alt_match = match if @symbols.is_a? StringIO then # Nothing is on disk elsif OS.windows? then command = "FC /B #{@symbols.path} #{@decoded.path}" alt_match = system(command) warn "Failed to execute Windows FC" if alt_match == nil elsif OS.posix? then command = "diff #{@symbols.path} #{@decoded.path} -q" alt_match = system(command) warn "Failed to execute POSIX diff" if alt_match == nil end return match, alt_match end
Close IO objects, ignoring exceptions.
# File huffman.rb, line 130 def close for io in [@symbols, @decoded, @encoded] begin io.close rescue end end end
Are :symbols and :decoded IO objects identical in contents?
# File huffman.rb, line 142 def decoded_matches_symbols(wait=0) @decoded.flush (0..2**wait).each { |busy_waiting| busy_waiting + 1 } @symbols.rewind @decoded.rewind match = @symbols.each_byte.all? { |b| b == decoded.getbyte } match or ((wait < FLUSH_WAIT) and decoded_matches_symbols(wait + 1)) end
TO DO: document EXTENSIVELY
# File huffman.rb, line 180 def report(n_symbols, entropy, cross_entropy) match, alt_match = check_decoded bit_rate = @encoded.size / @symbols.size.to_f divergence = bit_rate - entropy puts puts "Source file size : #{@symbols.size} bytes" puts "Encoded file size : #{@encoded.size} bits" puts "Decoded file size : #{@decoded.size} bytes" puts "Source and decoded files match : #{match}" puts "Confirm match (experimental) : #{alt_match}" puts "Number of distinct symbols : #{n_symbols}" puts "Entropy H(p) : #{entropy.round(5)} bits" puts "Cross entropy H(p, q) : #{cross_entropy.round(5)} bits" puts "Actual bits per encoded symbol : #{bit_rate.round(5)}" puts "Relative entropy D(p||q) : #{divergence.round(5)} bits" puts puts "Here p is the distribution of symbols in the source" puts "and q is the ideal distribution of symbols for the code," puts "with H(p, q) = H(p) + D(p||q)." puts end