Compression
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
User Name:
Password:
Remember me
Go Back   Web Development Archives FAQs Compression

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Display Modes
 
Unread Web Development Archives Sponsor:
  #1  
Old June 19th, 2008, 08:20 AM
jules Gilbert
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
answers please...

, as I have said I have a method that does produce a non-uniform
number of output states, given a uniform number of input states.
(Here, non-uniform means a bell curve.)

I have been testing a 1M 'file' of RAD, where my 'file' is a segment
of right-most bits from a well known/respected PSRNG and am producing
bell-curve output. (These runs have been made many millions of
times, and I have on occasion substituted other RNG's -- I don't think
I am 'seeing' an attribute of the RNG.) I offered to provide this
information to one of the mathematicians who post here, but the
private conversation was not helpful -- so once again, I am coming to
the group for assistance.

Also, I have also done this with actual files, including MN's 1M file
(well, 415k,) and produce non-uniform output. And, when I assume 1-
bit for the most frequent output state times the number of times that
symbol occurs, plus 2-bits for each of the next two most commonly
occurring symbols, again times the number of times each of those
symbols occur, etc I get a size result that is smaller than the
input file. (Yes, that's right.)

So I need to build an emitter. It doesn't have to be the very best
emitter, after all I can just run the data around again, but I need to
produce (ie., to translate,) my input states to a file.

In the past I have experimented by coupling my repeating compression
process's to publicly available systems, such as Zlib, and whether the
program was bzip2, the zlib system, or even gzip, I regarded that
section pretty much as a black box.

Now I am changing course.

And I have a couple of dumb questions

1) Why is Shannon-Fano a bad choice? (I understand that arithmetic
coding can be better,) but why is this not as good as Huffman?

2) Wrt to my data, I have a number of different implementation
models, (the system supports three parameters, not all of the
cartesian product of these models actually work wrt compression,) but
among those that do, many produce fewer than 256 symbols. Has anyone
written a standard emitter library that can be used for
experimentation? (I've only been looking for such a library for about
ten years.)

I have decided that my emitter will be built around S-F, it's simple
and is not protected by someone else's patents.







Reply With Quote
  #2  
Old June 19th, 2008, 09:40 AM
erpy
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
answers please...

jules Gilbert:

snip
So I need to build an emitter. It doesn't have to be the very best
>

snip

what's an "emitter" ?!

Reply With Quote
  #3  
Old June 19th, 2008, 05:20 PM
Thomas Richter
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
answers please...

jules Gilbert schrieb:
, as I have said I have a method that does produce a non-uniform
number of output states, given a uniform number of input states.
(Here, non-uniform means a bell curve.)

, if that is your problem or claim, this is easily done. If you have two
independent random sources, A and B, then A+B will always be a random
source with a non-uniform distribution. Actually, if you perform this step
a couple of times, you will arrive at a random source with a Gaussian
distribution. This is not a miracle, this is just the central limit
theorem. Proving this requires a bit of Fourier analysis and knowledge about
stable distributions, but once that is understood, it's about a page
(for the currently understood situation of having symmetric sources
with bounded variance, otherwise the proof is more technical.)

I have been testing a 1M 'file' of RAD, where my 'file' is a segment
of right-most bits from a well known/respected PSRNG and am producing
bell-curve output. (These runs have been made many millions of
times, and I have on occasion substituted other RNG's -- I don't think
I am 'seeing' an attribute of the RNG.) I offered to provide this
information to one of the mathematicians who post here, but the
private conversation was not helpful -- so once again, I am coming to
the group for assistance.

Well, here's my assistance: Google for "central limit theorem".

Also, I have also done this with actual files, including MN's 1M file
(well, 415k,) and produce non-uniform output. And, when I assume 1-
bit for the most frequent output state times the number of times that
symbol occurs, plus 2-bits for each of the next two most commonly
occurring symbols, again times the number of times each of those
symbols occur, etc I get a size result that is smaller than the
input file. (Yes, that's right.)

If you think about this closer, you'll soon find that, for example for
the above simple system, A+B will have a range that is larger than
the input range, which will eat up your gain you gained by the non-uniform
distribution. Similar pitfalls will exist for related algorithms.

And I have a couple of dumb questions
>

1) Why is Shannon-Fano a bad choice? (I understand that arithmetic
coding can be better,) but why is this not as good as Huffman?

It generates codes "top down" rather than "bottom up" - in some configurations,
it can be seen that the codes generated by it are non-optimal in the sense
that codes are possible that generate shorter average bit-length. For Huffman,
however, one can show that reaches the theoretically possible limit. Again,
a proof would be too long for this news group, but if you want to see an
example where Shannon-Fano does not create optimal codes, the Wikipedia carries
one: {0.35, 0.17, 0.17, 0.16, 0.15} - it should be sufficient to build the
corresponding Shannon-Fano code for this set, compare it with the code created
by the Huffman algorithm, and compute the average code-length for both. You'll
then see that the size of the Huffman code is smaller.

2) Wrt to my data, I have a number of different implementation
models, (the system supports three parameters, not all of the
cartesian product of these models actually work wrt compression,) but
among those that do, many produce fewer than 256 symbols. Has anyone
written a standard emitter library that can be used for
experimentation? (I've only been looking for such a library for about
ten years.)

Encoding an alphabet with a number of symbols not exactly a power of two is a
nice candidate for arithmetic coding (even if the probabilities are all
equal).

I have decided that my emitter will be built around S-F, it's simple
and is not protected by someone else's patents.

And, if I may ask, what's wrong with Huffman?

Besides, I think the relevant patents for arithmetic coding should be run
out by now.

So long,
Thomas

Reply With Quote
  #4  
Old July 4th, 2008, 06:10 PM
Einstein
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
answers please...

Jules once I volunteered to look at some amazing math of yours. I
emailed you, talked with you about my theories, and was willing to
sign a ND so long as I could verify.

Instead I got months worth of spam about 'buy this stock, though I am
not a stock market official or nothing, and I am not licensed my
algorithm shows this for sure.'

I also got vague information about how you were going to investors on
that and on your compression theory. The theory has been around for a
year now I take it?

I will give you the challenge then.

I will sign a nondisclosure with a clause in it. If you prove
compression which can be remade into the original product, and at
claimed levels, on RAD then I will certify it for the community as
working.

If it does not work I get to publish the entirety of the system and
proofs of how come it does not work (I will not summarily dismiss it
out of hand, I will do full proofs or if unable to prove/disprove I
will say so and NT publish the system).

This challenge means you can either discredit yourself, or give
yourself a means to validate to the world you are actually onto
something.

Reply With Quote
  #5  
Old July 6th, 2008, 09:10 AM
Einstein
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
answers please...

Resultant data is incomprehensible without a means to undo the work.

I thought I had a formula that worked, and thanks to kind efforts of
another I found that it would produce serious loss, though a really
good compression level :P

Reply With Quote
Reply

Viewing: Web Development Archives FAQs Compression > answers please...


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 6 hosted by Hostway
Stay green...Green IT