|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
answers please...
, as I have said I have a method that does produce a non-uniform
number of output states, given a uniform number of input states. (Here, non-uniform means a bell curve.) I have been testing a 1M 'file' of RAD, where my 'file' is a segment of right-most bits from a well known/respected PSRNG and am producing bell-curve output. (These runs have been made many millions of times, and I have on occasion substituted other RNG's -- I don't think I am 'seeing' an attribute of the RNG.) I offered to provide this information to one of the mathematicians who post here, but the private conversation was not helpful -- so once again, I am coming to the group for assistance. Also, I have also done this with actual files, including MN's 1M file (well, 415k,) and produce non-uniform output. And, when I assume 1- bit for the most frequent output state times the number of times that symbol occurs, plus 2-bits for each of the next two most commonly occurring symbols, again times the number of times each of those symbols occur, etc I get a size result that is smaller than the input file. (Yes, that's right.) So I need to build an emitter. It doesn't have to be the very best emitter, after all I can just run the data around again, but I need to produce (ie., to translate,) my input states to a file. In the past I have experimented by coupling my repeating compression process's to publicly available systems, such as Zlib, and whether the program was bzip2, the zlib system, or even gzip, I regarded that section pretty much as a black box. Now I am changing course. And I have a couple of dumb questions 1) Why is Shannon-Fano a bad choice? (I understand that arithmetic coding can be better,) but why is this not as good as Huffman? 2) Wrt to my data, I have a number of different implementation models, (the system supports three parameters, not all of the cartesian product of these models actually work wrt compression,) but among those that do, many produce fewer than 256 symbols. Has anyone written a standard emitter library that can be used for experimentation? (I've only been looking for such a library for about ten years.) I have decided that my emitter will be built around S-F, it's simple and is not protected by someone else's patents. |
|
#2
|
|||
|
|||
|
answers please...
jules Gilbert:
snip So I need to build an emitter. It doesn't have to be the very best > snip what's an "emitter" ?! |
|
#3
|
|||
|
|||
|
answers please...
jules Gilbert schrieb:
, as I have said I have a method that does produce a non-uniform number of output states, given a uniform number of input states. (Here, non-uniform means a bell curve.) , if that is your problem or claim, this is easily done. If you have two independent random sources, A and B, then A+B will always be a random source with a non-uniform distribution. Actually, if you perform this step a couple of times, you will arrive at a random source with a Gaussian distribution. This is not a miracle, this is just the central limit theorem. Proving this requires a bit of Fourier analysis and knowledge about stable distributions, but once that is understood, it's about a page (for the currently understood situation of having symmetric sources with bounded variance, otherwise the proof is more technical.) I have been testing a 1M 'file' of RAD, where my 'file' is a segment of right-most bits from a well known/respected PSRNG and am producing bell-curve output. (These runs have been made many millions of times, and I have on occasion substituted other RNG's -- I don't think I am 'seeing' an attribute of the RNG.) I offered to provide this information to one of the mathematicians who post here, but the private conversation was not helpful -- so once again, I am coming to the group for assistance. Well, here's my assistance: Google for "central limit theorem". Also, I have also done this with actual files, including MN's 1M file (well, 415k,) and produce non-uniform output. And, when I assume 1- bit for the most frequent output state times the number of times that symbol occurs, plus 2-bits for each of the next two most commonly occurring symbols, again times the number of times each of those symbols occur, etc I get a size result that is smaller than the input file. (Yes, that's right.) If you think about this closer, you'll soon find that, for example for the above simple system, A+B will have a range that is larger than the input range, which will eat up your gain you gained by the non-uniform distribution. Similar pitfalls will exist for related algorithms. And I have a couple of dumb questions > 1) Why is Shannon-Fano a bad choice? (I understand that arithmetic coding can be better,) but why is this not as good as Huffman? It generates codes "top down" rather than "bottom up" - in some configurations, it can be seen that the codes generated by it are non-optimal in the sense that codes are possible that generate shorter average bit-length. For Huffman, however, one can show that reaches the theoretically possible limit. Again, a proof would be too long for this news group, but if you want to see an example where Shannon-Fano does not create optimal codes, the Wikipedia carries one: {0.35, 0.17, 0.17, 0.16, 0.15} - it should be sufficient to build the corresponding Shannon-Fano code for this set, compare it with the code created by the Huffman algorithm, and compute the average code-length for both. You'll then see that the size of the Huffman code is smaller. 2) Wrt to my data, I have a number of different implementation models, (the system supports three parameters, not all of the cartesian product of these models actually work wrt compression,) but among those that do, many produce fewer than 256 symbols. Has anyone written a standard emitter library that can be used for experimentation? (I've only been looking for such a library for about ten years.) Encoding an alphabet with a number of symbols not exactly a power of two is a nice candidate for arithmetic coding (even if the probabilities are all equal). I have decided that my emitter will be built around S-F, it's simple and is not protected by someone else's patents. And, if I may ask, what's wrong with Huffman? Besides, I think the relevant patents for arithmetic coding should be run out by now. So long, Thomas |
|
#4
|
|||
|
|||
|
answers please...
Jules once I volunteered to look at some amazing math of yours. I
emailed you, talked with you about my theories, and was willing to sign a ND so long as I could verify. Instead I got months worth of spam about 'buy this stock, though I am not a stock market official or nothing, and I am not licensed my algorithm shows this for sure.' I also got vague information about how you were going to investors on that and on your compression theory. The theory has been around for a year now I take it? I will give you the challenge then. I will sign a nondisclosure with a clause in it. If you prove compression which can be remade into the original product, and at claimed levels, on RAD then I will certify it for the community as working. If it does not work I get to publish the entirety of the system and proofs of how come it does not work (I will not summarily dismiss it out of hand, I will do full proofs or if unable to prove/disprove I will say so and NT publish the system). This challenge means you can either discredit yourself, or give yourself a means to validate to the world you are actually onto something. |
|
#5
|
|||
|
|||
|
answers please...
Resultant data is incomprehensible without a means to undo the work.
I thought I had a formula that worked, and thanks to kind efforts of another I found that it would produce serious loss, though a really good compression level :P |
![]() |
| Viewing: Web Development Archives > FAQs > Compression > answers please... |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|