|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
An analysis of the counting argument.
This is just to stir debate and talk, not to say 'eureka I solved it'
and post nonsense :P The counting argument is used to justify the fact that it is impossible to obtain recursive lossless compression. The counting argument is simple in nature. Every possible outcome of 4 bits cannot become every possible outcome of 3 bits, nor any other standard can be applied. It is also known as the pigeonhole principle. An example of this at work. If you have 12 programs and this is the 'actual code' of those twelve programs 000 001 010 100 011 101 110 111 00 01 10 11 Then you have issues if you are trying to compress downward these programs. Under some systems some of our programs can compress, but do to mathematical or statistical luck. Most however will usually be uncompressible, or you need to incorporate loss into the system to be able to continue. There has been many efforts to bypass the counting argument. These all require systems which add bits to something one could call 'a command section' or 'identifying bits'. Ultimately each has failed. However if there were a way, then these following to me would seem to be the only ways possible to 'defeat' the counting system with one 'still out there' requirement. 1) Segments You take a long section of code. You subdivide it into subsections. Each section can be compressed or not compressed separately. Your issue is effectively identifying the sections and what is happening inside of them. Using the pigeonhole example, we have 16 pigeon holes. If we make it 4 sets of four we can then possibly close several holes in some sets and have say 12 holes in a 2,3,4,3 pattern but still have a maximum possible 16 holes which may or may not have a pigeon inside them. Now to those who study math effectively they may realize 16 holes is 65,536 outcomes. But if we could do this somehow properly counting, this is 810,000 (19.627~ bits) outcomes possible provided we leave at least 1 hole active per subgroup. The problem is of course how do we keep track? In the above example I added 3 bits (and change) to the size, but to count this I would need to either know how many holes were closed, how many holes are present per a section, or some other accounting of the whole. The lowest I can devise costs 8 bits. There are two ways to accomplish the 8 bits that I can find, and many ways that cost more. 2) Changing the 'make up' of the code If you somehow could 'sort' the code via an algorithm and get a certain percentage of 1's on one side, and 0's on another, you could in theory compress. This was the main method of my earlier research, and I though I had a magic bullet in it. I was wrong I fully admit it. But it is an interesting line of thought. You can effectively change the statistics of a section of code. It can happen as my own research indicates. With hard-coded methods you can get a deviation from norm as high as a 20%-22%-28%-30% (say this example 11, 01, 10, 00) ratio of something 2 bits or higher instead of a near typical 25%-25%-25%-25% (in approximation) ratio. However as Huffman and Shannon correctly show, you need something a bit more to be able to recursively compress. Not only that but code that is subject to compression via standard methods can get compression from those methods, but may result in a change that is not compressible via our selected method. Therefore suddenly we need a small command section, albeit a few bits. Then you need to add in what the changes resulted in, which is more likely to compress (11 or 00?) and then you again are past the amount which would lead to compression except in a very low number of the outcomes (and on a scale mattering on the total bits in the string being compressed) possible. Now it seems certain lossless repeatable compression is not highly probable, or even probably remotely possible to happen. However if it were possible, then one of these two means (am I missing one or more possible ways?) would seem to be the means to achieve this. However I would note, even if it is achievable, there would be entropy eventually, we cannot reach 1 bit from every possible formula in the universe after all which is the final argument of the counting argument :P So thoughts? |
|
#2
|
|||
|
|||
|
An analysis of the counting argument.
well you could utilize the fact that truly random data just doesn't
exist. if you encoded the laws of physics, the designs of computer systems, and human genomes you could definitely put a lower and upper bounds on any data produced by a computer, and compress that way. in a more diffuse way, what i aimed at was looking at the fact that information quantizes at all. take the string 100000000 in binary, it represents a high and eight lows. in base ten, it's a hundred million. in base twenty trillion and three, it quantizes the exact same, but each of the digits has much more information inside it. so, take a non-quantized unary ideal, ("one" in base 499534 for example), cycle through all the different quantized representations (499533 different examples) and you'll see that base 2 is actually the worst case scenario. you may find, for instance that in base 3240 the result of that unary ideal is 57577757757775775. take out the 'entropy' which is actually super, super easy to perceive in this method. it'd be 02022202202220220 with another information structure containing '5'. each digit is now base 3 instead of base 3240, which is pretty vast. extending this concept, *if* we could create custom molecules of the exact quantization dimensions that relates to the found best scope, we could avoid entropy entirely taking advantage of the lowest possible information encoding energy state in the quantum domain. basically, massaging the system so that entropy has no basis as a concept, as there's just nothing left to optimize in the quantization of information. otherwise, you can simply count the probability of each possible scope range occuring, and add them together as these are all the probabilities for any given unary ideal. seriously do it. you'll find that because binary is the worst-case scenario, the likelyhood that the final representation will end up as binary is less likely than any other result. then, it's just the overflow proportion (say like 3^3 = 9 compared to 2^4 = 16) from the compacted range + unary ideal + etc. that relates to whether or not the full string is actually better to write as binary or as the compacted range + unary ideal + etc. as this problem is pretty much totally NP-Complete (save the highest value digit comparisons, so you can have early outs from the scoping algorithm), which means that in effect you have to examine each and every scenario as the digital representation is so varied between even individual scoped representation, the amount of sheer computing power, even with GPU acceleration through CUDA and such, is inadequate. i suspect that in about 10-20 years there will be enough compute resources to apply this methodology. chris. |
|
#3
|
|||
|
|||
|
An analysis of the counting argument.
Einstein wrote:
There has been many efforts to bypass the counting argument. These all require systems which add bits to something one could call 'a command section' or 'identifying bits'. No, they all require a fundamental lack of cohesive thinking. Unless you believe basic math is internally inconsistent the only way to disbelieve the counting argument is to be stupid or naive. Seeing as that you have been here a while we can forget you are naive so either you are stupid or you don't really disbelieve the counting argument. My guess is the second. So thoughts? Welcome to my kill file. Marco |
![]() |
| Viewing: Web Development Archives > FAQs > Compression > An analysis of the counting argument. |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|