|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
combining probabilities from different models
Hi,
Most of my forays into compression have been based on using one particular model to generate a probability for a given symbol. I now have multiple models that each give a proability for a given symbol. My question is how does one combine these probabilities to form one proability? Is it as simple as using the mean of the probabilities? |
|
#2
|
|||
|
|||
|
combining probabilities from different models
Ah, so it is non-trivial then to combine models to form a super model
( forgive the pun ). I have thought about going down the path of weighted average however with a "monitor" a separate information channel would be needed to record the weights and might have a negative impact on the compression rate. I will have to ponder on it some more. |
|
#3
|
|||
|
|||
|
combining probabilities from different models
Hy;
I now have multiple models that each give a proability for a given symbol. > My question is how does one combine these probabilities to form one proability? > This is a fascinating question; probably *very* important in optimal statistical coding, yet little studied. Eh, this is exagerated, it has been studied to death (just to exegerate a bit on my own :-). You have to look for Context-Blending, especifically under the context of PPM. The investigations and experiments in that area led to the development of PPMd and thus is now in WinRAR. PPMd- style Context-Inheritance isn't a general implementation and probably not streightforward to understand as just generic context-blending (but it is). I suggest the very important paper from "Susan Bunton: A Generalization and Improvement of PPM's Blending". You may understand that in PPM basically (if you boil it all down) you have N models, and you blend N probabilies, to receive 1 probability. The first practical and very influencial implementation came from Charles Bloom in PPMZ where he combined the probabilities with an ad hoc logarithmic weighting, which means instead of weighting all models equaly (as the P suggested), models gradually raise their weighting the more extreme they would predict. In human words, the more sure a model was to be right, the more attention it got. I suppose you can draw an analogy to the neural-net approach of PAQ where the logarithmic weighting is replaced by sigma-functions. But except that PAQ asks a magnitude more models than PPMZ I don't see any conceptual differences (in the approach to weight and combine the distinct models). In image compression it is a very big topic, when you reach the level of adaptive predictors, the best are adaptive blending predictors. The /best/ image compressors like TMW, Glicbawls and MRP all try different approaches to blend 'fixed' predictors to predict the pixel. In the context of images it's especially streightforward because it's actually a digitized analogue signal which at least preserves some if it's properties (smoothness, the no such thing as _absolute_ appruptness in nature). You can read from "Tischler and Meyer: TMW - A New Method for Lossless Image Compression" which does model-blending in a sort of self-optimizing weighting. Methemtically more interesting is probably just calculating optimal weights by finding a least- squares matrix-solution; which I have the feeling is the conceptual superclass to both PPMZ-log and PAQ-sigma, because the matrix-solution can re-produce both but in addition also irregular higher-order weighting polynomal-functions. Well if that's not enough you may find "Goodman: Reduction of Maximum Entropy Models to Hidden Markov Models" exciting but complicated. Model-blending is just practical, having a single self-organizing Model will still be more effective, if not impractical. :-) So then, good luck Niels |
|
#4
|
|||
|
|||
|
combining probabilities from different models
moogie wrote:
Is it as simple as using the mean of the probabilities? For image classification tasks the product is also often used, although I haven't seen it used in compression. Marco |
|
#5
|
|||
|
|||
|
combining probabilities from different models
With linear blending this is simply a constrained search for a least
squares linear predictor. Least squares linear predictors will not give you the minimum entropy predictor (ie. not the best compression). That is dependent on the function you want to have minimized under LS. In my compressor I optimize for congruency with a smooth laplacian function, because I can't calculate the log2-entropy at all, neither could you calculate LZ-entropy for PNG for example. Because in my compressor I have a DPCM-feedback 'resonator' (it's multi-resolution) optimizing for distribution-shape is especially effective and reasonable, and possibly allready a little step into PDF-prediction. Could someone tell me why there are no lossless image compressors (to my limited knowledge) which model complete pdf's (like text coders) rather than having a simple predictor + shape parameter? I'd expect at least support for bi-modal distributions for efficient coding near edges and for highly impulsive noise/texture. For predictors I havn't seen it. What comes most close is from " and Vitterbi: Adaptive Scalar Quantization without Side Information", a very underestimated paper. It does PDF-approximation for on-line scalar-quantizer construction/adjustmrnt. Marco Ciao Niels |
|
#6
|
|||
|
|||
|
combining probabilities from different models
Thomas Richter wrote:
I would believe that for prediction the context of a pixel is a much better source for context information than any advanced predictor could probably be - but I haven't tried that. In most coders the prediction of the center of the distribution seems to take context into account already (for instance the bias term in JPEG-LS). That's not really relevant to the applicability of say the generalized gaussian distribution though. It's exactly in context (of being near edges or impulsive features) in which I expect uncertainty which can not be modeled by say the generalized gaussian will occur. Anyway, found a paper on a predictive coder by Popat which takes more complex PDFs into account seems to work pretty well for a lossy predictive coder, even if it is just on text. "Lossy Compression of Grayscale Document Images by A Quantization" Marco |
|
#7
|
|||
|
|||
|
combining probabilities from different models
Marco Al wrote:
> >I would believe that for prediction the context of a pixel is a much >better source for context information than any advanced predictor >could probably be - but I haven't tried that. > In most coders the prediction of the center of the distribution seems to take context into account already (for instance the bias term in JPEG-LS). That's not really relevant to the applicability of say the generalized gaussian distribution though. It's exactly in context (of being near edges or impulsive features) in which I expect uncertainty which can not be modeled by say the generalized gaussian will occur. I'm unclear how a bimodal probability model would differ from a context model in first place. Say, in the language of "bimodal", you would have to decide on some indicator coming from the neighborhood of a pixel which probability model and which predictor to choose from. In the context model, it would rather argue that instead of a probability, we have a conditioned probability model that depends, as a context, on the neighborhood of the pixel. It seems to me that this is rather a different description of the same idea. Anyway, found a paper on a predictive coder by Popat which takes more complex PDFs into account seems to work pretty well for a lossy predictive coder, even if it is just on text. "Lossy Compression of Grayscale Document Images by A Quantization" Thanks, I'll look into this one. So long, Thomas |
|
#8
|
|||
|
|||
|
combining probabilities from different models
Thomas Richter wrote:
I'm unclear how a bimodal probability model would differ from a context model in first place. It's just a PDF, a bimodal one, you can try to create it in different ways. You can assign each to be coded pixel a context and then adapt the probability model parameters after the pixel has been coded, just like existing lossless coders adapt (you might even still want to use prediction, curse of dimensionality and all). You can try to determine the probability model parameters, bimodal or not, purely based on the unique causal neighborhood of a pixel like you suggest. The problem is the sheer amount of (redundant) work you are going to be doing per pixel. Marco |
![]() |
| Viewing: Web Development Archives > FAQs > Compression > combining probabilities from different models |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|