|
-
Jan 31st, 2014, 01:47 PM
#1
Thread Starter
Hyperactive Member
Finding Patterns In A Byte Array
Assume the following byte array: {0, 0, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 3, 4, 3, 4, 3, 4}.
I am trying to find the patterns in the array. I have been able to find the following patterns easily:
Pattern 1: {0, 0}
Pattern 2: {1, 2}
Pattern 3: {3, 4}
I then store the indices and counts of each pattern in another array, like so: {0, 0, 1, 6, 2, 2}. Basically, they are stored in two bytes each. The first byte is the pattern index, while the second byte is the count - 1.
e.g. Pattern 1 (index 0) is found 1 time (stored as count - 1, or 0). Pattern 2 (index 1) is found 7 times (stored as count - 1, or 6), et cetera.
What I want is: Instead of storing the pattern index and the count, I would just store the pattern index. The patterns would look like this:
Pattern 1: {0, 0}
Pattern 2: {1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2}
Pattern 3: {3, 4, 3, 4, 3, 4}
I could then store them just by their index, i.e. {0, 1, 2}.
Here is another example: {0, 0, 1, 2, 1, 2, 1, 2, 3, 4, 3, 4, 3, 4, 1, 2, 1, 2, 1, 2, 0, 0}.
The patterns would be: {0, 0}, {1, 2, 1, 2, 1, 2}, {3, 4, 3, 4, 3, 4}. When stored, the resulting array would be: {0, 1, 2, 1, 0}.
Thanks.
Prefix has no suffix, but suffix has a prefix.
-
Jan 31st, 2014, 03:04 PM
#2
Re: Finding Patterns In A Byte Array
What you are doing sounds a bit like a form of run length encoding compression, except that you seem to be wanting to go too far. What I'm puzzled by is that the first example ends up with an array that has [pattern, length] pairs. Whereas in the second example you seem to want to get rid of the length part and just have the pattern. That alone seems both easy and rather less useful than the first example. However, between the two examples you muddy the water with this part:
Pattern 1: {0, 0}
Pattern 2: {1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2}
Pattern 3: {3, 4, 3, 4, 3, 4}
is:
{1, 2, 1, 2}
the same pattern as:
{1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2}
or is the second one pattern 2 and the first one some other kind of pattern? From your second example, it looks like any run of [1, 2] would be seen as pattern 2 regardless of the length. But if that's the case, haven't you already solved the problem, since you have already build an array with the pattern and the length? The only difference is that you wouldn't write the length. On the other hand, if each run could be a different pattern, then the number of patterns gets pretty large. In this case, I'd suggest a Dictionary(of Byte, Byte()), where the key is the pattern number and the value is an array of bytes that shows the pattern. The drawback to such a design is that only 256 patterns would be permissible.
My usual boring signature: Nothing
 
-
Jan 31st, 2014, 03:21 PM
#3
Thread Starter
Hyperactive Member
Re: Finding Patterns In A Byte Array
I am trying to do a type of compression. I tried to make the OP simple to understand but I think I failed. Let me go into more depth.
I have an image. My current method of compression simply stores sequences of bytes in pairs. The first byte is the index of the pixel color (from my color table created before compression), the second byte is the count of sequential pixels with that color. For example, if the color at index 0 in my color table is blue and the image contains 200 sequential blue pixels, then I can store it as {0, 200}. I do this for the entire image.
Say I have an image with 200 blue pixels, then 100 red pixels, then 200 blue pixels, then 100 red pixels. I could store it using {0, 200, 1, 100, 0, 200, 1, 100}. The example in the OP was simply taking that pattern and using a single byte (or short) to represent it.
I hope this cleared things up.
Prefix has no suffix, but suffix has a prefix.
-
Jan 31st, 2014, 04:33 PM
#4
Re: Finding Patterns In A Byte Array
I would first do some reading on runlength compression and maybe delta encoding.
There are some really cool website explaining all kind of compression methods.
-
Jan 31st, 2014, 09:11 PM
#5
Re: Finding Patterns In A Byte Array
That cleared up a little bit. It's certainly a form of run length encoding, as it appeared to be, but that being the case, what is the question? You say that instead of storing the pattern index and the count, you want to just store the pattern index. That's encoding without the run length. There simply isn't enough information contained in that for it to be useful. You can't very well do anything with just the code without the length, so you must be thinking of something else.
My usual boring signature: Nothing
 
-
Jan 31st, 2014, 10:15 PM
#6
Re: Finding Patterns In A Byte Array
No, from his first post (2nd case), the code "points" to the pattern and the pattern is of a certain length, so length is a property of the pattern so doesn't need to be stored separately.
I understood what he was saying, I just couldn't see how it would end up being practical, unless there were fewer than 256 unique patterns (including the length of the pattern) in a given file. Even for a 8-bit bitmap file with color look-up table, it didn't seem very likely.
To possibly extend that concept a little, I was thinking that if the files did really consist of groups of pairs of alternating values that were being searched for then when you found one of those runs the length and the two bytes could be stored as the indexed pattern, thus reducing the size of the "pattern dictionary", and if you had a similar but shorter of longer pattern, then those two value could be written as well, or perhaps the "dictionary" could be structured in such a way that the longer pattern could be the joining of two shorter patterns.
But, at that point, I figured, why reinvent the wheel. There are already compression algorithms out there, we're not likely to improve on them, especially since we don't really know what is trying to be compress. We just have snippets of example data, which probably doesn't give a good sense of the real data to be compressed.
Last edited by passel; Feb 1st, 2014 at 06:02 PM.
-
Feb 1st, 2014, 12:32 AM
#7
Thread Starter
Hyperactive Member
Re: Finding Patterns In A Byte Array
I realize now, after multiple iterations, that what I am trying to do is impractical and gives poor results. As for reinventing the wheel, I have already come up with algorithms that compress better than programs like PNGGauntlet. I am just trying to come up with as many different algorithms as I can, then see if they get better results. By the way, passel was correct. I was storing the patterns separate from the indices. The indices were referencing the patterns.
Thanks for all the help.
Prefix has no suffix, but suffix has a prefix.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|