|
Post by VDC 8x2 on Jul 26, 2014 7:19:10 GMT
okay keeping the 64 long copy. so it will be 0 becomes 63, then 1 is added to make it either 2 to 8 or 64 bytes dialed the others ones back to 2 to 63 with 0 for 256 and 1 for 64. 128 can = 2 of the 64s. will call this finalized until I make an encoder.
edit:2 to 64 is 1 to 63 with 1 added. 0 is 256. it made more sense flow wise for copy and fill.
|
|
|
Post by hydrophilic on Jul 27, 2014 2:53:33 GMT
Great. Once you get a working encoder, then you can customize. Lots of things you might *think* would help often have little effect or make things worse (in general), while somethings that seem un-needed (like RLE of 2) can actually help in many cases, and some things that seem totally stupid (like RLE 1) can occassionaly be helpful. [Note an RLE 1 of uses the same #bytes as Raw(skip) of 1, but the way my decoder was written, RLE was faster.] Have you decided how you will write an encoder? Like I was saying, a high-level language would be easiest to design and test. If for some reason you insist on doing it in ML, there was an article(s) in some issue(s) of C=Hacking, as I recall. Edit
There are few articles in various issues, some more relevant than others. For general compression theory (and rather easy to understand), see Compression Basics in issue #16. For a specific instance described in detail (PuCrunch), see article in issue #17, but it deals mostly with the decompressor (decoder), rather than compressor (encoder)... however, a web page(s) about PuCrunch can be found here. It has links to the C source code for the compressor. PuCrunch is using both RLE and Copy coding (often referred to as LZ77 which is pretty archaic term, IMHO). The main difference from what is discussed here is that it uses variable-length codes, and it also maintains a small table of most-used RLE values (31 of them). The decompressor is about the same size as your published version; due to variable-length codes, it runs slower but compresses better. Anyway, the source code of the compressor may be a good place to start; I imagine changing it to use fixed-length codes would be very simple! Oh yeah, another difference of PuCrunch is that it does not use a seperate bit to select between raw (uncompressible) data or codes; instead a special bit-pattern (the least common upper two-bits) is remembered as an 'escape code'. Maintaining (testing and updating) the escape code slows things down some more. I imagine it would be easy to strip this from the encoder as well. Well that's all the relevant stuff I spy in the C=Hacking issues. There was an article somewhere about crunching on a C64 using an REU to make it practical... I'm sure it could easily be ported to C128, if only I could remember where I saw it...
|
|
|
Post by VDC 8x2 on Jul 27, 2014 17:08:28 GMT
Looking at the C code of Pucrunch the first time and my eyes are glazing over.
I think i need to come up with rules for the encoder first before adapting it. [Rules] If a match is right next to current then use copy near code. since it is the smallest code byte. 256 match then if no then 2 to 64 do rle for 2 to 64 bytes. checking for 256 first
search for matches of far copy with in the 2k range. 256 then 2 to 8. [/Rules]
Will I need a back buffer of 2048 bytes?
Work for then back to front or front to back?
|
|
|
Post by VDC 8x2 on Jul 29, 2014 2:27:28 GMT
I downloaded visual studio for windows. Reading up on visual basic and C#.
For the compressor I am thinking a multipass one.
The encoder is being harder to think up then the decoder, so far.
|
|
|
Post by hydrophilic on Jul 29, 2014 6:14:55 GMT
I guess some people might like it, but C# just isn't for me... it is easier than C(++), but if you want easy, use VisualBASIC. VisualBASIC is present in all microsoft office apps, so learning that allows you to add very powerful customizations to your office "documents". I've made more than one "application" for a client that was nothing more than Access or Excel "document" running custom code for them. Heck, I've even had people request this type of application because it is easy for them to change (they don't want to mess with a seperate compiler or learn something more abstract like C/Java).
Your rules sound like a good start. Yes you will (eventually) need a 2K buffer. In VB you could just use either DIM M(2048) AS BYTE, if you want to manage it like an array, or DIM M$ AS STRING if you want to manage it like a string (in VB using $ for strings is optional). As a string, it is very easy to shift in new values and shift out old ones:
IF LEN(M$) >= 2048 THEN M$ = MID$(M$,2)+CHR$(new_byte) ELSE M$ = M$ + CHR$(new_byte) Using a numeric array, however, would require a bit more work; you would either need to maintain a wrap-around pointer (MOD is your friend here), or else manually shift all the bytes. The string method is probably what I would use (initially).
Just remember you, don't need (and shouldn't try) to do everything at once. Start off with something stupid simple, like read in a file and write out a "compressed" version using nothing but "raw" (skip) codes. Note this will actually INFLATE the file, but that's okay, you just want to be sure you can run it through the decompressor and get the correct result... when that works, add the RLE. You don't need a back-buffer yet, although you will need to peek into the future for RLE. You can use a buffer for that or possibly file-seek (umm, I can't remember ever using file-seek in VB).
When that works, add the local (sequential) copy ... this only needs 256 back-buffer. Finally after that is working, add the long (random) copy (here you finally need 2K buffer). And don't try to implement a 2-pass scheme yet. Just get the fundamentals working, and verified.
Afterwords, you can optimize things. You don't need to do full two-pass system... well not in the traditional sense of running through the whole input file twice. You just need to memorize a choice, and determine how much that choice costs, but not actually use it yet... and then keep going as if the code was uncompressible (raw data) until you get another option to compress the data... at that time, compare if the old/saved compression technique costs less than sending raw data + the new scheme. And choice the best one. I can elaborate on that, but no point yet...
Once you sure it is outputting optimal compression code (for the choosen decoder rules), then along with testing of many images, you can experiment with differnt code lengths to find out which is optimal for VDC images... this is when you will learn many fascinating things about the data... what kind of compression works well and which does not (this both tend to surprise me). This part is both easy and hard... it usually involves only trivial changes to the code for an experiment, but you need to update both the compressor and decompressor to run the experiment, and you need to run it past many images to be sure you aren't getting a false positive (or false negative).
Edit Just thought of this: VB limits strings to either 32K or 64K in length. Assuming it is 64K (or your image is not interlaced), you could read the entire source file into a string (M$ in the next example). Then just 'peek' anywhere at random with
WANTED$ = MID$(M$, file_position, 1) 'or for a numeric value BYTE_VAL = ASC(MID$(M$, file_position, 1)) Just remember it is a 1-based position, not 0-based like you would use in ML or C.
|
|
|
Post by VDC 8x2 on Jul 30, 2014 4:26:16 GMT
For the file output? should I output it to a string or a number. Then save to file?
|
|
|
Post by hydrophilic on Jul 30, 2014 6:21:44 GMT
Personally, with VB, I use one of these
PRINT#n, CHR$(byte_val); 'normal file OPEN 'or using FileSystem object OUTFILE.WRITE(CHR$(byte_val)) Just remember with PRINT# to include the the semi-colon! Of course you don't have to write out a byte at a time. You could, for example, write a full "code sequence" stored in a string. For example, to send 63 un-compressible bytes you might have someting like
CS$ = CHR$(63) FOR I=0 TO 62: CS$ = CS$ + MID$(M$,file_position+I,1) : NEXT file_position = file_position + 63 PRINT#n, CS$; Well you can make that more effecient, by using some math to avoid a loop, but hopefully that gives a clear example.
Edit Silly me! The first byte should not be CHR$(63), but CHR$(63*4) to put the '00' code bits at the low end.
|
|
|
Post by VDC 8x2 on Jul 30, 2014 18:43:26 GMT
Augh, its doing every thing with in a scope of 16bits. Trying to find a way to force 8bit chr and mid.
It turned a 9kb file into a 32kb file.
EDIT: down to 18.6kb
|
|
|
Post by VDC 8x2 on Jul 30, 2014 19:50:09 GMT
Module Module1
Sub Main() Dim M As String Dim CS As String Dim CountValue As UShort Dim x As UShort Dim FilePosition As UShort Dim L As UShort M = My.Computer.FileSystem.ReadAllText("D:/Emulation/drive10/testdata.prg") FilePosition = 1 L = Len(M)
Do CountValue = 63 * 4 CS = Chr(252) For I = 0 To 62 CS = CS + Mid(M, FilePosition + I, 1) Next FilePosition = FilePosition + 63 My.Computer.FileSystem.WriteAllText("D:/Emulation/drive10/output.prg", CS, True) Loop While FilePosition < L x = FilePosition - L FilePosition = L - x If x <> 0 Then CS = Chr(x * 4) For I = 0 To x CS = CS + Mid(M, FilePosition + I, 1) Next My.Computer.FileSystem.WriteAllText("D:/Emulation/drive10/output.prg", CS, True) End If
End Sub
End Module what code I have so for for VB
|
|
|
Post by hydrophilic on Jul 31, 2014 6:37:35 GMT
I'm not familiar with ReadAllText. From your description, it sounds like that function (err, Method would be the VB term... maybe) is importing the file data as unicode text (16-bit chars). I *think* there are versions of MID$ and CHR$ that force byte-size value, MIDB and CHRB (and ASCB? and LENB?), but I've never needed them because using OPEN or FileSystem.Read / FileSystem.Write uses bytes by default. Anyway, a simple test to verify VB is not doing something stupid would be M = My.Computer.FileSystem.ReadAllText("D:/Emulation/drive10/testdata.prg") FilePosition = 1 L = Len(M) MsgBox "Length is " & L & "bytes" Then if it reports the wrong file size (like 2x longer, maybe some other wrong value), you know it is doing some kind of unicode conversion. However, if it gives the correct value, it may still be doing unicode conversion (i.e., Len may be returning the #characters, not #bytes) An alternate method that you could try is Open fileName For Input As 21 L = LOF(fileName) For i = 1 to L : Get#21, c$: m$ = m$ + c$ : Next Close 21 MsgBox "Verify " & Len(m$) & "=" & L ... Open outName For Binary Access Write As 33 ... Put #33, , CByte(byteValue) 'or Print #33, cs$; 'if cs$ is a string built of chr$ values ... Close 33 You might need to include Binary in the first open... (I write binary files a lot, but don't see any code right now that reads binary data). Hopefully it is obvious the fileName and outName should be strings with two different filenames (full path). You might want to use KILL to delete an existing file... just don't delete your source file
|
|