SegaXtreme

Home	Forums	What's new	Resources

Hex to Bin

paul_met - Dec 25, 2018

paul_met

Dec 25, 2018

Greetings. Maybe someone has an algorithm for converting from HEX to BIN (for SH2 asm)?
For example:
Code:

    AD (hex)= 10101101 (bin)

Binary value must be written in HEX format.
-------------------
--------------------
------
This is my version of the code. Perhaps someone will have a smaller code.
Code:

    00 MOV #0xXX, R6 (number of convertible bytes) 02 MOV #0x00, R5 (jump offset #1) 04 MOV.B @R0 +, R1 (reading curent byte) 06 SHAR R1 08 MOVT R2 (get low bit) 0A SHAR R1 0C MOVT R3 (get high bit) 0E SHLL2 R3 10 SHLL2 R3 12 BRAF R5 + 0x16 (jump to next step) 14 OR R3, R2 16 MOV R2, R4 (get fourth byte) 18 BRA 0x06 (repeat conversion) 1A MOV #0x06, R5 (jump offset #2) 1C SHLL8 R2 1E OR R2, R4 (get third byte) 20 BRA 0x06 (repeat conversion) 22 MOV #0x0E, R5 (jump offset #3) 24 SHLL16 R2 26 OR R2, R4 (get second byte) 28 BRA 0x06 (repeat conversion) 2A MOV #0x16, R5 (jump offset #4) 2C SHLL8 R2 2E SHLL16 R2 30 OR R2, R4 (get first byte) 32 DT R6 (cycle identificator) 34 MOV.L R4, @R7 (writing 4 bytes) 36 BF/S + 0x02 (return at beginning) 38 ADD #0x04, R7 (writing step)

		antime	Dec 25, 2018
		Use a look-up table? 4- or 8-bit, depending on what you want to optimize for.

		paul_met	Dec 25, 2018
		To convert 1 bpp tiles to 4 bpp.

antime

Dec 25, 2018

I'm still not sure I understand what you're asking, but this converts 1-4 ASCII hex digits to a number (uppercase assumed). Assembly output compiled with -Os.

Code:

    uint32_t cvt(const char *ascii, unsigned int len) { static const int8_t table[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0, 10, 11, 12, 13, 14, 15 }; uint32_t result = 0; for (unsigned int i = 0; i < len; i++) { int8_t nybble = table[ascii[i] - '0']; result <<= 4; result |= nybble; } return result; }

Code:

    00000000 <_cvt>: 0: d2 07 mov.l 20 <_cvt+0x20>,r2 ! 0 <_cvt> 2: e0 00 mov #0,r0 4: 75 01 add #1,r5 6: 45 10 dt r5 8: 8b 01 bf e <_cvt+0xe> a: 00 0b rts c: 00 09 nop e: 61 44 mov.b @r4+,r1 10: 40 08 shll2 r0 12: 31 2c add r2,r1 14: 71 d0 add #-48,r1 16: 61 10 mov.b @r1,r1 18: 40 08 shll2 r0 1a: af f4 bra 6 <_cvt+0x6> 1c: 20 1b or r1,r0 1e: 00 09 nop 20: 00 00 .word 0x0000

		paul_met	Dec 26, 2018
		For example, there is a tile font in the format of 1 bit per pixel (monochrome). I need to convert it to 4 bits per pixel (16 colors).

antime

Dec 26, 2018

So not at all what you originally asked about... Assuming the foregroung and background colors are constant, a pure C implementation is slightly longer. It might be possible to squeeze a couple of instructions out of the assembly by manual register allocation, but this is trivially unit-testable on the host.
Code:

    #define FG 0x0F #define BG 0x00 void cvt(const char *input, int len, char *output) { const char table[4] = { (BG << 4) | BG, (BG << 4) | FG, (FG << 4) | BG, (FG << 4) | FG }; for (int i = 0; i < len; i++) { unsigned int bits = *input++; for (int j = 3; j >= 0; j--) { output[j] = table[bits & 3]; bits >>= 2; } output += 4; } }

Code:

    00000000 <_cvt>: 0: d1 0f mov.l 40 <_cvt+0x40>,r1 ! ff0ff 2: 7f fc add #-4,r15 4: 45 11 cmp/pz r5 6: 2f 12 mov.l r1,@r15 8: 61 53 mov r5,r1 a: 8d 01 bt.s 10 <_cvt+0x10> c: 71 01 add #1,r1 e: e1 01 mov #1,r1 10: 41 10 dt r1 12: 8f 02 bf.s 1a <_cvt+0x1a> 14: 63 63 mov r6,r3 16: 00 0b rts 18: 7f 04 add #4,r15 1a: 74 01 add #1,r4 1c: 60 43 mov r4,r0 1e: 70 f0 add #-16,r0 20: 84 0f mov.b @(15,r0),r0 22: 73 04 add #4,r3 24: 65 03 mov r0,r5 26: 66 33 mov r3,r6 28: e2 04 mov #4,r2 2a: e7 03 mov #3,r7 2c: 27 59 and r5,r7 2e: 60 73 mov r7,r0 30: 07 fc mov.b @(r0,r15),r7 32: 42 10 dt r2 34: 23 74 mov.b r7,@-r3 36: 8f f8 bf.s 2a <_cvt+0x2a> 38: 45 09 shlr2 r5 3a: af ea bra 12 <_cvt+0x12> 3c: 41 10 dt r1 3e: 00 09 nop 40: 00 0f mac.l @r0+,@r0+ 42: f0 ff .word 0xf0ff

paul_met

Dec 27, 2018

Some strange syntax I do not understand. What does this mean?
Code:

    cvt .word 0xf0ff

		antime	Dec 27, 2018
		0x40-0x44 is the lookup table.

paul_met

Jan 2, 2019

Significant optimization, if you convert 1 bpp reverse-order to 4 bpp (-12 instructions and -2 registers).
Code:

    00 MOV #0xXX, R4 (number of convertible bytes) 02 MOV #0x00, R3 (constant) 04 MOV.B @R0 +, R1 (reading curent byte) 06 SHAR R1 08 MOVT R2 (get high bit) 0A SHLL2 R2 0C SHLL2 R2 0E SHAR R1 10 ADDC R2, R3 (get bytes) 12 CMP/PL R1 (checking end of bit conversion) 14 BF + 0x1A (jump to writing 4 bytes) 16 BT/S + 0x06 (continue bit conversion) 18 SHLL8 R3 1A MOV.L R3, @R5 (writing 4 bytes) 1C DT R4 (decrement of cycles) 1E BF/S + 0x02 (return at beginning) 20 ADD #0x04, R5 (writing step)

antime

Jan 2, 2019

Your "end of bit conversion" condition doesn't work. Depending on the MSB of the byte loaded from @r0, the cmp/pl is either always false, or always true (loads are sign-extending, and shar is sign-preserving, and the comparison is signed). Also, if any bit pattern is valid you can't early-out.

Ignoring that, you could remove one register by using rotates instead of adds:
Code:

    00000000 <cnv>: 0: e4 63 mov #99,r4 2: e2 00 mov #0,r2 4: 61 04 mov.b @r0+,r1 6: 41 21 shar r1 8: 42 24 rotcl r2 a: 42 00 shll r2 c: 42 08 shll2 r2 e: 41 21 shar r1 10: 42 24 rotcl r2 12: 41 15 cmp/pl r1 14: 8b 01 bf 1a <cnv+0x1a> 16: 8d f6 bt.s 6 <cnv+0x6> 18: 42 18 shll8 r2 1a: 25 22 mov.l r2,@r5 1c: 44 10 dt r4 1e: 8f f0 bf.s 2 <cnv+0x2> 20: 75 04 add #4,r5

Getting the data in the right order would just require shifting the byte left by 24, and then shifting out the MSB.

paul_met

Jan 2, 2019

Yes, I forgot to add EXTU.B R1, R1. The code should look like this:
Code:

    00 MOV #0xXX, R4 (number of convertible bytes) 02 MOV #0x00, R3 (constant) 04 MOV.B @R0 +, R1 (reading curent byte) 06 EXTU.B R1, R1 08 SHAR R1 0A MOVT R2 (get high bit) 0C SHLL2 R2 0E SHLL2 R2 10 SHAR R1 12 ADDC R2, R3 (get bytes) 14 CMP/PL R1 (checking end of bit conversion) 16 BF + 0x1C (jump to writing 4 bytes) 18 BT/S + 0x08 (continue bit conversion) 1A SHLL8 R3 1C MOV.L R3, @R5 (writing 4 bytes) 1E DT R4 decrement of cycles 20 BF/S + 0x02 (return at beginning) 22 ADD #0x04, R5 (writing step)

But your code gives the wrong result after second cycle.

antime

Jan 4, 2019

Because I was bored, here's another solution. With the table it's longer, but should run quicker. Obviously when optimizing for speed you can't beat a full 1K table, but I think this is a reasonable compromise.
Code:

    ! Parameters: ! r4 = source ! r5 = length (>= 1) ! r6 = dest ! Local registers: ! r0 = table pointer ! r1 = constant 0xf ! r2 = low nybble ! r3 = high nybble cnv: mov #0xf, r1 mova table, r0 loop: mov.b @r4+, r2 mov r2, r3 shlr2 r3 shlr2 r3 and r1, r2 and r1, r3 mov.w @(r0, r2), r2 mov.w @(r0, r3), r3 shll16 r2 xtrct r3, r2 mov.l r2, @r6 dt r5 bf/s loop add #4, r6 rts nop .align 2 table: .word 0x0000 .word 0x0001 .word 0x0010 .word 0x0011 .word 0x0100 .word 0x0101 .word 0x0110 .word 0x0111 .word 0x1000 .word 0x1001 .word 0x1010 .word 0x1011 .word 0x1100 .word 0x1101 .word 0x1110 .word 0x1111