From Fedora Project Wiki
Line 79: Line 79:


= pixel data =
= pixel data =
For the image pixeldata, I created the pattern, and used regex.findall to return a list of strings containing two-character hex codes.
<pre>
hexbytespattern = regex.compile('0x([[:xdigit:]]{2})', regex.DOTALL)
matchhexobject = hexbytespattern.findall(sDataRaw)
print(matchhexobject)
</pre>
I stopped trying regexs once I could see the example data had been correctly extracted:
<pre>
['00', '00', 'c0', '3f', '00', '00', '00', '00', 'f8', 'ea', '01', '00', '00', '00', ...]
</pre>
The .DOTALL was important to continue the search after a newline character.
= identifying the image name and size =
I added some logic to grab the first imagename1, and confirm it is the same as the other two imagenames. Also, I needed the width of the image to work out how to arrange the data when displaying the image. The width was also available so extracted that to an integer as well.
<pre>
  sImageName = matchobject[0][0]
  if(sImageName != matchobject[0][3]
      or sImageName != matchobject[0][6]):
    print('  ImageName={0} does not match'.format(sImageName))
  else:
    print('  ImageName={0}'.format(sImageName))
    if matchobject[0][1] == 'width':
      nImageWidth = int(matchobject[0][2])
    elif matchobject[0][4] == 'width':
      nImageWidth = int(matchobject[0][5])
    else:
      nImageWidth = 0;
      print('  ImageWidth not defined')
    if matchobject[0][1] == 'height':
      nImageHeight = int(matchobject[0][2])
    elif matchobject[0][4] == 'height':
      nImageHeight = int(matchobject[0][5])
    else:
      nImageHeight = 0
      print('  ImageHeight not defined')
    nPixels = nImageWidth * nImageHeight
    print('  ImageSize={}x{} = {} pixels'.format(nImageWidth, nImageHeight, nPixels))
</pre>
= convert list of strings to a binary array =
Given python doesn't have arrays, I was interested to see bytearrays added to python3. Also I found a bitarray library to do some work for me.
<pre>
    pixelbytestring = ''.join(matchhexobject)
    print('  pixelbytestring:')
    print(pixelbytestring)
import binascii
    pixelbytes = binascii.unhexlify(pixelbytestring)
    print('  pixelbytes:')
    print(pixelbytes)
from bitarray import bitarray
    pixelarray = bitarray()
    pixelarray.frombytes(pixelbytes)
    print('  pixelarray:')
    print(pixelarray.endian())
    BitmapDraw(nImageWidth, nImageHeight, pixelarray)
</pre>
It took some time to find a way to convert the list of strings containing 2 hex characters into a single string which bitarray needed. Firstly I str.joined the list into a single string. Next binascii provided an unhexlify which converted that into a bytearray. Next bitarray.frombytes converted this into the bitarray format <code>e.g. '1001011110100101' etc. Now this can be passed to a function which takes the width and height and raw binary pixelarray.
= drawing the image =
<pre>
</pre>
<pre>
</pre>

Revision as of 09:18, 28 December 2016

X BitMaps: extract image data and display in ascii terminal

Intro

While being distracted from my previous distractions from an earlier distraction (fonts), I was intrigued by: [Great 202 Jailbreak - Computerphile]

The report [a Summer Vacation: Digital Restoration and Typesetter Forensics] included a link to [archive made available of Martin W. Guy's backup to tape from the 80s], where the authors found some data they used either directly or to confirm their earlier guesses about construction of the document. This appears to have taken about 6-8 weeks of work to rebuild one printed report from various information they were able to find or still had in hand. But I digress.

Within the archive index was images described as: Mike Hawleys's collection of tiny X bitmaps (Dec 1988) Including: [Brian Kernighan].

Unknown image type

After clicking the extension-less file I saw:

#define bwk_width 48
#define bwk_height 48
static char bwk_bits[] = {
0x00, 0x00, 0xc0, 0x3f, 0x00, 0x00, 
0x00, 0x00, 0xf8, 0xea, 0x01, 0x00, ...

Hoping to find information to help find an application that could show this source code, I saved it to disk and tried file: bwk.image.c_source: ASCII text. Seeing this is c source code, I assumed that this was used by directly compiling into a larger c application. What I could have done was attempt to identify the file with:

test result
ffprobe bwk.image.c_source: Invalid data found when processing input
gimp bwk.image.c_source' failed: Unknown file type
imageinfo XBM X Windows system bitmap (black and white) 1850 8 48x48

using: imageinfo --format --fmtdscr --size --depth --geom bwk.image.c_source

imagemagick identify XBM 48x48 48x48+0+0 8-bit sRGB 2c 1.85KB 0.000u 0:00.000

Python workout

Given the things I tried hadn't made me any the wiser, I considered starting a c app, to include the file and code something to view it somehow. Expanding my python skills was more important, so I began looking at the structure of the file to plan how to proceed: - read the file - get the width - get the height - get the image data - transform / feed into an image creation library to create a png/bmp - whatever was easiest. - not knowing about the file format I decided to also grab the filename (bwk) from the defines, assuming that you could define more than 1 image in a file, and you need to pick the right defines and data for a single file.

Python development environment

Half the problem is to find and setup a dev env to speed the development. I started with: python3, gedit, gnome-terminal, firefox (google, python manual, stackoverflow). Hacking involved trying stuff in the python3 interpreter, and then copy paste into my code.py in gedit.

Later I started using bluefish editor, with a custom command for python: gnome-terminal --geometry=100x50+1200+0 --working-directory='%c' -e "bash -c \"python3 '%f'; read -n1 junk\"" Clicking Python starts the terminal, with the correct directory, starts python3 with the file in the editor, and pauses the terminal output until a key is pressed - necessary to see interpreter messages and my hacking output.

file read

Getting the text of the file into a string in memory was easy:

fhand = open('bwk.image.c_source.txt')
sDataRaw = fhand.read()

regular expressions

I learnt a lot about regex's by using the re module, and then the extended regex module to detect conforming file content. The [regex builder/tester] was useful. At first I tried to match the two #define lines, and extract the match group data, leaving the pixel data for a second regex.

import re
...
matchobject = re.search('.*#define ([[:alpha:]]{1,3})_([[:alpha:]]{4,6}) ([[:digit:]]{1,2}).*', sDataRaw)
if matchobject:
  print(matchobject)

However, this would only show the first match. I extended this to match the overall file structure extracting: imagename1, metric1, value1, imagename2, metric2, value2, imagename3, and which had data that looked like a c string of 0xab hex values. Since I needed multiple matches, I changed to regex library instead:

import regex
...
pattern = regex.compile('#define ([[:alpha:]]{1,8})_([[:alpha:]]{4,6}) ([[:digit:]]{1,2})\n.*#define ([[:alpha:]]{1,8})_([[:alpha:]]{4,6}) ([[:digit:]]{1,2}).*static char ([[:alpha:]]{1,8})_bits\[\] *= *.*[, \n0x[:xdigit:]]+\};', regex.DOTALL)

pixel data

For the image pixeldata, I created the pattern, and used regex.findall to return a list of strings containing two-character hex codes.

hexbytespattern = regex.compile('0x([[:xdigit:]]{2})', regex.DOTALL)
matchhexobject = hexbytespattern.findall(sDataRaw)
print(matchhexobject)

I stopped trying regexs once I could see the example data had been correctly extracted:

['00', '00', 'c0', '3f', '00', '00', '00', '00', 'f8', 'ea', '01', '00', '00', '00', ...]

The .DOTALL was important to continue the search after a newline character.

identifying the image name and size

I added some logic to grab the first imagename1, and confirm it is the same as the other two imagenames. Also, I needed the width of the image to work out how to arrange the data when displaying the image. The width was also available so extracted that to an integer as well.

  sImageName = matchobject[0][0]
  if(sImageName != matchobject[0][3]
      or sImageName != matchobject[0][6]):
    print('  ImageName={0} does not match'.format(sImageName))
  else:
    print('  ImageName={0}'.format(sImageName))
    if matchobject[0][1] == 'width':
      nImageWidth = int(matchobject[0][2])
    elif matchobject[0][4] == 'width':
      nImageWidth = int(matchobject[0][5])
    else:
      nImageWidth = 0;
      print('  ImageWidth not defined')

    if matchobject[0][1] == 'height':
      nImageHeight = int(matchobject[0][2])
    elif matchobject[0][4] == 'height':
      nImageHeight = int(matchobject[0][5])
    else:
      nImageHeight = 0
      print('  ImageHeight not defined')
    nPixels = nImageWidth * nImageHeight
    print('  ImageSize={}x{} = {} pixels'.format(nImageWidth, nImageHeight, nPixels)) 

convert list of strings to a binary array

Given python doesn't have arrays, I was interested to see bytearrays added to python3. Also I found a bitarray library to do some work for me.

    pixelbytestring = ''.join(matchhexobject)
    print('  pixelbytestring:')
    print(pixelbytestring)

import binascii
    pixelbytes = binascii.unhexlify(pixelbytestring)
    print('  pixelbytes:')
    print(pixelbytes)

from bitarray import bitarray
    pixelarray = bitarray()
    pixelarray.frombytes(pixelbytes)
    print('  pixelarray:')
    print(pixelarray.endian())

    BitmapDraw(nImageWidth, nImageHeight, pixelarray)

It took some time to find a way to convert the list of strings containing 2 hex characters into a single string which bitarray needed. Firstly I str.joined the list into a single string. Next binascii provided an unhexlify which converted that into a bytearray. Next bitarray.frombytes converted this into the bitarray format e.g. '1001011110100101' etc. Now this can be passed to a function which takes the width and height and raw binary pixelarray.

drawing the image