Thursday, November 14, 2013

The Questions that Keep Me Awake at Night: Copy Machines

How many times can you make a photocopy of a photocopy before losing legibility?

Not having an actual photocopier that I can abuse, I'm going to have to emulate this, which means I am going to actually have to have some sort of understanding of copy machine resolution and what sort of generation loss I can expect.  That's the hard part.  The easy part is building an emulator, which will be a Python script that take in some jpg or whatever, applies an entropy function, and outputs the resultant jpg or whatever.  Turing bless the PIL [Python Imaging Library], which is going to do all the heavy lifting in the code without making me bother my pretty little head about the details.

I selected a sample photocopier to emulate on the extremely rigorous basis of the first thing that came up at Amazon when I searched for a copy machine. The important thing to note here is that the photocopy resolution is 600x600dpi.   Because I don't understand the complexities of dpi vs ppi, I am going to claim that in this case, there is actually a 1 to 1 conversion.  That's true in some cases at least  Feel free to tell me in the comments how wrong I am for making this assumption.  The image below is 72x72 ppi, thank you GIMP print resolution feature.  It is well within the resolution limits of my imaginary copier.



Now, what can go wrong with in the process of copying?  The digital process of storing and copying an image should not have any inherent generation loss, provided the resolution of the image is not above the resolution of the copier.  There is some evidence to suggest that sometimes characters will change into other characters depending on what sort of compression algorithm is being used.  Xerox claims that this can all be avoided by changing settings.  Since this isn't a Xerox machine I'm emulating, I think I'm going to ignore this.

Based on that pinnacle of internet research, howstuffworks.com, the process of copying is complicated and involves a lot of different things like lights and toner all working properly.  I don't know how failure of any one piece translates into image quality when copying.  So I got the user manual for this copy machine and looked at the trouble shooting section to see what commonly goes wrong and came up with the following list and hacked together some functions that will produce each effect.

  • uneven printing
  • white specks
  • vertical streaks
  • smudges or spatters appearing
  • printing is dark
  • bottom edge has smudge marks
  • portion of the page not printed

None of the individual functions will produce much of a noticeable effect.  If something went horribly wrong, you would redo the copy, right?  The question is, I think, how quickly small accumulating errors lead to completely terrible resolution.  The only assumption I need to make here, is how often such a small error will occur.  I am going to claim an error happens 4 times out of 10.  If this seems arbitrary, that's because it is.

Now for my emulator.  At each copy pass, there is a 40% probability that an error will occur.  An error will be chosen at random from the above list.  Each error has an equal probability of occurring in my emulation.


Below are my results

Copied 30 times. 

Copied 50 times. 

copied 100 times. 
copied 500 times. 


If I adjust the numbers such that visible (to the casual eye) errors are made by the error function, illegibility happens a lot faster.


Copied 10 times. 

Copied 20 times. 

Copied 30 times. 


To make this smell like science I'd have to fiddle with a lot more of the variables, but I think I'm bored at this point.  So I'm going to leave this as is.

Copied below the fold is my code, for anyone interested.




from PIL import Image
import random



def bottomsmudge(img):
    px=img.load()
    width, height=img.size
 
    x=15
    for i in range(0,width):
        for j in range(height-x,height):
            if j%30==0:
                model=px[i,j]
                #model=(0,0,0)
                for k in range(x):
                    px[i-k,j-k]=model
                    px[(i+1)-k,j-k]=model
                    px[(i+2)-k,j-k]=model
                    px[(i+3)-k,j-k]=model  
                    px[(i+4)-k,j-k]=model
    return img



def smudge(img):
    px=img.load()
    width, height=img.size
    h=5
    rand1=random.randrange(1,501)
    rand2=random.randrange(1,501)
    for i in range(width):
        for j in range(height):
            if j%rand1==0 and i%rand2==0 and j-h>0 and i-h>0 and i+h<width \
               and j+h<height:
                model=px[i,j]
                #model=(0,0,0)
                for k in range(h):
                    px[i-k,j-k]=model
                    px[(i+1)-k,j-k]=model
                    px[(i+2)-k,j-k]=model
                    px[(i+3)-k,j-k]=model  
                    px[(i+4)-k,j-k]=model
    return img


#cut off bottom
def cutoff(img):
    px=img.load()
    width, height=img.size
 
    x=5
    for i in range(0,width):
        for j in range(height-x,height):
            px[i,j]=(0xfff,0xfff,0xfff)
 
    return img


#light half the image
def unevenprinting(img):
    px=img.load()
    height, width=img.size
    #print rand1, rand2
    x=5
    for i in range(height/2,height):
        for j in range(width/2,width):
            tmp=(px[i,j][0]+x,px[i,j][1]+x,px[i,j][2]+x)
            px[i,j]=tmp
 
    return img


#error functions, each takes an image
def whitespecks(img):
    px=img.load()
    width, height=img.size
    rand1=random.randrange(1,501)
    rand2=random.randrange(1,501)
    #print rand1, rand2
    for i in range(width):
        for j in range(height):
            h=1
            if j%rand1==0 and i%rand2==0 and j-h>0 and i-h>0 and i+h<width \
               and j+h<height:
                #print "here, j=", j
                for k in range(h):
                    px[i,j+k]=(0xfff,0xfff,0xfff)
                    px[i,j-k]=(0xfff,0xfff,0xfff)
                    px[i+k,j]=(0xfff,0xfff,0xfff)
                    px[i-k,j]=(0xfff,0xfff,0xfff)
                    break
         
    return img          

def darken(img):
    px=img.load()
    height, width=img.size
    x=random.randrange(10)      
    for i in range(height):
        for j in range(width):
            tmp=(px[i,j][0]-x,px[i,j][1]-x,px[i,j][2]-x)
            px[i,j]=tmp
    return img          

def verticalstreaks(img):
    px=img.load()
    width, height=img.size
    rand1=random.randrange(1,931)
    rand2=random.randrange(1,731)
    #print rand1, rand2
    for i in range(width):
     
        for j in range(height):
            h=5
            if j%rand1==0 and i%rand2==0 and i+h<width and j+h<height:
                for k in range(h):
                    px[i,j+k]=(0,0,0)
                    px[i+1,j+k]=(0,0,0)
                 
    return img      

#copy the image a certain number of times
def copycopies(img):
    newimg=img.copy()          
    for i in range(500):
        r1=random.randrange(100)
     
        if r1>40:
            r=random.randrange(7)
            if r==0:
                newimg=whitespecks(newimg)
            elif r==1:
                newimg=darken(newimg)
            elif r==2:
                newimg=verticalstreaks(newimg)
            elif r==3:
                newimg=unevenprinting(newimg)
            elif r==4:
                newimg=cutoff(newimg)
            elif r==5:
                newimg=smudge(newimg)
            elif r==6:
                newimg=bottomsmudge(newimg)
        else:
            newimg=newimg.copy()

    return newimg




def main(filename):      
    img=Image.open(filename)
    newimg=copycopies(img)
    newimg.save("copiedrun2.jpg")
main("twigabw.jpg")



2 comments:

  1. One thing that will make this experiment more informative is if you use a picture of a gorilla rather than a giraffe. It is a well known fact to those of us involved in Information "Theory" that giraffe pictures create intrinsic interference in information transfer.

    Otherwise, my only suggestion is to revisit your "verticalstreaks" routine and use prime numbers for the upper bounds on the randrange for rand1 and rand2. One of the fundamental "Theorems" in Information "Theory" shows the deep connection between prime numbers and information transmission.

    ReplyDelete
  2. Now do you want me to use randrange(primenum) or randrange(primenum+1) since randrange(n) returns a number in the range 0 to n not inclusive?

    ReplyDelete