Comments

You must log in or register to comment.

COMPUTER1313 OP t1_j8qbb1f wrote

> D. Kriesel, a German Ph.D. student studying computational geometry, encountered a strange problem when scanning a blueprint on a common Xerox office scanner. The numbers denoting the square footage of rooms were totally wrong, and what's more, they changed when he scanned the blueprint again.

>Intrigued, Kriesel tried scanning a table of costs and figures. Numbers changed again—but not wildly, just by a little bit: 54.60 became 54.80, for instance.

Another article on that same news: https://www.theregister.com/2013/08/06/xerox_copier_flaw_means_dodgy_numbers_and_dangerous_designs/

Yeah, that could potentially result in lawsuits or other legal mess with Xerox being caught in the crossfire. Such as a purchase contract's or construction plan's numbers being silently changed.

326

EmperorJake t1_j8qc7qw wrote

I know it was 1216! One after Magna Carta!

515

SketchyApothecary t1_j8qjemm wrote

This isn't limited to Xerox. Lots of scanning recognition devices/programs have trouble differentiating 6s and 8s in some fonts when trying to convert images to text fields, and occasionally other numbers get mixed up as well.

33

Hattix t1_j8qkmql wrote

No it wouldn't.

The use of the aggressive JBIG2 setting was not default. You had to change it yourself on the MFD's interface, where there was a warning saying that this setting could cause character substitution errors.

23

PMzyox t1_j8qnivz wrote

Yeah I saw that episode of Better Call Saul, so

9

n1gr3d0 t1_j8qt5x3 wrote

The fun part is that was not about recognition. Scanning shouldn't do any OCR, so in that context any meaningful character manipulation (like replacing one character with another one) look shady as hell. Thankfully it turned out to be just an overly zealous compression algorithm.

63

avipars t1_j8qyr6t wrote

Does this effect scantrons for tests ?

0

Gathorall t1_j8r7dsg wrote

That tracks, magnitudes easier "though shouldn't matter nowadays" to tell the head to put "8 in black" in a certain spot rather than tell the precise location and color of every constituting dot.

−3

surelythisisfree t1_j8rdhnl wrote

Most copier manufacturers have their own pdf compression that generally puts a scanned page between 50 and 150kB, down from about a Meg if they don’t do anything fancy. I only realised after years of working with them how that isolated out things that looked like letters and basically averaged each letter representation on the page to slow better compression. The only reason I realised was due to a big in a released firmware that only affected compact pdf (that was quickly pulled within 24 hours). The big basically made all the letters not line up in a row on each page so they’d move up and down the line a bit.

10

brazzy42 t1_j8rfkb9 wrote

Note: the bug had existed for 8 years, and there's a good chance that there are still scanners out there being used which have the bug and were never updated.

135

girhen t1_j8soe26 wrote

MCCXXXIVDLXVII - >!1,234,567!<

MMMCDLVMDCCLXXXIX - >!3,456,789!<

I know you're joking, but it's always nice to add perspective.

Also, some of my coworkers frequently have to read legal documents in paragraphs, sections, etc and convert lines to outlines. Nothing like deciphering section i from section i in an outline - meaning a section after h and before j at one level vs the first Roman numeral at another. It's a PITA.

9

marmorset t1_j8ssfjv wrote

I was using a scanner with OCR software in 1991. It was a somewhat new technology, but not cutting edge. I worked for a small publishing company and while the OCR wasn't perfect, it was still much better than having to retype an entire book.

11

PC-hris t1_j8sxql1 wrote

How does that even happen? Doesn’t a photo copier just directly copy what it sees? I wouldn’t think such an old one would be content aware enough to make changes like that

3

paulsteinway t1_j8t1znj wrote

The actual text in the article is clipped on the right side so a lot of sentences make no sense. I had to highlight the text and paste it into a text editor to make it readable.

3

syncboy t1_j8t4xto wrote

OCR technology wasn’t very good until very recently. It also used to change text, letters, capitalization, etc.

1

bothunter t1_j8t952g wrote

This had nothing to do with OCR -- it was due to compression artifacts. Specifically it looked for common patterns and created a lookup table to save space. It's just that some numbers and letters look similar enough to that algorithm that it didn't notice that a 6 and 8 were different "patterns"

And that was the issue -- nobody was using the Xerox to scan the text -- they just wanted it to make identical copies like all other Xerox machines did in the past.

12

badassbridge t1_j8un5i3 wrote

This happened at a small, ethical company most people haven’t heard of. It was called Enron!

1

sadi89 t1_j8unvhi wrote

Damn, who knew scanners could also have dyslexia.

3

planchetflaw t1_j8w5bc7 wrote

I believe it was digitizing the scans as a file. If you selected the Normal quality mode, it would use a lookup table to save file size by compressing in the stated standard. At least that was my understanding.

1

disneyvacafacts t1_j8wrjnb wrote

As an embedded engineer, I'm proud that I guessed the problem after only reading the first paragraph, having never done this exact type of engineering.
All I did was think of how I would have personally handled these scans and how I could have screwed it up.

1