The
Volume Problem
The story we all know...
Most of us know how well Jack
fared after he cut the beanstalk.
After all, he walked away with
the goose that lays the golden
egg. Every morning, another
golden egg would be waiting
for him. Those eggs saved him
and his mother from poverty.
Before long, they were contented
suburban homeowners.
Until that fateful day when
Jack took up rollerblading.
He was having so much fun that
he left the golden egg under
the goose all day. That evening,
the egg hatched! Jack was dejected
about his lost revenue until
the next day, when he discovered
that both geese had laid golden
eggs. He could hardly believe
his good fortune. If he harvested
the eggs every other day instead
of every day, he would double
the number of gold-laying geese
every two days.
40 days later, he had 1,048,576
geese to take care of and gold
was so common that nobody wanted
it.
The lesson is simple: volume
always complicates matters.
Most recipes will work if you
double the ingredients. But
try multiplying by 50 or 100
and all you'll have is a mess
in the kitchen and a big room
full of hungry people.
The SGML Expert
High technology is no exception
to the problem of volume. Consider
Gus, for example. He is Acme
Corporation's resident SGML
expert, hired as part of Acme's
initiative to have all of its
product documentation stored
as SGML. Gus is a technical
wizard. He designed a DTD for
Acme in two weeks, and proudly
shows off chapter 1 of the Acme
Dustscraper Repair Manual, which
he tagged himself in just one
day.
A commendable effort, but there
are 10 chapters in the Acme
Dustscraper Repair Manual and
Acme has 100 manuals. It would
take Gus over 4 years to get
all that documentation into
SGML. Even if Acme could wait
4 years, they need Gus for other
things. After all, he's crucial
to ramping up the rest of the
company to the new SGML system.
Gus Days
So far we've determined that
having Gus convert all the data
is unacceptable. But what are
the other options? Well, the
work can be divided up among
Acme's staff, or temporary employees
can be hired specifically for
this project. Before we make
any such decisions, however,
it's important to determine
just how much effort is involved.
About 1,000 chapters need to
be converted. It takes Gus one
day to tag a chapter. We can
therefore assume an effort of
1,000 Gus-days (the four years
mentioned above). So, hire 100
Gus's and you'll be done in
two weeks. Easy!
Except for the volume problem.
Where are you going to find
100 SGML experts who are willing
to work for only two weeks?
And even if you could, can you
afford to pay 100 people what
you're paying Gus? And when
you do hire them, how are you
going to get all 100 to tag
the data the same way? Everyone
will have his/her own interpretation.
The only way to get useable
SGML from these experts is to
have Gus train them in his DTD.
Ah hah! If you're going to
need training anyway, hire unskilled
or semi-skilled workers at one
third the cost of Gus. That's
fine, but it will take them
three times as long.
The point is, what works for
low volume doesn't work for
high volume. New solutions are
required.
Software
An automated solution is ideally
suited for high volumes of data.
The computer is about 1,000
times faster than Gus. You've
finally solved the volume problem.
All you have to do is find or
develop software that will completely
and accurately convert your
data to SGML.
Guess what? You'd have an easier
time cloning Gus than getting
such a program. Why? Because
this isn't just a conversion.
You are adding structure to
your documents, which requires
inference and subjective decision-making.
The Best of Both Worlds
Ah, but surely the computer
can do most of the grunt work
and then Gus can fix it up afterwards.
Yes, combining automation with
expert review seems to be the
best approach. But only if it's
done right.
If you do enough damage to
your car, the insurance company
will give you money to buy another
one rather than fix the one
you have. Similarly, fixing
cookie-cutter SGML can actually
take longer than tagging it
by hand. It's clear that one
key to a successful conversion
is to automate as much as you
can as cleanly as you can.
Here is where Acme makes a
frightening discovery: an SGML
expert is not a conversion expert.
Gus doesn't know how best to
develop or configure a conversion
program. Why should he? That's
like asking a race car driver
to fix your car: it's simply
a different field of expertise.
What Does a Conversion
Expert Do?
Conversion is not a standard
field of knowledge. As far as
I know, there are no degrees
available: the most reliable
indicator of expertise is a
track record. So, even though
there is no universally accepted
methodology, I can cover some
guiding principles used at BCA
for managing a large conversion.
Standardization
Large volumes require standardization
to prevent chaos. Otherwise,
different interpretations will
generate inconsistent results.
BCA implements "conversion
specifications," which
detail every element in a document
and how it should be coded in
the new format. These specifications
are used as a standards document
throughout the project. Also,
BCA uses a project team approach,
with one data analyst per project.
This analyst is solely responsible
for interpreting how data should
be coded. All exceptions to
the written rules are brought
to him. Even details such as
file naming conventions are
standardized, because the smallest
discrepancy can snowball at
large volumes.
Customized Software
One key to successfully using
conversion software is to customize
it. BCA has developed its own
suite of conversion filters
that it configures to the specifications
of each project. It has even
created its own generic intermediate
formats. These robust "hub"
formats divide the conversion
in half so that changes in specs
require only partial rework
of data that's already been
converted.
Quality Control
As discussed earlier, it is
crucial to minimize the amount
of cleanup necessary after the
conversion is finished. While
it is true that BCA's editors
know nothing about Acme Dustscrapers,
they know plenty about SGML
(and all the other standard
electronic formats). These editors
parse the new SGML and then
do a "format review."
This second review is necessary
because parsed SGML is not necessarily
correct SGML.
The SGML is filtered into a
viewing package. Tags, which
require slow, tedious checking,
are converted to visual cues.
It then becomes immediately
apparent to an editor if something
is tagged right or not, simply
by comparing it to the original
hard copy.
Customer Feedback
The most critical element of
quality control is customer
feedback. BCA keeps the entire
conversion process open to Acme,
so that a misunderstanding doesn't
result in thousands of mistagged
pages. Normally, two samples
are provided to the customer
before the volume work begins.
These samples, along with the
conversion specifications, must
be approved by the client at
the start.
Once the conversion is underway,
partial deliveries are sent
to the client as they are completed.
This is more than just checking
BCA's work. "Live"
data gives Acme a better understanding
of how it will best implement
new data on its new system.
Experience
For most companies, conversion
is a rare occurrence. Therefore,
no past experience exists to
provide guideposts and warning
signs. BCA has converted millions
of pages to and from every major
format. Which brings us to our
conclusion.
No Surprises
Perhaps the most pernicious
problem of large volumes is
that the work involved is impossible
to predict. In other words,
even if you do budget for all
the Gus days you think you need,
you might very well need more.
This could lead to disgruntled
workers and even more disgruntled
executives.
BCA has learned, through experience,
to make its process flexible
enough to stay on schedule.
Problems are either avoided
or prepared for in advance.
Potential concerns are brought
to the customer before they
multiply. To put it simply,
you can get away with a little
sloppiness when you have one
goose, but a million geese demand
serious attention.
Your company is not set up
to be a conversion house. I
recommend you hire someone who
is. Otherwise, you just might
lay an egg. |