| Figure
II: The lifecycle of
a conversion project can be divided
into four distinct phases.
A key benefit of this phased
approach is that there are a
number of specific checkpoints
at which you can reconsider
the project in terms of new
information and redirect it
to best fit where you're going.
Phase 1:
Concept and Planning - The
purpose of this phase is to
get everyone to agree to a
common definition of what
the project is. You'll want
to lay out the project objectives
and expectations, define the
success criteria, lay out
a preliminary approach, identify
the risk areas, estimate approximate
cost ranges and define a preliminary
budget.
Phase 2:
Proof of Concept - The purpose
of this vital step is to test
your approach on a limited
scale, paying particular attention
to the areas identified as
potential risk areas. The
results of this phase will
help you arrive at a more
detailed plan, while further
fleshing out functional requirements.
Based on the results of the
test, preliminary software
is prepared, and cost projections
are fine-tuned.
Phase 3:
Analysis, Design, and Engineering
- This is the critical step
where all the details get
worked out and the project
gets prepared for volume production.
Specifically, keying and conversion
specifications are finalized,
cleanup and review guidelines
are defined, and final production
costs are confirmed. More
generally, the entire conversion
process is finalized and tested,
and production rampup begins.
Phase 4:
Production - This is the step
we've all been waiting for
- data flowing smoothly at
500 or 50,000 pages a week.
Provided that the preceding
3 steps were done well, there's
not a whole lot to say about
this phase. What you will
need, however, are some tools
in place to closely monitor
quality and productivity.
In addition to the four phases,
there are two other important
aspects to this methodology.
These are the underlying disciplines
shown as two stripes at the
bottom of figure II. As in any
large project, management and
quality control are critical
and apply to every phase. Ideally,
a single person will oversee
both disciplines in order to
guarantee continuity.
OK. This sounds like
a plan, but who does the work?
It all depends on the staffing
and the experience that you
have available.
While the nuts and bolts of
doing the conversion require
some specialized skills and
facilities, the actual planning
and management process requires
much the same skills required
to manage any large, complex
project. If you have people
with experience available to
dedicate to this effort, then
you can probably do it internally.
If not, you may want to consider
outsourcing. The key issue is
not to overlook the fact that
this effort will need dedicated
project management talent.
Phase 1: Concept and
Planning
Although it is an important
step, this can be a pretty short
one if you've carefully thought
through exactly what you want
to happen.It may just be a day
or two, though it's more likely
to be several weeks. The major
elements of this phase are described
below:
Project Concept -
Everybody needs to be on the
same page. The first step is
to clearly define the project,
and to get an agreement that
people's various expectations
are the same. You simply cannot
meet a goal that you don't know
about in advance. At this point,
the project concept is discussed
at a high level, without getting
bogged down in detail. The following
are the critical questions that
need to be answered honestly.
- What do you need to do,
and how quickly do you need
to do it?
- Do you have a technical
approach in mind?
- What are the goals and what
are the success criteria?
- What's critical and what's
nice to have?
- What's the expected budget?
And what are estimated costs?
- Where are the tradeoffs
in time, budget, and functionality?
The end result of this analysis
is a Project Concept document.
Materials Evaluation
- While a detailed inventory
of materials does not usually
get done until the Proof of
Concept phase, it is critical
to get an early understanding
of the project's scope. Design
and implementation decisions
on where best to focus resources
will be based on this information.
This is illustrated by the chart
in Figure III, and while the
specific questions will vary
from project to project, typical
questions are:
The end result of this analysis
is a Project Concept document.
Materials Evaluation
- While a detailed inventory
of materials does not usually
get done until the Proof of
Concept phase, it is critical
to get an early understanding
of the project's scope. Design
and implementation decisions
on where best to focus resources
will be based on this information.
This is illustrated by the chart
in Figure III, and while the
specific questions will vary
from project to project, typical
questions are:
- How big is the project?
You need to quantify in terms
you're used to thinking in
- pages, books, journal issues,
products, etc?
- How much source variation
is there? - Materials may
have been produced in a multitude
of electronic formats, on
different computer operating
systems, or by different typesetters.
Some of it may even live as
paper, under dust, in huge
warehouses.
- How much format variation
is there? How often has the
layout format changed over
the years? Invariably, different
authors choose to lay out
in different ways; while it
would be nice to have a strictly
enforced template, if you're
dealing with legacy data,
you're bound to find a lot
of formatting inconsistency.
- What are the special issues?
Tables, formulas, cross-referencing
and graphics are all areas
that need special attention
in the planning process.
All of these critical issues
will differ slightly from project
to project; it's a good idea
to lay them out explicitly in
a format like Figure III.

Figure III:
The materials evaluation sheet
will be invaluable in helping
you understand the scope of
the conversion task.
Rough-Cut Pricing Estimate
- Usually, there is not enough
information available this early
in the process to allow accurately
predicting the project's overall
production costs. There are
simply too many variables that
will not be finalized until
well into Phase II. However,
it is possible (and useful)
to start assembling rough-cut
costing parameters.

Figure IV:
It's important, early on, to
get a feel for what the rough-cut
numbers will be.
It's generally a good idea
to use a chart like the one
shown in Figure IV to lay out
what the major tasks in the
production process are. Alongside
this, cost ranges are laid out
for what those tasks have historically
taken. What this provides you
with, are some very broad ranges.
These are useful, both for feasibility
analysis ("I didn't know
we were talking about a $2,000,000
project!"), and for sensitivity
analysis ("If we didn't
have to do that step we could
save $2.00 per page").
If budgeting has not yet been
done for your project, these
ranges will also prove to be
useful guides for setting budgets.
Project Feasibility Analysis
- While the information collected
so far is fairly sketchy, this
is an early opportunity to assess,
based on those broad parameters,
whether the project is still
feasible. You need to answer
some of the following questions.
If this is a $1-$2 million project,
does it still make good business
sense to proceed? If it's way
over budget can we redefine
the project's scope? Is there
another way to do this? Can
we do without certain elements
so as to bring down the cost?
If so, does the project make
sense at a reduced level? And
most importantly, does it make
sense to go on with Phase 2?
Phase II: Proof of
Concept
So Phase I has told you that
this may actually be worth pursuing.
You've got the rough cut estimate,
and even your CFO admits that
it sounds like a pretty sound
business model. Most importantly,
everyone is agreed on what the
broad strokes of the project
are. The next step is the Proof
of Concept.
The purpose of the Proof of
Concept phase is to test your
planned approach on a limited
scale. This will be your opportunity
to test out the areas that were
identified as being particularly
risky, and to test on a small
scale, the hypothesis developed
in Phase I. The results of this
phase will provide a more detailed
plan including the following:
fleshed out functional requirements,
preliminary software development,
a converted sample set, and
more finely-tuned cost projections.
Returning to the building analogy
again, this is the step where
the preliminary design is laid
out, and a model built so that
everyone can get an idea of
what the building will look
like. Additionally, a test boring
is done to ensure that the soil
will be able to support the
building.

Figure V: Establishing
a project timeline will help
ensure that you accomplish what
you need on schedule.
Figure V shows a typical project
timeline for this phase. For
a significantly larger project,
this phase might take 6-10 weeks.
The key stages and deliverables
are described below:
Project Initiation - You always
need a project kickoff meeting.
One of the main purposes of
this meeting is to make sure
that everyone on the expanded
team has the same understanding
of the project concept. The
team will probably include a
project manager, a domain expert,
a data analyst, a programmer
and a senior editor.
The project initiation is also
where the detailed task plan
is created and reviewed. The
task plan will help ensure that
everyone understands their roles
and responsibilities as a member
of the team.
Defining the Sample Set - Important
questions need to be answered
in order to define the Proof
of Concept. Be patient here;
you probably won't be ready
after the kickoff meeting.
Ask yourself the following
questions: What's intended to
be proven? How big should the
sample be? Which project elements
are known technology and therefore
don't need to be part of this
exercise? Which elements are
particularly risky or unknown
and need special focus?
Beware the common mistakes.
While there may be a tendency
to try to do everything at once,
or to do the easy parts first,
remember that the real purpose
is to focus on a small data
set, and on the risky and unknown
areas. Fail to identify where
your project's critical challenges
are now, and the hypothesis
of your whole project might
be off.
With this in mind, it may be
better to focus on 10 pages
of difficult bibliographic references
or complex tables, rather than
100's of pages of straightforward
or repetitive text. And if there
are 20 major variations of material,
don't try to analyze them all;
instead, pick the 2 or 3 that
are most representative of the
issues.
Inventory Materials - This
task invariably evokes groans,
but someone has got to do it.
You need to have a good idea
of how big the pile is, and
a clear understanding of the
variation contained within the
pile. The exact methodology
you use to collect this information
will vary depending on the project.
While it would be ideal to
get a detailed list of everything
that needs to be done, that's
not usually the case. What you
are trying to do at this stage
is get an understanding of how
much of each type of material
there might be. The reason for
this, is, that each type of
material will probably require
its own programming and conversion
process. And while building
conversion software to help
automate much of the conversion
makes sense, you don't want
to invest lots of programming
time automating for a particularly
difficult type of materials
that you only have 10 pages
of.
Developing Decision-Making
Guidelines - This is usually
the heart of the Proof of Concept
phase. The extent to which you
can you develop rules and guidelines
for transforming your source
materials into "properly
tagged data" will be the
most important determinant of
final cost of this project.
In other words, it needs to
be done with care.
The domain expert and the data
analyst will work closely together
here, to try to generalize the
rules and condense them into
as small a set as possible.
What you are trying to do at
this stage, is build a functional
set of rules. Don't make the
mistake of turning this into
a programming exercise; that
will just bog you down at this
stage. Equally importantly,
don't give up too early. While
the usual tendency is to think
there are no rules - "it's
just common sense and you either
know it or you don't" -
that's probably not the case.
The Conversion Specification
Document – It is useful
at this stage to formalize the
guidelines derived to this point
into a single document. The
Conversion Specification Document
will become the primary repository
of project information; it will
be continually consulted and
reviewed by the end user, the
domain expert, the analyst,
and the programmer. This document
expands the previously established
guidelines into a set of rules
that can be programmed for.
It also identifies areas that
are ambiguous or difficult to
define; these areas will then
need to be reviewed by the domain
expert. The Conversion Specification
document typically circulates
among the various parties involved,
and becomes the central discussion
document until issues are resolved.
It is also the document that
defines the programming efforts.
POC Software and Sample Set
Conversion - OK, so you've written
the conversion specification
document; hopefully it addresses
all the major issues of the
conversion. Now it's time to
see if you can really use those
guidelines and specifications
to convert anything.
As in the Project Initiation
task, you need to be cautious
here. While most successful
conversion projects combine
automation with manual effort,
programming should be done sparingly
at this point. There simply
isn't time during the Proof
of Concept to program for everything
you'd like to. In addition,
there will be a tendency to
program for the easy things
first. The best approach is
to select a few complex areas
which people doubt can be converted
in an automated manner. For
these areas, invest time testing
out programmatic approaches
to their unique problems; this
learning process will be invaluable
and will help tremendously when
you move on to Phase III. For
the rest of the set, however,
it probably makes sense for
people to follow the conversion
specification manually, rather
than investing heavily in writing
and testing programs.
The end result of this phase
should give you a good feel
for what can or should be automated,
and what will need to be done
manually. It will also yield
some valuable timings on what
the labor elements of this project
are likely to be.
Future Phase Planning and Pricing
- If you've done your job right
thus far, you'll now be able
to more closely estimate the
project's costs, and lay out
a realistic timeframe in which
it can be done. As more materials
are tested and converted in
the next phase, these estimates
will be further refined. Keep
in mind that programming costs
will rise in the next phase
as you start to expand your
efforts toward automation. However,
if the materials you initially
selected for the sample are
truly representative, and you've
taken into account people's
learning curves as they started
working with your sample data,
what you have now is pretty
close.
This phase is also the check
point at which to determine
whether the project still makes
sense. This checkpoint let's
you make a go/no go decision
based on the outcome of the
POC. Ask yourself the following
questions: Are the costs still
in line with our budget? Were
we able to prove the concept
that we came up with in the
first phase? Are we getting
the quality we expected? Was
our original time estimate (or
the promises I made to our backers)
doable?
In deciding whether this project
is feasible or not, figure out
what the Proof of Concept has
yielded.
What did this exercise
buy you?
- Time to Market - you'll
have a realistic estimate
of how long this project will
take as well as your options
for speeding it up.
- Quality - you'll be able
to demonstrate expected results
while there's still time to
make modifications.
- Cost - you'll have an understanding
of the project costs and what
the tradeoffs are.
- Scalability - you'll have
the tools in place to create
a process that scales as big
as you want.
Phase III: Analysis,
Design and Engineering.
By now, we should have a clear
understanding of what we want
to achieve from this conversion.
The Proof of Concept will have
yielded valuable clues as to
where and how to refine our
conversion guidelines. Mistakes
will be identified and concepts
will be proven. You'll understand
the true complexity of the project
and the steps you need to overcome
them. The Proof of Concept,
more than anything else, should
land the entire conversion team
on the same page. And, it will
become the foundation upon which
our entire process will be built.
The following are the primary
results of the Proof of Concept:
- Improved conversion guidelines.
- Refined conversion specification.
- Refined conversion software.
- More finely tuned cost
projections.
Phase III is primarily a matter
of refining the various deliverables
done on the conversion sample,
for the fuller set of materials.
Here we build upon what was
done during the Proof of Concept
and expand the analysis, design,
and engineering components to
handle the full set of materials.
Additionally, we go back and
program for all the things we
did not have time for (or did
not need to prove) during the
Proof of Concept. Planning for
gradual ramp-up and full volume
production processing will also
be done at this point.
A typical project timeline
for this phase is shown in Figure
VI:

Figure VI: It's
important to lay out a timeline
for the analysis, design and
engineering phase of the project;
for the typical larger project,
this phase will likely take
6-10 weeks.
Phase III is made up of the
following key tasks:
Production Process Planning
- Integrating the various elements
of the conversion process is
too often an afterthought. That
can be an expensive mistake.
The most mundane things, such
as agreeing upon filename conventions
and basic data trafficking procedures,
are too often not properly planned
in advance. Typically, a large
conversion effort consists of
between 30-50 independent steps
requiring multiple skills, and
often multiple vendors. There
are also time dependencies that
need to be integrated in order
to ensure a smooth production
flow.
In planning the production
process, there are also a number
of important logistical considerations:
- How many pages a week can
each step in the process handle?
- What's the weak link in
the chain?
- Can you keep up with reviewing
and inspecting converted materials
as they're delivered?
- Technical questions will
arise; will there be a dedicated
point of contact on both sides?
- How will materials be transported
back and forth?
Another important question
to ask is - how much time will
it take? With an ongoing production
facility that handles thousands
of pages a day, you still need
to allow 4-6 weeks for the integration
to take place. You'll therefore
need to start early in Phase
III. And if you're going to
be building a process from scratch,
you should allow at least 6
months.
Production Quality Planning
- While many of the standard
quality control processes apply
to conversion projects, there
is a significant difference.
Chips Ahoy may make the best
cookies, but they'll tell you
that they only use the choicest
chocolate chips and the finest
flour. Unlike a cookie factory,
you can't really control the
quality of ingredients coming
into your machine. No matter
how well you select your samples
and trial materials, you are
unlikely to find every significant
variation; therefore, it's probably
expecting too much to hope to
account for all the possibilities
in advance. The documents will
typically have been written
by many individuals, at several
different locations, in many
different editing packages,
over a long period of time,
and on a variety of systems.
So, like the people who made
them, the documents will have
personalities. And, like people,
their behavior may not always
be exemplary.
Ensuring quality control in
this environment means building
feedback loops at each step
of the process. These checkpoints
are designed to report when
things are not meeting expectations,
and provide guidelines, rather
than rules, to the people inspecting
the results. Information needs
to flow back and forth easily
in order to allow refinement
of this process. You'll also
need to collect statistics in
order to tell how much sampling
will be needed as the process
improves.
Production Ramp Up - Just as
we advised in Phase 1, caution
is critical at this stage. We
usually find that the best approach
is to plan for a few weeks of
low volume production through
the initial production process.
This will help to identify any
weaknesses in our process.
The entire production team
needs to be aware that the purpose
of the first weeks of production
is to provide feedback in order
to help engineer a smoother
process. This is not yet the
time to put dozens of people
to work, but rather a time to
assign a select few individuals
who are capable of figuring
out where improvements can be
made.
Phase IV: Production
You're almost there, but you
do need to continually monitor
results and make sure that quality
and productivity stay where
you expect them to be.
Full-Volume Production - Even
after the production ramp-up
stage, it is not necessarily
prudent to plan for full production
volumes immediately. We've found
it best to gradually increase
volumes, thereby allowing ample
time for people to be trained
and to come fully up to speed.
Production Process Control
- You need a method to track
production through the various
phases. For smaller projects,
Excel spreadsheets may be sufficient.
But for larger projects, you
probably need something more
sophisticated. At a minimum
you need to know where in the
process each batch of materials
are, how long a batch is taking
to go through, and how much
material is awaiting each phase.
Materials Trafficking - It
is very rare that you have everything
that needs conversion ready
in a pile at the beginning of
a project. More likely, materials
will be readied gradually as
the project progresses. In order
to avoid slowdowns later in
the process, someone needs to
be in charge of trafficking
the materials, making sure that
materials are ready and complete,
and forwarding them appropriately.
Process Improvement Feedback
- It certainly won't be perfect
when you first go into production.
You will need a method to formally
collect information on exceptions
and on what's not working properly.
This method will need to be
quite flexible as different
parts of the process will report
exceptions at different times.
Packaging and Delivery - This
doesn't seem like a big deal,
but you need to get the finished
materials to the right person.
The right materials! Otherwise
frustration can set in. This
is also a convenient point at
which to do some final quality
checking, and to document any
specific procedures the person
you're delivering to needs to
follow.
Exception Handling Mechanisms
- You'll also have to allow
for exception reporting. Exception
reports are delivered to the
end user along with the completed
data. Because of the wide variance
and inconsistencies of the materials
being converted, there will
inevitably be materials that
need special handling by the
recipient. And it would seem
wise to have a mechanism more
sophisticated than yellow stickies
to deal with this.
Where do I go from
here?
There is a lot to absorb in this
article. If you only take six
things from what you've read,
take the following: |