Design of Experiments
How to Bridge Statistics and Chemical Engineering
Competition and increased demand for product innovation are placing unprecedented pressures on chemical manufacturing. As well as a seemingly unquenchable need for new products and product variants, the industry as a whole is also burdened with the high cost of research and development, leading to a near constant search for lean and efficient solutions. Though statistical analysis has not always gone hand-in-hand with chemical development, it can be a vital tool for accelerating the discovery and creation of viable new products, and for engineering the processes through which they can be delivered at scale. This marriage is the way to get things “right first time”, reducing development risks and relieving the pressures mentioned above.
Experimentation has always been a key aspect of product development, allowing the kinks in chemical and formulation processes to be ironed out. Although such experimentation is good, it is becoming increasingly apparent that the traditional one-factor-at-a-time approach is partly responsible for inefficient product and process development: As well as consuming a lot of resources, it is likely to miss some of the practically important effects that lead directly to later manufacturing inefficiencies and failed product launches.
Fortunately, we now have the tools to supercharge our approach to experimentation. This approach is well-established in other industries, but so far has not been widely adopted in chemical manufacturing, particularly in the case of specialty chemicals. By deploying this new approach throughout the development phase, it is now possible to design quality into the process for making new products at the outset, rather than suffering the impact of failed product launches, protracted time to market, and low manufacturing yields. Even though many revolutionary products have been, and will continue to be, discovered serendipitously, figuring out how to produce them consistently and with commercially viable yields is a natural problem for statistics to help solve.
The Need to Innovate
Design of experiments, DOE for short, is a systematic method to determine the relationship between factors affecting a process and the output of that process. In the industrial setting, there are usually many factors that might have an effect, and it is crucial that they be manipulated together, not one at a time. DOE has been used to find cause-and-effect relationships since Ronald Fisher first introduced it in 1935 and has continued to evolve over more than eighty years. This has led to a series of widely applied design families adapted to meet specific situations and experimental objectives, and more modern approaches mean that you can make a design that fits more or less any situation. Software tools like JMP do all the computing work, and make it relatively easy for chemists, researchers and engineers to easily adopt this new approach to experimentation.
If this adoption becomes widespread, it can cut the time required for research and development, helping R&D to support twice as many products, and bring them to market twice as fast. And because knowledge accumulates, researchers can innovate more predictably over time.
This is no small feat. At the moment, the R&D process in most labs is unpredictable. Missed project milestones and incomplete understanding mean that the processes for producing new products are likely to be transferred into manufacturing with issues still unresolved, and with the expectation that the additional work needed will be done in manufacturing. But, even if it is successful, this strategy is time consuming and adds cost and waste. In the worst case, the product might be returned to R&D. Such rework takes away time that could be used to develop more new products. In industries like chemical manufacturing that need to innovate to remain competitive, this status quo is costly. The way to break this vicious circle is to build quality into the products in R&D at the earliest possible opportunity using DOE.
Overcoming the Curse of Dimensionality
Effective information gathering is already in place in many R&D labs. However, whenever there is more than one input or factor affecting an outcome, testing one factor at a time is inefficient and risks missing the joint effect of two (or more) factors, which are commonplace. To properly uncover how all the factors affect the response, DOE is required. Because of its versatility and ease of use,
chemists can easily leverage the software to reveal and model relationships between many factors, and one or more outputs, or responses. Often the best approach is to change the factors according to a plan that maximizes the chances of being able to determine a robust and cost-effective process that delivers the required product characteristics. Actively manipulating factors in this way is the best way to gain useful, new understanding.
In statistics and machine learning, dimension reduction is the process of reducing the number of variables under consideration by obtaining a set of principal variables that still contain most of the information carried by the originals. While this technique has not traditionally been a part of the product development and testing process, it can be especially useful in conjunction with DOE. For example, in drug development, many of the new products in R&D use starting materials that come from human beings or other animal subjects, and there is often only a short list of qualified donors. Thanks to advances in genomics and the characterization of the microbiome, it is easy to generate a very long list of measured properties of any given sample from each of those subjects. This results in a very large set of measurements on a very small number of subjects. Testing this in the lab can be costly and time-consuming, but by using dimension reduction the analysis process can be streamlined.
Increasing Efficiency and Effectiveness
In a real example, 10 years ago a team manufacturing vaccines was faced with a challenge of evaluating around 1,500 parameters affecting nine key quality attributes, measured on a smaller number of manufactured lots. Their task was to identify what was causing some unanticipated results in manufacturing. The team included a chemometrician, a mathematician, and a number of statisticians who, after several weeks of analysis, solved the problem and were able to recommend the changes needed to safeguard the supply of the vaccine in question. The computational challenges were tackled by a large team of people working tirelessly over several weeks and at great cost: Today, using tools like JMP, the same analysis problem can be solved by a single researcher in 30 minutes.
This is not to say that the work of this team of highly skilled professionals wasn’t valuable, but rather that through the use of modern approaches and supporting software, it is now possible to free their time and expertise to work on increasing yield and developing more new products. The overall goal is to increase the efficiency and effectiveness of chemical R&D, bringing more new products to market more quickly, and at lower cost. For this greater goal to be achieved, R&D will have to achieve their desired outcomes with fewer resources and in more sustainable ways. This sustainability and reproducibility is a challenge to the chemistry-using industries, and the widespread and consistent adoption of DOE can be a huge enabler.
Powerful Data Management
DOE also allows chemists to confront issues that have been longstanding, and to finally determine the relationships that will get them closer to understanding the true cause and effect at work. The resulting model is an important tool for consensus building, because what shines through is the result: A ranking of all factors ordered by priority, challenging the preconceptions and biases of all those involved. This method of screening many predictors allows chemists to choose how to move forward in a controlled way, and that’s where the real benefit of using such an analysis lies. With the addition of contextual knowledge, such analyses are usually very helpful in re-engineering the process so that it performs better.
By its very nature, chemical R&D produces a lot of data. Yet this often is still recorded on paper, even today. It can be time-consuming to digitize swathes of past records into a data management system, but this investment can be very worthwhile. Statistical analysis software is capable of sorting, cleaning, and organizing this data quickly and efficiently, and preparing it for analysis. This data management can be daunting for the uninitiated, but if the data is collected appropriately and prepared it will save time later down the line. There are also growing regulatory guidelines about appropriate data processing that need to be taken into account.
A Question of Mindset
In the USA the Food and Drug Administration, FDA, has produced a number of initiatives that aim to support the modernization of the pharmaceutical and bio-pharmaceutical industry sectors. At the heart of these initiatives is a shift away from a conformance-based approach to quality to one of using process-based understanding to manage risk and deliver better predictability.
Appropriate training and know-how remain a big stumbling block for industry as a whole: Many chemists aren’t necessarily equipped to work with statistics, have not been exposed to DOE, and may not have had the opportunity to work with software like JMP that can give them the support they need.
More promisingly, it is not uncommon to hear from scientists who were once reluctant to use DOE but who become firm advocates once they see how they can quickly find solutions for problems they had been working on.
DOE Is Becoming Mainstream
Things are moving in a positive direction. Over the past 20 years, and taking industry as a whole, the use of DOE in product development has moved steadily from the fringes to the mainstream. Today it is evident that many R&D professionals are routinely turning to DOE to help answer their questions. Such is the power of DOE to gain the required process understanding that in regulated industries, the regulators may look more favorably on submissions and assessments that explicitly incorporate DOE.
Given this demonstrable success, the wide adoption of DOE in R&D by the chemical-based industries should be seen as both necessary and strategic. Companies in these sectors should be actively investing in developing their DOE capability, and work towards instituting more comprehensive data collection schemes that allow them to better understand their processes through the whole product lifecycle. It is an effort in terms of training and investment, but it will allow them to survive and prosper.
Contact
JMP Statistical Discovery / SAS Institute GmbH
In der Neckarhelle 162
69118 Heidelberg
Germany
+49 (0)6221 415 3367