How to perform testing of AI/ML-based systems in DesignWise
Learn which phases of the AI development lifecycle benefit the most from DesignWise optimization.
Artificial Intelligence is transforming the technology landscape of the digital age. The world is moving towards the adoption of AI-powered smart systems which will increase exponentially over the next few years. While we see the advancements, the key challenge would be the testing of Artificial Intelligence/Machine Learning (AI/ML)-based systems.
There are 3 major challenges in testing AI systems:
DesignWise cannot do much about the first one, so we will talk primarily about the benefits related to availability and quality of data. After all, 80% of a scientist’s time is spent preparing the training dataset.
We will use the phase classification from Forbes:
DesignWise applicability to QA in different phases of AI development
AI algorithm itself |
Low |
Hyperparameter configuration |
Low |
Training, validation, test data |
Medium |
Integration of the AI system with other workflow elements |
High |
The rest of the article covers phases 2-4 in more detail. Regarding phase 1, significant customization of the algorithm code is not as prominent and, to borrow the quote from Ron Schmelzer, “There’s just one way to do the math!”, so the core value proposition of DesignWise to explore possible combinations is not as relevant (i.e., low applicability due to the “linear” nature of operations).
Phase 2
The specific ranges and value expansions on the screenshot are for example purposes but should sufficiently communicate the “identity” of the approach. Further, constraints and risk-based algorithm settings can be used to control the desired interactions:
Strength: Systematic approach to identifying relevant hyperparameter configuration profiles.
Weakness: May explore the profiles with too many changes at a time or require numerous constraints to limit the scope.
Phase 3
To build the corresponding model in DesignWise, you will need to forget (temporarily) some of the lessons about parameter & value definitions given different objectives. Instead of optimizing the scenario count, the goal of this data set is to become a representative sample of the real world and eliminate as much human bias as possible. This means not just data quality, but also completeness.
When it comes to the DesignWise algorithm strength selection, the highest available option is typically the most desired one (see the caveat in the “Weakness” below):
Weakness:
The current scope limitation is 4000 scenarios per DesignWise model which may not be sufficient for training or even validation purposes of some AI systems.
As a side note, while “all possible permutations” is a nice goal, it is often not the optimal one – even for representative purposes, having 289,700,167,680,000 scenarios (which is the possible total for the model above) will not be realistic to perform training on. So, the “right” answer still requires balance and prioritization.
Despite certain workarounds, programmatic handling of complex expected results would likely require complementary manual effort.
The approach depends on the overall ability to leverage synthetic data instead of production copies which may or may not be feasible in your environment.
Phase 4
Scenario volume would still be largely driven by the “standard” integration priorities (i.e., key parameters affecting multiple systems) but the number of values and/or the average mixed-strength dropdown selection would be higher than typical.
Strength:
DesignWise at its best with the thoroughness, speed, and efficiency benefits.
Ability to quickly reuse model elements from Phase 3 and models related to other systems (e.g., the old version of the non-AI advisor for systems B and C).
Higher control over the variety of data at the integration points and over the workflow as a whole.
Weakness: Similar to Phase 3 but usually more manageable given the difference in goals (volume in P3 vs integration in P4).
Conclusion
To summarize, the applicability level by phase is repeated below:
AI algorithm itself |
Low |
Hyperparameter configuration |
Low |
Training, validation, test data |
Medium |
Integration of the AI system with other workflow elements |
High |
For another perspective, using this stage classification from Infosys, DesignWise can deliver the most significant benefits in the highlighted testing areas:
Therefore, fully testing these kinds of applications would not be feasible. To overcome this challenge, we need to think more critically about a systematic, risk-based test design approach, such as the one that DesignWise facilitates.