The platform starts by analyzing the quality and representativeness of the data via a combination of ML algorithms and statistical analysis, leading to a comprehensive overview of data quality. When it comes to data representativeness, we use unsupervised learning algorithms to detect automatically the structure of the data, allowing the platform to build a dictionary of representative data.
In a second step, we analyze how well the algorithm generalizes by automatically computing the performance of the various forecasts and classifications over the different data clusters that were identified in the first step.
In a third step, we then build a set of challenger models, using various approaches such as random-forests and convolutional neural networks to compare performance of various approaches in a systematic fashion.
The three steps described above can also be used for PPNR model monitoring and back-testing, where the Yields.io platform is capable of analyzing all available data on a continuous basis, thereby minimizing the integration effort.
For a more complete validation, the model also has to be analyzed on alternative datasets. To support this, Yields.io contains a diverse set of scenario generators, capable of handling both discrete and continuous data. Our unique approach allows us to sample the space of possible datasets as completely as possible, thereby mapping out the performance of the model under various conditions.
All the generated data is available via dynamic dashboards and can be used as well to generate static reports that can be used as a starting point for model validation documents.