Knowledge Base & Community Wiki
Types of Performance Models & Data Requirements For Performance Models
To be able to model performance of a system you need the relevant data. Depending on the nature of Performance Modelling you intend to perform the nature of data you require would vary. Let’s briefly look at the different types of questions one might want to answer and the associated data requirements for those Performance Models.
How many orders can we process per hour before we run out of capacity on our boxes – In this case you are looking to answer an application related question which is how many orders can i currently process before i run out of capacity on my current boxes. As an architect or a performance engineer this requires you to have an understanding of the relevant business processes (at a sufficiently high level) supported by the application on the given system, the data generated by those processes on the given system including an understanding of how to obtain that data i.e. orders processed per unit of time.
Let’s assume for purposes of this example that the Orders Placed business process along with the supporting processes i.e. shopping cart, add to trolley, view items, etc. are responsible for consuming compute resources on the system you should find some sort of a relationship between the utilization of the system and the number of transactions processed on the system including a strong relationship between the utilization of the system and the number of orders processed per unit time. At minimum your data requirements would be –
- Orders / Unit Time
- Utilization / Unit Time
- Transactions / Unit Time
Advantages – The advantages of using Time Series Regression to build Performance Models and understand system performance for future growth in workload is a more sensible and reliable way to model system performance. These statistical models help identify the relationships between two system variables i.e. Orders / Unit Time and Utilization / Unit Time and use those relationships to help understand system behavior for a growth in workload.
Disadvantages – The main disadvantage of such a statistical modelling technique is the amount of data required to be able to create the models and model system behavior. Collecting data can be painful and a very time consuming task. Also, what you would like to keep in mind is that any changes to system configuration (hardware, software, network changes, etc.) will render your model obsolete.
How long before we run out of space on my storage subsystem – In this case you are looking at a obtaining a view of the growth in data stored on the disk sub system and based on that information you could then plan to provision additional storage. There are a couple of different ways you could look at this question. One of the ways could be to look at this question from an application standpoint, assess the data generated by the relevant business processes and then build a model of the data generated v/s the current storage consumption.
In reality such a model is very difficult to build and most professionals performance engineers or capacity planners will default to performing a Time Series Forecast to view the growth in data storage requirements. Time Series Forecasts have their own limitations (which we will not delve into at this stage) however, the data required at minimum for such a performance model would be –
- Data Consumption / Unit Time
Advantages – The advantages of using Time Series Forecasting provide Forecasts is that you can obtain a view of what the system performance would look like based on past historical performance. These models are relatively easy to get up and running i.e. assuming you have the relevant historical data at hand. The more granular the data and bigger the historical data set, the higher the possibility of a stronger the forecast.
Disadvantages – The main disadvantage of such a statistical modelling technique is the amount of historical data required to be able to create the models and model system behavior. Collecting data can be painful and a very time consuming task. You should also keep in mind is that any changes to system configuration (hardware, software, network changes, etc.) will render your model obsolete. One of the biggest concerns that forecasters, performance engineers, architects, developers, etc. have when using a Time Series modelling technique is that Time Series models are forecasting the future based on a knowledge of the past and that can sometime be a very dangerous thing.
Conclusion – There are various permutations and combinations possible and what we’ve attempted to do here is pick up the two of the most common ones to illustrate the potential data requirements you might have. Please note that your data requirements will be strongly driven by the nature of the performance models you intend to build including the type of insight you are looking to gain with regards to the behavior (Performance/Scalability, etc.) of the system at hand.
Modelling Solution: VisualizeIT offers access to a bunch of Analytical Models, Statistical Models and Simulation Models for purposes of Visualization, Modelling & Forecasting. Access to all the Analytical (Mathematical) models is free. We recommend you try out the Analytical models at VisualizeIT which are free to use and drop us a note with your suggestions, input and comments. You can access the VisualizeIT website here and the VisualizeIT modelling solution here –VisualizeIT.