Tag Archives: #Scenarios

Layers of Scenarios and a Reply on Randomness


Gojko Adzic asked for opinions in his blog “How to Specify Something Should Be Random.  He presented alternative scenarios and asked for your choice.

The situation he described is having a robot’s chat response appear like it comes from a human being. There could be multiple levels of tests associated with this requirement. Let’s take a look at these levels and along the way give my choice(s) for an answer.

Let’s start with a high level scenario for the functionality itself:

Scenario: Robot types reply that seems like it was typed by a human
Given a robot receives a message
When the robot types a reply
Then the reply has characteristics that mimic a human being

There may be some exploration associated with this scenario to determine what those characteristics are. They could include pauses between characters, backspacing, quick word completion (simulating auto completion) and so forth. This scenario might be tested manually. The bot would type a response and then a human would state whether it look “robotish” or “humanish”. Suppose it was determined that pauses were a characteristic to be implemented:

Scenario: Robot types reply that has pauses
Given a robot receives a message
When the robot types a reply
Then the reply has pauses that mimic a human being

For some teams, this might be enough. The timing of the pauses would be handled in the implementation. For others more detail could be included that describes what particularly mimics a human being. This would specify the results of the exploration.

Scenario: Robot types reply that has random pauses
Given a robot receives a message
When the robot types a reply
Then the reply has pauses between characters that randomly vary between 0.2 and 0.5 seconds

Testing this scenario would probably require automation to measure the delays between characters appearing on the screen. The test would just check that the pauses fell within that range and that they were not all the same.

One tricky issue is the meaning of random. Checking randomness involves technical tests. Wikipedia has some details on randomness tests.  These tests check that a sequence is random to some level of confidence.

At the high level, the randomness could be checked by using a large number of replies and testing the set of pauses that were inserted in the replies.

Scenario: Robot types random pauses over a large set of replies
Given a large set of replies
When the robot types them
Then pauses between characters are random with a confidence level of 99%

Now this scenario and the previous one cover the external behavior for the requirement. In answer to Gojko’s question, I would have both of these scenarios to cover the requirement. They represent behavior that is externally visible.


As an internal test, one could run the randomness tests against the method that produced a random sequence, if the creator had not already done so.

Scenario: Given a pseudo random number generator that produces values from 0.0 to 1.0
When a long sequence is produced
Then it is random with a confidence level of 99%

As a general rule, values that are random should have a test double (mock) that produces a known output for testing. These values for the test double can be setup in a Given. For example, here’s a test that checks the conversion to a range is correct:

Scenario: Convert random sequence to pause sequence
Given random sequence <value>
When converted to pause from 0.2 to 0.5
Then pause length is <length>
| value | length | Notes |
| 1.0 | .5 s     | maximum |
| 0.9 | .47 s    |
| 0.4 | .32 s    |
| 0.6 | .38 s    |
| 0.0 | .2 s     | minimum |

Another internal test could check the plumbing between the pause sequence method and the reply functionality.

Scenario: Check that random sequence produces appropriate pauses
Given random sequence is:
| value |
| 1.0   |
| 0.9   |
| 0.4   |
| 0.6   |
| 0.0   |
When bot replies “hello”
Then the pauses between character are:
| char | pause |
| H    | .5 s  |
| e    | .47 s |
| l    | .32 s |
| l    | .38 s |
| o    | .2 s  |


I suggest you have at least the two external scenarios that show a single reply and also check randomness on a series of replies. Depending on the complexity of the implementation, you might have the other scenarios as internal tests for methods or combinations of methods.

Decompose Scenarios for Simpler Scenarios

A blog question on relative dates by Gojko Adzic triggered a blog post by Seb Rose.   The two blog posts showed there are many shades of gherkin.  I’d like to use the example in those two posts to demonstrate a couple of facets of scenario decomposition. This uses a slightly different shade than Seb’s.


Let’s start with a scenario that might have been created during the discovery phase for this story. The example revolves around credit card transactions that reserve an amount of money.   If those transactions are not finalized by an actual charge within a certain amount of time, they are cancelled.   The scenarios might look like:

Scenario: Transaction aged by one month or less remains pending
Given a transaction received one month ago
When batch is processed
Then transaction remains pending
Scenario: Transaction aged over one month is cancelled
Given a transaction received over one month ago
When batch is processed
Then transaction is cancelled

These scenarios assume that the customer knows about the daily batch job or else the When might be reworded to a more customer-understood term.     


When the scenario from discovery is explored in the formulation phase, then additional scenarios may be created that capture more detail of behavior.  I differentiate between flow scenarios (like the one above) that have state (e.g. a transaction that must be created) and calculation scenarios where an algorithm or business rule yields a result.   Calculation scenarios tend to have more (or much more) detail.  

There could be some separation of behavior in the cancellation rule.  It could be split into a calculation of the differences in two dates and a determination of the actions based on the differences.   This separation allows for more re-use in different scenarios, just like separation in code.   If the date difference calculation was previously used in an application, then there should already have been a scenario/test for it.   If not, you collaborate on creating a scenario for it:

Scenario:  Calculation - Difference in Two Dates
* Difference in Months and Days between Date and Another Date
| Date        | Another Date | Difference    | Notes       |
| 20-Feb-2020 | 20-Mar-2020  | 1 month 0 day |  common     |
| 19-Feb-2020 | 20-Mar-2020  | 1 month 1 day |  “          |
| 29-Feb-2020 | 31-Mar-2020  | 1 month 0 day | previous month does not have the day|
| 29-Feb-2020 | 30-Mar-2020  | 1 month 0 day |  "          |
| 29-Feb-2020 | 29-Mar-2020  | 1 month 0 day |  “          |

I typically have a Notes column on every table.  It can explain why a particular set of values is being used.  That’s handy when you re-visit a scenario after enough time has passed that the reason has faded. 

I also suggest the description of the calculation be put in comments with the scenario.   Part of it might read:

#If the previous month does not contain the day, then use the highest day available

This calculation can get rather complicated and messy. Take a look at Seb’s blog for even more examples.   Notice that the calculation is asymmetric.    One month after 29-Feb-2020 is 29-Mar-2020.   Once the triad starts creating this scenario, the customer might alter the requirement to something simpler, like cancel “over 30 days”, unless there was a legal requirement for the complexity.   In that case, the calculation scenario should be reviewed by the subject matter expert to ensure that it meets that requirement.     

With the difference in dates separate, the cancellation rule can be stated as:

Rule:  Cancel transactions aged over one month
| Difference    | Category       | Action         |        
| 1 month 0 day | one_month      | Remain pending |
| 1 month 1 day | over_one_month | Cancel         |

Examples with data for a scenario could be:

| Trans Date  | Batch Date  | Category       | Action         |
| 20-Feb-2020 | 20-Mar-2020 | one_month      | Remain pending |
| 19-Feb-2020 | 20-Mar-2020 | over_one_month | Cancel         |

Note that this business rule uses the result of the difference calculation.   Each rule/calculation can be simpler, as each is deals with fewer of the details.   The examples of this rule utilize one of the examples in the difference calculation rule.   Note that the names of the categories should reflect the customer’s terminology.  

Let’s incorporate the example data of the cancellation rule into the flow scenario from discovery.

Scenario: Transaction aged by one month or less remains pending
Given a transaction received 20-Feb-2020  
When batch processed on 20-Mar-2020
Then transaction remains pending
Scenario: Transaction aged over one month is cancelled
Given a transaction received 19-Feb-2020  
When batch processed on 20-Mar-2020
Then transaction is cancelled


Note when actual data is incorporated, then the meaning behind that data selection is hidden. To keep both the meaning and the data together, I created a Gherkin preprocessor as an experiment.  It can be found at https://github.com/atdd-bdd/GherkinPreprocessor

With the preprocessor, you create names for values.   You use these names in the scenario.   When the feature file is processed, the names are replaced by the values.   For example:

#define BatchDate 20-Mar-2020
#define OneMonthAgo 20-Feb-2020
#define OverOneMonthAgo 19-Feb-2020
Scenario: Transaction aged by one month or less remains pending
Given a transaction received OneMonthAgo
When batch processed on BatchDate
Then transaction remains pending
Scenario: Transaction aged over one month is cancelled
Given a transaction received OverOneMonthAgo
When batch processed on BatchDate
Then transaction is cancelled

Notice how the scenarios appear close to the ones from discovery.  They are more abstract, but the test that is run contains specific data.  You could use whatever style you like for the #defines.  For example:  

#define batch_date 20-Mar-2020

would make the step into:

When batch processed on batch_date

The #defines might be changed if a different approach was needed for automation.  For example, if the date of the batch processing could not easily be set to a test date, then you might use:     

#define BatchDate Today()
#define OneMonthAgo TodayLess(1 month 0 day)
#define OverOneMonthAgo TodayLess(1 month 1 day)

Now the dates used in the automated test will be relative to today’s date.   But the scenario itself has not changed.   You could convert these symbols into the relevant values in the step definition, rather than use the preprocessor.  That would make the actual data used less transparent, which is a discussion for another article.  


Splitting a story into flow scenarios and calculation scenarios can simplify each of the scenarios.  The calculation scenarios may be re-usable in other scenarios.  The advantages of smaller scenarios can parallel the advantages of smaller methods in coding.   We’ll explore that in a subsequent post.