Tag Archives: #ATDD

The Gilded Rose Kata from a Gherkin Perspective

In reviewing a book, I was reminded about the Gilded Rose kata.   It revolves around some legacy code that has no tests.   You need to make some changes to the code to support a new requirement.   The kata was created by Bobby Johnson (http://iamnotmyself.com/) and updated by Emily Bache (http://coding-is-like-cooking.info/)

A PDF that outlines a Gherkin approach to this kata is at: https://www.acceptancetestdrivendevelopment.com/Gilded-Rose-Kata-from-a-Gherkin-Perspective.pdf

. The formatting of WordPress doesn’t allow sufficient flexibility to present


Layers of Scenarios and a Reply on Randomness


Gojko Adzic asked for opinions in his blog “How to Specify Something Should Be Random.  He presented alternative scenarios and asked for your choice.

The situation he described is having a robot’s chat response appear like it comes from a human being. There could be multiple levels of tests associated with this requirement. Let’s take a look at these levels and along the way give my choice(s) for an answer.

Let’s start with a high level scenario for the functionality itself:

Scenario: Robot types reply that seems like it was typed by a human
Given a robot receives a message
When the robot types a reply
Then the reply has characteristics that mimic a human being

There may be some exploration associated with this scenario to determine what those characteristics are. They could include pauses between characters, backspacing, quick word completion (simulating auto completion) and so forth. This scenario might be tested manually. The bot would type a response and then a human would state whether it look “robotish” or “humanish”. Suppose it was determined that pauses were a characteristic to be implemented:

Scenario: Robot types reply that has pauses
Given a robot receives a message
When the robot types a reply
Then the reply has pauses that mimic a human being

For some teams, this might be enough. The timing of the pauses would be handled in the implementation. For others more detail could be included that describes what particularly mimics a human being. This would specify the results of the exploration.

Scenario: Robot types reply that has random pauses
Given a robot receives a message
When the robot types a reply
Then the reply has pauses between characters that randomly vary between 0.2 and 0.5 seconds

Testing this scenario would probably require automation to measure the delays between characters appearing on the screen. The test would just check that the pauses fell within that range and that they were not all the same.

One tricky issue is the meaning of random. Checking randomness involves technical tests. Wikipedia has some details on randomness tests.  These tests check that a sequence is random to some level of confidence.

At the high level, the randomness could be checked by using a large number of replies and testing the set of pauses that were inserted in the replies.

Scenario: Robot types random pauses over a large set of replies
Given a large set of replies
When the robot types them
Then pauses between characters are random with a confidence level of 99%

Now this scenario and the previous one cover the external behavior for the requirement. In answer to Gojko’s question, I would have both of these scenarios to cover the requirement. They represent behavior that is externally visible.


As an internal test, one could run the randomness tests against the method that produced a random sequence, if the creator had not already done so.

Scenario: Given a pseudo random number generator that produces values from 0.0 to 1.0
When a long sequence is produced
Then it is random with a confidence level of 99%

As a general rule, values that are random should have a test double (mock) that produces a known output for testing. These values for the test double can be setup in a Given. For example, here’s a test that checks the conversion to a range is correct:

Scenario: Convert random sequence to pause sequence
Given random sequence <value>
When converted to pause from 0.2 to 0.5
Then pause length is <length>
| value | length | Notes |
| 1.0 | .5 s     | maximum |
| 0.9 | .47 s    |
| 0.4 | .32 s    |
| 0.6 | .38 s    |
| 0.0 | .2 s     | minimum |

Another internal test could check the plumbing between the pause sequence method and the reply functionality.

Scenario: Check that random sequence produces appropriate pauses
Given random sequence is:
| value |
| 1.0   |
| 0.9   |
| 0.4   |
| 0.6   |
| 0.0   |
When bot replies “hello”
Then the pauses between character are:
| char | pause |
| H    | .5 s  |
| e    | .47 s |
| l    | .32 s |
| l    | .38 s |
| o    | .2 s  |


I suggest you have at least the two external scenarios that show a single reply and also check randomness on a series of replies. Depending on the complexity of the implementation, you might have the other scenarios as internal tests for methods or combinations of methods.

Organizing Your Feature Files

“How to organize feature files?” was a question asked by Gojko Adzic in a recent blog. He presented several options and then asked for responses. The options included grouping them by user story, by capability and level of detail. This question has often come up over the past years in the workshops that I teach. 

For a small set of files, you can keep them in a single directory and use descriptive names for the files. For a larger set, a directory hierarchy is a common organization.  A network of files (using keywords to represent grouping) could also be created. Let’s look in detail at the hierarchy and network.   

Work Hierarchy

The overall hierarchy structure could represent the work items or the functionality.   With work items, the files are placed in a structure that parallels the sequence of when they are implemented.   The iterations or development cycles are used for the folder names.   This makes it easy to find the files associated with a story / work item.    

Iteration 7
   Story 171
   Story 895

Functional Hierarchy

With functionality, the hierarchy represents the user experience or the operational workflow – the behavior at a higher levels.  The folder names represent steps or sub-steps in the flow.  There could be separate folders for operations which are in common to multiple steps.   

Place an order
   Compute total order amount
      Compute tax
          No tax state 
          Tax exempt organizations

At the lowest steps in the structure, there could be either a single feature file or multiple files in a folder that represents a behavior.   This would be a small behavior that might have been created by a single story or by multiple stories.    

An alternative is to use a network form, each file having metadata which could be used to group the files.    For instance, tags could represent the groups.   This would be useful if there was not an operational hierarchy.    For example, feature files might contain:

@Order @Tax @Exempt
Feature: Determine organizations that are exempt from taxes

@Order @Tax @NoTaxState
Feature:  Determine states for which no tax should be applied

The files would be displayed in groups based on the tags – either single or multiple.    You could create multiple views of the same sets of files.  Those views could also represent a functional hierarchy. 


It doesn’t take too much effort to have files in a functional hierarchy. The advantage is that scenarios are now in the context of the flow.  Scenarios related to the same flow step are together.   A story that changes the behavior of an existing step may not create a new feature file, but just alter an existing one.    The living documentation represents how a system is used, not how it was created.  So it’s in the domain of the customer, not the implementer.

If a connection back to the stories is desired, you could use tags that identify the stories (e.g. @Story1234).    Each feature file would have one or more tags to stories which were the reason for its creation or change.