“Safety Critical Development best practices
can help lower QA costs
by proactively addressing complexity
in the Design and Development Phases.”
Should I read this paper?
This paper is relevant if you manufacture systems with a software component, are not using formal modeling such as MATLAB, and have recently noticed the following trends:
- The percentage of software in your offering is increasing,
- New trends in your industry, such as IoT (Internet of Things), mobile or OTA (On the Air) access, are increasing complexity and the number of supported configurations,
- Your testing and validation cost are getting out of control.
This paper will address those three concerns with innovative approaches and tooling.
You know Emenda from our work in the security critical and safety critical software industry, so let me ask you a question:
Have you ever wondered why Boeing can build commercial airplanes with a failure rate less than one in a billion per hour for Catastrophic Failure Condition and we can’t build something (relatively simple compared to a 787) that operates “without a glitch”?
A safety critical test engineer has helped me understand how some of the concepts used can help control the increasing cost of testing due to complexity. Don’t worry, I won’t conclude you must bring the complete burden of a formal certification to your organization, nor that you need to give up agile, but we will explain how to best use some of the disruptive technologies out there, derived from what their authors have learned over the years.
By traits, I am an engineer and I graduated both from an Master of Engineer with a focus on Manufacturing Software, and from a “diplôme d’études approfondies” (DEA) in Automation Systems. I have spent a lot of my career writing code for drones, plant manufacturing, ship building, 3D Vision and various embedded systems. More recently, I have been involved at various level with Rogue Wave, which manufactures test and validation software and with Emenda, which focuses on a consultative approach to software quality and software certification. I have talked to, researched with, and helped hundreds of customers with their applications. Wearing so many hats gives me the opportunity to look beyond the traditional organization silos of design, development and validations, and promote a better cooperation. So, expect me to “think out of the box”.
A little background of Safety Critical Software
If you are familiar with safety-critical software development, you probably can jump ahead to the next paragraph. If you are not, a great Introduction is the “Embedded Software Development for Safety-Critical Systems” book by Chris Hobbs.
I had the pleasure to recently interview some engineers who were responsible for the validation of embedded software in commercial airplanes. It ended up being an eye-opening experience and it felt like finally solving a complex puzzle.
Without getting in too many details of DO-178 (for planes) or ISO-26262 (for cars), I drove the conversation around what I consider the 3 pillars of a safety critical software engineer’s role:
- Correct, Complete and Verifiable requirements: My first big surprise was to discover that a test and validation team would refuse to create a test plan from what they consider bad requirements. This is something I had never seen before in my development career!
- Full traceability from requirements to design, to code, to testing, to production: The second thing I realized is obvious, but I never looked at it that way. With my “developer hat”, when you find a bug or defect, you fix it, check the code back in, or file a development ticket that is then prioritized by the product manager. Phew! You just avoided trouble! In the safety critical world, things are not that simple. You find a defect and you meet to discuss its impact. Is it really safety related? Can we work with it? What is the impact of changes? Safety will decide first, but then, the cost of revisiting every single step from requirement to coding to production will control what comes up next. Little do we know, but some commercial airplanes are flying today with validation procedures added to their safety manual just because the cost of fixing the bug was too high: In those cases, “The time you could fix a bug was gone!”: A supplier has decided he would rather pay a penalty for every time a plane is grounded for that defect for the total lifespan of that model, instead of assuming the cost of a delay in production. Unlike common development practice, every defect is researched, and if a change is made, it traces back all the way to the requirements, with a complete cycle of validation.
- Complete Documentation within Process: Lastly, if you have ever taken a safety critical class about any of the standards, you have realized that any process will require and create a lot of documentation. This documentation will never fall behind and will remain up-to-date. There will be little to no discrepancies between what is in your code and what is in the various software or test specifications. The best anecdote about this is talking with a developer that just moved from California to a ISO26262 development team in Michigan, and stated: “My job is boring now, I know exactly what I am supposed to write before I do”. Do you believe your tester or your developer feel the same way?
Does this apply? And how?
Formal requirements are usually “frowned upon” in mainstream development and often associated with the more old-fashioned “waterfall models”, far from the preferred (modern) concepts of Agile development. So how important is it to have a correct, complete and verifiable definition, in “Plain English”, of what you are trying to achieve? Should we act as a safety critical validation engineer and refuse to work a test plan if we don’t accurately and fully know, or understand, what we need to test? Should we even not develop the functionality? If so, when do we say “Yes”? when do we say “No”? Is there a standard level of acceptance for “English Requirement” we can all agree on?
Not unrelated, I recently asked a QA manager “would you do anything different to test your product if your life depended on it?”. I got the surprising answer of “I am not sure I would trust whoever wrote those specs”.
My findings have been that most will create tests and validations not matching the intended behavior of the products. It could even be beneficial, as errors will often happen outside of the scope of what was specified or designed. Agile is a world where little to no work has been done to start with a full design, so why would the tester do any different?
And that brings us to the second point about traceability. Unless you create something completely new, your increase in complexity is originating in an incremental development process. We add features every day and we constantly have new technology to interact with (how soon will you be able to order your elevator from your phone?). Each sprint will add its new functions and its new challenges. Each version will add new technology to interact with. Each CVE (Common Vulnerabilities and Exposures), Hack or Defect will add new defensive measures, new workarounds or new software fixes and patches. The never-ended increments are making impossible to completely revisit and revalidate what have been done or created in the past.
While looping back to requirements is compulsory in Safety Critical Systems – because there should be no shortcut when life is in danger – the associated costs are prohibitive for most of us. But could we take simple actions to increase the robustness of software to which we are constantly adding to, without reverting to a “waterfall” process and without implementing formal traceability systems?
Additionally, while we are constantly adding to our software, who keeps track of updating the documentation? People have stopped reading manuals because they are often inadequate: We have all jumped on the Doxygen band wagon, hoping it will be the silver bullet that would save us after we reduced the documentation team for economical reasons.
Ask yourself, how far behind is your documentation, your design documents, your requirements? Worst, once you stop keeping up the pace, are you ever able to get back up to speed? Do you rely on ‘word of mouth’ and individual knowledge more than anything else? It is difficult for anyone to test or validate behaviors or functionalities when those aren’t explained anywhere. So, is there a simpler way to create non-existing documentation, stay up to date and distribute information across our team? Can it remain current with the most recent stage of development? Can it be used to facilitate the conversations between team members across all silos (specification, design, development and testing)?
Three places to improve
Our goal is to not make the development organization adopt a safety critical standard. Our goal is to see what we can learn from them, and envisage how it could be applied to ‘regular development’. We are also looking for solutions that are simple to implement, and easily accepted by developers.
So far, we have three ideas:
- A simpler version of “Correct, Complete and Verifiable requirements” can be a standard level of quality in the way we describe our goals in “Plain English”. It must be an objective, cross-domain and, why not, quantitative (you can grade it!) valuation of what you are being asked to do.
- Without enforcing full traceability from requirements to production, and because we want to stay outside of a “waterfall” process, we are looking for simple actions we can take to increase the robustness of a software to which we constantly add to, by quickly and efficiently looping back from implementation to design to implementation.
- And lastly, while complete and accurate documentation is a little too cumbersome, how do we keep everyone informed of latest changes in the software, and make sure that information remains “up to date” during all the development stages, preferably with some help from automation.
3 problems, 3 solutions, 3 experts.
I have had the pleasure to work with three domain experts that have recently introduced me to viable solution for those three ideas: Kevin, Robert and Christelle.
Christelle lives in France and is working for the innovative French company, Prometil, in Toulouse, the home of Airbus.
I heard about their solution at a seminar in Toulouse hosted by Emenda, and presented by Christelle in front of an audience of French technology leaders. Prometil mixed years of experience in aeronautical consulting and Ph.D level expertise in linguistic to create a tool that can review your specifications. It loads a Microsoft Word document (alternatives are Microsoft Excel and Rational DOORS) containing requirements, and can “grade” it for correctness, ambiguity, completeness, verifiability and more. It will even point out the “defects” in your requirements, not unlike a static code analyzer (Klocwork and SciTools Understand being the ones we know) detects defects in your code.
It will also work with user stories or “less formal” requirements! And this gives us what we are looking for, i.e. a way to grade the language used to write requirements.
It is an easy first step to implementing a “Plain English” standard that can be used to make our organization write better, and establish clear, concise, and most importantly, verifiable goals.
Engineers will learn from using it, because they can now evaluate their work against an objective reference. Again, much like static analysis for your code. Developers and testers will quickly and automatically verify that their daily work has met an accepted standard of quality.
I met Robert in the Netherlands earlier that year. Robert is the CEO of Verum, a company focused on the validation of manufacturing software but is slowly expanding to other verticals for good reasons.
Because we met through Chris Hobbs, a safety critical specialist and the author mentioned earlier, the first time I looked at Robert’s technology, I assumed it was as a formal verification method to be used exclusively for safety critical code: I was wrong.
When you look at Verum Dezyne in detail, you realize this is the complete opposite, and Robert has invented a new version of the smart and interactive white board when anyone can specify, write down, explain, discuss, extend and verify their greatest concepts and ideas.
I use a white board analogy, but this is a shortcut as Dezyne is much more than a tool to write ideas. But, like our minds, white boards get crowded. When you and your colleagues start discussing new features or a new user story, you start with a neat and clean diagram on the white board, and while the ideas progress, you eventually run out of space, add smaller and smaller boxes, arrows, texts until the complexity and graphic quality reach a breaking point, and sadly the conversation falters. Let’s just think of this in a test driven development (TDD) organization: One would write their solution in Dezyne specification and component models and then one would also be able to verify and to validate the specifications & components designs as part of their TDD approach.
When our heads get crowded (did we say anything about recent software being incremental?), we usually end up doing something wrong, forget something, or miss a trivial scenario that will cost you dearly later, in testing or in operation. When I explain what programming is to a non-programmer, I often ask them to explain, step by step, how they would back up a car from the garage. And most of the time, people forget to open the garage door, debating instead about the gas pedal, the brake and the gear shifter. Robert and Verum Dezyne are the little voices that would stop you before you get behind the wheel to execute, and ask: “Didn’t you forget something?”. Even, they will also warn you of the garage door.
Today, we sadly “hit the garage door” a lot at the validation stage, failing on obvious scenarios we missed. We go back to specification or design (sometimes), fix the code (always), add everything we need to open the door, and end up driving on the bicycle that was left in the driveway… Like our work, the complexity is always incremental, so nothing is easy anymore: Tomorrow someone will want the car to be manual, or electric (hint: did you think about unplugging it?!), or use your cell phone as a key to enter.
But, unlike the safety critical world, we do not have full traceability from requirements to production to check that we dealt with all scenarios, nor do we want to introduce costly formal methods to “cover every scenario”.
We are just looking for small addition to our development process that can help solidify the code, by easily measuring, simulating and validating the impact of changes and additions to our software.
And Dezyne does exactly this:
- It is not a dusty and a scary formal method (in all truth, it can’t be used to build airplanes) but implementing Dezyne in a development process such as AMDD is straight-forward because you describe your system like you write java interfaces, and provide an “interactive way of thinking”.
- Dezyne will help you evolve your system (or at least a white board view of it) and add features, running sanity checks on your latest additions, and show you where your design can fail before you even start coding.
- Dezyne is also not a testing tool. It is a next generation smart white board where you can design your ideas and validate them. You can simulate behavior and inject “test vectors”. But that is just the beginning: The goal is for you to realize that you are missing something in your design or notice that paths where unforeseen operations of the system may lead to failed states.
And this matches what we are looking for…
Christelle and Semios have an attractive solution to solve our English concerns, and Robert sounds like he can help us with strengthening out our software while constantly adding to it, but we still need someone to help us with our documentation.
This is where Kevin comes in play. Kevin works for SciTools and I first met Kevin in Boston, on our way to visit a medical device supplier interested about Understand, to help with reverse engineering of some of their existing software.
In the safety critical world, in theory (and hopefully in practice), you start by writing requirements, then you design your application architecture and tests, and then you finally code them. But, not unlike people getting their dreams shattered after they build a totally capable aircraft in their garage, with no hope of ever getting it through a DO-178C certification and see it fly, a lot of development groups start their certification by asking “what if I already have the code?”. There is no good answer, because, at that point, you only have what must be considered a prototype and you must go through the certification process as if it were a new development.
So, almost daily, people follow this path and ask Kevin and I how they could use Understand (Understand-C) to reverse engineer their code and create the design and specification documents from the existing. Is this an acceptable practice? Perhaps?
Luckily, in the context of this paper, we don’t have any restriction, and this seems to give us what we wanted: The ability to create documentation, architectural diagrams and a design, where none existed before. But does it really provide us what we need?
As discussed, we needed
- To see the information that is relevant to our team, not only to the developer but also the testers
- The ability to share this information
- The ability to keep that information current and relevant
First (1), when testers and developers interact with each other, the conversations focus on 3 things:
- An internal proprietary language to describe components: SciTools Understand lets you inject “proprietary knowledge of the application architecture” into your analysis, so the information, diagrams and metrics can be documented using your own terminology, easing conversation for people that are not familiar with the specifics of the code.
- Interaction between those components: SciTools Understand lets you create dependency diagrams to show dependencies and the impacts of change within the system.
- Control Flow within the systems: easy-to-create control flow diagram will document the execution in each function.
Second (2), a by-product of using SciTools Understand is a single UDB file that can be easily shared and copied across desktops or easily included in source-code distribution or version control system such as git. With these, developers can also share notes and hints for specific tasks, directly in the code. This can be used to share critical information about a problem, hints of limitation, or correct usage, or traceability to a bug report or a requirement.
Third (3), unlike a pdf or an html doc, SciTools Understand is a live tool, meaning its output will change based on the change in the source code. Simple modifications will get reflected immediately in the diagram, allowing developers or testers to test, prove and demonstrate their theories, directly on their desktop without the need to wait for a centralized regeneration of the documentation.
To this day, Understand remains the best place for technical people to discuss existing code. Download and Try Understand by entering your email at http://www.scitools.com/usa-partner
Safety critical development often uses the “V-Model” to deliver risk free (reduced?) software, requiring every bit of your system and its validations to be defined before their implementations. Those strategies are not adapted to an increasingly Agile development process where the application matures and evolves with each development iteration.
While Safety Critical models are not entirely applicable because they are too cumbersome to manage, three of their best practices can be mapped to a more mainstream development process quite efficiently:
- The first is the ability to automatically validate and evaluate the “Plain English” language used in technical document with the hope of leaving little or no room for interpretation for the critical functions of the system. Semios is one tool that can greatly help with this.
- The second is to use a light-weight variant of formal modeling, with little ramp-up time, to let you quickly loop back from code to white board to code, and let you validate that simple scenarios have not been missed in an incremental development. Dezyne is a perfect solution here.
- The third is the ability to communicate and document better, with an always up-to-date documentation of your application and its architectures. We have seen that technology used for reverse-engineering can deliver an elegant solution to automatically generate and update what we need from the source code. Understand is a wonderful tool for increasing code-comprehension in your team.
By introducing these three practices in your development environment, you will be able to take back control over some of the complexity of your products. You can lower the cost of testing and validation by being pro-active in addressing complexity in the development and design phase instead. In some regards, this is no different from what safety critical engineers do.