Peer Review and Static Analysis Find More Defects When Used Together
A Post-Webinar Q&A with Caper Jones and Tom McCabe
SmartBear was honored to have Capers Jones and Tom McCabe as guest presenters for our “Why Static Analysis Isn’t Enough” webinar (Now available OnDemand at the link above). Both Capers and Tom share their perspectives by answering some of the more interesting webinar questions from the live event we didn't have time to cover. We hope you find them helpful.
Q: What about security vulnerabilities? Do you believe there is a way to identify code that is more likely to have security flaws?
Capers: So far as I can tell from talking to the security community, you need defect prevention during the architecture and design phase. And then you need formal inspections that are aimed specifically toward security problems. You need to use static analysis to look for known vulnerabilities that are mechanical. But I think the human mind has to participate too.
Tom: Yes, there are specific things to look for. For example, one of the things often talked about is what's called deep parsing. Because some of the tools are superficial, they don't grab all the include files or all the code that's going to be used. And there could be Trojan horses and various security things hidden within that. There are very specific things that come up in different languages. For example, in C, there are vulnerabilities or openings where the code can get infected. So there are a bunch of very specific things to scan and inspect for that are security sensitive. The overall architecture is one of those items.
Q: Regarding the relationship between static analysis and manual inspection, can defect density reported by static analysis be used to identify areas that are more suited for manual inspection?
Capers: I believe some of the static analysis tools calculate various flavors of complexity, such as cyclomatic and essential. And that would seem to be a relatively useful starting point. Look at the more complex things later for inspection, and then later for testing.
Tom: This is code that's not been executed. So for example, when you look at the architecture, one of the quality attributes to superimpose on that is the difference between cyclomatic complexity and what we call actual complexity. And actual complexity is the number of cyclomatic paths that have been untested. Where you find modules with a high number of untested paths, you definitely want to delve in and inspect.
There are several typical outcomes. One is that the testing has not been very thorough. And when it's not thorough, you're going to have errors. A second outcome is that it could be a Trojan horse. It could be lines of code that you don't execute in normal application or testing, that does something rather perverse to the software. So one of the key areas where you should inspect is in looking at the system over time and finding either whole modules or paths that have never been executed—you either have a reliability issue or you have a security issue. And that's where you want to direct your inspection.
Q: Is it true that areas of the code with lots of statistically detected defects are not high priority candidates for inspection? In other words, when you talk about manual inspection, should you be focusing on areas with lots of statistically detected defects or areas where you had said you should be focusing more on areas that are higher risk?
Tom: I would direct the inspections to where the metrics are telling you there's a problem.
Capers: I'll make one small anecdote about this. I happened to be part of the quality assurance team of an application that had 425 modules in it. And it was very buggy when it was released. We discovered that 300 of those modules never received a single bug report from a customer. They were zero-defect modules in the field. 57% of the entire error load against that application came from 31 modules that coincidentally were in one department, under one manager who didn't like inspections. So the conclusion was, given that kind of distribution on a very buggy project, if you have a high number of defects in a particular piece, you should inspect it. And for that matter, you should make sure that it is inspected before it gets released, which was something that we had not done at the time.
Q: How can you apply some of these methods to the source of components that you use, like frameworks and libraries? Do you have any best practices or recommendations about inspections against code that you're getting from frameworks and other libraries that you're using?
Capers: If you're getting your material from an internal source and you're in a big company like IBM, they ought to have a certification process. Anything that you get from your internal library of reusable materials will be certified to have no security flaws and to approximate zero defects. If you're getting your frameworks or your reusable material from a random source, like an open-source provider or a third party, and you don't know if it's been certified or not, then you're taking fairly serious risk. Because there could be viruses, Trojans, and other kinds of the problems lurking in it that you have no way of knowing about. So I think that what the industry needs is, eventually, a better process for certifying materials that are going to be reused, so that the consumer who gets access to them will have some kind of written warranty or guarantee that they have been checked to almost zero-defect level and are free from known viruses and security flaws.
Tom: That's a huge issue—a very open issue in our industry. And it has all kinds of side effects. Now, a lot of companies use open-source software. What often happens is, as the open-source software is updated, the company uses that new software version without even knowing it. So in fact, there are some companies-- not particularly in metrics, but in a different domain-- that have tools just for that, just to manage the configuration baselines of the open-source software. But it's a huge issue, because people use and reuse the code, with no idea about what version it is.
A second thing is that there should be some kind of certification around the testing or reusability or quality. That's seldom the case, in my experience, so that's needed.
A third thing-- all kinds of wacky things that happen with that. For example, and there's no metric for this, but there ought to be. In C or C++, or old languages, more so with C++ and object-oriented languages, you can have pathological inheritance. Consider it the father inheriting traits from the son; it throws off the whole inheritance tree. And it leads to all kinds of pathological results. Before you ever went to use some open source code, that’s the kind of thing that you'd want to do a scan for and check.
As Capers mentioned, although this whole area has been a boost to productivity and driven down costs, the danger within it is pretty incredible. And I don't think as an industry we've come even close to facing up to that. A lot of organizations are thinking about quality-level service agreements when they use code. And I think we've got to get to something like that before this even has any kind of a rationality to it.
Q: What impact has malware and software viruses had on this issue, particularly with getting code from libraries, open-source code and how you test?
Capers: Problems of malware seem to be growing at a fairly rapid clip. And in theory, that should raise the use of inspections on incoming materials before you utilize it. Unfortunately, that seldom seems to be the case. So what's happening is that reusability and open source are raising the volume of low-cost stuff, but the number of flaws and security vulnerabilities in that low-cost stuff seems to be increasing. So I think that we need to stop and take a look, and introduce some kind of business firewall. Before you allow materials to come into your building and be used in your software, it has to pass through either an internal certification program, or you have to know that the vendor has certified it.
Tom: That's a pretty big issue. For one thing, a lot of the viruses and malware come in, in terms of object code rather than source code. And it's obviously costing billions of dollars in both damages and dollars. So we have to broaden the way we think of quality to object code, to deal with it. One more point, and I'm publishing a graph theory about this. Exactly where the essential complexity is very high and you can't understand an algorithm is precisely where malware is inserted, with the thought that, who would ever find the stuff? So the enemy of security, in that sense, is complexity. When things are very complex and not understandable, there's an open door to put things in that are not desirable.
Just one more issue-- this is really a national issue-- many foreign governments actually require big vendors like Microsoft and Google to give them the source code. When they have the source code they do all kinds of things with it, including looking for malware, open doors, trap doors, and so on. So it's a huge issue. And I think it just accelerates the importance of what Capers and I have been talking about. Such code needs to be thoroughly inspected with code review and even more sophisticated static and dynamic tools.
Download a free trial of CodeCollaborator and see for yourself why the experts agree it's one of the best ways to insure software quality.