## page was renamed from MeetingLogs/devweek1001/devweek1001/WriteTests == Dev Week -- Writing good test-cases -- jam -- Wed, Jan 27 == UTC {{{#!IRC [16:00] First up is the fantastic John Arbash Meinel - he'll talk about "Writing good test-cases"! [16:00] jam: the floor is yours! [16:01] greetings all! [16:01] I'm happy to see so many names on the channel.... hopefully you all aren't just lurking [16:01] waiting for the next speaker :) [16:01] I'm going to be pretty loose with the discussion [16:01] so feel free to ask questions [16:02] cjohnston has graciously offered to help me keep track [16:02] o/ [16:02] So what is a 'good' test case [16:02] L [16:02] ? [16:03] Generally speaking, a test case is meant to verify that something is happening correctly. [16:04] Or looking at it differently, when something *isn't* happening correctly, the test case should fail. [16:04] However, there are still a lot of subtle aspects. [16:04] I would say that the desirable features of a test case break down into [16:04] 1) Sensitivity (the likelyhood that the test case will break when something goes wrong) [16:05] 2) Specificity (the likelyhood that the test case will break and what you are testing has not gone wrong) [16:05] 3) Performance [16:05] It is easy to focus on say 1, but to have a *good* test, all 3 are important [16:06] It is great to have a huge test suite that covers every possible aspect of your code, and all permutations [16:06] but if it takes 3 weeks to run, it is often not very useful for development [16:06] Likewise, a test with low specificity will be breaking all the time [16:06] and won't really help you isolate the problem [16:07] You can argue that 4) coverage is a property, but I would argue that it is a desirable property of the whole test suite, and not a great property of a single test [16:10] Personally, I think there are a range of 'good' tests, generally dependent on what aspect you are focusing on [16:10] I personally think that having lots of focused tests is better than having a single 'test' that is testing lots of aspects [16:11] but integration tests are still useful an needed [16:11] and [16:12] So how about we put together an example [16:13] I'll use python, since that is my preferred language [16:13] and it has a decent unittest suite [16:13] QUESTION: Even after weeks of testing by most experience testers, very robust apps breakdown all of a sudden, what might be the reasons behind it? [16:14] ANSWER: I think that complexity is generally a major factor causing 'bugs' in software. [16:15] It is often considered inherent in any sufficiently developed program. [16:15] < Omar871> QUESTION: according to what I learned in college, a 'good' test is that one that makes that system break/crash, to show where the problem is, how true could that be? [16:15] Generally, this means there will be some sort of permutation of objects which has not been tested [16:15] directly [16:16] A goal of development is to manage complexity (generally by defining apis, separation of concerns, etc) [16:16] good software can then have good tests that test a subset, without *having* to manage the permutation problem [16:16] (but generally, abstractions can 'leak', and ... boom) [16:17] ANSWER : I think I understand you to mean the "inverse" [16:17] which is that a good test is one that provokes a problem in the system [16:18] I think that for a regression style test, it is certainly important that the test would trigger the bug that you are trying to fix [16:18] However, once a bug is fixed, you would certainly expect it to not provoke the problem anymore [16:19] So it is certainly important that a test exposes a weakness [16:19] I guess put another way... [16:19] If I had a bug, and wrote a test case, and it *didn't* trigger the bug, that test doesn't have a lot of worth (for fixing that bug) [16:20] (Which often accidentally happens if you fix the bug before writing the test) [16:22] n3rd: We were just discussing in our group about "coding creates bugs... are we reducing the number of bugs faster than we are creating them?" [16:24] n3rd also brings up a decent point [16:24] users often find bugs that developers don't think of [16:24] often because they use software in ways that weren't anticipated [16:25] often this goes back to the permutation issue [16:25] it isn't possible to test every possible permutation [16:25] (well, rarely possible) [16:26] < n3rd> jam, so the users are passive tester? [16:26] A file with just 20 bytes has 8^20 = ~10^48 permutations [16:27] Well, I would say that users are often pretty *active* testers [16:27] however, they don't make good automated test suites [16:27] I suppose that would be my: [16:27] 4) Reproducible [16:28] (the chance that running the same thing now will give you the same thing it gave you before) [16:28] It is somewhat related to specificity [16:28] As an unreproducible test has low specificity [16:28] (it breaks for reasons that you aren't trying to test) [16:29] I guess I should also mention... if a user used every intermediate version of my program, they'd run into a lot more bugs [16:29] As a developer I fix a huge amount of things before it gets released [16:29] it is just that often the set of *remaining* bugs [16:30] are ones that I had not anticipated very well [16:34] Anyway, I think it is useful to remember what the strengths of a given style of testing are. [16:34] You can have automated tests (unit and integration tests), manual (interactive) testing, foisting off the software on your users [16:34] etc [16:34] I do think that testing at each level is useful [16:35] and trying to test things at a the wrong level introduces more pain than benefit [16:35] Having an absolutely bulletproof piece of software that doesn't do what your users want isn't particularly beneficial [16:36] So having user testing in your feedback loop is certainly a good thing [16:36] However, giving them a buggy PoS is also not going to make them very happy [16:36] I'm certainly a fan of multi-tier testing, including automated testing [16:36] having a small fast test suite that is likely to expose bugs is nice on 'must pass before making it to trunk' [16:37] having a slower but more thorough "must pass before releasing to users" [16:37] and for some systems adding a "must be tested by a human interaction" can be inserted in there as well [16:38] If the first one takes much more than 5 minutes, it often causes grief when trying to get development done [16:38] but the second can take a day, and still not slow down your release cycle [16:44] < Omar871> QUESTION: Could the efficiency and effectivenes of the testing process depend on the licensing type of the software we are making? [16:45] ANSWER: I don't think it would change dramatically [16:46] If the license is open, it does allow users to do introspective testing which would just not be possible otherwise [16:46] however, few users can really develop your software anyway, so it certainly shouldn't be relied upon as a source of improving correctness [16:46] even if your users are very intelligent they almost always [16:46] 1) don't know your code [16:46] 2) don't have time to be doing your work for you :) [16:47] I think Linus gets a bit of a boost, because there are lots of developers on the project, not just users [16:48] Certainly a "lot of eyeballs" requires eyeballs that can see the code [16:49] and with enough, you can have a person to specialize for any given subset, which also helps in defect localization (hence 'shallow') [16:50] < hggdh> QUESTION: are there any considerations for *usability* testing (this is one thing that users would certainly be able to perform)? [16:51] ANSWER: I think that usability testing is certainly important [16:51] (10:35:23 AM) jam: Having an absolutely bulletproof piece of software that doesn't do what your users want isn't particularly beneficial [16:51] There is an argument whether this specifically falls under the standard category of 'testing' [16:52] (market research is certainly important to developing software, but it isn't "testing" :) [16:56] < strycore89> QUESTION : how is testing of graphical applications done ? (For example PyGTK apps) [16:56] IME, you can test PyGTK (and PyQt) without actually showing dialogs [16:57] both of them support updating widgets by programattically setting values [16:57] and then triggering events [16:57] In that case, they can be tested in the same fashion as any other unit testing [16:57] however, it doesn't test the visual representation [16:57] which is a valid level to test [16:58] There are also gui testing suites that can be used [16:59] I forget the exact name of one (suikili?) [16:59] Which uses screenshots [16:59] and some work for marking the regions you care about [17:01] http://groups.csail.mit.edu/uid/sikuli/ [17:02] alrightie! [17:02] thanks jam for giving this great talk! [17:02] * dholbach hugs jam }}}