## page was renamed from MeetingLogs/devweek1001/devweek1001/WriteTests
== Dev Week -- Writing good test-cases -- jam -- Wed, Jan 27 ==
UTC
{{{#!IRC
[16:00] <dholbach> First up is the fantastic John Arbash Meinel - he'll talk about "Writing good test-cases"!
[16:00] <dholbach> jam: the floor is yours!
[16:01] <jam> greetings all!
[16:01] <jam> I'm happy to see so many names on the channel.... hopefully you all aren't just lurking
[16:01] <jam> waiting for the next speaker :)
[16:01] <jam> I'm going to be pretty loose with the discussion
[16:01] <jam> so feel free to ask questions
[16:02] <jam> cjohnston has graciously offered to help me keep track
[16:02] <cjohnston> o/
[16:02] <jam> So what is a 'good' test case
[16:02] <jam> L
[16:02] <jam> ?
[16:03] <jam> Generally speaking, a test case is meant to verify that something is happening correctly.
[16:04] <jam> Or looking at it differently, when something *isn't* happening correctly, the test case should fail.
[16:04] <jam> However, there are still a lot of subtle aspects.
[16:04] <jam> I would say that the desirable features of a test case break down into
[16:04] <jam> 1) Sensitivity (the likelyhood that the test case will break when something goes wrong)
[16:05] <jam> 2) Specificity (the likelyhood that the test case will break and what you are testing has not gone wrong)
[16:05] <jam> 3) Performance
[16:05] <jam> It is easy to focus on say 1, but to have a *good* test, all 3 are important
[16:06] <jam> It is great to have a huge test suite that covers every possible aspect of your code, and all permutations
[16:06] <jam> but if it takes 3 weeks to run, it is often not very useful for development
[16:06] <jam> Likewise, a test with low specificity will be breaking all the time
[16:06] <jam> and won't really help you isolate the problem
[16:07] <jam> You can argue that 4) coverage is a property, but I would argue that it is a desirable property of the whole test suite, and not a great property of a single test
[16:10] <jam> Personally, I think there are a range of 'good' tests, generally dependent on what aspect you are focusing on
[16:10] <jam> I personally think that having lots of focused tests is better than having a single 'test' that is testing lots of aspects
[16:11] <jam> but integration tests are still useful an needed
[16:11] <jam> and
[16:12] <jam> So how about we put together an example
[16:13] <jam> I'll use python, since that is my preferred language
[16:13] <jam> and it has a decent unittest suite
[16:13] <cjohnston> QUESTION: Even after weeks of testing by most experience testers, very robust apps breakdown all of a sudden, what might be the reasons behind it?
[16:14] <jam> ANSWER: I think that complexity is generally a major factor causing 'bugs' in software.
[16:15] <jam> It is often considered inherent in any sufficiently developed program.
[16:15] <cjohnston> < Omar871> QUESTION: according to what I learned in college, a 'good' test is that one that makes that system break/crash, to show where the problem is, how true could that be?
[16:15] <jam> Generally, this means there will be some sort of permutation of objects which has not been tested
[16:15] <jam> directly
[16:16] <jam> A goal of development is to manage complexity (generally by defining apis, separation of concerns, etc)
[16:16] <jam> good software can then have good tests that test a subset, without *having* to manage the permutation problem
[16:16] <jam> (but generally, abstractions can 'leak', and ... boom)
[16:17] <jam> ANSWER <omar>: I think I understand you to mean the "inverse"
[16:17] <jam> which is that a good test is one that provokes a problem in the system
[16:18] <jam> I think that for a regression style test, it is certainly important that the test would trigger the bug that you are trying to fix
[16:18] <jam> However, once a bug is fixed, you would certainly expect it to not provoke the problem anymore
[16:19] <jam> So it is certainly important that a test exposes a weakness
[16:19] <jam> I guess put another way...
[16:19] <jam> If I had a bug, and wrote a test case, and it *didn't* trigger the bug, that test doesn't have a lot of worth (for fixing that bug)
[16:20] <jam> (Which often accidentally happens if you fix the bug before writing the test)
[16:22] <jam> n3rd: We were just discussing in our group about "coding creates bugs... are we reducing the number of bugs faster than we are creating them?"
[16:24] <jam> n3rd also brings up a decent point
[16:24] <jam> users often find bugs that developers don't think of
[16:24] <jam> often because they use software in ways that weren't anticipated
[16:25] <jam> often this goes back to the permutation issue
[16:25] <jam> it isn't possible to test every possible permutation
[16:25] <jam> (well, rarely possible)
[16:26] <cjohnston> < n3rd> jam, so the users are passive tester?
[16:26] <jam> A file with just 20 bytes has 8^20 = ~10^48 permutations
[16:27] <jam> Well, I would say that users are often pretty *active* testers
[16:27] <jam> however, they don't make good automated test suites
[16:27] <jam> I suppose that would be my:
[16:27] <jam> 4) Reproducible
[16:28] <jam> (the chance that running the same thing now will give you the same thing it gave you before)
[16:28] <jam> It is somewhat related to specificity
[16:28] <jam> As an unreproducible test has low specificity
[16:28] <jam> (it breaks for reasons that you aren't trying to test)
[16:29] <jam> I guess I should also mention... if a user used every intermediate version of my program, they'd run into a lot more bugs
[16:29] <jam> As a developer I fix a huge amount of things before it gets released
[16:29] <jam> it is just that often the set of *remaining* bugs
[16:30] <jam> are ones that I had not anticipated very well
[16:34] <jam> Anyway, I think it is useful to remember what the strengths of a given style of testing are.
[16:34] <jam> You can have automated tests (unit and integration tests), manual (interactive) testing, foisting off the software on your users
[16:34] <jam> etc
[16:34] <jam> I do think that testing at each level is useful
[16:35] <jam> and trying to test things at a the wrong level introduces more pain than benefit
[16:35] <jam> Having an absolutely bulletproof piece of software that doesn't do what your users want isn't particularly beneficial
[16:36] <jam> So having user testing in your feedback loop is certainly a good thing
[16:36] <jam> However, giving them a buggy PoS is also not going to make them very happy
[16:36] <jam> I'm certainly a fan of multi-tier testing, including automated testing
[16:36] <jam> having a small fast test suite that is likely to expose bugs is nice on 'must pass before making it to trunk'
[16:37] <jam> having a slower but more thorough "must pass before releasing to users"
[16:37] <jam> and for some systems adding a "must be tested by a human interaction" can be inserted in there as well
[16:38] <jam> If the first one takes much more than 5 minutes, it often causes grief when trying to get development done
[16:38] <jam> but the second can take a day, and still not slow down your release cycle
[16:44] <cjohnston> < Omar871> QUESTION: Could the efficiency and effectivenes of the testing process depend on the licensing type of the software we are making?
[16:45] <jam> ANSWER: I don't think it would change dramatically
[16:46] <jam> If the license is open, it does allow users to do introspective testing which would just not be possible otherwise
[16:46] <jam> however, few users can really develop your software anyway, so it certainly shouldn't be relied upon as a source of improving correctness
[16:46] <jam> even if your users are very intelligent they almost always
[16:46] <jam> 1) don't know your code
[16:46] <jam> 2) don't have time to be doing your work for you :)
[16:47] <jam> I think Linus gets a bit of a boost, because there are lots of developers on the project, not just users
[16:48] <jam> Certainly a "lot of eyeballs" requires eyeballs that can see the code
[16:49] <jam> and with enough, you can have a person to specialize for any given subset, which also helps in defect localization (hence 'shallow')
[16:50] <cjohnston>  < hggdh> QUESTION: are there any considerations for *usability* testing (this is one thing that users would certainly be able to perform)?
[16:51] <jam> ANSWER: I think that usability testing is certainly important
[16:51] <jam> (10:35:23 AM) jam: Having an absolutely bulletproof piece of software that doesn't do what your users want isn't particularly beneficial
[16:51] <jam> There is an argument whether this specifically falls under the standard category of 'testing'
[16:52] <jam> (market research is certainly important to developing software, but it isn't "testing" :)
[16:56] <cjohnston> < strycore89> QUESTION : how is testing of graphical applications done ? (For example PyGTK apps)
[16:56] <jam> IME, you can test PyGTK (and PyQt) without actually showing dialogs
[16:57] <jam> both of them support updating widgets by programattically setting values
[16:57] <jam> and then triggering events
[16:57] <jam> In that case, they can be tested in the same fashion as any other unit testing
[16:57] <jam> however, it doesn't test the visual representation
[16:57] <jam> which is a valid level to test
[16:58] <jam> There are also gui testing suites that can be used
[16:59] <jam> I forget the exact name of one (suikili?)
[16:59] <jam> Which uses screenshots
[16:59] <jam> and some work for marking the regions you care about
[17:01] <jam> http://groups.csail.mit.edu/uid/sikuli/
[17:02] <dholbach> alrightie!
[17:02] <dholbach> thanks jam for giving this great talk!
[17:02]  * dholbach hugs jam
}}}