PyUnitTests

Dev Week -- Unit testing Python code, with code coverage measurement - Lars Wirzenius -- Fri, Sep 5

(01:02:06 PM) liw: jcastro, how do you want to work this, shall I wait a bit or just start now?
(01:02:21 PM) jcastro: up to you, it's your hour. :)
(01:02:35 PM) jcastro: Though a few minutes so everyone can go to the bathroom or something is always appreciated. :D
(01:02:44 PM) liw: I'll wait for 180 seconds, then
(01:03:25 PM) liw: in the mean while: jcastro, will you or someone be around to relay questions from -chat?
(01:06:50 PM) liw: ok, let's start
(01:06:59 PM) liw: Welcome, everyone. The goal of this session is to introduce the Python unittest library and the coverage.py code coverage measurement tool.
(01:07:04 PM) liw: I will do this by walking through the development of a simple command line program to compute md5 checksums for programs.
(01:07:09 PM) liw: I assume everyone in the audience has a basic understanding of Python.
(01:07:14 PM) liw: If you have questions, please ask them in #ubuntu-classrom-chat, prefixed with "QUESTION".
(01:07:18 PM) liw: I would also appreciate if someone volunteered to feed the questions to me one by one.
(01:07:27 PM) liw: (now breathe a bit and read that :)
(01:08:28 PM) liw: The exaple program I will develop will be similar to the md5sum program.
(01:08:32 PM) liw: It gets some filenames on the command line and writes out their MD5 checksum.
(01:08:41 PM) liw: For example: checksum foo.txt bar.txt
(01:08:46 PM) liw: This might output something like this:
(01:08:51 PM) liw: d3b07384d113edec49eaa6238ad5ff00  foo.txt
(01:08:51 PM) liw: c157a79031e1c40f85931829bc5fc552  bar.txt
(01:09:04 PM) liw: is anyone following this or am I going too fast?
(01:09:08 PM) Myrtti: I volunteer for relaying
(01:09:13 PM) liw: Myrtti, thank you
(01:09:51 PM) liw: I will develop this program using "test driven development", which means that you write the tests first.
(01:10:16 PM) liw: 'http://en.wikipedia.org/wiki/Test_Driven_Development gives an overview of TDD for those who want to learn more.
(01:10:23 PM) liw: For this tutorial, we will merely assume that writing tests first is good because it is easier to write tests for all parts of your code.
(01:10:35 PM) liw: For the checksumming application, we will need to compute the checksum for some file, so let's start with that.
(01:10:42 PM) liw: http://paste.ubuntu.com/43675/
(01:10:56 PM) liw: That has the unit test module.
(01:11:02 PM) liw: In the real program, we will have a class called FileChecksummer, which will be given an open file when it is created.
(01:11:05 PM) liw: It will have a method "compute", which computes the checksum.
(01:11:08 PM) liw: The checksum will be stored in the "checksum" attribute.
(01:11:12 PM) liw: To start with, the "checksum" attribute will be None, since we have not yet computed the checksum.
(01:11:17 PM) liw: The "compute" method will set the "checksum" attribute when it has computed the checksum.
(01:11:44 PM) liw: (This is not necessarily a great design, for which I apologize, but this is an example of writing tests, not of writing great code)
(01:11:55 PM) liw: In the unit test, we check that this is true: that "checksum" is None at the start.
(01:12:02 PM) Myrtti: < geser> QUESTION: there are several unittest frameworks for Python out there. What are the most important differences between them?
(01:12:16 PM) liw: I'll answer the question in a minute
(01:12:22 PM) liw: The Python unittest module is inspired by the Java JUnit framework.
(01:12:26 PM) liw: JUnit has inspired implementations in many languages, and these frameworks are collectively known as xUnit.
(01:12:30 PM) liw: See http://en.wikipedia.org/wiki/XUnit for more information.
(01:13:26 PM) liw: there are at least two other modules for automated testing in the Python standard library: doctest and test.
(01:13:59 PM) liw: unittest is the only one I have any real experience in. back when I started writing unit tests with Python, doctest scared me, and I don't know if test even existed then
(01:14:22 PM) liw: as far as I understand, the choice between doctest and unittest is mostly a matter of taste: it depends on how you want to write the tests
(01:15:10 PM) liw: I like unittest's object oriented approach; doctest has an approach where you paste a Python command prompt session into a docstring and doctest runs the code and checks that the output is identical
(01:15:33 PM) liw: so it's good to look at both and pick the one that you prefer; sorry I can't give a more definite answer
(01:16:01 PM) liw: The example above (see the paste.ubuntu.com URL I pasted) shows all the important parts of unittest.
(01:16:08 PM) liw: The tests are collected into classes that are derived from the unittest.TestCase class.
(01:16:11 PM) liw: Each test is a method whose name starts with "test".
(01:16:16 PM) liw: There can be some setup work done before each test, and this is put into the "setUp" method.
(01:16:23 PM) liw: In this example, we create a FileChecksummer object.
(01:16:26 PM) Myrtti: < Salze> QUESTION: is that a convention that the testclass is the original classname plus "tests"?
(01:16:51 PM) liw: Salze, yes, that is one convention; that naming is not enforced, but lots of people seem to use it
(01:17:02 PM) liw: continuing
(01:17:03 PM) liw: Similarly, there can be work done after each test, and this is put into the "tearDown" method, but we don't need that in this example.
(01:17:08 PM) liw: "setUp" is called before each test method, and "tearDown" after each test method.
(01:17:11 PM) liw: There can be any number of test methods in a TestCase class.
(01:17:15 PM) liw: The final bit in the example calls unittest.run to run all tests.
(01:17:18 PM) liw: unittest.run automatically finds all tests.
(01:17:51 PM) liw: that's all about the test module. any questions on that? take a minute (and tell me if you need more time), it's good to understand it before we continue
(01:19:50 PM) liw: no questions? let's continue then
(01:19:55 PM) liw: http://paste.ubuntu.com/43676/
(01:19:59 PM) liw: That's the actual code.
(01:20:02 PM) liw: As you can see, it is very short.
(01:20:06 PM) liw: That is how test driven development works: first you write a test, or a small number of tests, and then you write the shortes possible code to make those tests pass.
(01:20:28 PM) liw: Let's see if they do.
(01:20:32 PM) liw: To run the tests do this: pyhon checksum_tests.py
(01:20:36 PM) liw: You should get the following output:
(01:20:41 PM) liw:  liw@dorfl$ python checksum_tests.py
(01:20:41 PM) liw:  .
(01:20:41 PM) liw:  ----------------------------------------------------------------------
(01:20:41 PM) liw:  Ran 1 test in 0.000s
(01:20:43 PM) liw:  
(01:20:46 PM) liw:  OK
(01:20:48 PM) liw: Everyone please try that, while I continue slowly.
(01:21:04 PM) liw: The next step is to make FileChecksummer to actually compute a checksum.
(01:21:11 PM) liw: First we write the test.
(01:21:17 PM) liw: http://paste.ubuntu.com/43677/
(01:21:30 PM) liw: that's the new version of the test module
(01:21:45 PM) liw: it adds the testComputesAChecksum method
(01:21:51 PM) liw: Then we run the test.
(01:22:00 PM) liw:  liw@dorfl$ python checksum_tests.py
(01:22:00 PM) liw:  F.
(01:22:00 PM) liw:  ======================================================================
(01:22:00 PM) liw:  FAIL: testComputesAChecksum (__main__.FileChecksummerTests)
(01:22:00 PM) liw:  ----------------------------------------------------------------------
(01:22:00 PM) liw:  Traceback (most recent call last):
(01:22:02 PM) liw:    File "checksum_tests.py", line 18, in testComputesAChecksum
(01:22:04 PM) liw:      self.assertNotEqual(self.fc.checksum, None)
(01:22:06 PM) liw:  AssertionError: None == None
(01:22:08 PM) liw:  
(01:22:10 PM) liw:  ----------------------------------------------------------------------
(01:22:16 PM) liw: That's not so good.
(01:22:18 PM) liw: The test does not pass.
(01:22:20 PM) liw: That's because we only wrote the test, not the code.
(01:22:22 PM) liw: This, too, is how test driven development works.
(01:22:24 PM) liw: We write the test, and then we run the test.
(01:22:28 PM) liw: And now check that the test fails in the right way.
(01:22:33 PM) liw: And it does: it fails because the checksum attribute is None.
(01:22:41 PM) liw: The test might have failed because we did not have a compute method, or because we misspelt the checksum attribute.
(01:22:48 PM) liw: Since we did not, the test is OK, and we write the code next.
(01:22:53 PM) liw: http://paste.ubuntu.com/43679/
(01:23:13 PM) liw: that's the new code, it modifies the compute() method
(01:23:22 PM) liw: Please run the test and see that it works.
(01:23:31 PM) Myrtti: < davfigue> QUESTION: what is the package for cheksum module ?
(01:24:10 PM) liw: davfigue, the checksum module comes from http://paste.ubuntu.com/43679/ -- save that to a file called checksum.py
(01:24:21 PM) liw: and update the file with newer versions as I get to them
(01:24:34 PM) liw: did anyone run the modifed code successfully through the tests?
(01:24:55 PM) Myrtti: < thekorn> QUESTION: what's your experience, where should I put the test code, in the module itself or in a seperate tests/ sub-directory?
(01:25:59 PM) liw: thekorn, in my experience, because of the way I run my tests, it is best to keep a module foo.py and its tests in foo_tests.py in the same directory; while I haven't tried nose (python-nose), I use another similar tool and it benefits from keeping them together
(01:26:15 PM) liw: thekorn, I also find that as  aprogrammer it's easier to have things together
(01:26:45 PM) liw: I'm going to hope the code passes through tests for others, and continue
(01:26:54 PM) liw: If you look at the code, you see how I cheated: I only wrote as much code as was necessary to pass the test.
(01:26:58 PM) liw: In this case, it was enough to assign any non-None value to checksum.
(01:27:03 PM) liw: That's OK, that's part of how test driven development works.
(01:27:06 PM) liw: You write a test and then a little code and then you start again.
(01:27:10 PM) liw: This way, you do very, very small iterations, and it turns out that for many people, including me, that means the total development speed is higher than if you skip writing the tests, or write a lot of code  at a time.
(01:27:18 PM) liw: That's because if you write a lot of code before you test it, it's harder to figure out where the problem is.
(01:27:23 PM) liw: If you only write one line at a time, and it breaks, you know where to look.
(01:27:28 PM) liw: So the next step is to write a new test, something to verify that compute() computes the right checksum.
(01:27:32 PM) liw: Since we know the input, we can pre-compute the correct answer with the md5sum utility.
(01:27:49 PM) liw: liw@dorfl$ echo -n hello, world | md5sum -
(01:27:49 PM) liw: e4d7f1b4ed2e42d15898f4b27b019da4  -
(01:27:56 PM) liw: Changing the test give this:
(01:28:07 PM) liw: http://paste.ubuntu.com/43680/
(01:28:18 PM) liw: Again, tests fail.
(01:28:28 PM) liw: It's time to fix the code.
(01:28:32 PM) liw: http://paste.ubuntu.com/43681/
(01:28:44 PM) Myrtti: < Salze> QUESTIONS: writing all the tests (one can think of) at once would be a "valid" approach to TDD, too? Or not?
(01:29:12 PM) liw: Salze, it's a valid approach, if it works for you :) I find that writing a large number of tests at once results in me writing a lot of code at once, and a lot of bugs
(01:29:33 PM) liw: but sometimes it's ok to write a lot of tests, to test all the aspects of a small amount of tricky code
(01:30:07 PM) liw: for example, if the function checks that a URL well-formed, it's ok to write all tests at once, adn then write the one-line regular expression
(01:30:24 PM) liw: Next we will write a main program to let us compute checksums for any files we may want.
(01:30:28 PM) liw: Sometimes it feels like a lot of work to write tests all the time, so I'm going pretend I'm lazy and skip writing the tests now.
(01:30:35 PM) liw: (note: _pretend_ :)
(01:30:38 PM) liw: After all, the checksumming is the crucial part of the program, and we've alredy written tests for that.
(01:30:42 PM) liw: The rest is boilerplate code that is very easy to get right.
(01:30:46 PM) liw: http://paste.ubuntu.com/43682/
(01:30:56 PM) liw: That's the finished application.
(01:30:59 PM) liw: All tests pass, and everything is good.
(01:31:17 PM) liw: Oops, no it isn't.
(01:31:22 PM) liw: If you try to actually run the application, you get the wrong output:
(01:31:26 PM) liw: liw@dorfl$ python checksum.py foo.txt bar.txt
(01:31:26 PM) liw: None foo.txt
(01:31:26 PM) liw: None bar.txt
(01:31:30 PM) liw: I forgot to call compute!
(01:31:37 PM) liw: See, this is what happens when I am lazy.
(01:31:40 PM) liw: I make bugs.
(01:31:45 PM) liw: Fixing...
(01:31:48 PM) liw: Still too lazy to write a test.
(01:31:53 PM) liw: http://paste.ubuntu.com/43683/
(01:32:07 PM) liw: that's really the finaly checksum.py I hope
(01:32:14 PM) liw: To test it, I compare its output with md5sum's.
(01:32:21 PM) liw: liw@dorfl$ python checksum.py foo.txt bar.txt
(01:32:21 PM) liw: d3b07384d113edec49eaa6238ad5ff00 foo.txt
(01:32:21 PM) liw: c157a79031e1c40f85931829bc5fc552 bar.txt
(01:32:21 PM) liw: liw@dorfl$ md5sum foo.txt bar.txt
(01:32:21 PM) liw: d3b07384d113edec49eaa6238ad5ff00  foo.txt
(01:32:22 PM) liw: c157a79031e1c40f85931829bc5fc552  bar.txt
(01:32:25 PM) liw: Both programs give the same output, so everything is OK.
(01:32:37 PM) ***liw makes a significant pause, because this is an important moment
(01:32:47 PM) liw: See what happened there?
(01:32:50 PM) liw: I stopped writing automated tests, so now I have to test things by hand.
(01:32:54 PM) liw: In a big project, how often can I be bothered to test things by hand?
(01:32:57 PM) liw: Not very often, because I'm lazy.
(01:33:00 PM) liw: By writing automated tests, I can be more lazy.
(01:33:09 PM) liw: This is why it's good for programmers to be lazy: they will work their asses off to only do something once.
(01:33:23 PM) liw: everyone with me so far?
(01:34:06 PM) liw: Suppose we come back to this checksumming program later.
(01:34:09 PM) liw: We see that there is some automated testing, but we can't remember how complete it is.
(01:34:47 PM) liw: (side note: the md5 module is going to be deprecated in future python versions, the hashlib module is the real module to use)
(01:34:53 PM) liw: In this example, it is obvious that it isn't very complete, but for a big program, it is not so obvious.
(01:34:59 PM) liw: coverage.py is a tool for measuring that.
(01:35:03 PM) liw: It is packaged in the python-coverage package.
(01:35:07 PM) liw: To use it, you run the test with it, like this:
(01:35:13 PM) liw:  liw@dorfl$ python -m coverage -x checksum_tests.py
(01:35:13 PM) liw:  ..
(01:35:13 PM) liw:  ----------------------------------------------------------------------
(01:35:15 PM) liw:  Ran 2 tests in 0.001s
(01:35:15 PM) liw:  
(01:35:17 PM) liw:  OK
(01:35:22 PM) liw: See, there is no change in the output.
(01:35:30 PM) liw: However, there is a new file, .coverage, which contains the coverage data.
(01:35:33 PM) liw: To get a report, run this:
(01:35:43 PM) liw:  liw@dorfl$ python -m coverage -r
(01:35:43 PM) liw:  Name                                         Stmts   Exec  Cover
(01:35:44 PM) liw:  ----------------------------------------------------------------
(01:35:44 PM) liw:  /usr/lib/python2.5/StringIO                    175     37    21%
(01:35:46 PM) liw:  /usr/lib/python2.5/atexit                       33      5    15%
(01:35:48 PM) liw:  /usr/lib/python2.5/getopt                      103      5     4%
(01:35:50 PM) liw:  /usr/lib/python2.5/hashlib                      55     15    27%
(01:35:52 PM) liw:  /usr/lib/python2.5/md5                           4      4   100%
(01:35:54 PM) liw:  /usr/lib/python2.5/posixpath                   219      6     2%
(01:35:56 PM) liw:  /usr/lib/python2.5/threading                   562      1     0%
(01:36:00 PM) liw:  /usr/lib/python2.5/unittest                    430    238    55%
(01:36:02 PM) liw:  /var/lib/python-support/python2.5/coverage     522      3     0%
(01:36:04 PM) liw:  <string>                                    <class '__main__.CoverageException'>: File '/home/liw/Canonical/udw-python-unittest-coverage-tutorial/<string>' not Python source.
(01:36:07 PM) liw:  checksum                                        20     13    65%
(01:36:09 PM) liw:  checksum_tests                                  14     14   100%
(01:36:11 PM) liw:  ----------------------------------------------------------------
(01:36:17 PM) liw:  TOTAL                                         2137    341    15%
(01:36:19 PM) liw: oops, that was long
(01:36:27 PM) liw: Stmts is the total number of statements in each module, Exec is how many we have executed, and Cover is how many percent of all statements we have covered
(01:36:34 PM) liw: This contains all the Python standard library stuff as well.
(01:36:42 PM) liw: We can exclude that:
(01:36:46 PM) liw: liw@dorfl$ python -m coverage -r -o /usr,/var
(01:36:58 PM) liw: (skipping long output)
(01:36:58 PM) liw: TOTAL               34     27    79%
(01:37:05 PM) liw: This shows that only 27 statements of a total of 34 are covered by the testing.
(01:37:12 PM) liw: The line with "class '__main__.CoverageException'>" is a bug in the hardy version of coverage.py, please ignore it.
(01:37:16 PM) liw: To get a list of the lines that are missing, add the -m option:
(01:37:26 PM) liw:  liw@dorfl$ python -m coverage -rm -o /usr,/var
(01:37:30 PM) liw:  Name             Stmts   Exec  Cover   Missing
(01:37:30 PM) liw:  ----------------------------------------------
(01:37:30 PM) liw:  <string>        <class '__main__.CoverageException'>: File '/home/liw/Canonical/udw-python-unittest-coverage-tutorial/<string>' not Python source.
(01:37:30 PM) liw:  checksum            20     13    65%   22-27, 31
(01:37:31 PM) liw:  checksum_tests      14     14   100%
(01:37:32 PM) liw:  ----------------------------------------------
(01:37:34 PM) liw:  TOTAL               34     27    79%
(01:37:37 PM) liw: We're missing lines 22-27 and 31 from checksum.py.
(01:37:46 PM) liw: That's the ChecksumApplication class (it's run method) and the main program.
(01:38:02 PM) liw: Now, if we wanted to, we could add more tests, and get 100% coverage.
(01:38:07 PM) liw: And that would be good.
(01:38:10 PM) liw: However, sometimes it is not worth it to write the tests.
(01:38:13 PM) liw: In that case, you can mark the code as being outside coverage testing.
(01:38:17 PM) liw: http://paste.ubuntu.com/43684/
(01:38:30 PM) liw: See the "#pragma: no cover" comments? That's the magic marker.
(01:38:34 PM) liw: We now have 100% statement coverage.
(01:38:46 PM) liw: Experience will tell you what things it's worthwhile to write tests for.
(01:38:50 PM) liw: A test that never fails for anyone is a waste of time.
(01:38:54 PM) liw: For the past year or so, I have tried to get to 100% statement coverage for all my new projects.
(01:38:59 PM) liw: It is sometimes a lot of work, but it gives me confidence when I'm making big changes: if tests pass, I am pretty sure the code still works as intended.
(01:39:03 PM) liw: However, that is by no means guaranteed: it's easy enough to write tests at 100% coverage without actually testing every aspect of the code, so that even though all tests pass, the code fails when used for real.
(01:39:14 PM) liw: That is unavoidable, but as you write more tests, you learn what things to test for.
(01:39:18 PM) liw: As an example, since coverage.py on tests _statement_ coverage, it does not check that all parts of a conditional or expression get tested:
(01:39:37 PM) liw: "if a or b or c" might get 100% statement coverage because a is true, but nothing is known about b and c.
(01:39:43 PM) liw: They might even be undefined variables.
(01:39:52 PM) liw: Then, when the code is run for real, you get an ugly exception.
(01:39:58 PM) liw: In this tutorial I've shown how it is like to write tests before the code.
(01:40:02 PM) liw: One of the results from this is that code written like this tends to be easier to test.
(01:40:07 PM) liw: Adding tests for code that has already been written often requires jumping through more hoops to get decent test coverage.
(01:40:35 PM) liw: <rick_h_> also check out figleaf http://darcs.idyll.org/~t/projects/figleaf/doc/
(01:40:35 PM) liw: <rick_h_> skips some coverage stuff in stdlib and such
(01:40:43 PM) liw: I didn't know about figleaf, cool. thanks rick_h_
(01:41:02 PM) liw: I've also only touched the very basics of both unittest and automated testing in general.
(01:41:05 PM) liw: For example, there are tools to make using coverage.py less work, and approaches to writing tests that make it easier to write good tests.
(01:41:24 PM) liw: For this session, they are too big topics, so I advise those interested in this to read up on xUnit, test driven development, and more.
(01:41:27 PM) liw: There's lots of material about this on the net.
(01:41:33 PM) liw: This finishes my monologue.
(01:41:36 PM) liw: Questions, anyone?
(01:42:05 PM) Myrtti: do you want them here or -chat?
(01:42:22 PM) liw: here is fine, unless it becomes chaos, in which case I'll say so
(01:45:19 PM) liw: while I continue to be astonished at having pre-answered every possible question, I'll note that I have heard good things about python-nose, but I haven't had time to look at it myself
(01:46:01 PM) liw: I wrote a test runner (the program to find and run tests) myself, since that was easy, but I hope to replace that with nose one of these days
(01:46:26 PM) Myrtti: < davfigue> QUESTION: do you have any advice or approach to simplify regression testing on python?
(01:47:26 PM) liw: davfigue, sorry, no; I try write some kind of automatic test for each bug fix (be it a unittest.TestCase method or something else), and then use that for regression testing
(01:47:27 PM) Myrtti: < tacone> QUESTION: which lib do you suggest for mockups ?
(01:48:00 PM) liw: I haven't studied libraries for mockups; mostly I have written small custom mockup classes
(01:48:17 PM) liw: (I am not the world's greatest expert on unit testing, as should now be clear :)
(01:49:44 PM) liw: I have wanted to find a mockup class for filesystem operations (much of the os module), both to more easily write tests and to speed things up
(01:49:49 PM) liw: but I haven't found anything yet
(01:50:31 PM) liw: <davfigue> QUESTION: do you know any other tool for gathering statistics on python tests ?
(01:50:58 PM) liw: nope, coverage.py is the only working one I've found; there was another one that I couldn't get to work, but I forgot its name
(01:53:03 PM) liw: <davfigue> QUESTION: would you point us to more resources on tdd for python ?
(01:53:04 PM) Myrtti: < davfigue> QUESTION: would you point us to more resources on tdd for python ?
(01:53:23 PM) liw: I don't have a handy list of Python specific TDD stuff, I*m afraid
(01:54:16 PM) liw: apart from the wikipedia page I pasted early one, http://c2.com/cgi/wiki?TestDrivenDevelopment might be a good place to start reading
(01:54:22 PM) liw: most stuff about TDD is language agnostic
(01:55:01 PM) liw: the c2 wiki (the _original_ wiki, unless I'm mistaken) is a pretty good resource for overview material on lots of software develompent stuff, actually
(01:55:43 PM) liw: <rick_h_> http://www.amazon.com/Test-Driven-Development-Addison-Wesley-Signature/dp/0321146530/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1220637200&sr=8-1
(01:55:43 PM) liw: <rick_h_> that book is half java and half python if I recall
(01:55:46 PM) liw: (for the record)
(01:56:25 PM) liw: <rick_h_> jason gave a talk at pycon using nose: http://us.pycon.org/2008/conference/schedule/event/79/
(01:56:34 PM) ***liw is learning more than his audience, at this rate :)
(01:57:27 PM) liw: ok, our hour is ending in a couple of minutes
(01:57:45 PM) Myrtti: thank you liw
(01:57:49 PM) liw: thank you for listening and participating
(01:58:11 PM) liw: if anyone wants to discuss these things further, I'll be around during the weekend and next week on irc, though not necessarily on these two channels

MeetingLogs/devweek0809/PyUnitTests (last edited 2008-09-06 00:15:34 by pool-68-238-87-204)