continuous integration for scientific software

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP

up vote
2
down vote

favorite

I’m no software engineer.
I’m a phd student in the field of geoscience.

Almost two years ago I started programming a scientific software. I never used continuous integration (CI), mainly because at first I didn’t know it exists and I was the only person working on this software.

Now since the base of the software is running other people start to get interested in it and want to contribute to the software. The plan is that other persons at other universities are implementing additions to the core software. (I’m scared they could introduce bugs). Additionally, the software got quite complex and became harder and harder to test and I also plan to continue working on it.

Because of this two reasons, I’m now more and more thinking about using CI.
Since I never had a software engineer education and nobody around me has ever heard about CI (we are scientists, no programmers) I find it hard to get started for my project.

I have a couple of questions where I would like to get some advice:

First of all a short explanation of how the software works:

  • The software is controlled by one .xml file containing all required settings. You start the software by simply passing the path to the .xml file as an input argument and it runs and creates a couple of files with the results. One single run can take ~ 30 seconds.

  • It is a scientific software. Almost all of the functions have multiple input parameters, whose types are mostly classes which are quite complex. I have multiple .txt files with big catalogs which are used to create instances of these classes.

Now let’s come to my questions:

  1. unit tests, integration tests, end-to-end tests?:
    My software is now around 30.000 lines of code with hundreds of functions and ~80 classes.
    It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented.
    So I thought about simply creating some test cases. Prepare 10-20 different .xml files and let the software run. I guess this is what is called end-to-end tests? I often read that you should not do this, but maybe it is ok as a start if you already have a working software? Or is it simply a dumb idea to try to add CI to an already working software.

  2. How do you write unit tests if the function parameters are difficult to create?
    assume I have a function double fun(vector<Class_A> a, vector<Class_B>) and usually, I would need to first read in multiple text files to create objects of type Class_Aand Class_B. I thought about creating some dummy functions like Class_A create_dummy_object() without reading in the text files. I also thought about implementing some kind of serialization. (I do not plan to test the creation of the class objects since they only depend on multiple text files)

  3. How to write tests if results are highly variable? My software makes use of big monte-carlo simulations and works iteratively. Usually, you have ~1000 iterations and at every iteration, you are creating ~500-20.000 instances of objects based on monte-carlo simulations. If only one result of one iteration is a bit different the whole upcoming iterations are completely different. How do you deal with this situation? I guess this a big point against end-to-end tests, since the end result is highly variable?

Any other advice with CI is highly appreciated.

share|improve this question

New contributor
user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

  • How to handle a question that asks many things
    – gnat
    1 hour ago

  • How do you know that your software is working correctly? Can you find a way to automate that check so you can run it on every change? That should be your first step when introducing CI to an existing project.
    – Bart van Ingen Schenau
    45 mins ago

up vote
2
down vote

favorite

I’m no software engineer.
I’m a phd student in the field of geoscience.

Almost two years ago I started programming a scientific software. I never used continuous integration (CI), mainly because at first I didn’t know it exists and I was the only person working on this software.

Now since the base of the software is running other people start to get interested in it and want to contribute to the software. The plan is that other persons at other universities are implementing additions to the core software. (I’m scared they could introduce bugs). Additionally, the software got quite complex and became harder and harder to test and I also plan to continue working on it.

Because of this two reasons, I’m now more and more thinking about using CI.
Since I never had a software engineer education and nobody around me has ever heard about CI (we are scientists, no programmers) I find it hard to get started for my project.

I have a couple of questions where I would like to get some advice:

First of all a short explanation of how the software works:

  • The software is controlled by one .xml file containing all required settings. You start the software by simply passing the path to the .xml file as an input argument and it runs and creates a couple of files with the results. One single run can take ~ 30 seconds.

  • It is a scientific software. Almost all of the functions have multiple input parameters, whose types are mostly classes which are quite complex. I have multiple .txt files with big catalogs which are used to create instances of these classes.

Now let’s come to my questions:

  1. unit tests, integration tests, end-to-end tests?:
    My software is now around 30.000 lines of code with hundreds of functions and ~80 classes.
    It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented.
    So I thought about simply creating some test cases. Prepare 10-20 different .xml files and let the software run. I guess this is what is called end-to-end tests? I often read that you should not do this, but maybe it is ok as a start if you already have a working software? Or is it simply a dumb idea to try to add CI to an already working software.

  2. How do you write unit tests if the function parameters are difficult to create?
    assume I have a function double fun(vector<Class_A> a, vector<Class_B>) and usually, I would need to first read in multiple text files to create objects of type Class_Aand Class_B. I thought about creating some dummy functions like Class_A create_dummy_object() without reading in the text files. I also thought about implementing some kind of serialization. (I do not plan to test the creation of the class objects since they only depend on multiple text files)

  3. How to write tests if results are highly variable? My software makes use of big monte-carlo simulations and works iteratively. Usually, you have ~1000 iterations and at every iteration, you are creating ~500-20.000 instances of objects based on monte-carlo simulations. If only one result of one iteration is a bit different the whole upcoming iterations are completely different. How do you deal with this situation? I guess this a big point against end-to-end tests, since the end result is highly variable?

Any other advice with CI is highly appreciated.

share|improve this question

New contributor
user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

  • How to handle a question that asks many things
    – gnat
    1 hour ago

  • How do you know that your software is working correctly? Can you find a way to automate that check so you can run it on every change? That should be your first step when introducing CI to an existing project.
    – Bart van Ingen Schenau
    45 mins ago

up vote
2
down vote

favorite

up vote
2
down vote

favorite

I’m no software engineer.
I’m a phd student in the field of geoscience.

Almost two years ago I started programming a scientific software. I never used continuous integration (CI), mainly because at first I didn’t know it exists and I was the only person working on this software.

Now since the base of the software is running other people start to get interested in it and want to contribute to the software. The plan is that other persons at other universities are implementing additions to the core software. (I’m scared they could introduce bugs). Additionally, the software got quite complex and became harder and harder to test and I also plan to continue working on it.

Because of this two reasons, I’m now more and more thinking about using CI.
Since I never had a software engineer education and nobody around me has ever heard about CI (we are scientists, no programmers) I find it hard to get started for my project.

I have a couple of questions where I would like to get some advice:

First of all a short explanation of how the software works:

  • The software is controlled by one .xml file containing all required settings. You start the software by simply passing the path to the .xml file as an input argument and it runs and creates a couple of files with the results. One single run can take ~ 30 seconds.

  • It is a scientific software. Almost all of the functions have multiple input parameters, whose types are mostly classes which are quite complex. I have multiple .txt files with big catalogs which are used to create instances of these classes.

Now let’s come to my questions:

  1. unit tests, integration tests, end-to-end tests?:
    My software is now around 30.000 lines of code with hundreds of functions and ~80 classes.
    It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented.
    So I thought about simply creating some test cases. Prepare 10-20 different .xml files and let the software run. I guess this is what is called end-to-end tests? I often read that you should not do this, but maybe it is ok as a start if you already have a working software? Or is it simply a dumb idea to try to add CI to an already working software.

  2. How do you write unit tests if the function parameters are difficult to create?
    assume I have a function double fun(vector<Class_A> a, vector<Class_B>) and usually, I would need to first read in multiple text files to create objects of type Class_Aand Class_B. I thought about creating some dummy functions like Class_A create_dummy_object() without reading in the text files. I also thought about implementing some kind of serialization. (I do not plan to test the creation of the class objects since they only depend on multiple text files)

  3. How to write tests if results are highly variable? My software makes use of big monte-carlo simulations and works iteratively. Usually, you have ~1000 iterations and at every iteration, you are creating ~500-20.000 instances of objects based on monte-carlo simulations. If only one result of one iteration is a bit different the whole upcoming iterations are completely different. How do you deal with this situation? I guess this a big point against end-to-end tests, since the end result is highly variable?

Any other advice with CI is highly appreciated.

share|improve this question

New contributor
user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

I’m no software engineer.
I’m a phd student in the field of geoscience.

Almost two years ago I started programming a scientific software. I never used continuous integration (CI), mainly because at first I didn’t know it exists and I was the only person working on this software.

Now since the base of the software is running other people start to get interested in it and want to contribute to the software. The plan is that other persons at other universities are implementing additions to the core software. (I’m scared they could introduce bugs). Additionally, the software got quite complex and became harder and harder to test and I also plan to continue working on it.

Because of this two reasons, I’m now more and more thinking about using CI.
Since I never had a software engineer education and nobody around me has ever heard about CI (we are scientists, no programmers) I find it hard to get started for my project.

I have a couple of questions where I would like to get some advice:

First of all a short explanation of how the software works:

  • The software is controlled by one .xml file containing all required settings. You start the software by simply passing the path to the .xml file as an input argument and it runs and creates a couple of files with the results. One single run can take ~ 30 seconds.

  • It is a scientific software. Almost all of the functions have multiple input parameters, whose types are mostly classes which are quite complex. I have multiple .txt files with big catalogs which are used to create instances of these classes.

Now let’s come to my questions:

  1. unit tests, integration tests, end-to-end tests?:
    My software is now around 30.000 lines of code with hundreds of functions and ~80 classes.
    It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented.
    So I thought about simply creating some test cases. Prepare 10-20 different .xml files and let the software run. I guess this is what is called end-to-end tests? I often read that you should not do this, but maybe it is ok as a start if you already have a working software? Or is it simply a dumb idea to try to add CI to an already working software.

  2. How do you write unit tests if the function parameters are difficult to create?
    assume I have a function double fun(vector<Class_A> a, vector<Class_B>) and usually, I would need to first read in multiple text files to create objects of type Class_Aand Class_B. I thought about creating some dummy functions like Class_A create_dummy_object() without reading in the text files. I also thought about implementing some kind of serialization. (I do not plan to test the creation of the class objects since they only depend on multiple text files)

  3. How to write tests if results are highly variable? My software makes use of big monte-carlo simulations and works iteratively. Usually, you have ~1000 iterations and at every iteration, you are creating ~500-20.000 instances of objects based on monte-carlo simulations. If only one result of one iteration is a bit different the whole upcoming iterations are completely different. How do you deal with this situation? I guess this a big point against end-to-end tests, since the end result is highly variable?

Any other advice with CI is highly appreciated.

object-oriented c++ testing continuous-integration

share|improve this question

New contributor
user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

share|improve this question

New contributor
user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

share|improve this question

share|improve this question

edited 1 hour ago

amon

78.3k19149234

78.3k19149234

New contributor
user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

asked 1 hour ago

user7431005

1142

1142

New contributor
user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

New contributor

user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

user7431005 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

  • How to handle a question that asks many things
    – gnat
    1 hour ago

  • How do you know that your software is working correctly? Can you find a way to automate that check so you can run it on every change? That should be your first step when introducing CI to an existing project.
    – Bart van Ingen Schenau
    45 mins ago

  • How to handle a question that asks many things
    – gnat
    1 hour ago

  • How do you know that your software is working correctly? Can you find a way to automate that check so you can run it on every change? That should be your first step when introducing CI to an existing project.
    – Bart van Ingen Schenau
    45 mins ago

How to handle a question that asks many things
– gnat
1 hour ago

How to handle a question that asks many things
– gnat
1 hour ago

How do you know that your software is working correctly? Can you find a way to automate that check so you can run it on every change? That should be your first step when introducing CI to an existing project.
– Bart van Ingen Schenau
45 mins ago

How do you know that your software is working correctly? Can you find a way to automate that check so you can run it on every change? That should be your first step when introducing CI to an existing project.
– Bart van Ingen Schenau
45 mins ago

3 Answers
3

active

oldest

votes

up vote
2
down vote

Testing scientific software is difficult, both because of the complex subject matter and because of typical scientific development processes (aka. hack it until it works, which doesn’t usually result in a testable design). This is a bit ironic considering that science should be reproducible. What changes compared to “normal” software is not whether tests are useful (yes!), but which kinds of test are appropriate.

Handling randomness: all runs of your software MUST be reproducible. If you use Monte Carlo techniques, you must make it possible to provide a specific seed for the random number generator.

  • It is easy to forget this e.g. when using C’s rand() function which depends on global state.
  • Ideally, a random number generator is passed as an explicit object through your functions. C++11’s random standard library header makes this a lot easier.
  • Instead of sharing random state across modules of the software, I’ve found it useful to create a second RNG which is seeded by a random number from the first RNG. Then, if the number of requests to the RNG by the other module changes, the sequence generated by the first RNG stays the same.

Integration tests are perfectly fine. They are good at verifying that different parts of your software play together correctly, and for running concrete scenarios.

  • As a minimum quality level “it doesn’t crash” can already be a good test result.
  • For stronger results, you will also have to check the results against some baseline. However, these checks will have to be somewhat tolerant, e.g. account for rounding errors. It can also be helpful to compare summary statistics instead of full data rows.
  • If checking against a baseline would be too fragile, check that the outputs are valid and satisfy some general properties. These can be general (“selected locations must be at least 2km apart”) or scenario-specific, e.g. “a selected location must be within this area”.

When running integration tests, it is a good idea to write a test runner as a separate program or script. This test runner performs necessary setup, runs the executable to be tested, checks any results, and cleans up afterwards.

Unit test style checks can be quite difficult to insert into scientific software because the software has not been designed for that. In particular, unit tests get difficult when the system under test has many external dependencies/interactions. If the software is not purely object-oriented, it is not generally possible to mock/stub those dependencies. I’ve found it best to largely avoid unit tests for such software, except for pure math functions and utility functions.

Even a few tests are better than no tests. Combined with the check “it has to compile” that’s already a good start into continuous integration. You can always come back and add more tests later. You can then prioritize areas of the code that are more likely to break, e.g. because they get more development activity. To see which parts of your code are not covered by unit tests, you can use code coverage tools.

Manual testing:
Especially for complex problem domains, you will not be able to test everything automatically. E.g. I’m currently working on a stochastic search problem. If I test that my software always produces the same result, I can’t improve it without breaking the tests. Instead, I’ve made it easier to do manual tests: I run the software with a fixed seed and get a visualization of the result (depending on your preferences, R, Python/Pyplot, and Matlab all make it easy to get high-quality visualizations of your data sets). I can use this visualization to verify that things did not go terribly wrong. Similarly, tracing the progress of your software via logging output can be a viable manual testing technique, at least if I can select the type of events to be logged.

share|improve this answer

    up vote
    1
    down vote

    1. It is never a dumb idea to add CI. From experience I know this is the way to go when you have an open source project where people are free to contribute. CI allows you to stop people from adding or changing code if the code breaks your program, so it is almost invaluable in having a working codebase.

      When considering tests, you can certainly provide some end-to-end tests (i think it is a subcategory of integration tests) to be sure that your code flow is working the way it should. You should provide at least some basic unit tests to make sure that functions output the right values, as part of the integration tests can compensate for other errors made during the test.

    2. Test object creation is something quite difficult and laboursome indeed. You are right in wanting to make dummy objects. These objects should have some default, but edge case, values for which you certainly know what the output should be.

    3. The problem with books about this subject is that the landscape of CI (and other parts of devops) evolves so quickly anything in a book is probably going to be out of date a few months later. I do not know of any books that could help you, but Google should, as always, be your saviour.

    4. You should run your tests yourself multiple times and do statistical analysis. That way you can implement some test cases where you take the median/average of multiple runs and compare it to your analysis, as to know what values are correct.

    Some tips:

    • Use the integration of CI tools in your GIT platform to stop broken code from entering your codebase.
    • stop merging of code before peer-review was done by other developers. This makes errors more easily known and again stops broken code from entering your codebase.
    share|improve this answer

    New contributor
    AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.

      up vote
      1
      down vote

      1. Types of test

        • It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented

          Think of it the other way round: if a patch touching several functions breaks one of your end-to-end tests, how are you going to figure out which one is the problem?

          It’s much easier to write unit tests for individual functions than for the whole program. It’s much easier to be sure you have good coverage of an individual function. It’s much easier to refactor a function when you’re sure the unit tests will catch any corner cases you broke.

          Writing unit tests for already-existing functions is perfectly normal for anyone who has worked on a legacy codebase. They’re a good way to confirm your understanding of the functions in the first place and, once written, they’re a good way to find unexpected changes of behaviour.

        • End-to-end tests are also worthwhile. If they’re easier to write, by all means do those first and add unit tests ad-hoc to cover the functions you’re most concerned about others breaking. You don’t have to do it all at once.

        • Yes, adding CI to existing software is sensible, and normal.

      2. How to write unit tests

        If your objects are really expensive and/or complex, write mocks. You can just link the tests using mocks separately from the tests using real objects, instead of using polymorphism.

        You should anyway have some easy way of creating instances – a function to create dummy instances is common – but having tests for the real creation process is also sensible.

      3. Variable results

        You must have some invariants for the result. Test those, rather than a single numerical value.

        You could provide a mock pseudorandom number generator if your monte carlo code accepts it as a parameter, which would make the results predictable at least for a well-known algorithm, but it’s brittle unless it literally returns the same number every time.

      share|improve this answer

        Your Answer

        StackExchange.ready(function() {
        var channelOptions = {
        tags: “”.split(” “),
        id: “131”
        };
        initTagRenderer(“”.split(” “), “”.split(” “), channelOptions);

        StackExchange.using(“externalEditor”, function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using(“snippets”, function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: ‘answer’,
        convertImagesToLinks: false,
        noModals: false,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: “”,
        onDemand: true,
        discardSelector: “.discard-answer”
        ,immediatelyShowMarkdownHelp:true
        });

        }
        });

        user7431005 is a new contributor. Be nice, and check out our Code of Conduct.

         
        draft saved
        draft discarded

        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin(‘.new-post-login’, ‘https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f379996%2fcontinuous-integration-for-scientific-software%23new-answer’, ‘question_page’);
        }
        );

        Post as a guest

        3 Answers
        3

        active

        oldest

        votes

        3 Answers
        3

        active

        oldest

        votes

        active

        oldest

        votes

        active

        oldest

        votes

        up vote
        2
        down vote

        Testing scientific software is difficult, both because of the complex subject matter and because of typical scientific development processes (aka. hack it until it works, which doesn’t usually result in a testable design). This is a bit ironic considering that science should be reproducible. What changes compared to “normal” software is not whether tests are useful (yes!), but which kinds of test are appropriate.

        Handling randomness: all runs of your software MUST be reproducible. If you use Monte Carlo techniques, you must make it possible to provide a specific seed for the random number generator.

        • It is easy to forget this e.g. when using C’s rand() function which depends on global state.
        • Ideally, a random number generator is passed as an explicit object through your functions. C++11’s random standard library header makes this a lot easier.
        • Instead of sharing random state across modules of the software, I’ve found it useful to create a second RNG which is seeded by a random number from the first RNG. Then, if the number of requests to the RNG by the other module changes, the sequence generated by the first RNG stays the same.

        Integration tests are perfectly fine. They are good at verifying that different parts of your software play together correctly, and for running concrete scenarios.

        • As a minimum quality level “it doesn’t crash” can already be a good test result.
        • For stronger results, you will also have to check the results against some baseline. However, these checks will have to be somewhat tolerant, e.g. account for rounding errors. It can also be helpful to compare summary statistics instead of full data rows.
        • If checking against a baseline would be too fragile, check that the outputs are valid and satisfy some general properties. These can be general (“selected locations must be at least 2km apart”) or scenario-specific, e.g. “a selected location must be within this area”.

        When running integration tests, it is a good idea to write a test runner as a separate program or script. This test runner performs necessary setup, runs the executable to be tested, checks any results, and cleans up afterwards.

        Unit test style checks can be quite difficult to insert into scientific software because the software has not been designed for that. In particular, unit tests get difficult when the system under test has many external dependencies/interactions. If the software is not purely object-oriented, it is not generally possible to mock/stub those dependencies. I’ve found it best to largely avoid unit tests for such software, except for pure math functions and utility functions.

        Even a few tests are better than no tests. Combined with the check “it has to compile” that’s already a good start into continuous integration. You can always come back and add more tests later. You can then prioritize areas of the code that are more likely to break, e.g. because they get more development activity. To see which parts of your code are not covered by unit tests, you can use code coverage tools.

        Manual testing:
        Especially for complex problem domains, you will not be able to test everything automatically. E.g. I’m currently working on a stochastic search problem. If I test that my software always produces the same result, I can’t improve it without breaking the tests. Instead, I’ve made it easier to do manual tests: I run the software with a fixed seed and get a visualization of the result (depending on your preferences, R, Python/Pyplot, and Matlab all make it easy to get high-quality visualizations of your data sets). I can use this visualization to verify that things did not go terribly wrong. Similarly, tracing the progress of your software via logging output can be a viable manual testing technique, at least if I can select the type of events to be logged.

        share|improve this answer

          up vote
          2
          down vote

          Testing scientific software is difficult, both because of the complex subject matter and because of typical scientific development processes (aka. hack it until it works, which doesn’t usually result in a testable design). This is a bit ironic considering that science should be reproducible. What changes compared to “normal” software is not whether tests are useful (yes!), but which kinds of test are appropriate.

          Handling randomness: all runs of your software MUST be reproducible. If you use Monte Carlo techniques, you must make it possible to provide a specific seed for the random number generator.

          • It is easy to forget this e.g. when using C’s rand() function which depends on global state.
          • Ideally, a random number generator is passed as an explicit object through your functions. C++11’s random standard library header makes this a lot easier.
          • Instead of sharing random state across modules of the software, I’ve found it useful to create a second RNG which is seeded by a random number from the first RNG. Then, if the number of requests to the RNG by the other module changes, the sequence generated by the first RNG stays the same.

          Integration tests are perfectly fine. They are good at verifying that different parts of your software play together correctly, and for running concrete scenarios.

          • As a minimum quality level “it doesn’t crash” can already be a good test result.
          • For stronger results, you will also have to check the results against some baseline. However, these checks will have to be somewhat tolerant, e.g. account for rounding errors. It can also be helpful to compare summary statistics instead of full data rows.
          • If checking against a baseline would be too fragile, check that the outputs are valid and satisfy some general properties. These can be general (“selected locations must be at least 2km apart”) or scenario-specific, e.g. “a selected location must be within this area”.

          When running integration tests, it is a good idea to write a test runner as a separate program or script. This test runner performs necessary setup, runs the executable to be tested, checks any results, and cleans up afterwards.

          Unit test style checks can be quite difficult to insert into scientific software because the software has not been designed for that. In particular, unit tests get difficult when the system under test has many external dependencies/interactions. If the software is not purely object-oriented, it is not generally possible to mock/stub those dependencies. I’ve found it best to largely avoid unit tests for such software, except for pure math functions and utility functions.

          Even a few tests are better than no tests. Combined with the check “it has to compile” that’s already a good start into continuous integration. You can always come back and add more tests later. You can then prioritize areas of the code that are more likely to break, e.g. because they get more development activity. To see which parts of your code are not covered by unit tests, you can use code coverage tools.

          Manual testing:
          Especially for complex problem domains, you will not be able to test everything automatically. E.g. I’m currently working on a stochastic search problem. If I test that my software always produces the same result, I can’t improve it without breaking the tests. Instead, I’ve made it easier to do manual tests: I run the software with a fixed seed and get a visualization of the result (depending on your preferences, R, Python/Pyplot, and Matlab all make it easy to get high-quality visualizations of your data sets). I can use this visualization to verify that things did not go terribly wrong. Similarly, tracing the progress of your software via logging output can be a viable manual testing technique, at least if I can select the type of events to be logged.

          share|improve this answer

            up vote
            2
            down vote

            up vote
            2
            down vote

            Testing scientific software is difficult, both because of the complex subject matter and because of typical scientific development processes (aka. hack it until it works, which doesn’t usually result in a testable design). This is a bit ironic considering that science should be reproducible. What changes compared to “normal” software is not whether tests are useful (yes!), but which kinds of test are appropriate.

            Handling randomness: all runs of your software MUST be reproducible. If you use Monte Carlo techniques, you must make it possible to provide a specific seed for the random number generator.

            • It is easy to forget this e.g. when using C’s rand() function which depends on global state.
            • Ideally, a random number generator is passed as an explicit object through your functions. C++11’s random standard library header makes this a lot easier.
            • Instead of sharing random state across modules of the software, I’ve found it useful to create a second RNG which is seeded by a random number from the first RNG. Then, if the number of requests to the RNG by the other module changes, the sequence generated by the first RNG stays the same.

            Integration tests are perfectly fine. They are good at verifying that different parts of your software play together correctly, and for running concrete scenarios.

            • As a minimum quality level “it doesn’t crash” can already be a good test result.
            • For stronger results, you will also have to check the results against some baseline. However, these checks will have to be somewhat tolerant, e.g. account for rounding errors. It can also be helpful to compare summary statistics instead of full data rows.
            • If checking against a baseline would be too fragile, check that the outputs are valid and satisfy some general properties. These can be general (“selected locations must be at least 2km apart”) or scenario-specific, e.g. “a selected location must be within this area”.

            When running integration tests, it is a good idea to write a test runner as a separate program or script. This test runner performs necessary setup, runs the executable to be tested, checks any results, and cleans up afterwards.

            Unit test style checks can be quite difficult to insert into scientific software because the software has not been designed for that. In particular, unit tests get difficult when the system under test has many external dependencies/interactions. If the software is not purely object-oriented, it is not generally possible to mock/stub those dependencies. I’ve found it best to largely avoid unit tests for such software, except for pure math functions and utility functions.

            Even a few tests are better than no tests. Combined with the check “it has to compile” that’s already a good start into continuous integration. You can always come back and add more tests later. You can then prioritize areas of the code that are more likely to break, e.g. because they get more development activity. To see which parts of your code are not covered by unit tests, you can use code coverage tools.

            Manual testing:
            Especially for complex problem domains, you will not be able to test everything automatically. E.g. I’m currently working on a stochastic search problem. If I test that my software always produces the same result, I can’t improve it without breaking the tests. Instead, I’ve made it easier to do manual tests: I run the software with a fixed seed and get a visualization of the result (depending on your preferences, R, Python/Pyplot, and Matlab all make it easy to get high-quality visualizations of your data sets). I can use this visualization to verify that things did not go terribly wrong. Similarly, tracing the progress of your software via logging output can be a viable manual testing technique, at least if I can select the type of events to be logged.

            share|improve this answer

            Testing scientific software is difficult, both because of the complex subject matter and because of typical scientific development processes (aka. hack it until it works, which doesn’t usually result in a testable design). This is a bit ironic considering that science should be reproducible. What changes compared to “normal” software is not whether tests are useful (yes!), but which kinds of test are appropriate.

            Handling randomness: all runs of your software MUST be reproducible. If you use Monte Carlo techniques, you must make it possible to provide a specific seed for the random number generator.

            • It is easy to forget this e.g. when using C’s rand() function which depends on global state.
            • Ideally, a random number generator is passed as an explicit object through your functions. C++11’s random standard library header makes this a lot easier.
            • Instead of sharing random state across modules of the software, I’ve found it useful to create a second RNG which is seeded by a random number from the first RNG. Then, if the number of requests to the RNG by the other module changes, the sequence generated by the first RNG stays the same.

            Integration tests are perfectly fine. They are good at verifying that different parts of your software play together correctly, and for running concrete scenarios.

            • As a minimum quality level “it doesn’t crash” can already be a good test result.
            • For stronger results, you will also have to check the results against some baseline. However, these checks will have to be somewhat tolerant, e.g. account for rounding errors. It can also be helpful to compare summary statistics instead of full data rows.
            • If checking against a baseline would be too fragile, check that the outputs are valid and satisfy some general properties. These can be general (“selected locations must be at least 2km apart”) or scenario-specific, e.g. “a selected location must be within this area”.

            When running integration tests, it is a good idea to write a test runner as a separate program or script. This test runner performs necessary setup, runs the executable to be tested, checks any results, and cleans up afterwards.

            Unit test style checks can be quite difficult to insert into scientific software because the software has not been designed for that. In particular, unit tests get difficult when the system under test has many external dependencies/interactions. If the software is not purely object-oriented, it is not generally possible to mock/stub those dependencies. I’ve found it best to largely avoid unit tests for such software, except for pure math functions and utility functions.

            Even a few tests are better than no tests. Combined with the check “it has to compile” that’s already a good start into continuous integration. You can always come back and add more tests later. You can then prioritize areas of the code that are more likely to break, e.g. because they get more development activity. To see which parts of your code are not covered by unit tests, you can use code coverage tools.

            Manual testing:
            Especially for complex problem domains, you will not be able to test everything automatically. E.g. I’m currently working on a stochastic search problem. If I test that my software always produces the same result, I can’t improve it without breaking the tests. Instead, I’ve made it easier to do manual tests: I run the software with a fixed seed and get a visualization of the result (depending on your preferences, R, Python/Pyplot, and Matlab all make it easy to get high-quality visualizations of your data sets). I can use this visualization to verify that things did not go terribly wrong. Similarly, tracing the progress of your software via logging output can be a viable manual testing technique, at least if I can select the type of events to be logged.

            share|improve this answer

            share|improve this answer

            share|improve this answer

            answered 32 mins ago

            amon

            78.3k19149234

            78.3k19149234

                up vote
                1
                down vote

                1. It is never a dumb idea to add CI. From experience I know this is the way to go when you have an open source project where people are free to contribute. CI allows you to stop people from adding or changing code if the code breaks your program, so it is almost invaluable in having a working codebase.

                  When considering tests, you can certainly provide some end-to-end tests (i think it is a subcategory of integration tests) to be sure that your code flow is working the way it should. You should provide at least some basic unit tests to make sure that functions output the right values, as part of the integration tests can compensate for other errors made during the test.

                2. Test object creation is something quite difficult and laboursome indeed. You are right in wanting to make dummy objects. These objects should have some default, but edge case, values for which you certainly know what the output should be.

                3. The problem with books about this subject is that the landscape of CI (and other parts of devops) evolves so quickly anything in a book is probably going to be out of date a few months later. I do not know of any books that could help you, but Google should, as always, be your saviour.

                4. You should run your tests yourself multiple times and do statistical analysis. That way you can implement some test cases where you take the median/average of multiple runs and compare it to your analysis, as to know what values are correct.

                Some tips:

                • Use the integration of CI tools in your GIT platform to stop broken code from entering your codebase.
                • stop merging of code before peer-review was done by other developers. This makes errors more easily known and again stops broken code from entering your codebase.
                share|improve this answer

                New contributor
                AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.

                  up vote
                  1
                  down vote

                  1. It is never a dumb idea to add CI. From experience I know this is the way to go when you have an open source project where people are free to contribute. CI allows you to stop people from adding or changing code if the code breaks your program, so it is almost invaluable in having a working codebase.

                    When considering tests, you can certainly provide some end-to-end tests (i think it is a subcategory of integration tests) to be sure that your code flow is working the way it should. You should provide at least some basic unit tests to make sure that functions output the right values, as part of the integration tests can compensate for other errors made during the test.

                  2. Test object creation is something quite difficult and laboursome indeed. You are right in wanting to make dummy objects. These objects should have some default, but edge case, values for which you certainly know what the output should be.

                  3. The problem with books about this subject is that the landscape of CI (and other parts of devops) evolves so quickly anything in a book is probably going to be out of date a few months later. I do not know of any books that could help you, but Google should, as always, be your saviour.

                  4. You should run your tests yourself multiple times and do statistical analysis. That way you can implement some test cases where you take the median/average of multiple runs and compare it to your analysis, as to know what values are correct.

                  Some tips:

                  • Use the integration of CI tools in your GIT platform to stop broken code from entering your codebase.
                  • stop merging of code before peer-review was done by other developers. This makes errors more easily known and again stops broken code from entering your codebase.
                  share|improve this answer

                  New contributor
                  AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.

                    up vote
                    1
                    down vote

                    up vote
                    1
                    down vote

                    1. It is never a dumb idea to add CI. From experience I know this is the way to go when you have an open source project where people are free to contribute. CI allows you to stop people from adding or changing code if the code breaks your program, so it is almost invaluable in having a working codebase.

                      When considering tests, you can certainly provide some end-to-end tests (i think it is a subcategory of integration tests) to be sure that your code flow is working the way it should. You should provide at least some basic unit tests to make sure that functions output the right values, as part of the integration tests can compensate for other errors made during the test.

                    2. Test object creation is something quite difficult and laboursome indeed. You are right in wanting to make dummy objects. These objects should have some default, but edge case, values for which you certainly know what the output should be.

                    3. The problem with books about this subject is that the landscape of CI (and other parts of devops) evolves so quickly anything in a book is probably going to be out of date a few months later. I do not know of any books that could help you, but Google should, as always, be your saviour.

                    4. You should run your tests yourself multiple times and do statistical analysis. That way you can implement some test cases where you take the median/average of multiple runs and compare it to your analysis, as to know what values are correct.

                    Some tips:

                    • Use the integration of CI tools in your GIT platform to stop broken code from entering your codebase.
                    • stop merging of code before peer-review was done by other developers. This makes errors more easily known and again stops broken code from entering your codebase.
                    share|improve this answer

                    New contributor
                    AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.

                    1. It is never a dumb idea to add CI. From experience I know this is the way to go when you have an open source project where people are free to contribute. CI allows you to stop people from adding or changing code if the code breaks your program, so it is almost invaluable in having a working codebase.

                      When considering tests, you can certainly provide some end-to-end tests (i think it is a subcategory of integration tests) to be sure that your code flow is working the way it should. You should provide at least some basic unit tests to make sure that functions output the right values, as part of the integration tests can compensate for other errors made during the test.

                    2. Test object creation is something quite difficult and laboursome indeed. You are right in wanting to make dummy objects. These objects should have some default, but edge case, values for which you certainly know what the output should be.

                    3. The problem with books about this subject is that the landscape of CI (and other parts of devops) evolves so quickly anything in a book is probably going to be out of date a few months later. I do not know of any books that could help you, but Google should, as always, be your saviour.

                    4. You should run your tests yourself multiple times and do statistical analysis. That way you can implement some test cases where you take the median/average of multiple runs and compare it to your analysis, as to know what values are correct.

                    Some tips:

                    • Use the integration of CI tools in your GIT platform to stop broken code from entering your codebase.
                    • stop merging of code before peer-review was done by other developers. This makes errors more easily known and again stops broken code from entering your codebase.
                    share|improve this answer

                    New contributor
                    AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.

                    share|improve this answer

                    share|improve this answer

                    New contributor
                    AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.

                    answered 1 hour ago

                    AWhitePelican

                    191

                    191

                    New contributor
                    AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.

                    New contributor

                    AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.

                    AWhitePelican is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.

                        up vote
                        1
                        down vote

                        1. Types of test

                          • It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented

                            Think of it the other way round: if a patch touching several functions breaks one of your end-to-end tests, how are you going to figure out which one is the problem?

                            It’s much easier to write unit tests for individual functions than for the whole program. It’s much easier to be sure you have good coverage of an individual function. It’s much easier to refactor a function when you’re sure the unit tests will catch any corner cases you broke.

                            Writing unit tests for already-existing functions is perfectly normal for anyone who has worked on a legacy codebase. They’re a good way to confirm your understanding of the functions in the first place and, once written, they’re a good way to find unexpected changes of behaviour.

                          • End-to-end tests are also worthwhile. If they’re easier to write, by all means do those first and add unit tests ad-hoc to cover the functions you’re most concerned about others breaking. You don’t have to do it all at once.

                          • Yes, adding CI to existing software is sensible, and normal.

                        2. How to write unit tests

                          If your objects are really expensive and/or complex, write mocks. You can just link the tests using mocks separately from the tests using real objects, instead of using polymorphism.

                          You should anyway have some easy way of creating instances – a function to create dummy instances is common – but having tests for the real creation process is also sensible.

                        3. Variable results

                          You must have some invariants for the result. Test those, rather than a single numerical value.

                          You could provide a mock pseudorandom number generator if your monte carlo code accepts it as a parameter, which would make the results predictable at least for a well-known algorithm, but it’s brittle unless it literally returns the same number every time.

                        share|improve this answer

                          up vote
                          1
                          down vote

                          1. Types of test

                            • It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented

                              Think of it the other way round: if a patch touching several functions breaks one of your end-to-end tests, how are you going to figure out which one is the problem?

                              It’s much easier to write unit tests for individual functions than for the whole program. It’s much easier to be sure you have good coverage of an individual function. It’s much easier to refactor a function when you’re sure the unit tests will catch any corner cases you broke.

                              Writing unit tests for already-existing functions is perfectly normal for anyone who has worked on a legacy codebase. They’re a good way to confirm your understanding of the functions in the first place and, once written, they’re a good way to find unexpected changes of behaviour.

                            • End-to-end tests are also worthwhile. If they’re easier to write, by all means do those first and add unit tests ad-hoc to cover the functions you’re most concerned about others breaking. You don’t have to do it all at once.

                            • Yes, adding CI to existing software is sensible, and normal.

                          2. How to write unit tests

                            If your objects are really expensive and/or complex, write mocks. You can just link the tests using mocks separately from the tests using real objects, instead of using polymorphism.

                            You should anyway have some easy way of creating instances – a function to create dummy instances is common – but having tests for the real creation process is also sensible.

                          3. Variable results

                            You must have some invariants for the result. Test those, rather than a single numerical value.

                            You could provide a mock pseudorandom number generator if your monte carlo code accepts it as a parameter, which would make the results predictable at least for a well-known algorithm, but it’s brittle unless it literally returns the same number every time.

                          share|improve this answer

                            up vote
                            1
                            down vote

                            up vote
                            1
                            down vote

                            1. Types of test

                              • It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented

                                Think of it the other way round: if a patch touching several functions breaks one of your end-to-end tests, how are you going to figure out which one is the problem?

                                It’s much easier to write unit tests for individual functions than for the whole program. It’s much easier to be sure you have good coverage of an individual function. It’s much easier to refactor a function when you’re sure the unit tests will catch any corner cases you broke.

                                Writing unit tests for already-existing functions is perfectly normal for anyone who has worked on a legacy codebase. They’re a good way to confirm your understanding of the functions in the first place and, once written, they’re a good way to find unexpected changes of behaviour.

                              • End-to-end tests are also worthwhile. If they’re easier to write, by all means do those first and add unit tests ad-hoc to cover the functions you’re most concerned about others breaking. You don’t have to do it all at once.

                              • Yes, adding CI to existing software is sensible, and normal.

                            2. How to write unit tests

                              If your objects are really expensive and/or complex, write mocks. You can just link the tests using mocks separately from the tests using real objects, instead of using polymorphism.

                              You should anyway have some easy way of creating instances – a function to create dummy instances is common – but having tests for the real creation process is also sensible.

                            3. Variable results

                              You must have some invariants for the result. Test those, rather than a single numerical value.

                              You could provide a mock pseudorandom number generator if your monte carlo code accepts it as a parameter, which would make the results predictable at least for a well-known algorithm, but it’s brittle unless it literally returns the same number every time.

                            share|improve this answer

                            1. Types of test

                              • It feels kind of strange to me to start writing unit tests for hundreds of functions which are already implemented

                                Think of it the other way round: if a patch touching several functions breaks one of your end-to-end tests, how are you going to figure out which one is the problem?

                                It’s much easier to write unit tests for individual functions than for the whole program. It’s much easier to be sure you have good coverage of an individual function. It’s much easier to refactor a function when you’re sure the unit tests will catch any corner cases you broke.

                                Writing unit tests for already-existing functions is perfectly normal for anyone who has worked on a legacy codebase. They’re a good way to confirm your understanding of the functions in the first place and, once written, they’re a good way to find unexpected changes of behaviour.

                              • End-to-end tests are also worthwhile. If they’re easier to write, by all means do those first and add unit tests ad-hoc to cover the functions you’re most concerned about others breaking. You don’t have to do it all at once.

                              • Yes, adding CI to existing software is sensible, and normal.

                            2. How to write unit tests

                              If your objects are really expensive and/or complex, write mocks. You can just link the tests using mocks separately from the tests using real objects, instead of using polymorphism.

                              You should anyway have some easy way of creating instances – a function to create dummy instances is common – but having tests for the real creation process is also sensible.

                            3. Variable results

                              You must have some invariants for the result. Test those, rather than a single numerical value.

                              You could provide a mock pseudorandom number generator if your monte carlo code accepts it as a parameter, which would make the results predictable at least for a well-known algorithm, but it’s brittle unless it literally returns the same number every time.

                            share|improve this answer

                            share|improve this answer

                            share|improve this answer

                            answered 1 hour ago

                            Useless

                            8,33921633

                            8,33921633

                                user7431005 is a new contributor. Be nice, and check out our Code of Conduct.

                                 
                                draft saved
                                draft discarded
                                user7431005 is a new contributor. Be nice, and check out our Code of Conduct.

                                user7431005 is a new contributor. Be nice, and check out our Code of Conduct.

                                user7431005 is a new contributor. Be nice, and check out our Code of Conduct.

                                 

                                draft saved

                                draft discarded

                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin(‘.new-post-login’, ‘https%3a%2f%2fsoftwareengineering.stackexchange.com%2fquestions%2f379996%2fcontinuous-integration-for-scientific-software%23new-answer’, ‘question_page’);
                                }
                                );

                                Post as a guest

                                Related Post

                                Leave a Reply

                                Your email address will not be published. Required fields are marked *