Testing in Software Engineering
In larger enterprise projects, a full suite of testing methodologies is necessary to assure the software:
- Works as expected through all permutations of inputs and use cases.
- Continues to work as expected despite changes to the codebase, especially when developers work on areas of the codebase they’re not familiar with.
This common suite of testing methodologies includes:
- Functional Testing
- Unit Testing
- Integration Testing
- End-to-End (aka System) Testing
- Manual Testing
- Performance Testing
- Security Testing
Many of these testing methodologies branch into sub-categories. We’ll touch on the most common, but it’s important to note that not all are required at the beginning of a project or start-up. It’s only at the highest scales of a project where most testing methodologies become necessary (depending on the component). Furthermore, terminology is often used in different ways by different people. Some people refer to End-to-End Testing as Functional Testing, however, thinking of this term from first principles, all of: unit, integration, end-to-end, and manual testing are responsible for testing the functioning of an application. They just test it at different levels of scope (in other words: different layers of the application or system).
Furthermore, it’s difficult to encapsulate all forms of testing in a single blog post, because for every edge case in the behavior of any software application, a tailored test could be created. There’s also a tremendous difference of opinion between the exact categorization of any test as many test methodologies overlap one another.
Table of Contents
Functional tests encompass a suite of tests at various scopes of an application. At the lowest level, unit tests test components within an application. Integration tests are typically used in distributed systems to test a single service as a whole (as a black box). At the highest level, End-to-End and Manual tests test a system as a whole (including all the services that comprise it).
Unit testing occurs within a single application, testing its public functions as separate modular components. When a project is first starting, unit testing is typically the first form of automated testing that’s introduced to an application (after any prototyping). A decent overview of unit testing can be found in this Medium post: Unit Testing Best Practices: 9 to Ensure You Do It Right. For a deeper look at unit testing, I would suggest reading tutorials of your unit testing framework of choice for your language of choice.
Integration testing tests an application as a whole (as a black box), interacting with its network APIs or other method of consumption/production. They could be implemented in two ways:
Pre-merge integration tests run alongside unit tests before a code change is committed to its code repository. This means that the entire test suite runs in-memory and the application likely runs as a standalone binary (executable/runnable file), rather than being deployed onto a virtual machine (although the latter isn’t impossible). If there are network calls to be made, those network calls are made through a machine’s loopback mechanism via localhost.
Post-merge integration tests would provide more comprehensive testing, especially if the application is deployed onto a machine or VM that is specially-configured. However, because they would run after an application is merged and deployed into pre-prod environments, the feedback loop for a software developer to iterate in the case of a test failure would be longer, slowing down overall development for all software developers in the team/company.
It’s important to note that there’s usually nothing stopping you from using a Unit Testing framework to run Integration Tests. Unit Testing frameworks are just test runners to run and record test outcomes. They might also include assertion and/or mocking libraries.
End-to-End (E2E) testing depends on the system being tested:
- Does the system have a User Interface?
- Does the system have network APIs that are accessed by clients other than the User Interface?
If the system does have a User Interface, then a testing framework that simulates a browser or mobile phone is necessary once the project becomes too big for manual testers to test every change. These User Interface End-to-End tests can double as compatibility tests for different browsers, mobile phones, etc.
If a system has exposed APIs that are consumed by clients other than the User Interface, then those endpoints can be tested the same way as post-merge integration tests. It’s conceivable that even a large distributed system could be tested pre-merge just like the Pre-merge Integration Test mentioned above, although for larger enterprise projects, building and starting such a distributed application, could take too long, slowing down the development process.
Manual testing occurs throughout the development process, informally by developers as they iterate their development, and formally by QA Testers once an application is in pre-production environments. But manual testing also occurs by product managers, and other stakeholders.
Unofficially, manual testing can be broken down into the following categories:
- Acceptance testing: Mostly targeted testing of changes before accepting them for release.
- Exploratory testing: An unscripted testing method that allows the tester to explore the application and find defects. These can sometimes be organized into “Bug Bash” sessions where multiple people explore different parts of an application and record any defects in a common location.
- Usability testing: Similar to Acceptance testing. More focus on incremental improvements towards ease of use.
- Accessibility testing: Assuring support for users with disabilities.
- Localization testing: Assuring support for different languages and regional settings.
- Beta testing: limited public release before releasing the product to the entire public.
Before understanding Performance Testing, it’s important to understand Throughput vs Latency. One could argue that testing how an application performs in a specific use case, such as a failover scenario constitutes a performance test, however, performance tests are usually be broken down into the two categories of Throughput and Latency.
Several performance tests exist with differing shapes of throughput for different purposes:
This is a broad term that could arguably encompass all throughput-oriented tests. The throughput could have any shape and still be considered a load test.
A stress test is a term used to describe a test that tests a system under extreme or abnormal conditions. Abnormal condition could mean maximum load before breaking, but it could also be a sudden spike (see Spike Test) during hours when little load is expected. It could also mean a specific type of traffic that happens to create an especially high burden on a specific part of the system.
A spike test simulates a sudden, unexpected spike in the number of users accessing a system. This type of test is unique because distributed systems typically have two mechanisms for handling higher user traffic:
- Scheduled scaling of up and down of machines to match historical traffic patterns.
- Reactive resource-based auto-scaling (resource could be CPU, memory, message queue, or similar).
Reactive auto-scaling is typically too slow to handle large spikes in traffic during off-hours if the request path is entirely synchronous. An asynchronous system would have a better chance of handling such spikes, given that additional load can be queued in message queues.
The intent of a smoke test is to assure basic functionality of a subset of a system. It typically doesn’t test an entire system, so its scope is typically smaller than an End-to-End test.
A Latency test measures the delay caused by the application’s code and its interactions with the underlying infrastructure. It could test a subset of a system or the entire system as a whole from input to output. A smoke test, canary test, or End-to-End test could be used to measure latency.
Security Testing is itself a superset of testing methodologies for the purpose of assuring the software’s security controls and identifying vulnerabilities. Below is a non-exhaustive list. Often, each one of the following can further branch into even more sub-categories.
This type of testing involves simulating an attack on a system to identify vulnerabilities that an attacker could exploit. Pen-testing is itself a superset of finer-grained pen-testing. To learn more, visit this brief overview: What is pen testing?.
This type of testing involves identifying and assessing vulnerabilities in a system, but without attempting to exploit them. An example might be identifying vulnerabilities in the third-party software used in a system from a database of known vulnerabilities. Not all vulnerability assessments focus on third-party tools, but their mechanisms of reporting are similar.
Static Code Analysis
This type of testing involves analyzing the source code of a system without executing it, in order to identify potential security issues. An example how static code analysis could be useful is to detect unsanitized inputs before using them in an SQL query, allowing for possible SQL injection attacks.
This type of testing involves reviewing the configuration of a system to ensure that it is secure and in compliance with relevant security standards.
As a project matures, more forms of testing need to be introduced to assure stability, compliance, and security. The vernacular of testing in software extends beyond the methodologies mentioned here, however, there’s tremendous overlap in those terms vs these ones. For example, regression testing encompasses every single form of testing mentioned here, because any change in software can introduce a regression (bug) at any scope of an application. Finding the right time to introduce the right form of testing is very application-specific and it’s difficult to strike a perfect balance.
Sam Malayek works in Vancouver, using this space to fill in a few gaps. Opinions are his own.