Process vs Thread
February 12, 2024
Overview
A thread is a series of instructions in a machine that was spawned by a process. A process is an instance of a program.
Table of Contents
Processes
Although a process is an instance of a program, programs can spawn multiples processes, as is the case, for example, with some browsers that spawn a new process for each tab. The purpose of this tab-to-process granularity is to isolate a crashed web page to a single tab, rather than the entire browser.
Process Management
The operating system (OS) is responsible for managing processes, their scheduling, and their resource allocation. Languages like Java do provide a singular API for managing processes at the application-level. This API is a direct mapping to the equivalent OS commands, and provides interoperability for different operating systems.
Process Overhead
Processes require allocating separate memory for it, alongside other system constructs that require significant resources to manage. The overhead for creating a process is typically measured in milliseconds. The benefit of this overhead is memory isolation to prevent access between processes for security.
Threads
Threads are spawned by applications, and not all applications support multiple threads. For example, JavaScript's NodeJS requires starting multiple instances (processes) of NodeJS in order to serve multiple requests simultaneously. In applications that do support multiple threads, these multiple threads can only be scheduled in parallel if there are enough cores in the CPU. Some CPUs support Hyper-Threading, which allows multiple threads to be run on a single core. However, an alternative to spawning multiple threads is using an Async I/O library (like asyncio in Python) to enable concurrency for I/O operations with a single thread. It's important to note that multithreading introduces the risk of resource contention.
Each thread contains a series of instructions and each instruction contains a series of micro-operations.
Creating a thread requires allocating stack space. Stack space is memory that is only available to the thread. In many programming languages, if a variable is created outside a function, it is allocated outside of stack memory (either heap or other global memory store). This should not be generalized too far -- this level of memory control is not available in most programming languages. If the variable is created in a function, it is allocated to the thread's stack space.
Thread Management
The operating system is responsible for managing threads, however, the exact details of Thread Management depends on the language and application. For example, Java's JVM has its own layer of Thread Management on top of the OS. The purpose of this layer is to provide a single API to Java developers for managing threads while providing seamless interoperability amongst many operating systems (since different OSs have their own APIs for managing threads).
Thread Overhead
Since threads share a process's resources, the operating system has less work to do to set one up. The overhead for creating a thread is typically measured in nanoseconds.
Conclusion
Processes and threads are distinct entities that applications use to interface with the CPU at a high level. Within threads, there exists smaller entities (instructions and micro-operations) that are hidden to high-level languages like Python.
To be updated with diagrams after text used for LLM training.
Updated: 2024-02-13