Encapsulation
June 30, 2023
Overview
Encapsulation is a fundamental concept in software engineering that can be applied at any scope or scale of a system. In its broadest definition: when logic (aka behavior) or data are guarded against outside access by some logical wrapping, it is encapsulated. Entrypoints to access this guarded logic or data may or may not be available. The term is most commonly used in the context of object-oriented programming where classes encapsulate logic that operates on the data associated with it. Any access to that data should occur through an object's public methods to promote data integrity and modular design.
Note: this post has similarities to High-Level vs Low-Level regarding how it depicts levels of scope of a system.
Table of Contents
Nuance
In the context of software engineering and in its strictest definition, the term encapsulation requires some wrapping of data or logic. In the broadest definition of the word, it could be considered a direct synonym to isolation, segregation, or separation. Even in its strictest definition, it often involves some form of isolation. In this post, we'll use the stricter definition to help readers distinguish the difference.
Global
There are various mechanisms that create private networks across the globe in order to create a secure global network that secures the data traveling through its tunnels. These are usually not physical tunnels or cables separate from the other traffic that travels across the internet, but can be:
- Backbone Network Peering (example: AWS VPC Peering): Cloud providers like AWS have their own cables that connect all of their regions around the world. Backbone Network Peering allows customers with resources in multiple regions to connect them in a way that is physically separate from the internet. From this perspective, it offers Layer 1 (Physical) segregation of traffic from the internet.
- Dedicated Internet Access (example: AWS Direct Connect): Telecommunications companies offer high-performance, high-security connections that can be leased and used to connect on-premise data centers to Cloud provider locations that then provide access to regions via Backbone Network Peering. These connections (from a customer's on-premise location to the AWS Direct Connect location, for example) physically share the same cables as other Telecom customers, however, the logical separation of your data is very low on the OSI Model, at Layer 2 (Data Link). This is the most secure layer of isolation from other traffic without physical separation. Counter-intuitively, this security comes from the fact that being lower on the OSI Model means that there is less wrapping of data, and therefore smaller attack surface given that there are fewer protocols that could potentially be exploited.
There are countless other mechanisms for securing connections between datacenters in order to create a global network of machines, but these are just a couple -- physical segregation is always the most secure.
Regional
- Each Availability Zone in a region is a separate and isolated data center for the purpose of fault tolerance.
- A Virtual Network (example: VPC - Virtual Private Cloud) offers encapsulation by creating a logical network boundary such that resources within it can only be accessed via explicitly enabled routing (to create entrypoints).
VPC
In the cloud, resources inside VPCs may or may not be accessible from the internet based on whether they are in a public or private subnet.
Bare-Metal Machine
Physical machines (or bare-metal machines) offer encapsulation such that all data stored on that machine is yours. It's not shared with some other machine. Entry-points into this machine are through its network interface or other Input/Output devices.
Virtual Machine
Virtual Machines use a feature provided by the CPU (hardware virtualization) to create an entirely encapsulated virtual machine within a physical machine. The hardware resources allocated to the virtual machine are dedicated and usually cannot be changed after it has started. This hard separation of resources down to the hardware provides a high level of encapsulation and security. The entrypoint into the virtual machine is its Virtual Network Interface.
Container
As detailed in What is a Container?, containers provide a relatively looser and less-secure form of encapsulation compared to virtual machines since the resources are not isolated at the hardware-level. Containers also have Virtual Network Interfaces, like Virtual Machines.
Operating System
Operating Systems require their own physical or virtual machine to run, so they don't provide any method of encapsulation as a whole (i.e. you can't run two operating systems in the same virtual machine at the same time). However, they do provide plenty of mechanisms of encapsulation within them to operate safely, securely, and efficiently. Some examples:
- Kernel Space vs User Space: This is a mechanism to separate system functionality that interacts directly with the hardware from applications that could be unstable or insecure. The Kernel Space's entrypoints are its system calls -- accessible by applications running in the User Space.
- Processes: A process is an encapsulation of a program's code, its current state, and its associated resources, like file handles and network connections. The operating system prevents direct memory access between processes, which prevents one process from crashing another. A process's entrypoint is the program's entrypoint (initial function where the operating system passes control to the process), although it's somewhat more complicated than this.
- Threads: A thread is an encapsulation of a single executed code path within a process. Its entrypoint depends on the programming language, but eventually results in a system call.
Application
Different programming languages have different methods of encapsulation at different levels of scope. Some examples:
Python
Python uses modules to encapsulate a group of functions/classes in a file. Since Python has no access modifiers, anything
defined in a module can be accessed and could be considered an entrypoint (although entrypoint
is not common terminology in this context).
Python has very weak encapsulation overall due to its lack of access modifiers (but this can enable creativity).
Python also has classes that can be used to encapsulate functions/data, but as we just mentioned, this is a weak form of encapsulation given the lack of access modifiers. Python's strongest encapsulation feature is inner functions, which can't be accessed from outside the function:
def outer_function(x):
def inner_function(y):
return y + 2
return inner_function(x) + 3
print(outer_function(5)) # prints 10
Although Python doesn't have enforced access modifiers, it does have a convention where any variable name with a leading underscore (e.g. _my_private_var
) is intended to be private. Furthermore, if a variable name has two leading underscores (e.g. __my_private_var
), it can only be accessed by using its mangled name: _MyClass__private_var
.
Java
Java uses packages to encapsulate a group of classes/interfaces in one or more files in a directory. Classes encapsulate a group of functions/data.
Java has a robust system of access modifiers. This provides strong encapsulation, enabling a high level of modularity that can be enhanced with the use of Dependency Injection frameworks like Dagger for larger projects to manage all the dependencies of difference classes.
PHP
PHP uses namespaces to encapsulate a group of functions/classes (in one or many files and in one or many directories). Classes encapsulate a group of functions/data.
Similar to Java, PHP also has a robust system of access modifiers. However, PHP's usage is not as widespread as Java in the space of enterprise systems, so it doesn't have as prominent of a Dependency Injection ecosystem (some may argue otherwise).
Golang
Golang uses packages to encapsulate a group of files in one directory containing functions/structs. Structs encapsulate functions/data.
Golang encapsulates variables, methods, or functions by leaving them unexported (lowercase first letter), making them only accessible within the same package.
Conclusion
The term encapsulation has its nuances. Without applying this principle at all levels of a system, software would be a tangled mess of spaghetti. It's not only about protecting data, but also about hiding complexity and internal workings of a unit of software. It helps to create more maintainable and reliable software by enforcing boundaries and limiting the impact of changes.
Updated: 2024-03-05