One of the most common errors received while working on the Internet, with websites or applications is the 502 Bad Gateway error, and this error confuses an ine...
3v-Hosting Blog
12 min read
Imagine a situation where your website or project has started to run painfully slow, pages load slowly or crash altogether, users complain, and conversion rates drop. If this sounds familiar, your first instinct is usually to dive into the code, try to optimize SQL queries, enable and reconfigure caching, and so on. And, oddly enough, these actions often actually do help. However, there is a scenario that even experienced administrators regularly overlook. This scenario is when the problem lies not in the application, but in the resources of the server itself on which the project is deployed.
And the most frustrating part is that infrastructure degradation almost never looks like “the server just crashed”. It always takes the form of a gradual deterioration in system performance: today the site runs a little slower than yesterday, tomorrow spikes appear on the load graphs in monitoring, and then user complaints about the service start coming in.
Let’s try to figure out below how to recognize that it’s time to upgrade your hardware and how not to confuse this with issues in the code or in your project’s logic.
Most technical analyses start the same way: we begin by looking for a bottleneck in the code. And this is absolutely the right strategy, since in 70-80% of cases, the problem really is there. For example, poor SQL queries, missing indexes in tables, unnecessary calculations, poor application architecture, and much more - all of these are classic issues. But there comes a point when digging deeper into the code stops yielding results and turns into running in circles.
This moment usually comes when you’ve already cleaned up the application, but the system’s behavior remains strange. One moment everything is running fast and smoothly, and the next, delays suddenly appear - with no obvious cause, regardless of the time of day or the visible load on the server. At times like these, it’s important to stop, take a deep breath, step back, and look not at the code, but at the environment in which it runs.
So, if you’ve already:
, but the site still continues to “do its own thing” - then this is a strong signal that the problem may lie below the application level.
It’s important to always keep in mind that even perfectly written code cannot run fast if it simply lacks resources.
It’s like trying to push a sports car to its limits on a bumpy road - the potential is there, but the conditions don’t allow it to be fully realized.
Unfortunately, hardware behaves very quietly when it fails; that is, it doesn’t produce loud, clear errors like stack traces or 500 responses. Instead, some indirect symptoms begin to appear, such as:
It is precisely because of this lack of obviousness that many teams fall into the trap of continuing to optimize code that is already working fine, instead of addressing the actual bottleneck, such as a shortage of CPU, RAM, or disk space.
Now that we’ve discussed the core of the problem, it’s time to talk about each component in detail.
When it comes to performance, the CPU is usually the first thing people look at. And that makes sense, since the CPU is what executes all your code - from PHP scripts and Python applications to web server request processing and database operations. But an important nuance is that the problem almost never lies simply in an abstractly high CPU load. It’s much more important to understand exactly how it’s being loaded, what the predominant type of load is.
Many people rely on a simple metric here: if the CPU is at 80-100% utilization, that’s bad news and it’s time to rent a new server. But in practice, this is only part of the picture, and it’s much more accurate to assess the load behavior - whether it’s steady, if there are spikes, how quickly the CPU releases completed tasks, and so on.
If the CPU load consistently stays at 80-100%, that is undoubtedly a warning sign. But even more telling is a scenario where short-lived but frequent spikes appear on the load graph during seemingly routine user actions, such as opening a directory or a product page.
Typical signs that the CPU is becoming a bottleneck most often include:
Why does this happen? Because every user action involves a set of operations, such as code execution, database operations, template processing, or sometimes cryptography. So, if there are too many of these operations happening simultaneously, the processor simply can’t keep up (your CPU).
Classic examples include sites built on CMS platforms like WordPress or OpenCart. It seems like everything is optimized: minimal plugins, caching enabled, and requests are fine. But for some reason, with 20-30 concurrent users, the server starts to “choke”. This means that each page still requires a significant amount of CPU time, and the total load exceeds the processor’s capacity.
It’s worth mentioning a separate scenario where negligent and/or greedy hosting providers sell you an oversold VPS: formally, you’re allocated a vCPU, but physically, that processor is shared among a large number of other clients. As a result, everything works fine during quiet times, but as soon as the load starts (whether from you or your neighbors), the available CPU time drops sharply, and at such moments you don’t get a full-fledged processor, but only its “leftovers,” which immediately begins to affect your website’s response time.
In practice, this looks like this: you haven’t changed anything in the code for a long time, traffic is also within normal limits, but your site suddenly starts running slower, delays appear, and the CPU in the metrics behaves erratically. And if you don’t take virtualization into account, it’s very easy to mistake this for an application issue, although, as we’ve already established, the cause here is purely infrastructural.
The situation with RAM is much less obvious than with the CPU. When there isn’t enough CPU power, it’s usually immediately apparent. But a lack of RAM can masquerade for a long time as all sorts of strange slowdowns that are hard to explain. The server doesn’t seem to crash, errors aren’t popping up, but the site starts behaving sluggishly and erratically.
The reason is that when memory is insufficient, the system doesn’t stop. Instead, it activates a mechanism called swapping. This means that some data is moved from RAM to disk to free up memory for new tasks. Formally, everything continues to work, but in reality, performance drops significantly.
Why is this so critical? Because access to RAM is measured in nanoseconds, while access to the disk is measured in milliseconds (depending on the type of disk). The difference, as you can see, is thousands of times. And as soon as the system starts actively using swap, every request turns into a chain of slow read/write operations (I/O).
Based on the above, typical signs of insufficient RAM look like this:
free -m or htop.
In practice, this feels very unpleasant when the site seems to be up and running but periodically freezes up. The user clicks a button - and waits. Then everything works quickly again, and a minute later the situation repeats itself.
A good analogy is your desk. If you have enough space on your desk, everything is within reach, but if the desk is cluttered, you start putting things in the next room. And now every action you take requires extra time, because you have to get up, walk over, grab an item, and come back. That’s exactly what swap does.
Components that actively work with data suffer particularly badly from a lack of RAM:
If there isn’t enough memory, the database starts accessing the disk more frequently, the cache loses its efficiency, and application processes begin competing for resources. As a result, everything slows down at once.
Here’s an important interim conclusion: even a perfectly optimized application won’t be able to run fast if it simply has nowhere to store data. RAM isn’t just another server resource - it’s, without exaggeration, the foundation of all performance.
The disk is often the last thing people think about. It usually seems that the CPU and RAM are the most important, while the disk is just where files are stored. But in practice, disk issues are one of the most common causes of hidden bottlenecks, especially in projects with large databases.
It’s important to understand that a disk isn’t just a place to store files, but also that every disk has its own access speed to stored data. And the difference in read/write speeds between a standard HDD, a standard SSD, and an NVMe drive can be several times greater. Most importantly, if the disk can’t keep up with processing operations, the entire system starts to wait. Of course, there are exceptions, such as applications optimized to run exclusively in RAM. Still, in most projects, sooner or later, data will need to be saved to disk.
When an application makes a database query or writes a log, it cannot continue working until the disk responds. And if there are many such operations, a queue begins to build up. At some point, it turns out that the CPU may be free, RAM is in good shape, but processes are simply standing by and waiting for I/O (Input/Output).
Typical signs that you’ve hit a disk bottleneck include:
This is easy to confuse with issues in the code or database, but the key detail here is absolute unpredictability. Sometimes everything runs quickly, and sometimes sudden delays occur, even though the load seems unchanged.
The following are particularly sensitive to disk speed:
Unfortunately, cheap SATA SSDs can often hit I/O bottlenecks just like old HDDs, especially under heavy load. That’s why a real performance boost usually comes from switching to NVMe drives, where latency and throughput are orders of magnitude better. That’s exactly why we at 3v-Hosting switched to NVMe drives.
So, remember: if you have low CPU load and plenty of free RAM, but the site is still slow, there’s a very high chance that the bottleneck is the drive.
But there are cases where the situation looks extremely strange. The code hasn’t changed in a long time, the server load is the same, CPU and RAM metrics are perfectly fine, the disk is barely used, but the site is either flying or suddenly starts to slow down terribly. At times like this, the problem might not be with your project at all, but with where it’s hosted.
If you use VPS or cloud virtualization, it’s important to understand that in this case, you’re sharing the physical resources of a single physical server with other clients. That is, all server resources - the CPU, disk, and network - are shared. And if one of your “neighbors” starts heavily loading the system, it can affect you too.
This effect is called the “noisy neighbor” effect. It is particularly noticeable on budget VPS plans, where resources are not strictly guaranteed but are allocated dynamically. Hosting providers using virtualization types with weak isolation also suffer greatly from this. You can read more about virtualization types in this article.
In practice, this can manifest as follows:
Yes, as you can see, the symptoms are similar to those we described for other components. But this time, the main challenge is that you may not see any obvious causes within your own system. All your processes look normal, but the actual delays occur at the hypervisor level - that is, where you no longer have control over the situation.
This is particularly common when other clients place a load on the disk or CPU. For example, a neighbor runs intensive data processing or creates a backup, and at that moment you suddenly experience increased latency, even though nothing has changed on your end.
This is where the trick lies, because such issues are easily mistaken for a so-called “floating bug” in the code or unstable database performance. But if the behavior is erratic and doesn’t reproduce consistently, it’s almost always worth looking at the infrastructure.
That is precisely why, for projects with consistently high traffic or general stability requirements, it is important to consider not only the VPS specifications but also the provider’s policies - specifically, whether guaranteed resources are allocated, what type of virtualization is used, what the overselling level is, and what disk subsystem is used.
At 3v-Hosting, we long ago abandoned overselling in favor of guaranteeing the stability of the infrastructure provided to our clients, as we understand that the main value of hosting lies not in price or specific server parameters, but in the stability of its operation over the long term. This is our value; this is our philosophy.
After analyzing the CPU, RAM, disk, and infrastructure, diagnostics can be boiled down to one simple principle: in most cases, the difference between a “code issue” and a “resource issue” is evident from the system’s behavior over time.
Code, as a rule, behaves predictably, and if there is a bottleneck in it, it will manifest identically every time under the same conditions (the principle of reproducibility). Hardware, on the other hand, exhibits fluctuating, unstable symptoms that intensify under load and often depend on external factors.
So, let’s put together a short summary that can be relied upon in real-world work:
If the problem is in the code:
If the problem is with resources (CPU, RAM, disk, VPS):
The main practical rule of thumb is simple: if you haven’t changed anything but performance is all over the place, then it’s almost certainly not the code (your Cap again).
Theory is all well and good, but in real-world work, it all comes down to logging into the server and figuring out exactly what’s happening right now. And here it’s important not just to run a couple of commands, but to be able to correlate their metrics.
Never look at just one metric. You need to get the big picture of what’s happening in the system, since symptoms almost always manifest in several places at once.
Below we’ve provided a concise set of standard checks that lets you understand what’s happening and pinpoint the problem in just a couple of minutes.
Start with the basics and see what the CPU is doing and if there’s any obvious bottleneck.
top
Or, more conveniently:
htop
What to look for:
%CPU for processes - which ones are actually taxing the system,load average - the overall task queue,%us / %sy - user and system load.
Interpreting the results:
Now let’s check memory. Here, it’s not just the amount of free RAM that matters, but also the presence of swap.
free -m
Additionally, you can check in htop (it’s more intuitive there).
What to look for:
available - how much memory is actually available;swap - whether swap is being used.
Interpreting the results:
If the CPU isn’t busy but the site is still slow, it’s almost always worth checking the disk.
iostat -x 1
What to look for:
%util - disk load,await - operation wait time,%iowait(in top) - how much CPU time the disk is waiting for.
Interpreting the results:
awaitmeans the disk is responding slowly;%utilclose to 100% means the disk has hit its limit;
To get a complete view of the system, it’s helpful to use this tool:
vmstat 1
What to look for:
r - process queue (CPU load);si/so - swap activity;wa - disk wait.
Interpretation of results:
wameans a disk problem;si/someans the system is swapping;rmeans the CPU is overwhelmed.
Sometimes a single command is enough to understand what’s happening:
uptime
It will display a parameter such as load average.
Interpreting the results:
Work with these commands, practice drawing conclusions based on their results, so that at a critical moment you can quickly and confidently identify the root of the problem and resolve it promptly.
There are moments in life that many people try to put off - for example, a trip to the dentist or switching from summer to winter tires on a car. The same thing happens when it comes to upgrading a server - when it becomes clear that the problem is no longer in the code, but in the resources. Usually, before this happens, everyone goes through the classic process of optimizing queries, cache configuration, and rewriting parts of the logic. And that’s the right approach! But at some point, it becomes clear that each subsequent improvement yields less and less effect.
This is the main signal: it means you’ve hit the ceiling of your current infrastructure.
At this point, it’s crucial to set priorities correctly, because you can keep looking for and fixing minor issues within the application itself, but if the baseline level of resources doesn’t match the load, you’ll be fighting the symptoms rather than the cause.
In this sense, hardware is the foundation of the entire system. And if this foundation is weak, no amount of optimization on top of it can make the system truly fast and stable. Moreover, solving this issue is becoming much easier, given the constantly falling prices for servers.
Most often, this is due to increased load - either from your end (more users) or at the host level (if it’s a VPS with other tenants). The code obviously doesn’t change depending on the time of day, but available resources certainly can.
No. Caching reduces load but does not increase server resources. If the bottleneck is in the CPU, RAM, or disk, caching will only temporarily alleviate the problem.
Just run free -m. If swap is being used, there’s already not enough memory, even if the system still seems to be functioning normally.
It depends on the project. But in practice, for database-driven sites, the disk is more often the bottleneck than the CPU.
Most likely, processes are waiting for the disk (I/O) or hitting the limits of RAM/swap. The CPU may be idle while the system is waiting.
Yes. On cheap VPS servers, the “noisy neighbors” effect often occurs. In this case, performance fluctuates without any changes to the code.
When a website starts to slow down, the first instinct is to dive into the code. This is a good habit, but it’s important to pause and ask yourself a broader question: is the problem really in the application, or in the resources it’s running on?
In practice, it very often turns out that the bottleneck isn’t a specific algorithm or query, but a simple lack of CPU, RAM, or disk speed. Moreover, such issues manifest in subtle ways, making them difficult to pinpoint.
The most common mistake is to keep optimizing something that’s already working fine. You can spend days or weeks squeezing out a few percentage points of performance, even though the real solution lies in increasing resources or changing the infrastructure.
Good infrastructure doesn’t require attention. It doesn’t create unnecessary noise, doesn’t cause random performance drops, and doesn’t limit the project’s growth. It simply allows the system to work as intended.
Proxies and VPNs are often confused, but they are different tools. Let’s take a look at how they work, what the differences are, when to use each one, and how t...
WordPress vulnerabilities in practice: how they are detected using WPScan, where to look for weaknesses, and which errors most often lead to website breaches an...
Step-by-step guide to setting up WireGuard on a VPS: installation, key generation, server and client configuration, launching the VPN, and troubleshooting commo...