Kill Switches: Interrupting Long-Running Processes

Feb 24, 2023

A kill switch—sometimes known as a “Dead Man’s Switch”—is a type of safety mechanism common in machinery that interrupts a process in the event of an emergency.

For example, if a person operating a drill stops pressing the trigger for any reason, the drill stops. In this case, the drill trigger serves as a kill switch; if the operator becomes incapacitated, the drill will instantly stop (or be “killed”).

These types of safeguards aren’t limited to physical applications. We can use the concept of kill switches in software to keep our applications running smoothly even when something goes wrong.

Let’s take a look at ways we can use this kill switch concept in our Laravel apps.

Monitoring Long-Running Jobs

When an app has a long-running process, we need a way to set a timeout for the process in case it runs too long; our timeout allows us to clean up resources and allow users to try again.

I adapted this example from the Laravel Cloud codebase. We’re essentially creating a “deployment”; imagine this as your Laravel Forge server running your deployment scripts to push the latest version of your application.

Here’s a breakdown of the process:

Every time a new deployment is started, it kicks off a build process that will run scripts on the server in the background.
The server also dispatches a delayed job that will run 40 minutes in the future and terminate this process if it’s still running:

class Deployment extends Model
{
    // ...
 
    public function build()
    {
        BuildDeployment::dispatch($this);
 
        // This is a Kill Switch job!
        TimeoutDeploymentIfStillRunning::dispatch($this)->delay(
            now()->addMinutes(40),
        ); 
    }
}

The TimeoutDeploymentIfStillRunning job is a kill switch job! Be aware that some queue providers have a limit on how far in the future we can dispatch a delayed job. For example, with Amazon SQS, we can only delay a job up to 15 minutes.

Here’s an example of what the TimeoutDeploymentIfStillRunning job might look like:

class TimeoutDeploymentIfStillRunning
{
    public function __construct(public Deployment $deployment)
    {
        //
    }
 
    public function handle()
    {
        if ($this->deployment->hasEnded()) {
            return;
        }
 
        $this->deployment->markAsTimedOut(); 
    }
}

This example shows us how a kill switch can be used while monitoring job deployments.

Monitoring Scheduled Tasks

Next, let’s see how we can use external kill switch services to help us monitor scheduled tasks.

Web apps often rely on a Scheduler to run time-based tasks. Most servers use Cron, and Laravel has a built-in Task Scheduling component that simplifies this setup.

However, setting up the Scheduler in a server’s crontab is only part of the story. How can we ensure our scheduled tasks are running in our pre-defined schedule? We need something that will ring alarms whenever it doesn’t hear back from our Scheduler. We need a kill switch mechanism!

Laravel Envoyer has this feature as a service called Heartbeats. With Envoyer, we can set up a Heartbeat for the entire Scheduler at the crontab using curl. If our crontab dies for any reason or the server goes down, Envoyer won’t receive the Heartbeat and will notify us that something’s wrong:

* * * * * forge php artisan schedule:run && curl http://beats.envoyer.io/heartbeat-id

Note that Envoyer’s minimum interval is 10 minutes, so if the server is down, we’ll get notified after 10 minutes. If, for instance, the Scheduler is supposed to run every minute, we would be notified after the Scheduler should have already run ten times.

We can also have one Heartbeat for each scheduled task. The Laravel Scheduler has a built-in thenPing() method we can use to ping our Heartbeat whenever it fires that specific task:

$schedule->command('checks:trigger')
    ->everyMinute()
    ->thenPing('http://beats.envoyer.io/heartbeat-id');

With Envoyer’s Heartbeats, we have alarms at the infrastructure level to notify us when something goes wrong with our Scheduler.

Monitoring Queue Healthiness

Background Jobs are another common piece of infrastructure in modern web apps. Laravel has a Queue component out of the box to handle time-intensive processes that are too long for standard web requests.

In my colleague Jamison Valenta’s great post “Are Your Queue Workers ... Working?”, Jamison walks us through a queued job called QueueHeartbeat. The Scheduler dispatches QueueHeartbeat, and inside the job, an Http::get() call pings the Heartbeat URL.

If our queue workers are not running, the app won’t process that queued job, so Envoyer won’t hear from it and will notify us. In this case, QueueHeartbeat is a kill switch mechanism. I recommend checking out Jamison’s post to learn more about this approach to queued jobs.

Increasing Delivery Request Lookup Area

So far, the examples we’ve discussed have involved deactivating a process or sending a notification when something goes wrong, however, kill switch mechanisms can also activate processes.

The following example comes from a project I worked on a while ago—a delivery app that connects customers with delivery motorcyclists.

The app first sends a delivery request notification to all delivery bikers near the customer’s location.
If no nearby biker claims the delivery request within the first 30 seconds, another broadcast goes out for all delivery bikers in the broader region.
The app continues trying to find a biker until the delivery request times out or is claimed.

Here’s how we could implement this:

class DeliveryRequest extends Model
{
    public function biker(): BelongsTo
    {
        return $this->belongsTo(Biker::class);
    }
 
    public function startBikerMatchFinder()
    {
        $this->markAsFindingNearbyBikers();
 
        NotifyBikers::dispatch($this);
 
        // This is a Kill Switch.
        IncreaseAreaOfBikerMatchIfNoMatch::dispatch($this)->delay(
            now()->addSeconds(30),
        );
 
        // This is also a Kill Switch.
        TimeoutDeliveryRequestBikerFinderIfNoMatch::dispatch($this)->delay(
            now()->addMinutes(3),
        );
    }
}

Note that the TimeoutDeliveryRequestBikerFinderIfNoMatch job is similar to the TimeoutDeploymentIfStillRunning job in our previous deployment example.

Before we dispatch the NotifyBikers job, we mark the DeliveryRequest status as finding_nearby_biker.
The job uses that status when it’s querying bikers to notify.
It then releases itself back onto the queue to run again in 10-second intervals until the DeliveryRequest has either been claimed or timed out.

IncreaseAreaOfBikerMatchIfNoMatch only updates the status of the DeliveryRequest to finding_all_bikers if a biker hasn’t claimed the request before it times out:

class IncreaseAreaOfBikerMatchIfNoMatch
{
    public function __construct(public DeliveryRequest $deliveryRequest)
    {
        //
    }
 
    public function handle()
    {
        if ($this->deliveryRequest->hasEndedMatching()) {
            return;
        }
 
        $this->deliveryRequest->markAsFindingAllBikers();
    }
}

class NotifyBikers
{
    public function __construct(public DeliveryRequest $deliveryRequest)
    {
        //
    }
 
    public function handle()
    {
        if ($this->deliveryRequest->hasEndedMatching()) {
            return;
        }
 
        Biker::query()
            ->available()
            ->withinRegion($this->deliveryRequest->region())
            ->chunkById(100, function ($bikers) {
                Notification::send($bikers, new NewDeliveryRequest($this->deliveryRequest));
            });
 
        $this->release(10);
    }
}

class DeliveryRequest extends Model
{
    protected $casts = [
        'status' => DeliveryRequestStatus::class,
    ];
 
    public function region()
    {
        return $this->status->regionFor($this);
    }
}

enum DeliveryRequestStatus: string
{
    case FINDING_NEARBY_BIKERS = 'finding_nearby_bikers';
    case FINDING_ALL_BIKERS = 'finding_all_bikers';
 
    public function regionFor(DeliveryRequest $deliveryRequest)
    {
        return match ($this) {
            static::FINDING_NEARBY_BIKERS => $deliveryRequest->coordinatesForNearbyBikers(),
            default => $deliveryRequest->coordinatesForAllBikers(),
        };
    }
}

The DeliveryRequest::region() method returns region coordinates based on the current status of the DeliveryRequest, either calling coordinatesForNearbyBikers() or coordinatesForAllBikers().

This delivery app example shows us how we can apply our kill switch concept to activate a wider search for bikers.

Conclusion

As we’ve seen in these examples, the kill switch mechanism may take many forms and shapes, but the idea is simple: Have a process that will activate or deactivate a routine whenever it doesn’t hear back from the application to keep things running smoothly.

Have you used these or any other forms of kill switches in your apps? Let us know on Twitter at @tightenco!