Skip to Content

The Virtyx Blog

Osquery Snapshots Arrive in Virtyx

By Ben Burwell • October 22, 2018

We love Osquery here at Virtyx. It’s an open-source project created by Facebook that exposes thousands of machine parameters as a virtual SQL database that can be queried. For example, you might be interested in knowing what users can log into a system. With Osquery, you can quickly find out:

osquery> select uid, username, shell
    ...> from users
    ...> where shell != '/usr/bin/false';
| uid | username     | shell            |
| 0   | root         | /bin/sh          |
| 4   | _uucp        | /usr/sbin/uucico |
| 501 | ben          | /bin/zsh         |

Using a standard SQL interface brings several benefits. First, it’s a query language that many developers, sysadmins, and IT personnel are already familiar with. Second, it provides a way to present lots of different types of data in a consistent way. Finally, it’s an extremely powerful abstraction enabling people to ask detailed, ad-hoc questions about the infrastructure they’re responsible for keeping available, performant, and secure.

Osquery has a built-in mechanism to run queries periodically and report their results to a central endpoint for analysis. This is extremely helpful for keeping an eye on system parameters over time and sending alerts when things change unexpectedly, but it also exposes one of the challenges of monitoring more broadly: it can be hard to predict ahead of time what information will be useful in a diagnostic context. When an incident arises, even with great monitoring tools in place, we often find that there are metrics that would be helpful in our diagnosis that we didn’t anticipate and thus aren’t available.

To fill this need, we’ve developed a powerful feature on top of Osquery that enables something like time travel debugging for custom queries. Every hour, the Virtyx agent takes a “snapshot” of Osquery’s virtual database and uploads it to the Virtyx cloud. This means that when you write a custom query, you can understand how its results have changed over time without having to plan ahead or wait for data to be collected, you can retrospectively run custom queries on any snapshot.

Starting today, when you use Osquery in Virtyx, you’ll automatically get the power of snapshots and with them, the peace of mind that you will be able to quickly get the answers you need to diagnose problems when they arise.

Tech Focus: Bi-Directional Agent

By Ben Burwell • October 19, 2018

To anyone who has spent time working with monitoring tools, an agent is not a new concept. The tried and tested monitoring pipeline typically involves a small piece of software that runs on all the devices you’re interested in monitoring and collects relevant system information. These metrics need to eventually end up in a central location to be useful, whether that’s through a push or a pull method (for example, StatsD pushes measurements to a server over UDP while Prometheus makes its latest metrics available for collection by exposing an HTTP endpoint).

Legacy monitoring products use agents that only communicate metrics, relying on out-of-band mechanisms for updating the monitoring configuration. While there are tools that can simplify this process, the inability to dynamically reconfigure monitoring systems can result in lost time when it’s most critical.

When we developed the Virtyx agent, we knew it was important for the monitoring system to be able to communicate its needs back to the agent. With the growing complexity of modern software stacks, it’s not always possible to predict which specific metrics may be most helpful to diagnosing issues, and even if it was possible to make a list of every possibly relevant metric, collecting all of them at a high frequency puts undesirable strain on your compute and network resources. By enabling the monitoring system to dynamically indicate to the sensor network (agents) which metrics need to be collected, you can get the best of both worlds: detailed, relevant metrics at a moment’s notice without the overhead of continuous high-frequency collection.

With bi-directional communication, the monitoring system can move a step even further into actually resolving issues that arise. In addition to sending agents metrics to collect, the monitoring system can instruct agents to perform a wide range of other functions from restarting services to checking network status or reachability.

Within Virtyx, all of the tasks that can be performed are defined as agent plugins. Our agent is simply a means for the monitoring system to collect data and carry out tasks on remote systems while being agnostic about how those functions are carried out. Our fundamental abstraction is the Task, which defines which plugin to run, how often to run it (possibly just once), and what parameters the agent should run it with.

Plugins are simply executable binaries that are launched by the agent and thus can be written in many languages. They can receive configuration parameters through standard input, and any results printed to standard output are reported by the agent back to the monitoring system.

The agent checks for new tasks when it sends its heartbeat every 15 seconds. If new tasks have been assigned to the agent in the monitoring system, they are immediately started. The agent also establishes a websocket connection to the monitoring system which enables single-run tasks to be sent in response to actions taken in the monitoring system’s user interface.

By implementing two-way communication between the agent and the monitoring system, we provide our customers not just with monitoring tools, but with the ability to truly diagnose and resolve issues when they arise more quickly than would otherwise be possible.

You can start reaping the benefits of using Virtyx by signing up now — your first five agents are always free. If what we’re doing sounds interesting, come join our team! We’re hiring in Boston. You can also take a look through some more practical information about the Virtyx agent on our documentation site.

Dimensional Monitoring: Diagnose Faster and with Confidence

By Jim Maniscalco • September 27, 2018

When we started building Virtyx we knew that monitoring was ready for a change. Machine Learning and Artificial Intelligence would drive some of the change, but even beyond those disruptive technologies, monitoring has reached a sort of plateau.

I know this may invite some debate but the fact is you can look at a plethora of commercial and open source monitoring products and quickly conclude that each of them can:

  1. Collect data from almost any source quickly and easily.
  2. Graph metrics, with multiple series over granular time periods.
  3. Send alerts when triggered by those metrics.

Virtyx, and other good monitoring systems, give you these essential features. But moving forward, to push monitoring even further, why stop at alerting of issues? If the monitoring system is collecting data to alert about an issue, why doesn’t it go further and collect the information on how it can be diagnosed and fixed?

Enriched Monitoring with Dimensional Data

The metrics collected can create the beautiful graphs and visualizations to give good visibility into the infrastructure or application. However, at a certain point more time series data doesn’t provide the depth and context to better understand the root cause of a problem. Some things cannot be graphed.

Virtyx builds on years of experience from IT Ops and DevOps to go further, ingest more data, to provide the dimensional context on top of 2D graphs. These added dimensions help diagnose and fix issues even faster.

We call it state — the extra dimension of data that is hard to graph, but just as important when understanding issues and problems. This extra dimension allows us to diagnose faster and with greater accuracy. State is separate from metrics; state can be tracked over time, but is not really graph-able. You can use metrics to show how many applications are installed, how many users are currently logged in, and how many requests per second your web server is getting. State defines what applications are installed, what version they are, who is logged in, and what the configuration is of your web server.

Virtyx automatically collects state through our Agent upon installation, with zero configuration. Behind the scenes, we’re building on top of Facebook’s powerful Osquery open-source tool. By building it into our Agent, we’re removing a lot of the complexity of installing, updating, and maintaining it over your hosts. By building upon Osquery, we’re leveraging it’s great querying ability to allow you to query every device in your infrastructure and ask it questions that simply can’t be captured by collecting metrics.

The metrics now collected by Virtyx for the your two dimensional graphs are enriched with our state information. For example, your disk usage has increased dramatically, but why? A quick look at the state change shows that Adobe Photoshop was installed on that desktop, explaining what has occurred. On top of being able to query the host for the current state, Virtyx captures the state over time allowing you to see how it’s changed historically. This provides another dimension of data.

Virtyx empowers your IT Ops and DevOps to ask these questions and get instant answers, so they can diagnose with confidence.

Monitoring is ready for a paradigm shift. More metrics cannot handle the complexity and needs of fast moving businesses. The innovation of bundling state data with metrics allows for deeper analysis of problems so they can be diagnosed better over more complex and critical infrastructures.

Virtyx Feature Focus: Shell

By Ethan Mick • September 20, 2018

Virtyx offers unparalleled monitoring and data correlation to find root issues. Monitoring software usually only surfaces a problem, often not even the root problem, and it’s up to the ops team to find the root cause and then fix it.

Today, I want to talk about the Virtyx Shell, one of our most powerful tools to actually fix issues.

Virtyx Shell

The Virtyx Shell is a fully functional, powerful, cloud based terminal that gives command line access to your hosts with the same permissions the Agent runs as. We’ve worked hard to make the terminal feel as natural as a native client would. And we think we’ve succeeded:


(ZSH is the best shell!)

The shell captures all input from your keyboard and directly sends it to the agent. The Agent pipes the response back to your browser, allowing perfect communication. This allows you to do some amazing things in the browser, again, just like you would in your own shell!

Lots Going On

(tmux with Angular, top, and Vim)

And of course, like everything at Virtyx, it works great on Windows!


(Powershell, but without the blue)


Another benefit of using the Virtyx Shell is that your actions are recorded in a session. This session is saved on the console so teammates can see it, and scripts can be created from it. We’ll cover sessions in another blog post.

Use Cases

We’ve seen our customers do some amazing things with the Virtyx Shell and want to highlight a few of them.

Working with Firewalls

Whether your job or a cafe has a firewall up, it can stop you from getting your work done or accessing your servers. Port 22 is often blocked on networks, which can make it challenging to SSH into your server. The Virtyx Shell allows you to easily bypass these restrictions. Because you access the shell through your web browser, the secure connection goes outbound on port 443, which should always be open. This allows you to check in on your agents even if conventional ports are blocked.

Getting through NAT’s

Often a server or computer you want to access is through a network address translation (NAT), such as:

  • Accessing your home desktop from a coffee shop
  • Accessing a server that’s in a VPC

In these cases, getting access to it can be tricky. You might need to setup port forwarding on a router, or SSH through a bastion server. The Virtyx Shell simplifies this entire process by letting you connect directly through the cloud to the Agent.

Easy Access on the Go

You might need to access a host when you are away from your normal workstation. You could be on a public computer, or borrowing a friends laptop and need to get access. Without proper SSH keys this might be impossible in those situations. The Virtyx Shell lets you have a backup plan in case you need to get access in those cases. Login to Virtyx and in one click you have the access you otherwise couldn’t get.

Some of our customers use Virtyx for this feature more than any other - the easy ability to access their hosts!


The Virtyx Shell is only possible because of our bi-directional agent communication. Instead of only sending information up to our platform, the agent listens for commands from our platforms as well. One of the commands it accepts is to open up a TTY on the host and pipe the information up to the cloud [security].

Our Agent is written in Go, but there currently isn’t a good TTY/PTY package for Go. We looked at several, but didn’t find one that supported both Unix systems and Windows.

Virtyx ❤️Windows. We work hard to have all features work cross platform.

However, Node happens to have a fantastic package written by Microsoft called node-pty, which works cross platform perfectly.

Our Agent’s plugin architecture allows us to run any binary plugins, not only Go libraries. Because of this, we were able to build the Node package into a binary, bundling the Node.js runtime and all. The resulting plugin can be executed by our Agent, which creates the pseudo terminal. The plugin opens up a secure TCP connection to our realtime servers, which connects the browser to the Agent.

We also could not have done this without the fantastic Xterm library.

Security and Future

The power of the shell is not something we take lightly. We take security seriously and want our customers to know they are secure when they connect through the Virtyx Cloud. Virtyx allows you to configure roles to control who can access the shell on various hosts.

Moving forward, we want to help our users do even more in the shell. We’re looking forward to adding contextual information for your current tasks, and suggestions to fix ongoing problems, all within Virtyx.

Virtyx - One of a million downloads of TimescaleDB

By Jim Maniscalco • September 14, 2018

Collecting large amounts of data to cast a monitoring net across servers, desktops and laptops requires a robust and scalable database architecture. The September 12th 1.0 release candidate announcement from TimescaleDB is heartily welcomed by Virtyx!

We have been using Timescale in the Virtyx Cloud for the last year and the performance, reliability, and scale has met every expectation of our engineers and the demanding analytic requirements of Virtyx customers.

For Virtyx, collecting atomic data from our cloud agents on fleets of machines required a backend that can ingest, analyze and graph data on a massive scale. The ability to power Virtyx with a time series architecture that runs on top of PostgreSQL is an advantage that makes TimeseriesDB the choice for next generation monitoring solutions.

Role-Based Security Arrives in Virtyx

By Ben Burwell • September 13, 2018

Our customers rely on Virtyx to provide them access to critical systems whenever (and from wherever) they need it. With the power that Virtyx provides, many of our customers have requested a way to limit access to sensitive parts of their infrastructure to a subset of their Virtyx users.

To fulfill this need, we’re excited to have just deployed a flexible mechanism for granting access to specific functionality through role-based access control. When you log into Virtyx and go to the Team page, you’ll notice that there is a new “Roles” tab. To get you started, we’ve created an Owner role that has access to the entire Virtyx system and added it to all of your existing users. If you want to keep using Virtyx the same way you have been with all users having access to your entire organization, you don’t need to do anything at all. If you’d like to learn about configuring more granular access, read on!

Roles in Virtyx can do two things. They can grant access to system-wide features, like the ability to create new scripts, and they can grant access to features on specific groups of agents, such as the ability to access a shell session. When you create a new role, you can configure which system-wide features and which agent-specific features you want the role to grant access to. Next, you’ll need to grant your role to users.

Permissions in Virtyx are strictly additive; if you give a user two roles and one of them grants access to your API servers and another role grants access to your fleet of desktops, then that user will be able to access both your API servers and your desktops.

There are two permissions that deserve a special mention: “manage all agents” and “manage users.” The default Owner role grants both of these permissions. When you add the “manage all agents” permission to a role, users who have the role will be granted full access to all agents in your organization, regardless of whether they’ve also been granted access to any specific groups. The “manage users” permission does pretty much what it sounds like; users who have been granted roles with this permission will be able to invite new users and manage access for existing users. It’s important to note that anyone who has the “manage users” also have access to grant themselves any other permission.

We’re excited to hear what you think about this new feature! You can send email to, or find us on Twitter @virtyx_inc.

MSP: Opportunity or Risk?

By Jim Maniscalco • March 8, 2018

For our Managed Service Provider customers in particular, we are discovering that the market shift to the cloud is a pretty serious change to their business and support model. Rapid client adoption of SaaS, cost effectiveness and management of public cloud servers are all major changes from the traditional desktop and server support cost model that helped MSPs build their businesses. These new support requirements and the ability to successfully implement them are now becoming primary drivers that new and existing customers are using to decide who gets the next support contract in a new cloud-first world. MSPs need to be prepared with the skills and platforms to manage in a world of SaaS and cloud where they may not own, or even manage, a large portion of the cloud hardware and software. Yet one thing has not changed for MSPs – they are most definitely on the hook for the same SLA customer support levels. It is a time of opportunity, but also great risk for MSPs.

At Virtyx we dialed into that world, delivering an entirely new way for MSPs and enterprise IT to support the changing landscape. A simple, highly aware, cloud monitoring platform that actually integrates monitoring with a powerful query language to easily investigate and troubleshoot issues (think Google to troubleshoot desktops, servers, apps) and a full command line shell to fix what is broken. The continuous nature of the cloud requires monitoring to go to another level where the alarms and alerts become the AI context engine to understand, isolate, and automatically fix problems; yes, fix problems. MSPs are starting to think about how they go beyond a tools strategy as they enter a cloud-first market that is very different and comes with an entirely new set of operational and support challenges.

Two-Factor Authentication Arrives at Virtyx

By Ben Burwell • February 28, 2018

When it comes to critical infrastructure, security is essential. That’s why Virtyx now allows customers to set up two-factor authentication (2FA) for their accounts.

Two-factor authentication makes your account more secure by requiring you to present two independent means of verifying your identity when attempting to log in. In addition to a password, an authentication code is used which proves that you have access to a preconfigured device. Even if a malicious person were to learn the authentication code you used to log in, they would not be able to use the code to log in again in the future. They would need access to your token generating device to get a new code to complete the login.

You can enable 2FA for your account now by logging in, going to Security, and following the instructions. More information about 2FA and our security measures is available in our help documentation.

If you don’t yet have a Virtyx account, sign up now for free and start improving the health of your infrastructure. Don’t forget to enable 2FA!

Detecting and Fixing High Disk Usage with Virtyx

By Ethan Mick • February 23, 2018

It’s a well-known fact that computers do strange things when they are very low on disk space, and even stranger things when they run completely out. For mission-critical infrastructure, it’s important to quickly detect and be able to fix issues which arise due to disk space starvation.

Even with increasing prevalence of self-healing infrastructure, there is the frequent need to monitor specific metrics (such as used disk space) on certain hosts and take action when problems arise.

In our own infrastructure, we have a Jenkins server that builds Maven artifacts from our CI jobs. After building many artifacts over some period of time, the Maven repository on the Jenkins server becomes very large, and our Jenkins server starts running low on disk space.

With a Virtyx agent deployed to the Jenkins server, we can easily keep an eye on the available disk space and get alerts when the free space becomes dangerously low. Furthermore, with the Virtyx shell, we can easily gain access to a shell session right from our dashboard and get in and remove old Maven artifacts to free up space.

There are a few other culprits of full disks. Log files are not always correctly rotated, eventually resulting in slow or non-functional servers or desktops. Perhaps you’ve downloaded many npm packages and have an enormous node_modules directory. Or your disk is full of old Docker images that you need to get rid of.

Deploying the Virtyx agent to your critical infrastructure can help you detect, investigate, and resolve issues as they arise. Get started by signing up for a free Virtyx account now.

Cross Compiling Go in Docker

By Ethan Mick • January 11, 2018

At Virtyx, our Agent is written in Golang and runs on Linux, Windows, and macOS. We’re really happy with Go’s small footprint, cross compatibility, and standard library. While there are a few things we’d love to see (better error handling!), we’re excited to keep writing our client agent in Go.

From day one, we’ve been cross compiling the Agent to run on all three major OS’s, 32 and 64-bit. To start, we used gox, which gave us easy cross compilation with simple pass through options. For example, to compile our Agent:

gox -os="linux windows darwin" -arch="amd64 386" -output="agent_{{.OS}}_{{.Arch}}" .

The {{.OS}} and {{.Arch}} are automatically replaced with the compiled values. A nice benefit of gox is also that the binaries are compiled in parallel!

However, recently we’ve expanded our cross compilation to include ARM (Hello, Raspberry Pi!), and this complicated the process. gox doesn’t support the ARM options, so we’d be falling back to go build. Rather than just tack on additional build commands (to the agent and all our plugins), we took this opportunity to refactor our Makefile.

We have been using Docker extensively for our tests and local builds. Some of our more complicated integration tests require external files to be present. Running the tests in a Docker container ensures we have complete control over the environment and those files are present. Docker also helps with stability and reliability as we can ensure the exact same environment between engineers. Since we’ve already been using Docker as such a major part of our development environment, we decided to cross compile our code inside a docker container as well!

That’s when we ran into an issue.

The cross compiling appeared to work perfectly, all binaries built successfully. But upon actually trying to run them, the Linux-amd64 binary would not run. The 32-bit version worked fine, and all other builds worked. Only the Linux-amd64 binary was broken. The error was:

-su: agent: No such file or directory

(Note: I was running the command as root, and the name of the binary was agent).

Okay – that doesn’t make any sense. I’m executing the binary, it’s clearly present. The error message isn’t clearly explaining what is going on. And that makes searching for a solution tricky. What’s going on?

After analyzing the new binary, we realized that it was actually missing a dependency:

brian@local:~/agent$ ldd previous-version/agent => (0x00007ffe8a7ce000) => /lib/x86_64-linux-gnu/ (0x00007f9065445000) => /lib/x86_64-linux-gnu/ (0x00007f906507b000)
  /lib64/ (0x00007f9065662000)

brian@local:~/agent$ ldd new-version/agent => (0x00007fff14bc5000) => not found

Wait a second… I thought Go built a static binary! You know, the kind where all dependencies are bundled in! That’s why deploying Go is so easy, just toss a binary on a host and start it up. Well, not quite. Go still builds binaries that depend on system libraries that are present. You can build perfect static binaries, but some of Go’s functionality requires cgo, and that requires the system libraries. However, these libraries are (almost always) present on modern systems. And in many cases, the only way to use the libraries is by dynamically linking against the system library. So why is our binary freaking out? It must have something to do with the changes we made in our compilation step, since we only just started having this issue. And the build command barely changed, the only other thing we changed was…. Docker.

Ah yes. We moved our compilation inside a Docker container. And not just any container – Alpine Linux. By default, we use Alpine Linux as our base image for all Docker containers. It’s small, efficient, lightweight, and for most applications, does a great job. In this case though, Alpine Linux has an important difference from other distributions. It uses a lightweight version of libc called musl-libc. This turned out to be the crux of the issue – the resulting binary was built against musl-libc, but on Ubuntu/Debian it found the standard glibc, and would crash with that cryptic error message.

We fixed this by changing our build Docker container to use Debian instead of Alpine Linux as it’s root image. Since only our build server needed to hold the image, we decided this was the fastest and best solution. It also better matches the environment in which the Agent most frequently runs.

In the end, we are still very happy with our usage of Docker and Go. It’s frustrating when seemingly innocuous changes cause bugs, but it’s a good reminder to test everything. We’re always looking to improve our development practices and help out others in the community!

If you’re interested, sign up for Virtyx today, or follow us on Twitter for more great posts!

Setting Up the Virtyx Agent On a Headless Raspberry Pi

By Ben Burwell • December 19, 2017

We work hard to make sure the Virtyx agent can be easily deployed on whatever hardware our customers are using. While the Virtyx agent is most commonly run on endpoints (end-user desktops and laptops) and servers, the Raspberry Pi provides a convenient platform for some specialized monitoring tasks. In this blog post, I will describe how to set up a Raspberry Pi to run the Virtyx agent.


  • A Raspberry Pi (I used a Raspberry Pi Zero W, but similar steps should work for other models)
  • A Micro SD card
  • A computer with the capability to mount the SD card
  • A power source for the Raspberry Pi
  • An Ethernet cable and available network port (if not using a wireless Pi)

Installing Raspbian

First, head over to the Raspberry Pi website and download the Raspbian Stretch Lite image. While that’s downloading, grab a copy of Etcher. Etcher is a cross-platform program for burning SD cards for use in IoT devices.

Once you’ve downloaded Raspbian and installed Etcher, connect the SD card to your computer and fire up Etcher. Follow the instructions to choose your SD card and the Raspbian ZIP archive, and burn the image. This may take a few minutes.

Pre-Boot Configuration

After Etcher finishes burning the image to the SD card, mount it on your computer. We need to enable the SSH server so you can use a truly headless setup, and if you are using a Raspberry Pi with WiFi, you’ll also need to configure the network settings.

For the following steps, I’ll assume you have a Unix-like environment at your disposal.

First, cd into the mount point of the SD card. The volume will likely be named boot. Create an empty file named ssh:

$ touch ssh

Next, if you’re using a Raspberry Pi with a wireless card, you’ll want to configure the WPA Supplicant.

$ cat << EOF > wpa_supplicant.conf
ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev


Replace country=US with the two-letter ISO-3166 country code corresponding to your country, and replace your_SSID with the SSID of the WiFi network to join, and your_PSK with your WPA pre-shared key.

Booting the Raspberry Pi

Unmount the SD card from your computer and plug it into the Pi. Connect the Pi to your network with an Ethernet cable if applicable, and connect the power cable. The Pi will take a moment to boot and acquire an IP address.

Using whatever method will work on your network, identify the IP address for your Pi. You should be able to examine the DHCP leases on your router. Alternately, a method that may yield helpful results is by using the arp(8) command to search for all the devices on your network with the b8:27:eb MAC address prefix which is assigned to the Raspberry Pi Foundation:

$ arp -a | grep b8:27:eb

Congratulations, this is a list of all the Raspberry Pis on your network! Once you’ve identified the IP address of your Pi, you’ll need to SSH in:

$ ssh pi@<PI_IP_ADDRESS>

When prompted for the password, enter raspberry. When you initially log in, you should change the password by running the passwd program. You can also modify your Raspberry Pi’s configuration by running the raspi-config program.

Installing the Virtyx Agent

First, log in to Virtyx to retrieve your API key from your Settings page. While you’re SSH’d into your Raspberry Pi, use curl to download the latest Virtyx agent and install it. You’ll want to run these commands as root:

pi@pi $ sudo su
root@pi# curl >
root@pi# unzip
root@pi# mkdir -p /.virtyx
root@pi# mv agent /usr/local/bin/virtyx-agent
root@pi# echo "{\"apiKey\":\"<YOUR_API_KEY>\"}" > /.virtyx/config.json
root@pi# virtyx-agent -install
root@pi# virtyx-agent -start

You should now have the Virtyx agent running on your Raspberry Pi. You can verify this by heading over to your Virtyx dashboard and looking for your new agent.

Now, you can add tasks as usual to start monitoring your internal network. Happy monitoring!

Network Continuous Deployment

By Ethan Mick • June 22, 2017

As a founding engineer of Virtyx, from day one we have pursued new software integration and development paradigms, such as continuous integration (CI) and Extreme Programming (XP) where unit and system testing are automated and a primary part of the development pipeline. This test driven development has spawned a plethora of products and tools that integrate and automated this strategy – tools such as Jenkins which allow an automated building, test and deployment cycle for software that we rely on at Virtyx.

The real power of these continuous integration cycles comes from the automation of unit and integration testing which is dependent on the availability of software environments which are substantially similar to the actual deployment environments. Server virtualization, software orchestration and the availability of deployment tools such as Salt, Puppet, Chef, Ansible, Docker and Vagrant make this possible for anyone moving to a cloud development model. Deploying these tools in the same environment for test and production allow an enormous amount of automation applied in test to be deployed and used for production.

However, as an environment scales and becomes more geographically diverse, it becomes significantly more complex and expensive to provide a development and test environment which is substantially similar to the production environment.

Most of the the time the biggest divergence in the environment is in fact geographic typically in the areas of bandwidth and delay to remote branch offices.

There is also a significant effort to manage the network infrastructure which supports this distributed environment, using the same tools used to create and manage the software defined datacenter environment. This means the same test drive approach should be taken for network changes that are implemented for software changes.

Herein lies the problem – how do you create network unit and system tests which measure the effects of network changes as seen from the end user perspective?

Testing from a single centralized location such as an SNMP manager only really tests the network in a single direction, from the manager to the SNMP devices, using a single protocol SNMP. The same testing is often done with ICMP PING as well. But both of these testing methods do not capture the complexity of modern web application delivery environments which may have different distributions of mid-boxes, such as NAT, Firewalls, Load Balancers and Proxies, which are not well tested with ICMP and SNMP.

They need to be tested with real HTTP and HTTPS traffic coming from all over the network.

So how would links between network change, deployment and network unit testing work? Each web application would need to have a specific crafted URL created for testing the availability of the application. Luckily, this is already done for both application unit testing and for the health testing of applications for things like elastic load balancing. The same health check URLs can be used and the reuse of these health checks assures that developers and operators (DevOps) have a common communication mechanism to say what exactly is not working correctly.

The next step in network deployment testing is to distribute across the network the validation and check of these health URLs. With the use of network mid-boxes and security segmentation of the network the behavior of a URL will change based on where it is being accessed from and when the network is changing, this effect can be even larger. So the URLs must be tested from locations across the network. With network security segmentation, two different ports on the same LAN switch may even have different access to a specific URL!

The last part of the continuous network testing puzzle is a way to measure a successful test versus a failed test. The challenge is that most networks are constantly changing due to current conditions such as failed carrier links, congestion due to time of day, or other factors which are not due to configuration changes. So a testing baseline needs to be a little more subtle – that is, everything the same before and after the network change, though even this test is often more than many environments can automate today.

So how do we implement network testing tied to software deployment testing?

  1. Same URL Unit tests for software, ELB Health and Network testing
  2. Distributed URL health checking across the internal and external networks.
  3. Fuzzy comparison of URL health check baseline responses for delay, and jitter, all content response must stay the same for a specific location but might vary by location.
  4. Continuous testing and communications of test status to all stakeholders (helpdesk, devops) as a common language for what it working and what is not.

By extending the DevOps practice of unit and integration testing to network scale deployment we get the benefits of flexible change and updates in a continuous cycle all within a manageable risk model. The legacy practice of fixed time change windows and acceptable outages during these windows does not match or, more accurately, keep up with the requirements for continuous availability and continuous deployment. By adopting continuous testing into the network there is a better match between the application development lifecycle and the lifecycle of the underlying network to support applications. This match in process assures the network can provide the same “speed to market” of project and availability that is necessary in today’s business environment.

TCP Optimization

By Jim Maniscalco • May 10, 2017

Packet loss in IP based networks is a fact that one just has to accept – physical networks always have errors of some form or another. Noise on a communications channel (link) is what limits the bandwidth over the channel. (See Shannon Channel Capacity).

There are many techniques to limit and control the loss of packets over a specific link such as error connection protocols (FEC) or local link retransmission of packets/frames. (Link Access Protocol LAPD).

The problem with these local link techniques is that they really require that every single link have them and work correctly, which, in reality, can add significantly to latency and cost of the link. Moreover, they also still can’t correct for problems which occur inside a router or switch and not on the links themselves. There still needs to be some mechanism to detect and correct packet loss from end to end. (Saltzer, J. H., D. P. Reed, and D. D. Clark (1981) “End-to-End Arguments in System Design”)

So how does the Internet deal with these issues?

Different links in the path will have different local link error correction and retransmission methods depending on the physical properties of the link. Most physical links today do not implement local retransmission protocols and these are left to the high-level protocols which are carried on the link.

In the Internet Protocol (IP) ecosystem suite there are two major end-to-end protocols. User Datagram Protocol, which does not guarantee packet order or packet delivery but rather it exposes the underlying IP datagram service to higher level applications. It does, however, provide a port based packet multiplexing mechanism.

The second major end to end protocol in the IP suite is Transmission Control Protocol (TCP). The TCP protocol provides applications with an end to end guarantee of order delivery of a stream of bytes. It does not guarantee any time of latency or delay or a constant rate of delivery of bytes. This is one of the reasons that applications which need lower or controlled latency will use UDP or direct IP and not TCP.

The details of TCP are well known. And for this blog, it is enough to know that as a TCP end station receives a set of bytes from the other end, it sends an acknowledgement that they have been received. When the sending end does not see an acknowledgement for packets it will resend a smaller set of unacknowledged bytes – and again await a reply. The fact that the number of bytes send without acknowledgement is smaller is an important TCP optimization for the network.

The End-to-End Argument in Network Monitoring

By Jim Maniscalco • March 16, 2017

The End-to-End Argument is important to analyze and understand the proper method for deployment specific network functionality.

The basic idea is that a proposed function should only be considered and implemented into the network itself when that function can be implemented in the network with such a completeness that all users of the network would benefit from its implementation. If there is any type and level of cost to implementing said function inside the network, even when most of the clients or traffic in the network does in fact use it, this cost of implementing may not make sense.

An alternative to implementing a function inside the network is for the end clients to implement the function. Examples of this are encryption, flow control and guaranteed delivery of data; the End to End argument had a significant impact on the design of the TCP protocol to guarantee reliable transmission over unreliable networks. The End to End argument has been a reliable form of governance to keep the network simple and to deliver solid performance for all clients.

One area in particular where the End to End argument has not entirely been upheld relates to network monitoring and management. Most network management and monitoring systems focus on individual components of the network such as routers and switches. It is entirely possible that each one of these component work correctly, but the clients on the network are losing packets or experiencing performance issues. The only way to see these problems is to monitor the end-to-end function of packet delivery and performance from the endpoints of the network. Monitoring from the end clients provides the ability to see the performance of the network as the End User is actually experiencing the network. This allows the monitoring system to find the effects of problems anywhere in the network path – – from user to the application.

Areas where these problems exist and which are often difficult to monitor include:

  • End station operating system, TCP stack and network hardware.
  • LAN Network, performance, packet loss, latency and other uses.
  • First Hop Router issues and router access control
  • Outbound controls such as Web URL Filtering, or Proxies.

In a recent problem we were involved with at a Virtyx client where a small set of End Users reported they were having problems accessing the Internet. This report came in via phone and quickly receded into background noise. This report was escalated to the network team, each silo (LAN, WAN, Firewall and Internet) went into their tool of choice, and to no great surprise not one silo team reported a problem.

The report from the End User did not have a definitive timestamp of when the problem commenced or when it subsequently cleared so it was difficult to correlate the problem with any other monitoring tools. If the End User workstations had end point monitoring installed on them we would have been able to see the when, where, what and why of the problem(s) that End User experienced. It would also allow the collection of additional information which the End User could not collect or convey as part of the problem report. At this point we don’t know exactly what happened or what caused the issue and it will be something which can be held over the network team until we determine the root of the problem.

Can you see what I see? Well... not really.

By Jim Maniscalco • March 16, 2017

This is an all too common exchange between IT/network support and end users accessing applications and services over the Internet. It just so happens to define the very essence of Virtyx!

The Internet is now a vital utility, and as more and more people and things (IoT) are connected, areas such as predictable performance and IT security will become both more important and more difficult to understand and manage because the principles and architectures of network reachability have changed.

In the early days of the Internet the End-to-End argument ruled and mid-boxes such as Proxies, NAT, Load Balancers and Firewalls were viewed as creating a broken network. Today these mid-boxes are now required elements in most networks; firewalls at the perimeters of enterprise networks and load balancers, proxies, CDNs in front of Internet Cloud Scale applications.

So what do these mid boxes mean for the management and troubleshooting of End User Experience problems?

It use to be that the network had transitivity: meaning that if Computer A could communicate with B and Computer B could communicate with C then A could communicate with C. This is no longer the case. The mid-boxes are starting to block and control different communications patterns from different source addresses in the network.

Just because the network management system can communicate with an application does not mean that an End-User can communicate with the same application. On the Internet, with Content Distribution Networks (CDNs) and Global Load Balancing via DNS changes, the End-User might not even be communicating to the same data center. Full stop.

What does this mean for network monitoring and management?

It means that you need to monitor the network and applications from exactly the same the network view of the End Users. If you want to provide the best performance for your End Users, It is no longer sufficient to monitor the network from specific centralized management systems.

The monitoring of the network must be from the end users perspective. If it is not working for the user, it is not working.

We live this world so we hope you follow us to figure out what End User Network Monitoring can do to improve operations and performance.

Virtyx is the easiest way to monitor and fix desktops, laptops, and servers. Our integrated tools provide the context you need to solve problems easily from anywhere.

Get Started for Free

Contact Sales

We’re hiring! Check out our current openings.

Start using Virtyx today.

If you manage computers or servers, Virtyx can make your life easier.