29 May 2015, 17:23

Integrating New Relic with Python, Flask, and uWSGI Emperor

I recently needed to integrate one of my Python Flask apps with New Relic to get some monitoring insight. The app is being served via uWSGI running in Emperor mode. The docs available from New Relic weren’t particularly clear to me so it took me a little while longer than it should have, but I got it ironed out.

I’m going to assume that if you’re trying to do this, you’ve already got uWSGI running in Emperor mode and that your vassals are configured.

First, ensure that the New Relic module is installed and that your newrelic.ini is created correctly (New Relic’s docs are fine for this) and that it resides in a location and has permissions where your app can read it. I like to leave mine in the root of my application.

Second, ensure that your vassal is configured correctly for New Relic to work. There are two required settings for New Relic; enable-threads = true and single-interpreter = true. I didn’t have any luck with using the eval or env configuration options presented by New Relic, so those aren’t present here. My dev vassal looks like this:

[uwsgi]                                                                                                                                                                                                   
plugins = python                                                                                                                                                                                          
chdir = /path/to/myapplication                                                                                                                        
socket = /path/to/myapplication/tmp/uwsgi.sock                                                                                                                             
uid = www-data                                                                                                                                                                                            
gid = www-data                                                                                                                                                                                            
enable-threads = true                                                                                                                                                                                     
single-interpreter = true                                                                                                                                                                                 
module = myapplication:app                                                                                                                                                                                  
chmod-socket = 666                                                                                                                                                                                        
logto = /var/log/uwsgi/dev.myapplication.log                                                                                                                                                                
catch-exceptions = true                                                                                                                                                                                   
py-reload = 2

At this point, you should be able to restart uWSGI and effectively see no change in behavior.

Because of the previously mentioned issues using env and eval to get this all running, my New Relic setup is actually part of the app itself, which I actually prefer. The magic bits to get this to work inside of your app are:

import newrelic.agent                                                                                                                                                                                 
newrelic.agent.initialize('./newrelic.ini') 

I have these immediately after the rest of the imports in my app. Note that this obviously assumes that you have the newrelic.ini in the same place that I do – right next to the application.

Now if you restart uWSGI, make a few requests to your app, and wait a few minutes, you should start getting metrics reported to New Relic.

06 Apr 2015, 16:13

Synology SHR array wrong size after expanding

I recently replaced a 1TB drive (Seagate Barracuda) in my Synology DS-1813+’s SHR-2 array with a 4TB drive (Western Digital Red). During that process, I had another drive which was on it’s last leg (another Seagate Barracuda, 4TB this time) die. I replaced the 4TB Seagate with a 6TB Western Digital Red drive. After everything was finished rebuilding and expanding, I was left a very small change in the capacity of the volume. For having added 5TB to the array, I was seeing about a 1TB change in capacity. That didn’t seem right to me.

So I asked Reddit. The answer there was “Well, SHR hides the complexity of RAID, bla bla bla.” So I asked Synology support. The answer from them was “Well, the calculator on the site is only for new arrays, what you’ll actually see when expanding is bla bla bla.” Neither of those answers were reasonable to me, so I started digging.

As it turned out, when looking at cat /proc/mdstat, I saw (something similar to, recreated from memory) this:

~ # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md5 : active raid6 sdf8[0] sda8[5] sdd8[4] sde8[3] sdh8[2] sdg8[1]
      3906585344 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md4 : active raid6 sde7[5] sdd7[6](S) sda7[7](S) sdg7[3] sdf7[2] sdh7[4]
      1953485568 blocks super 1.2 level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md3 : active raid6 sdh6[6] sdd6[9](S) sda6[10](S) sdg6[5] sdf6[4] sdb6[7] sde6[8] sdc6[1]
      5860456704 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md2 : active raid6 sde5[9] sda5[11](S) sdg5[6] sdf5[5] sdb5[8] sdd5[10] sdc5[2] sdh5[7]
      5832165888 blocks super 1.2 level 6, 64k chunk, algorithm 2 [7/7] [UUUUUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md1 : active raid1 sda2[7] sdb2[0] sdc2[1] sdd2[2] sde2[6] sdf2[3] sdg2[4] sdh2[5]
      2097088 blocks [8/8] [UUUUUUUU]

md0 : active raid1 sda1[7] sdb1[1] sdc1[3] sdd1[2] sde1[6] sdf1[4] sdg1[5] sdh1[0]
      2490176 blocks [8/8] [UUUUUUUU]

unused devices: <none>

At first glance, everything looked fine. I ran an lsblk, and everything seemed fine there too. I checked mdadm --examine /dev/md[0,1,2,3,4,5] and all of that seemed reasonable. Except, not quite.

The results from mdadm --examine /dev/md[2,3,4] showed that several of the partitions had been added to the array as spares, and if you look closely at the cat /proc/mdstat above, that’s confirmed by looking at the devices that in arrays – some of them have an (S) after them, also indicating spare. You’ll also notice from that output that bitmaps were enabled which I had done from a previous rebuild operation.

I believe what happened was that, because I had left bitmaps on, the Synology (actually, mdadm), wasn’t able to successfully execute the mdadm --grow /dev/md[2,3,4] --raid-devices=N (where N is the new number of devices) after it had successfuly performed the (for example) mdadm --add /dev/md2 /dev/sda5. Because of that, the devices were only added as spares and not integrated in to the array, and the subsequent resize2fs command had no additional capacity to resize to.

What I ended up doing was mdadm --grow /dev/md[2,3,4] --bitmap=none, and then for each of the md devices, mdadm --grow /dev/mdX --raid-devices=N, X being the md device, and N being the number of devices currently in the array plus the number marked as spare.

After each of those commands completed, DSM happily reported that I could expand the space. I wanted to get all of the devices in the array before I expand the first, so I finished all of those first. And then, through DSM, I expand the space. Doing this, I was able to recover nearly 5TB of “lost” capacity to the volume.

21 Aug 2014, 20:14

Synology NAS (DS1813+) degraded array for md0 and md1 after rebuild

I recently purchased a Synology DS1813+ to replace my troubled Drobo-FS. The migration process was long and arduous, consisting of a handful of rebuilds on both the DS side as well as the Drobo side as I shuffled data and moved disks.

During my final rebuild on the DS side (which is an SHR-2 array), I experienced a drive failure in Bay 3 (a Seagate 3TB Barracuda) which resulted in a hard-lock of the device requiring a reboot. When the DS came back up, the drive was available to to DiskStation Manager (DSM), however it wasn’t part of the array, and no amount of mdadm fiddling would re-add it, so through DSM I requested a rebuild of the array to that disk.

Unfortunately, part way through the rebuild, that disk failed again and dropped out of the array and out of the OS as well; it wasn’t found anywhere. Moments after that, the disk in Bay 2 (another Seagate Barracuda, 2TB) dropped from the OS.

At this point I initiated a rebuild with a 4TB Western Digital Red drive that I had configured as a hot stand-spare.

26 hours later, the rebuild finished, though my array was still degraded due to the lack of two drives at this point. I rebooted the DS, it picked up Bay 2 again, and everything was happy. Almost.

DSM reported that the DS was in good condition, however cat /proc/mdstat had something else to say:

~ # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md4 : active raid6 sdh7[4] sdd7[0] sdg7[3] sdf7[2]
      1953485568 blocks super 1.2 level 6, 64k chunk, algorithm 2 [4/4] [UUUU]

md3 : active raid6 sdb6[7] sdh6[6] sdc6[1] sdg6[5] sdf6[4] sdd6[2]
      3906971136 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md2 : active raid6 sdb5[8] sdh5[7] sda5[0] sdg5[6] sdf5[5] sdd5[3] sdc5[2]
      4860138240 blocks super 1.2 level 6, 64k chunk, algorithm 2 [7/7] [UUUUUUU]

md1 : active raid1 sdh2[4] sdg2[6] sdf2[5] sdd2[3] sdc2[2] sdb2[1] sda2[0]
      2097088 blocks [8/7] [UUUUUUU_]

md0 : active raid1 sdb1[2] sdh1[1] sda1[0] sdc1[4] sdd1[3] sdf1[5] sdg1[6]
      2490176 blocks [8/7] [UUUUUUU_]

unused devices: <none>
~ #

Yes, it would seem as though my rebuild missed md0 and md1. I found that very curious, because they were part of the rebuild process when I was nervously querying cat /proc/mdstat.

After a day and a half of nervously inspecting partitions, configurations, and mdadm’s output, I discoverd that md0 and md1 aren’t my devices in that they don’t hold any of my data. When I queried pvdisplay, they weren’t listed in any of LVM’s volumes, and when mounting them, they appeared to contain replicas of the OS (which I do suppose makes sense).

I was able to address the issue by issuing mdadm --grow -n 7 /dev/md[01] which caused those two arrays to “grow” (in this case, shrink) by one device. That happened immediately, and a subsequent cat /proc/mdstat showed fully happiness across the board:

~ # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md4 : active raid6 sdd7[0] sdg7[3] sdf7[2] sdh7[4]
      1953485568 blocks super 1.2 level 6, 64k chunk, algorithm 2 [4/4] [UUUU]

md3 : active raid6 sdh6[6] sdg6[5] sdf6[4] sdb6[7] sdd6[2] sdc6[1]
      3906971136 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md2 : active raid6 sda5[0] sdg5[6] sdf5[5] sdb5[8] sdd5[3] sdc5[2] sdh5[7]
      4860138240 blocks super 1.2 level 6, 64k chunk, algorithm 2 [7/7] [UUUUUUU]

md1 : active raid1 sda2[0] sdb2[1] sdc2[2] sdd2[3] sdf2[5] sdg2[6] sdh2[4]
      2097088 blocks [7/7] [UUUUUUU]

md0 : active raid1 sda1[0] sdb1[2] sdc1[4] sdd1[3] sdf1[5] sdg1[6] sdh1[1]
      2490176 blocks [7/7] [UUUUUUU]

unused devices: <none>

Now, with one bay empty, I just have to wait on my last 4TB WD Red to arrive to be configured as a replacement hot-spare, and I’ll be in business!

1: http://www.synology.com/en-us/products/overview/DS1813+

11 Jun 2014, 17:09

Git credential caching on Windows [Updated]

I previously wrote (quite a while ago, actually), about git credential caching on Windows. Now, I know I’m a bit late to the party here, but as of msysgit 1.8.1, the wincred credential helper has been added to the distribution. That means that everything you saw at the previous post is no longer required, and all you have to do is:

git config --global credential.helper wincred

13 Apr 2014, 05:00

How To (Not) Be Agile

Agile. Now the corner-stone of any development organization, agile processes and development methodologies have taken over businesses, tech. articles, and the lives of the engineers tasked with playing their song. I’m not sure you could find a software engineer job listing that didn’t ask for “agile experience.” The entire idea behind agile processes is to enable us, as engineers, to respond quickly to changing business requirements – to be agile – right? If that’s truly the case, then why are they so painful?

For even a small team of 4 contributors and a dev manager (or program manager, product owner, etc), [sprint planning] is 2 days worth of salary for 5 people to just decide what we’re actually going to be working on. That’s really expensive.

Take for example Scrum, in its purest sense. In a nutshell and in contrast to “Waterfall Development” Scrum enables us to work in (typically) 2-week periods of development (known as “sprints”), so that we don’t spend ages up-front ironing all of the business requirements, design specs, user interactions, etc. just to watch them change in a month. To start every sprint, we typically spend a day or two selecting the stories that have written by our product owner, ensuring that they fit within our velocity, and then breaking those stories down in to actionable tasks. For even a small team of 4 contributors and a manager, that’s 2 days worth of salary for 5 people to just decide what we’re actually going to be working on. That’s really expensive. If you could eliminate that, you get another engineer on the team for “free.” Not only is this an expensive process, it’s absolutely soul crushing for those involved. During a sprint, we spend an alleged 10-15 minutes (which usually turns in to 30-45 minutes) for our “stand-up” every morning so that everybody is on the same page.

As another example, let’s look at Kanban. With this, we’ve managed to kill off the 1-2 day sprint planning meetings, but we still have the the stories on our Kanban board, and we still have our overly-long and grueling stand-up. Further, because we’ve removed the sprint planning and task break down, the level of detail that ends up going in the stories is far too high and the number of meetings to try and clarify that detail quickly gets out of hand. Not only that, because we’re constantly evolving a single story, the amount of work that ends up part of it grows, and that kills our velocity. What started out with good intentions has turned problematic.

That’s two different flavors of “Agile” processes, both still have weaknesses.

Saving Time And Money

One of the more recent changes I tried to implement on one of my teams to help mitigate some of this was to reduce the amount of time spent in our daily stand-up. A stand-up generally tasks each member of the team with answering three questions:

  1. What did you do yesterday?
  2. What are you doing today?
  3. Are you blocked?

The first run at this goal got us down to:

  1. What’s remaining on the story you’re working on?
  2. Are you blocked?

The end result was:

  1. Are you blocked?

In essence, the entire meeting was reduced to about 60-120 seconds of time where everybody working on the project would be in the same physical place at the same time to facilitate getting a hold of somebody if you needed to. That’s really it.

Anything outside of answering that single question has no real bearing; nearly all of the other information is available by looking at the Kanban board, and if more detail is needed about something, that’s a smaller conversation that doesn’t require the entire team and can be organized with “I need to talk to you and you after this.” In essence, the entire meeting was reduced to about 60-120 seconds of time where everybody working on the project would be in the same physical place at the same time to facilitate getting a hold of somebody if you needed to. That’s really it.

Expanding The Idea

What if we could throw out nearly all of the process? The “Processless Process”?

Most recently, two co-workers and I participated in our organization’s Hackathon. We took a slightly different approach than other teams did. While we did produce some great technical artifacts as well as some great metrics and statistics, we had a completely different focus; hack the process.

Our goal was to see just how much work we could crank out in two days using 3 people if the entire process was effectively thrown out. Bring two engineers and a creative guy together to just do their jobs without things getting in the way and see how successful could they be.

It turns out that if you let you talented and skilled people do what they’re good at and passionate about, a lot gets accomplished in a short amount of time.

It turns out that if you let you talented and skilled people do what they’re good at and passionate about, a lot gets accomplished in a very short amount of time. While we didn’t create production-ready code by any means, we did generate an entirely new site design, implement a fault-tolerant API abstraction and caching layer, migrate our assets to .webm and .webp saving between 60-80% of our file size, put together a new search results page with three different layout styles, and a new detail page. In two days.

The greatest outcome from that endeavor came from demo itself, garnering comments such as “Why aren’t we [working like this] today?”

Where Are We Now?

While it’s taken quite a while to grow roots, the fruits of that labor are beginning to bear. There a few experimental projects running through the organization now, and while the idea has been given a marketing-esque name, the concept itself is intact; dissolve the unnecessary process over-head, enable individuals to thrive and succeed without putting up roadblocks, and watch the speed at which development progresses.

These changes have enabled us to work very loosely and iteratively with the business. Our requirements are now akin to “this is kind of what we’re thinking” and our UX comps are more of a “this is kind of what we want it to look like”. We have a closer relationsihp with both UX and the business to refine and iterate on the design and the concept itself. We’re no longer risking throwing out large portions of work when we come across something that doesn’t work for one reason or another.

Our sprints, meetings, stories, story planning and task breakdown, and our daily standup have all been replaced with walking over to someone or shooting them an IM or an email and asking a question. We have one weekly meeting, for about 30 minutes, to go over customer and business feedback. That’s it.

Further, our time to market is drastically reduced. Pushing our in-progress code to production behind an A/B test that’s limited to a very small fraction of our users enables us to gather real-world metrics from our actual customers without affecting the bulk of the site.

Feedback across the board has been positive; the engineers like it, the UX/UI teams like it, the PMs, POs, and dev. managers like it, the directors like it. Thus far, there’s not been (to my knowledge) a single negative thing that’s been said with one exception – the concern that we’re moving too fast for the business. I’m not quite sure what that means yet, but I think it’s a good thing.

Finally, we’re being agile instead of doing agile. And it feels good.

11 Mar 2014, 17:50

Installing Ruby with RVM on Archlinux

I’ve started doing some Ruby development. The commonly used platform for the development team(s) tends to be one of OS X, Ubuntu, or Mint. I do not fall in to that category.

Getting RVM to install Ruby under Archlinux was fraught with issues. Most of them were trivial and just a matter of getting the right dependencies installed. There was, though, one issue that was particularly misleading. When trying to rvm install 2.1.1, I received this error:

Error running '__rvm_make -j2',
showing last 15 lines of /home/sbarker/.rvm/log/1394556125_ruby-2.1.1/make.log
make[2]: Leaving directory '/home/sbarker/.rvm/src/ruby-2.1.1/ext/readline'
exts.mk:199: recipe for target 'ext/readline/all' failed
make[1]: *** [ext/readline/all] Error 2
make[1]: *** Waiting for unfinished jobs....
compiling ossl_x509crl.c
compiling ossl_digest.c
compiling ossl_pkey_rsa.c
compiling ossl_engine.c
compiling ossl_ssl.c
installing default openssl libraries
linking shared-object openssl.so
make[2]: Leaving directory '/home/sbarker/.rvm/src/ruby-2.1.1/ext/openssl'
make[1]: Leaving directory '/home/sbarker/.rvm/src/ruby-2.1.1'
uncommon.mk:180: recipe for target 'build-ext' failed
make: *** [build-ext] Error 2
There has been an error while running make. Halting the installation.

A similar-ish error occurred when trying to install Ruby 2.0.0.

Initially this looked to be an error with OpenSSL dependencies. I spent longer than I would care to admit going down that road, trying several different options, including rvm autolibs enable, all to no avail. I finally gave up, and in admitting defeat, spun up an Ubuntu VM. And behold, the same error occurred. It wasn’t an OS issue.

Back to the drawing board, I dove in to ~/.rvm/src/ruby/2.1.1 to ./configure and make the package by hand to see if I could discover anything new. As it turns out, seeing the full context around the error pointed me in the right direction.

The first bit of the error reported by RVM is the key.

Error running '__rvm_make -j2',
showing last 15 lines of /home/sbarker/.rvm/log/1394556125_ruby-2.1.1/make.log
make[2]: Leaving directory '/home/sbarker/.rvm/src/ruby-2.1.1/ext/readline'
exts.mk:199: recipe for target 'ext/readline/all' failed
make[1]: *** [ext/readline/all] Error 2
make[1]: *** Waiting for unfinished jobs....

The issue lies with readline and not with openssl dependencies. RVM’s autolibs setting obviously didn’t trigger, since readline was present. I told RVM to install its own readline with rvm pkg install readline. When that finished building, I attempted to install Ruby 2.1.1 again with rvm install 2.1.1 -C --with-readline-dir=$HOME/.rvm/usr. The result, great success. RVM happily downloaded, built, and installed 2.1.1 for me. I repeated this for 2.0.0 and also had great success. 1.9.3 was never an issue, and that continued to be not an issue.

05 Mar 2014, 00:24

How to fix Visual Studio 2013 not seeing unit tests

I recently ran in to this issue where Visual Studio 2013 didn’t see any of my MSTest unit tests, but only one machine. The repository for the code was identically cloned on three different machines and only the one was exhibiting the problem. I spent at least five days trying to solve this problem, trolling through StackOverflow, Google, Bing, and various forums, all to no available.

I finally found the solution.

Symptoms

  • If I were to right-click on the solution or project level and tell Resharper to run all the tests in the solution or the project, I would get an error “No tests found in file” or “No tests found in solution.”

  • If I opened a code file that contained a test class and test methods, the Resharper test runner would sometimes load the tests, but if I tried to run them they wouldn’t actually run, and Resharper would report them all as Inconclusive.

  • If I were to tell Visual Studio to run all tests, the progress bar would complete, but the list of tests in the test runner would still be empty.

Solution

As it turns out, there was one difference with this single environment vs. the other two; the location of the code. In this one environment, the source code was located on a mapped network share. That was preventing Visual Studio from being able to work with the code properly. You may at some point have gotten a dialog from Visual Studio when opening the project or solution telling you that the code is an untrusted location. That’s the root of the issue. Visual Studio will still open the solution, compile the code, and run it, but the test runner won’t be able to do anything with it. The solution is simple, but stupid.

Open the Windows control panel, and open the Internet Options. From there, select the Security tab. Select the Trusted Sites group, and click the Site button. In the dialog that opens, add the IP address of the machine that the network share lives on. Save your back through the various dialogs, restart Visual Studio, clean and rebuild your project, and now the Visual Studio test runner or Resharper should have no problem seeing your tests.

Special Thanks to Microsoft

You know, for silently failing when not being able to work with the files that you asked it to work with.

29 Jan 2014, 02:33

Another VPN update, Private Internet Access

So a friend of mine noticed my previous post about updating all of my VPN configuration and asked the question, “Why did you decide to go with HideMyAss when they keep logs?” I had a couple of reasons, the primary was that they weren’t the provider that I was using before, and secondarily because I didn’t really care that much.

The more that I thought about it, the more that I decided that I did care that much. To that end, I’ve taken his non-direct advise and switched providers yet again. This time, I’ve gone with PrivateInternetAccess.com, who according to their privacy policy, don’t collect any logs.

Due to the changes that I outlined in my previously mentioned post, switching providers was very, very trivial. I only needed to add a new peer, which I symlink‘ed to vpn, updated my chap-secrets with the relevant information, and I had a small change my ip-up.d and ip-down.d scripts that were responsible for my iptables rules to route all of my traffic through the ppp0 interface, as well as my fastest_ip.py script to find the fastest route.

I know I can still improve on this process, but for now, done and done.

13 Jan 2014, 09:22

Follow-up: Squid, Sick-Beard, Deluge and a VPN, now with 100% more HideMyAss

So, it’s been a little bit over half a year since I published the article about how to set up an always-on seed-box/VPN using Squid, Sick-Beard, and Deluge. A little bit has changed since then.

First, I no longer use IPVanish. I had an issue with them where they double charged me for a month, and gave me a little bit of a run-around trying to resolve it. Specifically, after contacting their support, they told me that only one of the transactions was successful, the other failed. My PayPal account and my financial institute disagreed. Then they told me I’d have to take it up with PayPal. I took this as a sign that it was time to switch providers. To their (mild) credit, after pressing them for more information, they just went ahead and reversed the charge. Unfortunately for them (not that they probably care that much), I had already switched providers. I now use HideMyAss Pro VPN (disclosure: that’s an affiliate link).

In addition to having switched to HideMyAss Pro VPN, I’ve updated the infrastructure in a couple of different ways to be a bit easier to work with and a bit more flexible.

First, there’s no longer a ipvanish config file, since that’s been replaced with a hidemyass file. But that’s been symlinked to vpn via ln -s hidemyass vpn. That file, just as with the previous one for ipvanish contains the necessary config bits to connect to HideMyAss. The options.pptp file isn’t referenced, so I just left that alone and it’s ignored. I updated chap-secrets to contain the credentials that I use for HideMyAss. Of note, HideMyAss uses a different password for PPTP and L2P connections than your normal password. Find that in your dashboard.

The ipvanish.service unit for systemctl has been renamed to vpn.service so that it’ll stand up semantically to provider changes. It’s also been updated to remove any ipvanish references in favor of the more generic term vpn. It’s also not directly calling pon anymore to turn the VPN on and off. I created a couple of scripts to manage this for me.

The togglevpn.sh script is what’s called by the systemctl unit vpn.service. It just passes on or off just as were passed directly to pon. That script first calls update_vpn_to_fastest_ip.sh which calls fastest_ip.py to retrieve the IP of the fastest VPN node that’s near me (this is just a local-ish subset of the IPs that HideMyAss provides), and updates the /opt/ppp/peers/vpn link (which points to /opt/ppp/peers/hidemyass) to use that IP. After that, pon is called to turn the VPN on. Finally, Squid is updated with update_squid_outgoing_ip_to_interface.sh and then restarted.

/opt/togglevpn.sh:

#!/bin/sh
case "$1" in
on)
        echo "Finding fastest IP..."
        /opt/update_vpn_to_fastest_ip.sh
        sleep 2s
        echo "Turning VPN on..."
        /usr/bin/pon vpn
        sleep 2s
        /opt/update_squid_outgoing_ip_to_interface.sh ppp0
        sleep 2s
;;

off)
        echo "Turning proxy off..."
        /usr/bin/poff vpn
        sleep 2s
        /opt/update_squid_outgoing_ip_to_interface.sh eth0
        sleep 2s
;;

restart)
        $0 off
        $0 on
;;
esac

systemctl restart squid

/opt/fastest_ip.py:

#!/usr/bin/python2.7
# Finds the fastest Seattle IP for HMA

import sys
import re
from subprocess import Popen, PIPE
from threading import Thread

ips = [
        "173.208.32.98",
        "216.6.236.34",
        "108.62.61.26",
        "216.6.228.42",
        "173.208.32.66",
        "173.208.32.74",
        "208.43.175.43",
        "70.32.34.90",
        "108.62.62.18",
        "173.208.33.66",
        "23.19.35.2"
]

fastest_ip = ""
lowest_ping = 100
for ip in ips:
        p = Popen(['/usr/bin/ping', '-c 1 ', ip], stdout=PIPE)
        time = str(p.stdout.read())
        m = re.search("time=([0-9.]+) ms", time)
        if m:
                ms = float(m.group(1))
                if ms < lowest_ping:
                        lowest_ping = ms
                        fastest_ip = ip
                #print("%s is alive.  round trip time: %f ms" % (ip, ms))

#print("Fastest ip is %s at %s" % (fastest_ip, lowest_ping))
print(fastest_ip)

/opt/update_vpn_to_fastest_ip.sh:

#!/bin/bash
ipaddy=`/opt/fastest_ip.py`

echo "Updating VPN to $ipaddy..."
sed -i -e "s/^pty.*/pty \"pptp $ipaddy --nolaunchpppd\"/g" /etc/ppp/peers/vpn</pre>

/opt/update_squid_outgoing_ip_to_interface:

#!/bin/bash
case "$1" in

ppp0)
        ipaddy=`ip addr | grep ppp0 | grep inet | cut -d' ' -f6`
;;

eth0)
        ipaddy=`ip addr | grep eth0 | grep inet | cut -d' ' -f6 | sed 's/\/24//g'`
;;

esac

echo "Updating squid to $ipaddy..."
sed -i -e "s/^tcp_outgoing_address.*/tcp_outgoing_address $ipaddy/g" /etc/squid/squid.conf</pre>

All in all this works rather well for me. I have occasional issues with ppp0 dropping out. I’m not sure if this is my problem or theirs, but I just log in and systemctl restart vpn and I’m back to the races. I’ve considering setting up a cron job to do this for me every hour or so, but it’s not been that much of a problem.

19 Jun 2013, 10:14

Installing Squid, Sick-Beard, Deluge, and an always-on VPN (IPVanish) on Archlinux for an automated seed box

I recently signed up for VPN service through IP Vanish (well, several providers, but that’s the one that stuck). While I like their client software, I was mildly annoyed with having to start and stop the thing when I wanted to run traffic through it, and with having it run ALL my traffic when that’s not necessarily what I wanted.

My solution was to spin up an Archlinux Hyper-V virtual machine on Windows Server 2012 server, and configure it to be a Squid caching proxy and VPN. Then I just pointed the applications that I wanted at it and let it proxy my traffic through the VPN. I went one step further by abandoning uTorrent and installing Deluge and BrickyBox’s Sick-Beard clone for torrent management and saving data to my Drobo-FS.

Note: I have removed all of the comments from these configuration files since most of them were in the default files to begin so you can still really read them if you want, and aren’t relevant to the configurations themselves. I encourage you to understand what these files are actually doing, not just pasting them in to your configs.

Configuring PPP for the IPVanish VPN

Dependencies

pacman -S pptpd

Configuration

/etc/ppp/chap-secrets:

# Secrets for authentication using CHAP
YOUR_USER_NAME	SERVER_ALIAS	PASSWORD    BIND_IPS

Obviously, replace YOUR_USER_NAME, SERVER_ALIAS, and PASSWORD with your specific information. For BIND_IPS, I used an asterisk to bind to all ip addresses. You can be more specific here if you’d like.

/etc/ppp/peers/ipvanish:

persist
maxfail 0
pty "pptp sea-a01.ipvanish.com --nolaunchpppd"
name YOUR_USER_NAME
remotename SERVER_ALIAS
require-mppe-128
file /etc/ppp/options.pptp
ipparam SERVER_ALIAS
updetach

Again, change YOUR_USER_NAME to reflect your IP Vanish username, make sure that SERVER_ALIAS matches what you put in chap-secrets, and use the server that you want to connect to for the pty parameter.

/etc/ppp/options.pptp:

lock
noauth
nobsdcomp
nodeflate

Enable traffic routing

Now that we have a functioning VPN, we want to route all of our traffic through it. Be sure to chmod +x both of these.

/etc/ppp/ip-up.d/10-start-all-to-tunnel-routing.sh:

PRIMARY=eth0
SERVER=$5
GATEWAY="192.168.1.1"
CONNECTION=$6
if [ "${CONNECTION}" = "" ]; then CONNECTION=${PPP_IPPARAM}; fi
TUNNEL=$1
if [ "${TUNNEL}" = "" ]; then TUNNEL=${PPP_IFACE}; fi
if [ "${CONNECTION}" = "ipvanish" ] ; then
 ip route del ${SERVER} dev ${TUNNEL}
 if [ "${GATEWAY}" = "" ] ; then
   ip route add -host ${SERVER} dev ${PRIMARY}
 else
   ip route add -host ${SERVER} gw ${GATEWAY} dev ${PRIMARY}
 fi
 ip route del default ${PRIMARY}
 ip route add default dev ${TUNNEL}
fi

/etc/ppp/ip-down.d/80-stop-all-to-tunnel-routing.sh:

PRIMARY=eth0
SERVER=$5
GATEWAY="192.168.1.1"
CONNECTION=$6
if [ "${CONNECTION}" = "" ]; then CONNECTION=${PPP_IPPARAM}; fi
TUNNEL=$1
if [ "${TUNNEL}" = "" ]; then TUNNEL=${PPP_IFACE}; fi
if [ "${CONNECTION}" = "ivanish" ] ; then
 # direct packets back to the original interface
 ip route del default ${TUNNEL}
 ip route del ${SERVER} dev eth0
 if [ "${GATEWAY}" = "" ] ; then
   ip route add default dev ${PRIMARY}
 else
   ip route add default gw ${GATEWAY} dev ${PRIMARY}
 fi
fi

Creating a custom systemctl unit

To help facilitate automation, I create a custom systemctl unit for the VPN so I wouldn’t have to manually start and stop it all the time.

/usr/lib/systemd/system/ipvanish.service:

[Unit]
Description=IPVanish Proxy
After=network.target

[Service]
Type=oneshot
RemainAfterExit=yes
PIDFile=/run/ipvanish.pid
ExecStart=/usr/bin/pon ipvanish
ExecStop=/usr/bin/poff ipvanish

[Install]
WantedBy=multi-user.target

After you create the unit, you can start and stop the proxy with systemctl start ipvanish and systemctl stop ipvanish respectively. You can also make it start at boot with systemctl enable ipvanish.

Installing Squid

Dependencies

pacman -S squid

Configuration

The Squid package will automatically create a proxy user for you, as well as the necessary systemd units. The only changes that are necessary are in the /etc/squid/squid.conf file. A lot of those changes are going to be predicated on your caching needs. I’m not going to go in to too much detail here, and just show the two lines that you need in your squid config to make this work. The rest of the stuff for actually storing objects and ACLs and the like, I’ll leave as an exercise to the reader.

/etc/squid/squid.conf:

tcp_outgoing_address 172.20.0.3 # This is the IP of your VPN
http_port 192.168.1.126:3128 # This is the IP of your machine

Mounting network shares

Dependencies

pacman -S smbclient autofs

I am mounting my network shares with AutoFS so they’ll come up as soon as someone (deluged) tries to use them. This will mount the one share specified in my auto.media file as /mnt/MOUNT_NAME. Be sure to change NAS_IP, NAS_PATH, and MOUNT_NAME to reflect your setup. In the credentials file, set your USERNAME and PASSWORD for the user that you’ll be connecting to the NAS as. The dir_mode and file_mode directives are the UMASK for the mount points. Mine are set to 777 so that everybody has write access to them, specifically the deluged user.

/etc/autofs/auto.master:

/mnt /etc/autofs/auto.media

/etc/autofs/auto.media:

MOUNT_NAME -fstype=cifs,file_mode=0777,dir_mode=0777,credentials=/etc/samba/credentials,workgroup=WORKGROUP ://NAS_IP/NAS_PATH

/etc/samba/credentials:

username=USERNAME
password=PASSWORD

Installing Deluge

Dependencies

pacman -S deluge python-mako

This was incredibly trivial. Just install the packages from the Arch repository. I did some light configuration through the web-ui to point deluged at my mounted NAS shares, and I was off to the races.