Azure & Chill

on-prem = legacy

Azure Site-to-Site on Unifi Security Gateway

It can be done. A little painful but not bad.

It can be done. A little painful but not bad.

Recently decided to upgrade from my Netgear SRX5308 here at home to a shiny new Unifi Security Gateway (v3). Quieter, much less power consumption, but most importantly, higher WAN-to-LAN throughput than the Netgear, which his a requirement now that ATT and Google have brought 1Gbps+ fiber to Charlotte.

I’ve used the Unifi APs now for years and loved them, the AC access points are particularly good. Blazing fast and great coverage throughout the house with only two of them. Dozens of 2.4GHz networks nearby, but 5GHz is pretty much completely empty. I was on the lookout for more Unifi gear and the USG caught my eye. What seems like a repackaged EdgeMAX Router has one big difference – instead of an onboard web server and config UI, all the config is done by the central Unifi Controller. I like this, since I’m already using it for my APs, and figured more integration would be neat.

I was wrong. To say the least, this was a huge PITA. The UI has no support for site-to-site VPN configuration (or a host of other features), which required all command line config. I’m ok with that, although it can be a bit painful. But what’s really bad here is that there’s a convoluted configuration-persistence process you have to go through, otherwise, making changes in the UI wipes out all other configuration. Read that again. This is an overwrite, not a merge. Pretty unbelievable.

I spent the other night getting the tunnel up and running. This wasn’t that bad, really, just what you’d expect. Phase1/Phase2 config, some NAT config, etc. Tunnel connected – all was good outbound (house –> Azure), but not inbound. Spent a couple of days fighting it and finally it dawned on me I was missing ACLs. Anyway – here’s the CLI config for the S2S tunnel:

set vpn ipsec esp-group azure-esp
set vpn ipsec esp-group azure-esp lifetime 3600
set vpn ipsec esp-group azure-esp pfs disable
set vpn ipsec esp-group azure-esp mode tunnel
set vpn ipsec esp-group azure-esp proposal 1
set vpn ipsec esp-group azure-esp proposal 1 encryption aes256
set vpn ipsec esp-group azure-esp proposal 1 hash sha1
set vpn ipsec esp-group azure-esp compression disable

set vpn ipsec ike-group azure-ike
set vpn ipsec ike-group azure-ike lifetime 28800
set vpn ipsec ike-group azure-ike proposal 1
set vpn ipsec ike-group azure-ike proposal 1 dh-group 2
set vpn ipsec ike-group azure-ike proposal 1 encryption aes256
set vpn ipsec ike-group azure-ike proposal 1 hash sha1

set vpn ipsec ipsec-interfaces interface eth0
set vpn ipsec logging log-modes all
set vpn ipsec nat-traversal enable

set vpn ipsec site-to-site peer <Azure Gateway IP>
set vpn ipsec site-to-site peer <Azure Gateway IP> local-ip <Local Public IP>
set vpn ipsec site-to-site peer <Azure Gateway IP> authentication mode pre-shared-secret
set vpn ipsec site-to-site peer <Azure Gateway IP> authentication pre-shared-secret <Azure Key>
set vpn ipsec site-to-site peer <Azure Gateway IP> connection-type initiate
set vpn ipsec site-to-site peer <Azure Gateway IP> default-esp-group azure-esp
set vpn ipsec site-to-site peer <Azure Gateway IP> ike-group azure-ike

set vpn ipsec site-to-site peer <Azure Gateway IP> tunnel 1
set vpn ipsec site-to-site peer <Azure Gateway IP> tunnel 1 esp-group azure-esp
set vpn ipsec site-to-site peer <Azure Gateway IP> tunnel 1 local subnet <Local Subnet>
set vpn ipsec site-to-site peer <Azure Gateway IP> tunnel 1 remote subnet <Azure Subnet>
set vpn ipsec site-to-site peer <Azure Gateway IP> tunnel 1 allow-nat-networks disable
set vpn ipsec site-to-site peer <Azure Gateway IP> tunnel 1 allow-public-networks disable

set service nat rule 5000 description 'NAT to Azure'
set service nat rule 5000 destination address <Azure Subnet>
set service nat rule 5000 exclude
set service nat rule 5000 log disable
set service nat rule 5000 outbound-interface eth0
set service nat rule 5000 source address <Local Subnet>
set service nat rule 5000 type masquerade

#these were my existing NAT rules. You should probably inspect your own config before just mindlessly blasting this out
set service nat rule 5001 description 'MASQ corporate_network to WAN'
set service nat rule 5001 log disable
set service nat rule 5001 outbound-interface eth0
set service nat rule 5001 protocol all
set service nat rule 5001 source group network-group corporate_network
set service nat rule 5001 type masquerade

set service nat rule 5002 description 'MASQ voip_network to WAN'
set service nat rule 5002 log disable
set service nat rule 5002 outbound-interface eth0
set service nat rule 5002 protocol all
set service nat rule 5002 source group network-group voip_network
set service nat rule 5002 type masquerade

set service nat rule 5003 description 'MASQ remote_user_vpn_network to WAN'
set service nat rule 5003 log disable
set service nat rule 5003 outbound-interface eth0
set service nat rule 5003 protocol all
set service nat rule 5003 source group network-group remote_user_vpn_network
set service nat rule 5003 type masquerade

set service nat rule 5004 description 'MASQ guest_network to WAN'
set service nat rule 5004 log disable
set service nat rule 5004 outbound-interface eth0
set service nat rule 5004 protocol all
set service nat rule 5004 source group network-group guest_network
set service nat rule 5004 type masquerade

#this is the new rule
set firewall name WAN_IN rule 2 action accept
set firewall name WAN_IN rule 2 description "azure-networks in"
set firewall name WAN_IN rule 2 log disable
set firewall name WAN_IN rule 2 protocol all
set firewall name WAN_IN rule 2 source group network-group 
# OR - I'm using a network-group with the Azure subnets, if you're not just use the subnet directly
set firewall name WAN_IN rule 2 source address 
delete firewall name WAN_IN rule 2 state 

#this was previously rule 2 - your existing ones may be different so plan accordingly
set firewall name WAN_IN rule 3 action drop
set firewall name WAN_IN rule 3 description "drop invalid state"
set firewall name WAN_IN rule 3 state established disable
set firewall name WAN_IN rule 3 state invalid enable
set firewall name WAN_IN rule 3 state new disable
set firewall name WAN_IN rule 3 state related disable

 

Adding your valuable partner as Azure Digital Partner of Record

The Partner of Record program allows Microsoft partners to get a cut of customers’ spend in various online services. Office 365 and Azure are two of the big services offering partner programs like this, but until recently, qualification for credit was done differently for each platform. Azure required a subscription ID, the GUID that uniquely identifies the subscription. Office 365 has a richer experience directly in the Office 365 Admin Portal, which was customer-initiated; e.g., customers declared their partner rather than partners asking for Subscription IDs.

This experience has been pulled over into the Azure account portal now as well. Here’s a quick walkthrough to get it setup.

***Update: If you’re on an EA, you’ll need to sign into the management portal first – click your name, then ‘View my bill.’***

ea

If you’re on a normal, non-EA account, start at https://account.windowsazure.com – you’ll need to sign in with a Microsoft Account (formerly Live ID). Once you’ve signed in, click Subscriptions in the top menu.

 

Sign in, then choose Subscriptions

Sign in, then choose Subscriptions.

If you don’t see any accounts, you may not be the account administrator – account administrator is a separate permission from account administrator. You can find the account administrator by going to Settings in the management portal – the account administrator should be listed for each subscription you have access to. This is the account that has access to the Account portal.

account-admin

 

This should bring up a list of your Azure accounts – choose the one that your partner has assisted you with and check out the menu on the left. Look for Partner Information – this is where you’ll put in your partner’s ID.

2 - account actions

Partner Information will bring up a dialog, asking for your Partner’s ID. If you don’t know it, your partner will be happy to give it to you. Verify the name matches, click the check and you’re done. Repeat for each subscription your partner has assisted you with.

3 - pid

 

 

Windows 10 IoT – Device Provisioning + App Deployment

Windows 10 IoT – the promise of a universal app that can run, quite literally, anywhere. This includes tiny, cheap computers like the Raspberry Pi and Minnowboard MAX.

Raspberry Pi 2

Raspberry Pi 2

It’s quite neat – $35 gets you, effectively, a tiny PC. Perfect for running a media center, a web server, arcade machines, all sorts of fun stuff. The advent of a specialized Windows 10 build being available opened the doors to the massive base of .net developers who, up until now, could struggle with Mono on Linux or have to learn a moon language like Java. On July 29th, an ‘RTM’ build dropped for Windows 10 IoT – plus Microsoft has been spending a lot of time hyping up the IoT Suite, which is basically a collection of existing products (Stream Analytics, Event Hubs, etc) that are getting a bundled offer this fall. Also coming this fall, Cardinal’s annual Innovation Summit – a place where we can meet with customers new and old to talk about cool stuff we’re building and how it fits into our customers’ daily lives. This got me thinking – what can we do to showcase the power of Windows 10 IoT + Azure IoT?

(Shameless plug – I’ll be talking about this and showing some cool demos 10/8 in Charlotte and 10/14 in Columbus, OH – follow the links to get registered – it’s free – and come check it out)

The Plan

I managed to get my hands on about a dozen Raspberry Pis – for the Summit talk, that would be plenty. I won’t get into details about what I’m building in this post – you could always, you know, come to the Summit and see it in action – but I will say it involves bluetooth and nearly the entire Azure IoT suite…but for this post, I’ll talk mostly about my biggest pain points – provisioning and app deployment.

Pi explosion
Pi explosion

Provisioning

What’s the first thing we need to do? Get these things installed + provisioned. Installation is relatively simple, albeit time consuming and manual. It involves imaging each SD card individually, which usually takes about 10 minutes. We had a couple laptops going and managed to get them provisioned pretty quickly. Once they’re installed, they’re all ‘minwinpc,’ which isn’t terribly helpful for discerning one from another. Not to mention the hard-coded, out-of-the-box admin password ‘pass@word1’ isn’t terribly secure. We need to change the names + update the passwords, preferably in a scriptable way. We’ll start there.

I’m not going to get into the gory details of setting up remote Powershell access, you can read all about that here. We’re just going to extrapolate that out a bit. Take a look:

Pretty straightforward. Not a lot happening there – you could wrap this in a foreach loop and call it for each device you’ve got on your network.

App Deployment

We’ve written this nice Windows 10 Universal app and it’s time to deploy! Now what? …This is a long one. And irritating. Every official bit of guidance says (usually with too much excitement), ‘Deploy right from Visual Studio!’ – which is *great* for debugging…and completely horrible for deploying to more than one device. This is the internet of *things*, Microsoft. Not the internet of *thing* – but let’s look at that official guidance:

  • Open project properties
  • Choose Build

 

  • Change to Remote Machine

 

  • Type in name of remote machine, or grab the mouse (*ew*) and choose from the list

 

  • Right-click project, Deploy…
    • Deploy dependencies

 

  • Deploy appx
  •  

     

    Which is great if you paid by the click, I suppose. For the rest of us it’s miserable. That’s just *one* app deployment to *one* device. And it requires Visual Studio! Tell your deployment teams they all need VS licenses for deployment…I’ll wait… Didn’t go so well, eh?

    Web-based Deployment

    Windows 10 IoT comes with a simple web portal (http://<your-pi>:8080/) for info + tasks – task/process monitor, event logs, etc. Plus app deployment. This should be easy, right? No. Apparently there’s some wonkiness with the order of operations + existing dependencies here, where I could get this to work exactly zero times. But we’ll be back…

    AppX Manager from WebB

    AppX Manager from WebB

     

    Modern Windows App Deployment, aka, WinAppDeployCmd

    WinAppDeployCmd – this sounds perfect, right? Should do *exactly* what I want, which is to Deploy a WinApp from a Command prompt. Perfect! No. This doesn’t work. You can’t get a PIN like you can from a phone to allow remote sideloading. Don’t even waste your time with this.

    Modern Windows App Test Deploy, aka Install-AppDevPackage.ps1

    If you’ve done Windows 8+ modern app dev, this should be familiar. I know, we’ll just copy the bits, including the ps1, and deploy from that using remote powershell. What could go wrong? Everything. What does this script do, anyway? It’s pretty simple, really:

    • Checks for a developer certificate (which isn’t required in Windows 10) for ‘Developer Mode’
    • Installs the cert, if necessary

     

    • Attempts to install the app package + signing cert

     

     

    RPis use ARM chips – so tools like certutil don’t exist. Know what uses certutil a lot? Install-AppDevPackage.ps1. I yanked every bit of reference to dev certificates out, seeing as there’s just a quick registry change for ‘Developer Mode’ in Windows 10. It all eventually died at Add-AppPackage – RPC endpoint mapper was out of endpoints (uh, endpoint mapper, isn’t this your only job?). I figure it may be related to permissions more than anything – apps are deployed + run under an account called DefaultAccount – not Administrator or whatever user you’ve connected as.

    PSA: don’t change the password of DefaultAccount. This should go without saying, but there’s nothing to stop you and it’ll require a reimage. It was late when I tried this.

    Maybe someone else can make this work, but I eventually abandoned it, since I had to reimage my device.

    devenv.exe /deploy /target:ARM

    In a rare stroke of brilliance, I thought – I know, I’ll just do whatever Visual Studio is doing, just over and over again, through the command line! This, also, did not work. Why, you ask? Because the ‘deploy’ option through Visual Studio uses some bastardization of the remote debugger (msvsmon.exe), with a private, undocumented service that it pumps data through. Visual Studio’s response trying to deploy?

    Remote debugging is not available from the command line.

    Yes. Evil.

    TailoredDeploy.exe

    Tracing the logs and deployment, I had a couple more leads (note – MSBuild ‘diagnostic’ debugging is no joke). Notably, TailoredDeploy.exe. I burned a few hours trying to figure this out, to no avail. The apparently useless endpoint mapper still couldn’t do its job, leading me to believe all of these higher-level tools really call the same methods underneath.

    Revisiting Web Deployment

    At this point, I had spent one too many late nights beating my face on the keyboard, and likely 10x the amount of time it would take to have just deployed manually. In a truly last-ditch effort, I ventured into the MSDN Forums. Here, I was pushed in the direction of a REST API that existed and was hosted by WebB (the little web server) on the Pi. After waking up from the stunning surprise of someone actually contributing a useful answer in the MSDN Forums, I got to fiddling and finally had something real to chase. It also unlocked some insight into how apps are deployed onto these little devices. It wasn’t without its own frustrations, however.

    /RestDocumentation.htm

    So innocuous. So simple. So obvious. Surely this little web server used something to perform its actions – I couldn’t believe I hadn’t thought of this before. Head over to your web browser and hit /RestDocumentation.htm off your Pi – you’ll find a list of interesting things you can do via HTTP with your Pi. Notably, App Deployment. Following these ‘docs’ to the T (there’s not much there), I was still getting 400s and 500s, although it wasn’t immediately obvious why. I decided that rather than trying this the ‘proper’ way, it was time to pull out the stops and just fiddle the hell out of it and figure out what the web interface was doing. Surely they didn’t write this logic twice…? ‘But you said it didn’t work above,’ you might be saying. And this would be true, it never did work from the web interface, although I never spent a whole lot of time trying to figure out why. Through a bit of trial and error, I figured out that the dependencies didn’t seem to be needed; in fact, including them caused it to blow up. Just uploading the package + certificate seemed to work fine, even on a freshly reimaged device. Fiddler showed me a few interesting things in the process:

    fiddler

    Twiddlin’ bits in Fiddler…or would it be Fiddlin’?

    • Post the form data (e.g., the packages)
    • Post dependencies

     

    • Post certificate

     

    • Commit deployment

     

     

    There were a few other things I wanted to do too – like uninstall the app before it gets reinstalled (to start with a fresh slate, mostly because I was getting a lot of ‘file in use’ errors) and set my app to start at boot, as would be typical for a production deployment of an IoT app. Take a close look at the above – you’ll see a lot of paths starting with /api/appx – but a few are /api/iot/appx – there’s a difference here. One is the seemingly ‘private’ API used by the web interface. There is no documentation I could find about these, but I definitely needed them, especially for actions like setting default boot apps. I would guess there’s some sort of privilege boundary here, but I can’t say for sure. Fiddler trace in hand, I got to replicating this in Powershell. You may laugh, as I would at someone who says that, but I figured I need to get my PS chops better…and PS is really just C# with a syntax from the moon, so what’s the big deal? How hard could it be?

    Take 1

    I got to work, plunking out some HTTP requests with Powershell. This went well, until I started uploading the packages. For whatever reason, it would 500 immediately, with no logging to indicate the problem. When I took the request and reissued it in fiddler, it would fail. But when I took it into the composer and reissued it, suddenly it would work. I went back and forth for hours, comparing requests, finding nothing seemingly different between the two with the exception of the multi-part form boundary ID. I even got to the point of reflashing one of the devices to make sure I didn’t have any sort of previous-deployment hangover that was preventing the upload. Still nothing. At this point I was grasping at straws. Tired, frustrated. Ready to launch the Pi from the nearest potato gun. On further inspection of the two requests, I found one thing that was different between the two – the one on the left returns 200, the one on the right returns 500. See if you can find it:

    srsly

    Can you find it (click for full-size)?

     

    Let’s talk about RFC 2616 19.2 – multipart

    Specifically, let’s talk about multipart boundary identifiers. From the spec:

    Although RFC 2046 [40] permits the boundary string to be quoted, some existing implementations handle a quoted boundary string incorrectly.

    I think we’ve found one of those implementations. But there is a larger issue here – .net’s inconsistency with how it applies these headers. Why would the boundary be in quotes in the declaration but naked in the usage? I understand that per spec, web servers should accept either as the same, but it’s just opening the door for trouble when the standard implementation rolls the dice on every request. What else have we learned? The little WebB.exe web server in Windows 10 IoT doesn’t implement the spec properly either – it should only be looking for a CRLF and two dashes for the boundary ID, not checking explicitly. Let’s get to the code. You’ll see an explicit drop/re-add of the Content-Type header with my boundary ID so it matches; it’s really about the best workaround I could find.

    Multi-Deployment PowerShell

    What have we learned?

    mundane details

    This is not some mundane detail, Michael!

    Updating ADFS 3 for WIA on Windows 10

    **Updated 7/30/15**

    Here’s the latest that’s working with IE 11 on Windows 10 RTM/10240:

    Set-AdfsProperties -WIASupportedUserAgents @("MSIE 6.0", "MSIE 7.0; Windows NT", "MSIE 8.0", "MSIE 9.0", "MSIE 10.0; Windows NT 6", "Windows NT 6.4; Trident/7.0", "Windows NT 6.4; Win64; x64; Trident/7.0", "Windows NT 6.4; WOW64; Trident/7.0", "Windows NT 6.3; Trident/7.0", "Windows NT 6.3; Win64; x64; Trident/7.0", "Windows NT 6.3; WOW64; Trident/7.0", "Windows NT 6.2; Trident/7.0", "Windows NT 6.2; Win64; x64; Trident/7.0", "Windows NT 6.2; WOW64; Trident/7.0", "Windows NT 6.1; Trident/7.0", "Windows NT 6.1; Win64; x64; Trident/7.0", "Windows NT 6.1; WOW64; Trident/7.0", "MSIPC", "Windows Rights Management Client", "Windows NT 10.0; WOW64; Trident/7.0; rv:11.0")

    If you’re using the Windows Technical Preview, you may notice that ADFS presents you with a Forms login instead of using WIA from IE on a domain machine. This little chunk of powershell includes most of the major browsers that support WIA – you can plunk this into your ADFS server and get it going:

    Set-AdfsProperties -WIASupportedUserAgents @("MSIE 6.0", "MSIE 7.0; Windows NT", "MSIE 8.0", "MSIE 9.0", "MSIE 10.0; Windows NT 6", "Windows NT 6.4; Trident/7.0", "Windows NT 6.4; Win64; x64; Trident/7.0", "Windows NT 6.4; WOW64; Trident/7.0", "Windows NT 6.3; Trident/7.0", "Windows NT 6.3; Win64; x64; Trident/7.0", "Windows NT 6.3; WOW64; Trident/7.0", "Windows NT 6.2; Trident/7.0", "Windows NT 6.2; Win64; x64; Trident/7.0", "Windows NT 6.2; WOW64; Trident/7.0", "Windows NT 6.1; Trident/7.0", "Windows NT 6.1; Win64; x64; Trident/7.0", "Windows NT 6.1; WOW64; Trident/7.0", "MSIPC", "Windows Rights Management Client")

    Why?

    In version 3, ADFS tries to intelligently present a user experience that’s appropriate for the device. Browsers that support WIA (like IE) provide silent sign on, while others (like Chrome, Firefox, mobile browsers, etc) are presented with a much more attractive and user friendly forms-based login. This is all automatically handled now, unlike before where users with non-WIA devices were prompted with an ugly and potentially dangerous basic 401 authentication box (if they were prompted at all).

    This means you can now design a login page for non WIA devices that might include your logo, some disclaimers or legal text.

    IMG_0001

    iOS 8x and ADFS 3

    Economics of VSO Build vs. Agents

    I’m looking into build agents for VSO for a client this week. If you haven’t noticed in your VSO tenant, the build.vNext system is now available in most of them. In fact, it’s not even called build.preview anymore, even though I’m pretty sure it’s still preview. It’s much better than before, now with tasks that are more straightforward and easier to use. No more VS requirement for designing builds, you can do it all in the browser. There’s a lot more, but that’s not really the point of this post.

    I found this really by accident – our client needs to use on-premises services that the VSO hosted build controllers won’t have access to, since they’re internal only. This example is a simple ChromeWebDriver for Selenium – it requires Chrome to be installed on the host where Selenium is running, or at a bare minimum, the ChromeWebDriver. They’ve got this on shared hosting internally, but rather than burn time trying to find a standalone chrome executable (let me help you – doesn’t look good), I decided we should do some build agents on an Azure VNet, VPNed back into the private network.

    Thinking this was less-than-desirable, since we now have another VM to manage, I started to look at other fringe benefits – namely, cost. What I found will shock you (thanks, Buzzfeed) – much to my surprise, it’s significantly different to build your own agents. But first, let’s look at the differences:

    VSO Hosted Build

    1800COLLECT

    The easiest, fastest way to get going with CI or CD builds, or even just builds in general with VSO is to use the hosted build controller. You get 60 minutes a month for ‘free’ – I say that because anyone who actually has to buy VSO recognizes that $20/head isn’t particularly cheap, especially when it’s a requirement for backlog access. MSDN subscribers get access for free, so if you’re not a full-blown project management shop and all your people have MSDN, this might not be a big deal. Beyond that, you can buy some more, at a rate of a nickel per minute up until 21 hours, and a penny a minute after that (why does this feel like an impromptu 1-800-COLLECT ad?). What do you get for your hard-earned nickel per minute?

    • 99.9 SLA for hosted build 
    • Virtually zero setup or ongoing maintenance
    • Support for some third-party unit test frameworks
    • Visual Studio 2015 – 2010 preloaded on the environment

    However, there are some restrictions – being multi-tenant, this is no surprise:

    • no builds taking over 10 hours to run
    • no builds over 10GB
    • no admin privileges (this is obvious, and a bad practice anyway, but everyone’s got some legacy skeletons in their closets)
    • no interactive or user-logon dependencies

    So for most things, the hosted build controller is fine. Pre-configured, one-click CD build definitions for integrating with your Azure resources too.

    1-800-4DEVOPS

    This is all great, until you start thinking about modifying your deployment process to include some rapid releases. Let’s say you’re on a small team of four developers, checking in code at least once a day. Each of those check-ins triggers a CD build, and you’ve got one overnight for good measure. Let’s also say it’s a relatively simple project that takes around 5 minutes to build. You can get through about a day and a half with your included build minutes.

    “No problem,” you say, “I’ll just buy the minutes I need from VSO.” Let’s extrapolate that a bit – five builds per day @ 5 minutes/build. 25 minutes a day, five days a week = 125 minutes/week. Typical 22 day work-month, that’s around 550 build minutes/month. We’ll round to 600 for giggles.

    600 minutes, minus your 60 free minutes leaves you holding the bag on 540 build minutes. At $0.05/min, that comes to $27. What else does $27 buy you?

    • 182 hours of a Basic A2 VM (2 core, 3.5GB)
    • 91 hours of a Basic A3 VM (4 core, 7GB)
    • A pack of smokes in NYC
    • 77% of a Raspberry Pi 2

    182 hours is a quarter of the Azure-standardized 744 hour month. So you could host your own build agent, leave it running most of each week and unleash quite a bit more power to get builds done faster. Or, in our case, you could host it on a VPNed VNet and connect to on-premises resources. Or join it to a domain and run the build agent service as a local administrator. Or execute powershell that requires admin rights. Or…

    What would be even better is a way to trigger that machine to start-up via Azure SM/PowerShell only when a build is queued, but I haven’t thought that all the way through yet. Nailing that would add even more efficiency.

    Now I know this doesn’t take into account things like the additional maintenance of yet-another-server, or the (minor) addition of storage + egress cost, or even the fact that a single VM in Azure doesn’t have the same 99.9% SLA. But even then, it’s a path worth exploring, especially if your builds are scheduled so you can automate that agent coming up/down, or if you fall outside of the prescribed requirements for using the hosted controller. With MSDN images that already have Visual Studio or TFS installed, and the fact that the build agent ‘installer’ is just a powershell script, those machines could be created from scratch with minimal effort. Resource Manager + PowerShell DSC + Startup Script might just be the ticket to a super-fast and easy build agent with only slightly more overhead than using the hosted controller.

    Availability vs. Consistency, through the eyes of a toddler.

    Fred.

    Fred.

    More cloud patterns in real life – this one, while a little silly, illustrates availability, geographic redundancy, rolling upgrades and consistency. Enjoy.

    Meet Fred.

    This is Fred. Fred is my son’s number one – confidant, partner, companion and sleeping buddy. The Hobbes to his Calvin, if you will.

    Fred was there when Patrick was born, goes with him nearly everywhere and is as much a part of Patrick’s daily life as his parents. He travels with Patrick, be it to the park or 800 miles away to see his cousins: where Patrick goes, Fred follows. He’s a requirement for sleep – in fact, if presented with Fred, Patrick’s thumb immediately meets his face, regardless of the time of day or current activity. Fred can’t be in sight during bath time or dinner or else Fred too will get a bath or a face full of beans, usually against Patrick’s mother’s wishes.

    But we’ve had some availability challenges with Fred – notably, forgetting him when we travel. At least twice we’ve had to order a Fred for a trip, after having realized at the airport that Fred was still in the car or left at the back door. Ordering is a snap – 90 seconds on Amazon and a fresh Fred is in a box and on the way, usually arriving before we do. This has left us with multiple Freds (I think we’re up to 3 now).

    99.95% SLA for Fred availability.

    Ever had to console an upset baby at 3am? It’s not much fun. The proximity of any available Fred is important, especially in the middle of the night.

    Three Freds floating around means I can have a much higher SLA on finding Fred for bed than when we only had one. Fred’s currently in scheduled maintenance in the washing machine? No problem, we’ve got another. Dropped him in the park? Covered in mud? In Dad’s car? All of these issues melt away.

    You could say that we’re at high availability with Fred now – with three different Freds, the chances of one being within reach go up dramatically, especially when there’s a primary with known replicas. Since the beginning of 2015, I think we’re at around 99.95% availability of Fred – just about 5 minutes per week on average of unavailability. With the addition of two extra Freds, that number will probably close out even higher by 2016.

    Your data is similar. Replicas of your data offer you significantly better availability than a single copy of the data. Imagine if you had to reboot the machine for scheduled maintenance or had a disk failure. In these cases, your data becomes unavailable. Ok if you’re a 20 month old, not ok if it’s business critical data in the middle of the day. Multiple copies means you can get service regardless of the underlying state – if a disk has failed in your RAID array, or

    Fred as a Service

    Fred has transcended being just another stuffed toy. He is the embodiment of security to our little one, just like Linus’ security blanket or whatever your toy of choice was when you were a baby. Because of this, it’s less about a single, distinct Fred and more about the idea of Fred. The physical Fred is really just a host of the Fred idea.

    Because we’ve got multiple Freds at our disposal, this makes things immediately easier. We can roll in Fred B while Fred A is getting some stitches upgraded or some emergency maintenance because of a too-available ketchup bottle. Once Fred has been repaired and the pieces of hot dog pulled out of his stuffing, he can return to regular service rotation, all with no loss of service. Do this over and over again, and eventually they’re all up-to-date while being available the whole time.

    This is all well and good locally, but doesn’t always work – especially when traveling. Fred (or, more accurately, Patrick’s parents) has had some challenges during travel. He’s been left behind – in the car, at home, etc at least twice during our last few trips. Fortunately, we can order another Fred and have him shipped in minutes. Because of this, there are Freds scattered about NC and eastern Michigan. This gives us a little geographic redundancy – if our house is unavailable, because we’re traveling, we’ve got another Fred available at our destination. This sort of geographic availability works as an additional layer of protection on top of our existing local availability. In fact, if we wanted to get really fancy, we could order multiple Freds for each distinct location. This isn’t always necessary, however – and your application may encounter the same question.

    At some point, ultra-redundancy and availability leave cost efficiency. Is it really a requirement that geographic locations where we may spend 2-4 weeks per year have a full-blown Fred deployment? Triple redundancy at all locations? Perhaps we may experience some downtime during those windows, but overall, our SLA isn’t damaged too badly. Not to mention we can get new instances of Fred on-demand within about 12-18 hours. When you’re designing your application’s redundancy and availability requirements, this is something to keep in mind. Cloud brings us a lot of of cost efficiency, but like anything else, it can be abused and cost a fortune. Balancing availability vs. cost is a delicate feat that really only you, as the application owner or data owner can decide.

    Consistency

    FaaS (Fred as a Service) has worked well for us over the last 12 months or so, however, as children grow older, *one* always becomes the ‘primary.’ The rest are just imposters. Similar, yes, but not *the* original Fred. OG Fred has some unique traits – he smells a little different, his fur is a little fuzzier, his neck a little more limp from all the stuffing being pushed elsewhere, his colors a little less vibrant – but he’s familiar. He’s the original.

    Because of this, we now have a new problem – OG Fred != Fred B or Fred C. In fact, Fred B and Fred C have been used so infrequently compared to OG Fred that they are in much better (and thus, much different) shape. This presents two problems:

    • Our Freds are now wildly inconsistent – operations to OG Fred haven’t been effectively replicated to Freds B and C.
    • The longer we ignore the problem, the more inconsistent they become.

    How do we combat this? We need some way to keep our Freds consistent so that *any* Fred can be OG Fred. There are a couple of ways to combat this. At a minimum, all Freds need to be in regular rotation. It might hurt a bit at first, but it’s about the only way. Or perhaps we speed up the process with the effort of a one-time migration and wear them in a bit – ‘routine maintenance’ with an extra few spin cycles, some loss of stuffing, etc. followed by a reintroduction back into a more stringent rotation.

    But a lot of this depends on the client; in this case, a near two year old who doesn’t understand that ‘replication takes time.’ Upset two-year-olds have just recently found their inner divas and as such, will be as picky as humanly possible and find even the most imperceptible of inconsistencies.

    But in some cases, you need this kind of high consistency. Data consistency comes second to data classification. What kind of data needs high consistency? Credit card/payment data comes to mind immediately – if I make an operation to take some money I had better damn well make sure that succeeds before anything else happens – customers with duplicate charges don’t stay customers for long.

    But not all data needs that – if I don’t see your Facebook post the absolute instant that you’ve written it, that’s OK. I’ll see it within a few seconds and there isn’t really any sort of side effect, so it’s totally ok to just keep that data available and not highly consistent. Performance and transaction rate are part of the consideration as well – waiting on multiple geographies to replicate data could take too long when a user is waiting, especially if the data’s consistency requirements don’t dictate that kind of wait. Perhaps wait for 1+ write and async the rest to durable storage.

    Other patterns produce similar requirements, with different outcomes – for example, Patrick’s newest trick is to take all three Freds to bed. This presents a problem: FaaS availability is now dependent upon all three being available, not the individual availability of each Fred instance. Now our SLA is almost certainly destined for the tank. Redundancy patterns should consider the level of required consistency when building out both replication and failover strategies. For example, perhaps synchronous writes to 3+ storage providers, while the remainder of writes happen asynchronously for failover only.

    A boy and his Fred

    A boy and his Fred

    Build, Ignite, New Stuff, Come to TriAUG 5/26

    What a wild few weeks – Build saw the official announcement of Service Fabric and some other hot Azure news and Ignite saw Azure for Enterprise/Azure Stack and loads of other announcements. Busily digesting everything new, but I’ll be at the Triangle Azure User Group May 26th talking about *something* new. Just not sure what yet. I’m between Service Fabric and Event Hubs –> Stream Analytics –> Power BI. Or it could be something completely different, who knows. Regardless, come out for some food, networking and my marginally coherent ramblings about some new thing.

    LOL – Late nights with Azure Search and Attributes for Index Metadata

    I’m working with a client right now on modernizing and simplifying their search to use the new Azure Search service. Sure, the examples online are fine, but I wanted to decorate my data classes with attributes dictating the indexing settings for each field. Needless to say, I ended up with something just shy of mad:

    ???

    This actually works. It’s efficiency is up for debate, but it does work. I suspect I would have had just as much luck manipulating the JSON myself, but where’s the fun in that? Here’s what I started with – a simple Azure table entity poco. Nothing too exciting, just a few fields of relevance:

    This is fine – my attributes are there so I can configure indexing on the object itself. This is all well and good, but I need to actually create an index to put documents into:

    This also worked well – my index schema gets created from the properties that are decorated with my attribute. The problem starts when I need to actually add the documents to the search indexer. Since I’m inheriting from TableEntity (in this case), additional properties are included (like PartitionKey, RowKey, etc). I need to only get the properties which have indexing metadata, since those match the schema of the index. Apparently, including additional properties in the document you’re submitting to the index causes the call to blow up – not just ignore the extra properties. I have to copy the relevant, decorated properties to a new object on the fly…and voila, you end up with the silliness that is above.

    Cloud Patterns are all around us

    I’ve been spending quite a bit of time in Tampa recently – most recently Cardinal’s first annual Innovation Summit for our Tampa service. I like flying to Tampa because the airport has been done just right – going from curbside to gate can routinely take only around 10-15 minutes, especially if you have precheck. Even this morning, when pretty busy, it was only about 10 minutes from curb to gate. Charlotte is always crazy and busy, but it’s generally efficient as well, especially with the volume of people going through there – but I think Tampa’s maximized their efficiency even more by putting more services closer to where and when they are needed or consumed (I think you see where this is going).

    This got me thinking as I plodded through the airport (top tip: airports aren’t a good place to break-in new sandals), the application architectures that drive cloud efficiency are replicated in real-life all around us.

    Let’s look at what typical major airports have to deal with:

    • Large numbers of inbound and outbound traffic, for a variety of different (but known) tasks
    • Many gates, capable of moving large numbers of people and planes in and out, spread out across…
    • …multiple smaller, distributed buildings (airsides/concourses), with a small number of gates per building

    Here’s a map of Tampa, which shows this

    Tampa International Airport  - aviationexplorer.com

    Tampa International Airport – aviationexplorer.com

     

    Distributed Services + Pipes/Filters

    And the services that are offered in each area (note, my numbers have been pulled completely and totally out of thin air, I have no idea what volume TPA does):

    Main Terminal – ticketing, checkin, greeting spots, baggage claim, cars, cabs, etc. – 10,000 people/hour

    People Carrier – only ticketed passengers – 2,000 people/hour

    Airside – only ticketed passengers leaving from one of the serviced gates – 2,000 people/hour

    Concourse – only ticketed passengers, who passed security, who are leaving from a specific gate at some point today – 1,500 people/hour

    Flight – only ticketed passengers, who passed security, who are leaving from gate F84 at a specific time today – 126 people right now

    As you can imagine, by the time you get to your gate, you can say with relative confidence that everyone there is only leaving within a small time window from that gate. This is very similar to a pipes and filters pattern, where the same task is repeated over and over again, but the entire process is broken down into discrete steps, which, when executed, process data appropriately and route to the final destination.

    Think TSA is slow today? Imagine if it processed every person through the airport, not just flying passengers. It would be a nightmare and horribly inefficient. By using multiple copies of the same service, closer to the consumer, you can distribute load across all of them, leading to significantly shorter wait times while still offering the same (or better) level of service.

    Not only have we now distributed a high-transaction process, we’ve pushed it closer to the consumers and put it in a more efficient place in the workflow.

    ID + Boarding Pass, Plz – Federated Identity

    Anyone who’s ever heard one of my identity sessions knows I generally use an ID + bouncer at a club scenario to describe a federated identity system at work in the real world. This is no different.

    In our airport case, upon arriving to the friendly TSA agent, I’m asked to produce some form of ID, plus a boarding pass. This combination of valid identity and time-sensitive token authenticate me to the agent, who then grants access to the terminal. The key here, however, is that the TSA agent has no idea who I am (well, at least I hope not). He (the relying party) relies (see how that works) on

    • an external, trusted third party, like the Secretary of State, in cases of a passport, or NCDMV for a driver’s license to vouch for my identity (e.g., an identity provider),
    • a known set of anti-forgery tools, like UV-sensitive watermarks, specific microprinting, encoded data, RFID, etc. to ensure the identity document is valid (e.g., a signature), and
    • some data points, like my picture, name, hair color, height, departing flight time, etc to validate I have access to what I’m requesting (e.g., attributes or claims, depending on the consumer, the requested access is generally to a resource)
    • In addition, my driver’s license is only valid in some situations, like ID validation within the US. This roughly translates to an audience, or which targets the ID document has been issued for. This is an important security consideration – if someone steals my driver’s license, they can use it for whatever to validate their (my) identity. This doesn’t work outside of the states, however, so it’s only useful to spoof identity within a specific region (e.g., the US), which is the only place that document is valid.

    Our modern, federated identity patterns follow a similar pattern – a trusted third party acts as the identity provider and validator, which is what prevents me from having to have a TSA-specific ID. Since TSA trusts NCDMV and SoS, that trust is extended to verifiable documents I possess.

    Your cloud app does the same thing – an exchange between identity provider and relying party establishes a trust and a public key; incoming data is then signed with the private key, which can be decrypted and verified with the public key of the IdP.

     

    PreCheck? Preferred? SkyPriority? Priority Queuing

    Last one for now. Notice how there are multiple security lines at the airport? TSA PreCheck, First/Biz Class priority screening, Clear card, Crewmember? Here we have people who, one way or another, are of a higher priority to process faster. They could have gone through more rigorous background checks in the case of TSA Pre, or paid more for a flight – either way, each of these people get access to a shorter line for processing (ignore for now that the processing logic of each of these people is also different – imagine we’re all going to the same generic black hole of security screening). These are priority queues – people in those queues are processed first, while standard people either wait in a longer queue, or in some cases, the same queue, but priority people get preference as soon as they arrive. You see this pattern all over airports – check-in, security, gate.

    In some cases, priority people have entirely separate queues with a shared TSA agent. In this case, as soon as the priority passenger arrives, the shared TSA agent pauses processing the standard queue to process the priority passenger. Once the priority queue is full, that agent starts working the standard queue again.

    In other cases, priority queues have dedicated processors reading those messages. This is closer to the check-in process, where dedicated check-in lines lead to dedicated check-in agents processing first/biz class passengers.

    Lastly, we have the gate – first class first, then biz, then the unwashed masses. This is a single queue that calls on specific classes of messages to enter a queue at a specific time. This pattern is the least common of the priority queuing patterns seen in software and is generally used for managing messaging to a legacy system or something which requires specific processing order. You’ll usually see some other processing logic elsewhere that places messages in the queue at a specific time to ensure priority.

    Anyway, your code can follow a lot of these patterns as well – for critical messages, perhaps you use a dedicated queue and dedicated processor. These messages are guaranteed to be processed nearly immediately. As soon as the dedicated processor is done with each priority message, it dips back to the queue for the next priority message.

    Similarly, for priority-but-not-critical messages, or for priority messages that are rare, shared processors can check the priority queue before doing any standard queue processing. This allows for priority message processing, but not critical, real-time message processing. For example, if your worker is working on a long-running standard message, your priority queue item may wait before being processed, but would be guaranteed to be next in line.

    Simple priority queue processing. Source MSDN https://msdn.microsoft.com/en-us/library/dn589794.aspx

    Simple priority queue processing @ MSDN https://msdn.microsoft.com/en-us/library/dn589794.aspx

    Start Looking Around…

    …and you’ll see a lot more patterns. There are more we could dig into just in an airport and all kinds of other scenarios. These cloud patterns make sense because they’ve been proven across the world in all sorts of physical incarnations – why reinvent the wheel if you don’t need to?

    I hope this helps connect a few of the dots – as always, feel free to drop me a line or ask a question.

    Is the sky falling?

    Today was a neat day in the Azure space – Azure Websites has grown up and found itself. We’ve got new units of functionality that can build fully functional apps and workflows, interacting with different systems and moving data around (e.g., BizTalk), through a designer on a web page! Amazing. I came here tonight to dig in and share my thoughts on the new services, but I got sidetracked.

    After the announcement, I kept up with social networks, internal and external, and generally there’s a healthy level of excitement. I think once people get their hands dirty, we’ll see a lot more excitement – but what I also saw was sadly typical when these kinds of announcements are made:

    What makes me valuable is now available for anyone who can drag-and-drop on a webpage.

    And this assertion would be correct, except for one crucial detail – what makes us valuable as software developers, engineers, architects, code monkeys, etc is everything except physically typing out the code. If your only value is in the physical delivery of the code, then it may be too late for you anyway. Let’s back up though.

    Engineering + Heavy Clouds

    Look at systems engineering over the past 10 years or so. These poor souls have had all kinds of the-sky-is-falling moments. First it was virtualization, then the cloud. Then SaaS – Office 365, SharePoint Online, Exchange. If your job involved managing and monitoring servers and services for your company, your job has been under attack for a decade..

    But has it? How many people lost their jobs because their company elected to deploy Office 365? Many people adapted existing skills and learned new ones. I’ve yet to see “WILL ADMINISTER SHAREPOINT FOR FOOD” signs littering once-vibrant office parks. I once read that if change isn’t something you’re interested in, technology is not the industry for you. That statement pretty much summarizes the majority of this post, so feel free to leave now if you’ve gotten what you came for.

    In all reality, jobs in the space have stayed relatively stable in relation to other IT jobs. For example, if you look at the trends over the past 10 years, systems administration and software engineering jobs have followed a similar course:

    Software Engineer - source: indeed.com/jobtrends

    Software Engineer – source: indeed.com/jobtrends

    Systems Administrator - source: indeed.com/jobtrends

    Systems Administrator – source: indeed.com/jobtrends

    See how similar they are? People aren’t being replaced – in fact, these graphs are a little disingenuous as the ‘overall percentage of matching job postings’ includes most job posts on the internet, which are, of course, exploding. The point, however, is that we’re seeing the same general trends in both systems and software engineering. What did people do? They adapted, they translated existing skills into new platforms, they learned new chunks of knowledge to handle what was coming their way.

    Why? Think about it – on what hardware Exchange is installed on is irrelevant. The hardware is commodity now. Administering Exchange requires a certain set of skills; before Office 365 and after those skills aren’t dramatically different. Sure, it’s fewer servers to manage, but how many Exchange admins were really managing that enormous Jet database manually anyway? That knowledge and skillset transfers readily.

    Software Development Is Next.

    There have been pivotal changes in software in the past ten years – virtualization to an extent and (obviously) cloud. Maximizing efficiency of resources and time-to-market agility has made the cloud what it is. We’re in the ‘coarse’ efficiency now – the next five years or so will bring a whole new era of abstraction and efficiency.

    Anyway – let’s get back to my original issue. Software engineering is already going through some significant changes, but one of the biggest ones speaks directly to my original issue above. At some point, skills become commodity. Is there anyone working in dev today that can’t readily find sample code for connecting to Salesforce.com/Dropbox/Office 365 and make that work for their application? It’s become so commonplace that that’s no longer a ‘special’ skill – in fact, it has been made so repeatable that we can drag a block with an Office logo on it and connect to SharePoint Online data without writing any code.

    Who out there is impressed that I wrote a web app that had a nifty sliding progress bar? Anyone? Bueller? Bueller? That’s not impressive anymore. Years ago, when XMLHttpRequest was new, making a web call without a postback was amazing. Mind = blown. Now there are dozens of frameworks that make many, many lines of code boil down to a single line:

    $("thing").progressBar();

    Are you going to put ‘implemented progressBar’ on your resume? It can sit right next to ‘managed to get both feet into shoes.’ I think not.

    Platform Dev vs. Implementers

    It’s silly to think that what we’re going through now is somehow different from what we’ve gone through forever. But there is, among all the change, one constant that seems to be creating a larger gap daily. Platform development and implementation.

    Take a look at what was announced in the Azure space today – ‘Logic Apps,’ ‘API apps,’ – each one a higher-level abstraction of a few existing pieces that let you compose services from existing building blocks. The guys building those blocks have no idea what you’re going to do with them and in what combinations you may elect to build. But it doesn’t matter. The software is written in a way that supports, nay, encourages that kind of development. If I can get away with dragging seven existing blocks onto a designer and solve whatever problem I was attempting to solve, how is that not stuffed to the gills with win?

    Better yet, say there’s not a block that does what I need. Let’s build the block and write it properly so other people can use the block. Sounds pretty neat, huh?

    Which are you?

    When you start a project, do you write a bunch of problem-specific code? Are you one-offing everything you do? How much code reuse is in that block that you just wrote? Your time becomes less valuable for busy work when someone else can implement someone else’s blocks + 10% new code in half the time. If you’re solving a problem, solve it once and use it as many times as possible. Microservices and distributed components are how you gain maximum leverage for the time you’ve already spent.

    Platform Dev is the Future

    This should be obvious if you’ve made it this far, but I think it’s fairly clear that platform development is where the world of software development is heading. That doesn’t mean custom software won’t exist, but it won’t be ‘built from the ground up.’ It’ll be built from existing blocks of increasingly complex, reusable code. Conceptually this isn’t different from the frameworks we use today. When was the last time you managed memory? Opened raw sockets and sent HTTP requests manually? All of these things are offered by most of the major players today, to abstract complexity and menial, repeatable tasks. As we’re seeing today, the API/reusable block market is exploding. If that means your job is in danger, then perhaps it’s time to start thinking platforms and stop writing code merely for the finger exercise.

    Always think about a platform, always think about how you can make your code as generic and reusable as possible, and think about what kinds of other uses it may have. Build for platforms, not for implementations.

    « Older posts

    © 2016 Azure & Chill

    Theme by Anders NorenUp ↑