On Strongly Typed Logging

Logging is a crucial element of monitoring highly available systems. It allows not only to find out about errors but also quickly identify their cause. Logs are often used to generate metrics that help business and engineering make informative decisions on future development directions.

At OpenTable we have a central logging infrastructure, that means all logs are stored in the same shared database (ElasticSearch for us). And everybody can access any logs they want without having very specialized knowledge (thanks Kibana!).

ElasticSearch, though living in a NoSQL world, is not actually a schema-free database. Sure, you do not need to provide schema to it but instead ES will infer schema for you from documents you send to it. This is very similar to type inference you can find in many programming languages. You do not need to specify type of field, but if you later on try to assign inappropriate value to it you will get an exception.

This trait of our database goes all the way to the root of our logging system design. Let me explain why I say that we have ‘strongly typed logs’.

In The Beginning There Was String

Before centralization we just logged a single message along with its importance. In code it looked something like:

1
logger.ERROR(“Kaboom!”)

which resulted in logline on disk having timestamp, severity and message.

1
{2014-10-10T07:33:04Z [ERROR] Kaboom!}

That worked pretty well. As time passed we often started making log messages more generic to hold relevant data:

1
logger.INFO(string.Format(“Received {0} from {1}. Status: {2}. Took {3}”, httpMethod, sourceIp, statusCode, durationms));

When we decided to centralize logs we moved the same logs from local disk to a central database. Suddenly things that used to live on single server in a file called ‘application.log’ become part of one huge lump of data. Instead of easing access to logs they were really hard to filter, without even speaking about aggregation, or any simple form of operations to find the source of the problem. ElasticSearch is really good at free text searching, but frankly speaking FTS is never as precise as a good filter.

Then There Was Dictionary Of Strings

[Read More]

Building a living styleguide at OpenTable

If you’re reading this you’ve probably built yourself a website. A site - large or small - that’s thrown together or crafted over many months. And if you have, you’ve probably kept all your CSS class names in your head, or at least been able to go straight to the relevant stylesheets to retrieve them.

Well OpenTable is unsurprisingly built by many engineering teams across multiple continents, and was completely redesigned last year. And as soon as you have more than a handful of people working on your front-end you will quickly find a well-intentioned developer causing one or both of these problems:

  • Well-intentioned developer adds a new submission form but, like the design Philistine he is, his buttons are 18px Verdana #E40000, not the correct 16px Arial #DA3743
  • Your good old developer knows which font size and colour it should be, but bungs a duplicate class into a random stylesheet (or worse still, inline)

Despite these risks, a single front-end dev (or a team of them) cannot check every new piece of code or they will quickly become a bottleneck.

You need some guidelines

Offline designers regularly create ‘brand guidelines’ or ‘design standards’ to document the precise way their brand or product should be recreated when outside of their control. Online, such guidelines are similarly invaluable for maintaining brand and code consistency with multiple engineers and designers, but it is blindingly obvious that a printed or ‘static’ set of guidelines is completely unsuitable for a constantly changing website.

Step forward a ‘living’ styleguide.

A living styleguide gives a visual representation of a site’s UI elements using the exact same code as on the website, in most cases via the live CSS. A living styleguide may also provide reusable CSS and HTML code examples and they are not just for engineers new to the code; I frequently use ours at OpenTable and I wrote the stylesheets in the first place (I can’t be expected to remember everything).

Providing reusable code improves collaboration, consistency and standards, and reduces design and development time - but like most documentation it is essential your guide is always up-to-date and trustworthy. So if a living styleguide is (theoretically) always up-to-date, how did we build ours?

How we built our styleguide

[Read More]

Explaining Flux architecture with macgyver.js

What is Flux?

Flux is an application architectural pattern developed by Facebook. It was developed to solve some of the complexities of the MVC pattern when used at scale by favouring a uni-directional approach. It is a pattern and not a technology or framework.

MVC scale issue

When applications that use the model-view-controller (MVC) pattern at any scale it becomes difficult to maintain consistent data across multiple views. In particular the case whereby flow between models and views is not uni-directional and may require increasing logic to maintain parity between views when model data is updated. Facebook hit this issue several times and in particular with their unseen count (an incremented value of unseen messages which is updated by several UI chat components). It wasn’t until they realised that the MVC pattern accomodated the complexity that they stepped back from the problem and addressed the architecture.

Flux is intentionally unidirectional.

flux

Key to this architecture is the dispatcher. The dispatcher forms the gatekeeper that all actions must go through. When a view, or views, wish to do something they fire an action which the dispatcher correctly routes via registered callbacks made by the stores.

Stores are responsible for the data and respond to callbacks from the dispatcher. When data is changed they emit change events that views listen to to notify them that data has changed. The view can then respond accordingly (for example to update/rebind).

This will become more obvious when we go through the macgyver.js example.

What is macgyver.js?

[Read More]

Supporting IE8 in the OpenTable redesign

We’re really proud to have released last week our redesigned OpenTable site, the culmination of months of hard work from many talented people here in London and in San Francisco.

However despite killing off our old site and its 2004 design, 2.8% of our visitors could have been crying into their keyboard as a far worse opentable.co.uk filled their screen.

That version of OpenTable was our new responsive site viewed in Internet Explorer 8.

Our redesign before we optimised for IE8

The fundamental issue is that IE8 doesn’t support media queries so the age-old browser would try to stretch our mobile-first responsive design as wide as it could go - not great across a 27” Thunderbolt.

To solve the problem we first tried the Respond.js polyfill but this didn’t work as we’d hoped. The main issue appeared to be that because we serve our CSS and JS on a separate sub-domain we fell foul of the browser’s cross-domain security. We followed the Respond.js instructions to solve this but having no luck we looked for alternatives.

Legacssy

Further Googling lead us to Legacssy. With this Grunt task we could create a IE8-only stylesheet and not have to serve extra JS and cross-domain proxy files to all visitors.

Our existing process is to create our core CSS with an app.scss file and grunt-sass. Our additional step was to create an app_ie8.scss file, parse it with grunt-sass like before, but then also run it through Legacssy.

Our app.scss file

[Read More]

Proxying Services With Hapi.js

I’ve raved in the past about how awesome hapi.js is, but I’m going to talk about just a specific case today.

We started off with just a couple of hapi.js apis. This was at a time when standing up new infrastructure was still a bit painful, so inevitably those apis ended up having more functionality in them than they should have. Now it’s easy for us to get infrastructure, so we want to do more of it.

Our goal is to have lots of small(er) apis that just look after one specific piece (skillfully avoiding using the buzzword ‘microservices’).

When you want to split out functionality from one api to another, it can be a pain, especially if you have a lot of consumers who aren’t particularly fast-moving or communicative. Or maybe you don’t know all your consumers up front.

You’ve got a couple of options here:

  • Maintain the functionality in two places and slowly migrate consumers across

  • Use a proxy or routing layer in-front of the api to rewrite or redirect requests

  • Write code in your api to proxy requests to a different server

The first two options are pretty icky, and frankly the third isn’t all that great either. It all depends on you having the right framework. Do you see where I’m going here?

Enter Hapi.js

Hapi.js has the concept of a ‘proxy’ handler, which can transparently proxy requests to a different server.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
server.route([
{
method: 'GET',
path: '/foo',
handler: {
proxy: {
host: 'my-other-service.mydomain.com',
port: 80,
protocol: 'http',
passThrough: true,
xforward: true
}
}
}
]);
[Read More]

Hobknob v2.0: A new dimension

Sometimes there is the requirement for more granularity when toggling a feature switch.
Version 2.0 of Hobknob hopes to address this with feature categories.

TL;DR.

Hobknob now allows you to define categories of features that have multiple toggles per feature.

For example, you can define the ‘Domain Features’ category which allows you to toggle a feature OFF in your-website.com, but ON in your-website.co.uk.

Domain Features

Categories

Feature categories are configured with a few pieces of information. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
"categories": [
{
"id": 0,
"name": "Simple Features",
"description": "Use when you want your feature to be either on or off"
},
{
"id": 1,
"name": "Domain Features",
"description": "Use when you want your features to be toggled separately for different domains",
"values": ["com", "couk", "de", "commx", "jp", "ca"]
},
{
"id": 2,
"name": "Locale Features",
"description": "Use when you want your features to be toggled separately for different locales",
"values": ["en-GB", "en-US", "fr-CA", "de-DE", "ja-JP", "es-MX"]
}
]
}

Notice that each category (except the simple feature category) provides an array of accepted toggle values.

All non-simple feature toggles will have the key application-name/feature-name/toggle-name.
For example, main-website/show-user-section/com.

[Read More]

Interacting with ElasticSearch using Hubot

At OpenTable, we use a few ElasticSearch clusters. Our aim was to be able to interact with our ElasticSearch clusters via HipChat so that we could troubleshoot easily and without having to log into our VPN. We already use Hubot as part of our systems workflow, so it made sense to be able to interact with ElasticSearch with it.

Setting a cluster alias

When a pager wakes me at 3am, I really do not want to have to try and type the cluster URL into my mobile hipchat client. So the first thing that was added to the script was the ability to give a cluster an alias.

1
elasticsearch add alias my-test-alias http://my-cluster.com:9200

add-alias

This allows us to use that alias for all commands going forward. Please note that you can remove and query aliases as well:

1
elasticsearch show aliases

show-alias

1
elasticsearch clear alias my-test-alias

clear-alias

[Read More]

Coach don't rescue

I recently attended a fascinating and emotionally-charged talk by Samantha Soma at DareConf 2014, ‘How to stop rescuing people’. It strongly mirrored my experience of moving into a leadership role and I’d recommend anyone with a spare 30mins to watch it.

Samantha’s talk made me reflect on how I struggle to coach talented individuals; how I can identify when it’s going wrong and what steps I can take to remedy the situation.

Gold star syndrome

A new concept for me and a recurring theme throughout the sessions at DareConf, ‘Gold Star Syndrome’ is a fixation on finding validation for your work. I feel this is a result of early childhood values spinning the perception of working life away from the actual reality. As a child, especially during our school years, we discovered that when we do good things, good things happen to us. Remember how it felt to get that gold star in your spelling test or that A+ on an English essay?

That was great in school and even through to University but working life is a much more terse environment and getting positive reenforcement is much less common. As professionals, the majority of our work goes unnoticed - until there is a problem or issue to solve. Then we feel open to the stinging criticism but resentful that months of good work went unnoticed.

When we continue to search for a gold star or continually strive for perfectionism we, as individuals, become much more insular and isolated. We tend to avoid showing our work until it is 100% ready and put up a shield to protect us from any feedback, in fear of being made to look stupid or being called a fraud.

Empowerment

There is a considerable amount of research to suggest children who have bad experiences and manage to overcome them tend to grow up to become more rounded adults. By encouraging grit and allowing kids to solve their own problems, children learn they are empowered. They become more creative, more respectful, less dependant on others and display less problem behaviour.

This is also relatively easy to implement and can be as simple as involving children in a decision-making process. ‘What do you want to eat with dinner; carrots or broccoli’? This might progress to ‘What colour socks do you want to wear?’ or ‘Which swing do you want to play on?’

The concept of preventing yourself from controlling a situation is really key to successful coaching. Rescuing people by dictating an outcome requires one weak person and one strong person. This propagates itself so people drift towards being a victim or a rescuer. A much better outcome would be a group of confident, empowered individuals who are able to work together.

[Read More]

Hobknob v1.0: Now with authorization

We are pleased to announce the version 1.0 release of Hobknob, our open-source feature toggle management system. With it comes a few additions and several improvements.

This post will expand on some of the changes, in particular, authorisation via access control lists.
For an introduction to Hobknob, see our previous post: Introducing Hobknob: Feature toggling with etcd.

Authorisation with ACLs

A much requested feature was the ability to control who can add/update/delete toggles on an application by application basis. We achieve this via the use if an Access Control List for each application. Users that are part of the ACL for an application are known as application owners.

Hobknob Owner List

Application owners can (for an owned application):

  • Add toggles
  • Set the value of a toggle
  • Delete toggles
  • Add additional owners
  • Remove owners

Everyone can:

  • Add an application
  • See toggles
  • See application owners
  • See the audit trail for a toggle

When a user creates an new application, they are automatically added as an owner for that application.
The user can then add other application owners by clicking the ‘Add user’ button in the Owners panel and entering the users email address.

[Read More]

PuppetConf 2014 - Part 3

Day 2

This is our summary of PuppetConf 2014. In our previous post we gave an overview of the first day of the conference. This post will provide an
overview of the final day.

There were even more inspiring keynotes and lots more talks which have given us plenty of ideas to go home and think about.

Key Notes

Animating the Puppet: Creating a Culture of Puppet Adoption - Dan Spurling (@spurling), Getty Images - Slides



Dan Spuring, VP of Tech Services at Getty came out of the gate with a strong message. His GSD t-shirt
giving you a clear understanding of who he is. His talk about creating a culture of Puppet adoption at his company was a great story of how challenging it
can be to move various business units with projects of various ages to a configuration-management (with Puppet) ethos.

I think it is good to hear that they are rolling cm out into that huge backlog of legacy infrastructure that we all try to pretend isn’t there.
How do you make it integrate into existing processes? How do you sell the DevOps message at the same time as introducing a tool like Puppet into the mix as
part of that message? Dan gave some thoughts on this and it was good to hear some of that from someone who appears to be on the other side of that challenge.

One of the analogies that he used I that found quite useful was that undertaking a project like this is like moving a boulder. It requires an executive sponsor to
get the thing moving at all and then it requires everyone pulling in the same direction if it’s ever doing to get anywhere.

The big take-away was that you need to puppetize right away - that you can’t wait for the right environment or conditions to start doing it, you just need
start now and demonstrate it. This echo’s the Continuous Delivery ideal of “if it hurts, then do it more often”.

[Read More]