Thursday, August 26, 2010

Netflix for iPhone in the cloud and HTML5

I posted on this subject a while ago, and generated a lot of confused comments, who read things into my post that weren't there. Today I hope it's a bit clearer what I was talking about, because we released the Netflix iPhone app, which is based on HTML5 for its user interface elements, back ends into the Amazon cloud, and uses conventional, non-HTML5 video playback and DRM. The playback mechanism is the same as the iPad, but to support the scale of expected usage we needed more capacity (it's currently the top free app in the app store as I write this) and we got that by rebuilding our API tier and personalized movie choosing backend to run on the Amazon cloud. The user interface runs in a webkit based browser window in the app, just like on iPad, but the entire UI is built using Javascript with advanced CSS and HTML5 animations to get it to feel like a native iPhone app.

Returning to the theme of my last post, Netflix is hiring engineers to work on cloud tools, platforms and performance, and advanced user interfaces. I think we are breaking new ground in both areas and its an exciting place to be. We have very high standards and are looking for the best people in the industry to come and help us...

Wednesday, August 18, 2010

Eventual Consistency of Cloud?

Lori MacVittie wrote that eventually cloud standards will converge around the private cloud standards (RT @swardley). I disagree, and there are several examples I can think of that point to the opposite conclusion, that public cloud will be the standard, and AWS will continue to dominate. The original article is here.

The first point of disagreement is the claim that public clouds aren't getting the input they need from customers to mature their services, because real enterprise customers aren't running in the public cloud. Well, here at Netflix, we are giving Amazon exactly that kind of input. We use every feature they have, we are driving them hard and Amazon is taking the input and improving their product rapidly in ways that benefit all the users of AWS.

My main disagreement is the claim that lots of individual IT departments will converge on a single standard that will win out over public cloud standards. I find this highly implausible, there are a host of vendors feeding technology to enterprise clouds, and they will all do the usual vendor thing of looking for ways to lock in the customer, even if they base on the same standards their implementations will be different. Here's an example, the "private" Enterprise Unix variants Solaris, AIX, HP-UX, IRIX, OSF/1 etc. are all based on the same Unix standards, however the "public" alternatives are Linux and BSD, and Linux has won the mind share in this space. I see Linux as an analogy for the public cloud in the sense that there is a very low barrier to adoption. For Linux, you can download it to run on any old computer for nothing, learn to use it, then build very low cost solutions out of it. For AWS, for a few dollars on the existing Amazon account you use to buy books etc, you can explore all the features and learn to build very powerful systems in a few hours. This has produced a large population of very productive engineers, who know how to use AWS, Linux and other open source tools to solve problems rapidly at low cost using the same tools. In contrast, if every enterprise cloud moves ahead by solving their problems independently they will produce a variety of architectures, each optimized to their own problem, and with their own tooling, and a very small number of people who know how to run each variant. You will also find that every company also uses the public cloud to get stuff done more quickly and cheaply than the IT department, so that will become the common standard.

Part of the thinking behind Netflix' move to the cloud is that large public cloud providers like Amazon will have far more engineers working on making their infrastructure robust, scalable, well automated and secure than any individual enterprise could afford. By using AWS we are leveraging a huge investment made by Amazon, and paying a small amount for it on a month by month basis. We also get to efficiently allocate resources, for example how much does it cost to provision a large cage of equipment in a new datacenter and how long does it take from deciding to do it, to having it running reliably in production? Let's say $10M and many months. Instead we could spend $10M on licensing more movies and TV shows to stream, and grow incrementally in the cloud. In a few months time we have more customers than if we spent the money up front on buying compute capacity and we just keep re-provisioning new instances in the cloud, so we never end up with a datacenter full of inappropriately sized or obsolete equipment. At present, Netflix' growth is accelerating, so it is difficult to guess in advance how much capacity to invest in, but we have already flat-lined our datacenter capacity, and have all incremental capacity happening on AWS. For example, we can just fire up a few thousand instances for a week to encode all the new movies we just bought the rights to, then stop paying for them until another big deal closes. Likewise on The Oscars Awards night, there is a big spike in web traffic, and we can grow on the day and shrink afterwards as needed without planning it and buying hardware a long time in advance.

While the other public and private cloud vendors are competing to come up with standards, we are finding that resumes from the kind of engineers we want to hire already reference their experience with AWS as a de-facto cloud standard. It's also easier to attract the best people if they will learn transferable skills and work on the very latest technologies.

That might sound like a lock-in, but a well designed architecture is layered, and the actual AWS dependencies are very localized in our code base. The bet on the end game is that in coming years, other cloud vendors produce large scale AWS compatible offerings (full featured, not just EC2 and S3), and a very large scale multi-vendor low cost public cloud market is created. Then even a large and fast growing enterprise like Netflix will be an insignificant and ever smaller proportion of the cloud. By definition, you can't be an insignificant proportion of your own private cloud....

Monday, August 16, 2010

Reducing TCP retransmit timeout?

Cloud networks are lossy and low latency, reducing TCP_RTO_MIN and TCP_DELACK_MIN looks like a good idea, but it looks as if this needs a linux kernel recompile. Anyone else looked at this?
Here is a relevant paper “Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication“
http://www.cs.cmu.edu/~vrv/papers/sigcomm147-vasudevan.pdf

Friday, August 06, 2010

Open letter to my Sun friends at Oracle

I recently heard about Illumos via a tweet from Alec Muffett, and responded with my own tweet "I predict that #illumos will be just as irrelevant as Solaris has been for the last few years. Legacy." - personally I haven't logged into a Solaris or SPARC machine for about four years now. There are none at Netflix.

I have also been talking to a few friends who stayed at Sun and are now at Oracle, and there is a common thread that I decided to put out there in this blog post.

This week I presented at a local Computer Measurement Group meeting, talking about how easy it is to use the Amazon cloud to run Hadoop jobs to process terabytes of data for a few bucks [slideshare]. I followed a talk on optimizing your Mainframe software licensing costs by tweaking workload manager limits. There are still a lot of people working away on IBM Mainframes, but it's not where interesting new business models go to take over the world.

The way I see the Oracle/Sun merger is that Oracle wanted to compete more directly with IBM, and they will invest in the bits of Sun that help them do that. Oracle has a very strong focus on high margin sales, so they will most likely succeed in making good money with help from Solaris and SPARC to compete with AIX, z/OS and P-series, selling to late-adopter industries like Banking, Insurance etc. Just look where the Mainframes are still being used. Sun could never focus on just the profitable business on its own, because it had a long history of leading edge innovation that is disruptive and low margin. However, what was innovative once is now a legacy technology base of Solaris and SPARC, and it's not even a topic of discussion in the leading edge of disruptive innovators, who are running on x64 in the cloud on Linux and a free open source stack. There is no prospect of revenue for Oracle in this space, so they are right to ignore it.

That is what I meant when I tweeted that Illumos is as irrelevant as Solaris, and it is legacy computing. I don't mean Solaris will go away, I'm sure it will be the basis of a profitable business for a long time, but the interesting things are happening elsewhere, specifically in public cloud and "infrastructure as code".

You might point to Joyent, who use Solaris, and now have Bryan Cantrill on board, but they are a tiny bit-player in cloud computing and Amazon are running away with the cloud market, and creating a set of de-facto standard APIs that make it hard to differentiate and compete. You might point to enterprise or private clouds, but as @scottsanchez tweeted: "Define: Private Cloud ... 1/2 the features of a public cloud, for 4x the cost", that's not where the interesting things are happening.

So to my Sun friends at Oracle, if you want to work for a profitable company and build up your retirement fund Oracle is an excellent place to be. However, there are a lot of people who joined Sun when it was re-defining the computer industry, changing the rules, disrupting the competition. If you want some of that you need to re-tool your skill set a bit and look for stepping stones that can take you there.

When Sun shut down our HPC team in 2004 I deliberately left the Enterprise Computing market, I didn't want to work for a company that sold technology to other companies, I wanted to sell web services to end consumers, and I had contacts at eBay who took me on. In 2007 I joined Netflix, and it's the best place I've ever worked, but I needed that time at eBay to orient myself to a consumer driven business model and re-tool my skill set, I couldn't have joined Netflix directly.

There are two slideshare presentations on the Netflix web site, one is on the company culture, the other on the business model. It is expected that anyone who is looking for a job has read and inwardly digested them both (its basically an interview fail if you haven't). These aren't aspirational puff pieces written by HR, along with everyone else in Netflix management (literally, at a series of large offsites), I was part of the discussion that helped our CEO Reed Hastings write and edit them both.

What can you do to "escape"? The tools are right there, you don't need to invest significant money, you just need to carve out some spare time to use them. Everything is either free open source, or available for a few cents or dollars on the Amazon cloud. The best two things you can have on your resume are hands on experience with the Amazon Web Services tool set, and links to open source projects that you have contributed to. There isn't much demand for C or C++ programmers, but ObjectiveC is an obvious next step, it's quite fun to code in and you can develop user interfaces for iPhone/iPad in a few lines of code, that back-end into cloud services. Java code (for app servers like Tomcat) on Android phones, Ruby-on-Rails, and Python are the core languages that are being used to build innovative new businesses nowadays. If you are into data or algorithms, then you need to figure out how to use Hadoop, which as I describe in one of my slideshare decks is trivially available from Amazon. You can even get an HPC cluster on a 10Gbit ethernet interconnect from Amazon now. There is hadoop based open source algorithm project called Mahout that is always looking for contributors.

To find the jobs themselves, spend time on LinkedIn. I use it to link to anyone I think might be interesting to hire or work with. Your connections have value since it is always good to hire people that know other good people. Keep your own listing current and join groups that you find interesting, like Java Architecture or Cloud Computing, and Sun Alumni. At this point LinkedIn is the main tool used by recruiters and managers to find people.

Good luck, and keep in touch (you can find me on LinkedIn or twitter @adrianco :-)

Thursday, August 05, 2010