All things database: Q A with DataStax's Jonathan Ellis

INSUBCONTINENT EXCLUSIVE:
The Cassandra project doesn’t have a Benevolent Dictator For Life, but if it did that title would surely go to Jonathan Ellis
The co-founder of DataStax, Jonathan Ellis has been involved with Cassandra since the time it was open-sourced by Facebook
Once the project graduated from the incubator at the Apache Software Foundation (ASF), he served as its first Project Chair for the next six
years. It was difficult to tear Jonathan from his fans at DataStax’s Accelerate conference in Washington, but Mayank Sharma lured him by
disguising himself as one as well.Linux Format: You’ve always been a database guy… How did you find your way to Apache
Cassandra?Jonathan Ellis: It’s true that I’ve always been interested in database technology, but I initially thought that the database
space was just making incremental improvements on well-understood solutions until I joined a cloud backup company called Mozy in 2005
I built an object store there that scaled to petabytes of data and gigabits per second of throughput, and one of its features was
single-instance storage
That is, no matter how many users uploaded the same video or the same binary, we’d only store one copy in the backup storage.This in turn
meant that we needed a way to track which users had copies of which files – scaling to millions of users and billions of files
That’s when I realised that we needed new database architectures to deal with the challenges of web and mobile applications
Existing databases were optimised for applications that dealt with a single company’s worth of users, but now we needed to scale to an
entire country
It was a very different problem that required different trade-offs. Just a few months after Facebook open-sourced the Cassandra project,
Rackspace hired me to work on the challenge of scalable databases
I got to dig deep into Cassandra and the alternatives that were starting to grow in this space, and I was really attracted to its marriage
of a rich, tabular data model with a fully distributed, masterless approach to scalability and fault tolerance
As a result, I started working on the code and on building the community, and when Cassandra graduated from the ASF incubator, I became the
first project chair
Getting involved with Cassandra is one of the best decisions I have ever made.(Image credit: Shutterstock)LXF: I heard you once broke
Facebook… is that true?JE: Actually, I’m probably one of the only people who doesn’t work for Facebook that has broken Facebook
It happened after I was made committer on the Apache project; I think the feature [which was changed] was adding support for deleting rows,
which didn’t exist when Facebook open-sourced it – that’s how raw it was.So I committed that on, I think, a Friday afternoon, and over
the weekend I’m getting frantic instant messages from Avinash [Lakshman, co-author of Cassandra] at Facebook: “What happened? We
deployed the latest version and everything broke.” That’s how we learned how you don’t deploy from trunk, and also that CI/CD is an
important part of your development chain.LXF: And you probably stopped being so cavalier about pushing changes.JE: Well, almost literally
they were the only people using Cassandra at the time, and I didn’t realise that they were running from trunk
It was probably about another six months or so before we even had non-Facebook people using it.(Image credit: Shutterstock)LXF: Twitter?JE:
Twitter would have been, and the timeline’s fuzzy, but Twitter was one of the early ones
Digg also
Actually Digg was before Twitter, and then Twitter hired a bunch of the Digg engineers and that’s how Twitter started using it
So they were early users.LXF: When did you realise you could spin a profitable company around the database itself?JE: A venture capitalist
named John Vrionis tracked me down in the spring of 2009, just a couple of months after Cassandra joined the ASF
He was looking for early-stage projects in the big data space, and we had a good talk about NoSQL databases in general and Cassandra in
particular. In retrospect, I wish I had been less timid, but I told him that I thought it was too early to start a company around Cassandra
What gave me the extra push was when an early Cassandra user went with a different, worse technology because the alternative had commercial
support available
So about a year after first talking to John, I started DataStax, and John and Lightspeed Venture Partners decided to lead our series A
investment round.LXF: I talked to Robin Schumacher about the kind of symbiotic relationship between DataStax and the open source Cassandra
community and the positives of such a relationship
But this relationship has also created some issues right? With Twitter, for instance, back in 2010?JE: There are going to be times when the
needs of the community and a business are the same, and there will also be times where what the community sees as its goals are not aligned
with what the business wants to achieve
One of those times was when I stepped back from leading the Apache Cassandra Committee (PMC) in 2016 to give members of the community more
scope to lead on what they wanted to create.However, we are keen to stress how committed we are to the community
At DataStax, we’re increasing our support for Cassandra, continuing to invest in improved drivers and contributing support to get
Cassandra 4.0 production-ready through testing and bug fixes
Alongside the code and drivers we contribute, we want to make it easier for more developers to get started with Cassandra, so that means
maintaining our documentation and hosting more events to expose people to what Cassandra can achieve and what’s new in the latest
version.We believe this approach lets the community set the goals for where they want Cassandra to go in the future, while we help the
community achieve those goals for more people
I think of this not as being the hand on the steering wheel but the fuel in the tank: we are not directing or leading the community where we
think things should go, but helping the community get to its desired destination.LXF: In that respect, one of the themes of the conference
is DataStax re-engaging with the Cassandra community
But what does that actually mean in quantifiable terms?JE: Well, I would say the most quantifiable piece is that we have a distribution of
Apache Cassandra now, and so we’ve always contributed bug fixes from DataStax Enterprise back to Apache Cassandra
But it’s been on kind of a “We’ll do that when it’s convenient” kind of basis, but our incentives are a lot more aligned, now that
we actually have a product that supported open source, to contribute those back in a more immediate fashion
So internally we have been setting up the processes to make that happen and make sure we’re not lagging behind in either direction.LXF: In
your keynote you mentioned that eradicating complexities when rolling out a Cassandra cluster was one of the main motivations behind the
announcements at the conference
But there’s more to operating a Cassandra cluster
What’s the next problem that you want to solve?JE: We could talk about Kubernetes, Kafka integration, or DataStax’s new Graph release,
but the theme that ties these together is making DataStax and Cassandra easier to operate and easier to develop against
When I talk to our customers, almost none of them complain that we’re not powerful enough or fast enough or anything along those lines
Where we sometimes struggle is making that power available, understandable and consumable
The next frontier is really about making things easier, and all of those things fall into that category.(Image credit: Kevin Ku /
Pexels)LXF: You’ve been kind of firefighting since forever
People used to say that you were in the right place at the right time, that you had clients from the get-go
But you’ve also had detractors
I remember in early discussions on Slashdot people were quick to point out that Twitter doesn’t use Cassandra to save tweets
And then Facebook tried its best to say it doesn’t use Cassandra.JE: There’s kind of a nerd-politics, right? Like when you have an
architect at a company that wants to do things one way, there’s like political capital there
It’s not “What’s the right technical decision”, there are egos on the line
But I think the bigger factor with Facebook or Twitter is that both of these companies have some seriously extensive tools that they’ve
built and they have to manage tens of thousands of MySQL servers.So it’s a solid engineering principle
Facebook likes to say “Move fast and break things”
But if you go down to the data lair, there are some crusty greybeards there saying “No, if it ain’t broke, don’t fix it
You stay the f*** away from this.” The engineer in me respects that; I am not looking to go to someone and say, take what you have and
throw it away and build on Cassandra because it’s new and it’s shinier or it’s cooler
If I’m not solving a problem for you then I’m fine with that: use whatever you want, use what works for you.I want to go to people for
whom MySQL replication is a real pain point and solve their problems, because I can do that easier than a MySQL consulting shop
If you look at some of Facebook’s subsidiaries, Instagram is a big Cassandra user
I believe it’s their main datastore at Instagram
Netflix is another example where they went from Oracle on premises, and they said “We’re going to move to the cloud and we’re going to
adopt a better database technology as we are doing that”, so they went with Cassandra
The database tent is big enough for lots of families to live in.LXF: Some Sylla marketing people stood at the shoe shine and were giving out
their fluffy soft toys after asking if any of the people passing were here for the database conference
I have never seen that – it’s a big enough tent, like you said
But they’ve been doing that since 2015, right?JE: Yeah, they actually launched their product at our Cassandra Summit in 2015
They submitted a talk “How we accelerated Cassandra”, and we said “Oh, that sounds interesting, come and speak”
And they said, “Oh, we actually rewrote Cassandra”.(Image credit: Future)LXF: You’ve been doing conferences based around Cassandra for
almost a decade now, since the Cassandra Summit in 2010
Who is your audience this year?JE: To quote former Microsoft CEO Steve Ballmer, for us it’s about “developers, developers,
developers.” The community that exists around Cassandra is great, and we want to bring that awareness to a much bigger audience.We are
excited about supporting ApacheCon this year, about the new features we’re contributing to the Cassandra drivers, and about what will
happen when Cassandra 4.0 launches.We launched our inaugural DataStax Accelerate conference this year to bring the community together and
evangelise about what Cassandra can do, and we’ll be expanding our range of developer events so that more people can share their
experiences and learn more.You mentioned our Database as a Service in passing earlier
We’re very excited at how dramatically easy this makes it to run Cassandra
Cassandra is unquestionably a great database, but it is fair to say that you needed experience in order to make the most out of it. In
response, we’ve automated the hard parts as part of our Constellation cloud platform that’s launching this year, and we’re confident a
lot of people will find this resource valuable.The growth of the cloud over the past few years has been massive, but people don’t want to
be stuck with one single provider
They want to take advantage of their investments in their own infrastructure, as well as the strengths of the cloud, and get the best of
both worlds
Cassandra is unique in being able to run across multiple cloud services, or across internal IT and external services, as one seamless
database.As more people think about the cloud and how to make it work for them, Cassandra will play a big role in running those services at
scale.LXF: Do you think it’ll be every year?JE: I think so, yeah
I think the community needs the event, and DataStax needs the event, so there’s a good alignment there to make that happen on a yearly
basis.LXF: You’ve resolved the issues you had with the community?JE: Ah, I mean the issues were never really between the PMC (Project
Management Committee) and DataStax, it was more the Apache Board of Directors
The short version is that we are on good terms with the PMC, and we’ll leave it at that.LXF: You’re like the Linus Torvalds of
databases.JE: I’ll take that as a compliment…This interview was first published in Linux Format issue 256vEiXHZbWKMMSpkbbmnWVwP.jpg?#