Update for the last month

Sorry for the lack of updates, but I was on the end of project march and then off on paternity leave. I am hoping to resume regular posting soon. The project was very successful, we had a big push and brought another 120 tables online in our new Cassandra Cluster and migrated that data from SQL Server. Along the way it has given us a few fun design challenges.

Initially we were working around some limitations in keys in Cassandra. In SQL server often you will query on a column that may be null. In Cassandra none of the columns in your primary key can be null, which means you can’t query on that column since Cassandra doesn’t allow you to do adhoc queries. One work around we started with the obvious solution is to use a secondary index. However Datastax will tell you in general not to use them. We found in playing around with them in production we have just had issues with them. They seem to either get corrupted or be out of sync in some way with the tables very often so end up having to run a repair on that index to get the correct data. As a result of this we are completely moving away from secondary indexes. In the end it gave us some interesting data design problems, but I think we will end up with a much more resilient system in the end.

Posts

Recap for 2015

At the start of the year I posted my Themes for 2015. I decided now is a good time to look at what I was thinking at the start of the year and see how my year turned out. I think it is sort of pointless to set out some ideas of things you want to accomplish if you never stop and assess what you actually did, so this is sort of an accountability post to myself to see how things played out for the year.

Posts

Cassandra Days in Dallas 2015

I may have mentioned this before, but I love going to software conferences. When I got the email mentioning that Cassandra Days was coming to Dallas with a free 1 day conference on all things Cassandra, I signed up immediately. The event was sponsored by Datastax who sells a commercial version of Cassandra called Datastax Enterprise. They had 2 tracks an introductory track for people who are just exploring Cassandra, but haven’t yet taken it to production, and track 2 which was a deeper dive for people with experience with Cassandra.

Posts

Spring Boot for prototyping

I am on a new project at work that looks to be very interesting. I am redesigning our Cassandra layer. Currently we have a beautifully done layer that was designed and implemented by our former architect. It ends up making Cassandra look just like a JPA entity and we have Cassandra Repositories that look just like Spring Data JPA Repositories. After this was in place we discovered the Spring Data Cassandra project. We went to the talk on Spring Data Cassandra and it turns out they had implemented pretty much the system that our architect implemented.

Posts

Cassandra Data Modeling

I ended up having to miss the JHipster webinar last week as I was invited by my company to attend the Datastax DS220: Data Modeling with Datastax Enterprise class on Monday and Tuesday. The company came out and taught the class onsite. The instructor was Andrew Lenards and he did a great job.

I have been using Cassandra for a little while, but I hadn’t done anything serious with it. The CQL query language is all at once a great blessing and a curse. On the upside it is immediately familiar so anyone who has done SQL work can get comfortable creating tables and executing queries quickly. On the downside it sort of abstracts a few things about the data store away from you and I think at a certain point for performance you sort of need to understand what is going on under the hood. This class gave us that. It starts out presenting a data model like you might see in relational databases and then you work through the ways you might model that data in Cassandra and the trade offs of different models (which questions you can ask, which fields are required to ask those questions, etc). One of the biggest things I was missing prior to the class was the whole concept of partitions vs rows and what the partition key is vs the collating keys. I had been using the data store like a SQL database so that my partitions always had at most one row. We did a lot of looking at instead what if we model the data so the partitions have many rows and what are the advantages and disadvantages of doing so. On day two we got very deep in the technical aspects of what was going on under the hood, how data was stored on disk and how to do things like estimate partition sizes. We were also able to ask a lot of questions specific to how we have been using Cassandra in our organization and what the limitations are going to be as we expand its usage to even more areas of our product.