The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra’s support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages. Cassandra’s data model offers the convenience of column indexes with the performance of log-structured updates, strong support for denormalization and materialized views, and powerful built-in caching.
Apache Cassandra is available here.
Discord continues to grow faster than we expected and so does our user-generated content. With more users comes more chat messages. In July, we announced 40 million messages a day, in December we announced 100 million, and as of this blog post we are well past 120 million. We decided early on to store all chat history forever so users can come back at any time and have their data available on any device. This is a lot of data that is ever increasing in velocity, size, and must remain available. How do we do it? Cassandra!
If you are Uber and you need to store the location data that is sent out every 30 seconds by both driver and rider apps, what do you do? That’s a lot of real-time data that needs to be used in real-time. Uber’s solution is comprehensive. They built their own system that runs Cassandra on top of Mesos. It’s all explained in a good talk by Abhishek Verma, Software Engineer at Uber: Cassandra on Mesos Across Multiple Datacenters at Uber.
Apache Cassandra hands down. Why do I say this? Whenever we're looking at a choice in distributed databases we need to consider the type of distributed problems they solve. Apache Cassandra prioritizes Availability (continuous availability) and Partition tolerance to achieve zero downtime. It also allows for tunable consistency as part of a trade-off, which means that we can get stronger consistency on our data if we pay a performance price. If we choose eventual consistency, we retain the high performance Cassandra is known for and while writes or reads may lag on the freshest update this is only by a matter of milliseconds. This is okay for our purposes because we can plan for this in our application, whereas sluggish operations would be crippling.
For now, though, Cassandra stands at the front of the NoSQL pack when it comes to supporting real-time, interactive (non-analytics) Big Data applications.
Companies running their applications on Apache Cassandra have realized benefits which have directly improved their business. Cassandra is capable of handling all of the big data challenges that might arise: massive scalability, an always on architecture, high performance, strong security, and ease of management, to name a few. Learn about how businesses have successfully deployed Apache Cassandra in their environments based on various types of applications and use cases.
1.Open Source 2. Peer to Peer Architecture 3. Elastic Scalability 4. High Availability and Fault Tolerance 5. High Performance 6. Column Oriented 7. Tunable Consistency 8. Schema-Free
“Real-time insights is number one,” the executive says. “Doing what you have traditionally have done in a transactional relational data or perhaps on a Hadoop system and actually moving that into a real-time framework where we talking about data coming in and being available…in seconds rather than in minutes or in hours.”
Scalability, scalability, scalability … I like it, do cassandra allow me to store my data on different servers (without SAN) ? I am not talking here of replication, I speak of a single NoSQL server across multiple physical server.
It's no secret that big data is driving the world to turn to databases well-equipped to handle the velocity, variety, and volume of modern applications. When we look at the world's most popular databases, and how that popularity has shifted over time, this is apparent .... In sum, we're likely to see Oracle atop the database rankings for several years, but we're also going to see MongoDB and Cassandra rapidly gain on the top-three database leaders as enterprises turn to NoSQL to tame big data and rally around winners.
When I ran infrastructure, I wanted the no-brainer. I say this all the time: the database should be the most boring thing in your datacenter. Is it scaling? Yep. Is it online? Yep. Boring.