Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 207 Vote(s) - 3.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Howto avoid cassandra tombstones when inserting NULL values

#1
My problem is that cassandra creates tombstones when inserting NULL values.

From what I understand, cassandra doesn't support NULLs and when NULL is inserted it just deletes the respective column. On one hand this is very space effective, however on the other hand it creates tombstones which degrades read performance.

This goes agains NoSql phillosophy because cassandra is saving space but degrading read performance. In NoSql world the space is cheap, however performance matters. I beleive this is the phillosophy behind saving tables in denormalized form.

I would like cassandra to use the same technique for inserting NULL as for any other value - use timestamping and during compaction preserve the latest entry - even if the entry is NULL (or we can call it "unset").
Is there any tweak in cassandra config or any approach how I would be able to achieve upserts with nulls without having tombstones ?

I came across [this issue][1] however it only allows to ignore NULL values

**My use case**:
I have stream of events, every event identified by causeID. I'm receiving many events with same causeId and I want to store only the latest event for the same causeID (using upsert). The properties of the event may change from NULL to specific value, but also from specific value to NULL. Unfortunatelly the later case generates tombstones and degrades read performance.

**Update**

It seems there is no way how I could avoid tombstones. Could you advice me on techniques how to minimize them (set gc_grace_seconds to very low value). What are the risks, what to do when a node goes down for a longer period than gc_grace_seconds ?

[1]:

[To see links please register here]

Reply

#2
You cannot avoid tombstones if you particularly mention NULL in your INSERT. C* does not do a lookup before insert or writing a data which makes the writes very faster. For this purpose, C* just inserts a tombstone to avoid that value later (taking the latest update comparing the timestamp). If you want to avoid tombstone (which is recommended), you've to prepare different combinations of queries to check each one for NULL before adding it to the INSERT. If you have very few fields to check then it'll be easy to just add some IF-ELSE statements. But if there are lots of them, the code will be bigger and less readable. Shortly, you cannot insert NULL which will impact read performance later.

[To see links please register here]

Reply

#3
You can't insert NULL into Cassandra - it has special meaning there, and lead to creation of tombstones that you observe. If you want to treat NULL as special value, why not to solve this problem on application side - when you get `null` status, just insert any special value that couldn't be used in your table, and when you read data back, check for that special value and output `null` to requester...
Reply

#4
When we want to just insert or update rows using null for values that are not specified, and even though our intention is to leave the value empty, Cassandra represents it as a tombstone causing unnecessary overhead which degrades performance.

**To avoid such tombstones for save operations, cassandra has the concept of unset for a parameter value.**

So you can do the following to unset a field value while saving to avoid tombstone overhead for example related to different cases:

**1). If you are using express-cassandra then :**

const user = new models.instance.User({
user_id: 1235,
user_name: models.datatypes.unset // this will not create tombstone when we want empty user_name or null
});
user.save(function(err){
// user_name value is not set and does not create any unnecessary tombstone overhead
});


**2). If you are writing cassandra raw query then for empty or null field when you know say colC will be null, then don't use it in your query.**

`insert into my_table(id,colA,colB) values(idVal,valA,valB) // Avoid colC`

**3). If you are using Node.Js Driver, you can even pass `undefined` on insert or update which will avoid tombstone overhead. For example**

const query = 'INSERT INTO my_table (id, colC) VALUES (?, ?)';
client.execute(query, [ id, undefined ]);

**4). If you are using c# driver then**

// Prepare once in your application lifetime
var ps = session.Prepare("INSERT INTO my_table (id, colC) VALUES (?, ?)");

// Bind the unset value in a prepared statement
session.Execute(ps.Bind(id, Unset.Value));

For more detail on **express-cassandra** read the sub topic **Null and unset values** of

[To see links please register here]


For more detail on **Node.js driver unset feature** refer datastax

[To see links please register here]


For more detail on **Csharp driver unset feature** refer datastax

[To see links please register here]


**NOTE: I tested this on Node.js cassandra 4.0 But unset feature is introduced after cassandra 2.2**

Hope this will help you or somebody else.

Thanks!
Reply

#5
I don't think the other answers address the original question, which is how to overwrite a non-null value in Cassandra with null *without* creating a tombstone. The nearest is Alex Ott's [suggestion to use some special value other than null][1].

However, with a little bit of trickery you *can* insert an explicit null into Cassandra by exploiting a FROZEN tuple or user-defined type. The FROZEN keyword effectively serialises the user defined type and stores the serialised representation in the column. Crucially, the serialised representation of a UDT containing null values is not itself null.

> CREATE TYPE test_type(value INT);
> CREATE TABLE test(pk INT, cl INT, data FROZEN<test_type>, PRIMARY KEY (pk, cl));
> INSERT INTO test (pk, cl, data) VALUES (0, 0, {value: 15});
> INSERT INTO test (pk, cl, data) VALUES (0, 0, {value: null});
> INSERT INTO test (pk, cl) VALUES (0, 1);
> SELECT * FROM test;

pk | cl | data
----+----+---------------
0 | 0 | {value: null}
0 | 1 | null

(2 rows)

Here we wrote 15, then overwrote it with null, and finally added a second row to demonstrate that there is a difference between an unset cell and a cell containing a frozen UDT that itself contains null.

Of course the downside of this approach is that in your application you have to delve into the UDT for the actual value.

On the other hand, if you combine several columns into the UDT you do save a little overhead in Cassandra. (But you can't then read or write them individually. You also can't remove fields, though you can add new ones.)

[1]:

[To see links please register here]

Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through