Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 409 Vote(s) - 3.58 Average
  • 1
  • 2
  • 3
  • 4
  • 5
CQL with a wide row - how to get most recent set?

#1
How would I write the CQL to get the most recent set of data from each row?

I'm investigating transitioning from MSSQL to Cassandra and am starting to grasp the concepts. Lots of research has help tremendously, but I haven't found answer to this (I know there must be a way):

CREATE TABLE WideData {
ID text,
Updated timestamp,
Title text,
ReportData text,
PRIMARY KEY (ID, Updated)
} WITH CLUSTERING ORDER (Updated DESC)

INSERT INTO WideData (ID, Updated, Title, ReportData) VALUES ('aaa', NOW, 'Title', 'Blah blah blah blah')
INSERT INTO WideData (ID, Updated, Title, ReportData) VALUES ('bbb', NOW, 'Title', 'Blah blah blah blah')

wait 1 minute:

INSERT INTO WideData (ID, Updated, Title, ReportData) VALUES ('bbb', NOW, 'Title 2', 'Blah blah blah blah')

wait 3 minutes:

INSERT INTO WideData (ID, Updated, Title, ReportData) VALUES ('aaa', NOW, 'Title 2', 'Blah blah blah blah')


wait 5 minutes:

INSERT INTO WideData (ID, Updated, Title, ReportData) VALUES ('aaa', NOW, 'Title 3', 'Blah blah blah blah')

How would I write the CQL to get the most recent set of data from each row?

SELECT ID, Title FROM WideRow - gives me 5 rows, as it pivots the data for me.

Essentially I want the results for (SELECT ID, Title FROM WideRow WHERE .....) to be:

ID Title
aaa, Title3
bbb, Title2

Also, is there a way to get a count of the number of data sets in a wide row?

Essentially the equivalent of TSQL: SELECT ID, Count(*) FROM Table GROUP BY ID

ID Count
aaa 3
bbb 2

Thanks

Also, any references to learn more about these types of queries would also be appreciated.
Reply

#2
With your current data model, you can only query the most-recent row by partition key. In your case, that is `ID`.

SELECT ID, Title FROM WideData WHERE ID='aaa' LIMIT 1

Since you have indicated your clustering order on `Updated` in DESCending order, the row with the most-recent `Updated` timestamp will be returned first.

Given your desired results, I'll go ahead and assume that you do not want to query each partition key individually. Cassandra only maintains CQL result set order by partition key. Also Cassandra does not support aggregation. So there really is no way to get the "most recent" for all of your `ID`s together at once, nor is there a way to get a report of how many updates each `ID` has.

With Cassandra data modeling, you need to build your tables to suit your queries. Query "planning" is not really a strong point of Cassandra (as you are finding out). To get the most-recent updates by `ID`, you would need to build an additional query table designed to store only the most-recent update for each ID. Likewise, to get the count of updates for each `ID` you could create an additonal query table using [counter coulmns][1] to suit that query.

**tl;dr**

In Cassandra, denormalization and redundant data storage is the key. For some applications, you might have one table for each query you need to support...and that's ok.

[1]:

[To see links please register here]

Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through