Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 211 Vote(s) - 3.49 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How can I calculate the RCUs and WCUs from Cassandra for a AWS Keyspace cost estimation?

#1
In order to consider AWS Keyspaces as an alternative to an on-prem Cassandra cluster, I'd like to do a cost estimation. However, the keyspaces pricing is based on write request units (WRUs) and read capacity units (RCUs).

[To see links please register here]


> Each RRU provides enough capacity to read up to 4 KB of data with LOCAL_QUORUM consistency.
> Each WRU provides enough capacity to write up to 1 KB of data per row with LOCAL_QUORUM consistency

What metrics in Cassandra can be used for calculating the RCUs and WCUs for an existing cluster?
Reply

#2
Currently we are storing iostats information (in every sec). Based on that information we were able to come up with an approx RC and WC. (+- 10% error margin, 95% confidence level)

We are going to cross check our numbers with the AWS folks soon.

Example:

[![enter image description here][1]][1]

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
abc 0.00 0.00 1.00 0.00 0.03 0.00 64.00 0.00 0.00 0.00 0.00 0.00 0.00

We use the following calculation:
10,000 writes, of up to 1Kb, per second, in the AWS-EAST region, cost will be

Write cost:
On-demand capacity mode
=$1.45 * 0.01 * 60 * 60 * 24 * 365 = $457,272 per year

Provisioned capacity mode
=$0.00075 * 0.01 * 60 * 60 * 24 * 365 = $236.52 per year


[1]:



Updated: AWS folks are calculating based on a table partition size, which is wrong IMO.
Reply

#3
Some accuracy can be lost by using IOPs. Cassandra has a lot of iops overhead. On reads, cassandra can be reading from multiple sstables. Cassandra also performs background compaction and repair which consumes iops. This is not factor in Amazon Keyspaces. Additionally, Keyspaces scales up and down based on utilization. Taking the average at a point in time will only provide you with a single dimension of cost. You need to take an average that represents a large period of time to cover for peaks and valleys of your workload. Workloads tend to look like sine or cosine waves instead of a flat line.

Gathering the following metrics will help provide more accurate cost estimates.

* Results of the Average row size report(below)
* Table live space in GBs divided by replication factor
* Average writes per second over extended period
* Average reads per second over extended period



# Storage size

Table live space in GBs

This method uses Apache Cassandra sizing statistics to determine the data size in Amazon Keyspaces. Apache Cassandra exposes storage metrics via Java Management Extensions (JMX). You can capture these metrics by using third-party monitoring tools such as DataStax OpsCenter, Datadog, or Grafana.
Capture the table live space from the cassandra.live_disk_space_used metric. Take the LiveTableSize and divide it by the replication factor of your data (most likely 3) to get an estimate on Keyspaces storage size. Keyspaces replicates data three times in multiple AWS Availability Zones automatically, but pricing is based on the size of a single replica.

Table live space is 5TB and have replication factor of 3. For the us-east-1 you would use the following formula

```
(Table live space in GB / Replication Factor) * region storage price per gb

5000 / 3 * 0.3 = 500$ per month.

```

# Collect the Row Size

Results of the Average row size report

Use the following script to collect row size metrics for your tables. The script exports table data from Apache Cassandra by using cqlsh and then uses awk to calculate the min, max, average, and standard deviation of row size over a configurable sample set of table data. Update the username, password, keyspace name, and table name placeholders with your cluster and table information. You can use dev and test environments if they contain similar data.

[To see links please register here]


```

./row-size-sampler.sh YOURHOST 9042 -u "sampleuser" -p "samplepass"

```
The output will be used in the request unit calculation below. If your model uses large blobs then divide the average size by 2 because cassandra returns a hex value character representation.

# Read/write request metrics

Average writes per second/Total writes per month
Average reads per second/Total reads per month

Capturing the read and write request rate of your tables will help determine capacity and scaling requirements for your Amazon Keyspaces tables.
Keyspaces is serverless, and you pay for only what you use. The price of Keyspaces read/write throughput is based on the number and size of requests.

To gather the most accurate utilization metrics from your existing Cassandra cluster, you will capture the average requests per second (RPS) for coordinator-level read and write operations. Take an average over an extended period of time for a table to capture peaks and valleys of workload.

average write request per second over two weeks = 200 writes per second
average read request per second over two weeks = 100 read request per second


#### LOCAL_QUORUM READS
```
=READ REQUEST PER SEC * ROUNDUP(ROW SIZE Bytes / 4096) * RCU per hour price * HOURS PER DAY * DAYS PER MONTH

200 * (900 bytes / 4096) * 0.00015 * 24 * 30.41 = 27$ per month
```

#### LOCAL_ONE READS
Using eventual consistency reads can save you half the cost on your read workload.
```
=READ REQUEST PER SEC * ROUNDUP(ROW SIZE Bytes / 8192) * RCU per hour price * HOURS PER DAY * DAYS PER MONTH

200 * (900 bytes / 4096) * 0.00015 * 24 * 30.41 = 14$ per month
```
#### LOCAL_QUORUM WRITES

```
=WRITE REQUEST PER SEC * ROUNDUP(ROW SIZE Bytes / 1024) * RCU per hour price * HOURS PER DAY * DAYS PER MONTH

100 * (900 bytes / 4096) * 0.00075 * 24 * 30.41 = 68$ per month

```


Storage 500$ per month
Eventual Consistent Reads 14$ per month
Writes 68$ per month

Total: 592 per month

To further reduce cost I may look use client side [compression on writes](

[To see links please register here]

) for large blob data or if I have many small rows, I may use collections to fit more data in a single row.

Check out the [pricing page](

[To see links please register here]

) for the most up-to-date information.
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through