Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 252 Vote(s) - 3.58 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Can PTEST be used to test if two registers are both zero or some other condition?

#1
What can you do with [SSE4.1 `ptest`][1] other than testing if a single register is all-zero?

Can you use a combination of SF and CF to test anything useful about two unknown input registers?

What is PTEST good for? You'd think it would be good for checking the result of a packed-compare (like PCMPEQD or CMPPS), but at least on Intel CPUs, [it costs more uops to compare-and-branch using PTEST + JCC than with PMOVMSK(B/PS/PD) + macro-fused CMP+JCC.][2]

See also

[To see links please register here]



[1]:

[To see links please register here]

[2]:

[To see links please register here]

Reply

#2
No, unless I'm missing something clever, `ptest` with two unknown registers is generally not useful for checking some property about both of them. (Other than obvious stuff you'd already want a bitwise-AND for, like intersection between two bitmaps).

To test two registers for both being all-zero, OR them together and PTEST that against itself.

----

`ptest xmm0, xmm1` produces two results:

* ZF = is `xmm0 & xmm1` all-zero?
* CF = is `(~xmm0) & xmm1` all-zero?

**If the second vector is all-zero, the flags don't depend at all on the bits in the first vector.**

It may be useful to think of the "is-all-zero" checks as a `NOT(bitwise horizontal-OR())` of the AND and ANDNOT results. But probably not, because that's too many steps for my brain to think through easily. That sequence of vertical-AND and then horizontal-OR does maybe make it easier to understand why PTEST doesn't tell you much about a combination of two unknown registers, just like the integer TEST instruction.

Here's a truth table for a 2-bit `ptest a,mask`. Hopefully this helps in thinking about mixes of zeros and ones with 128b inputs.

Note that `CF(a,mask) == ZF(~a,mask)`.

a mask ZF CF
00 00 1 1
01 00 1 1
10 00 1 1
11 00 1 1

00 01 1 0
01 01 0 1
10 01 1 0
11 01 0 1

00 10 1 0
01 10 1 0
10 10 0 1
11 10 0 1

00 11 1 0
01 11 0 0
10 11 0 0
11 11 0 1

---

[Intel's intrinsics guide lists 2 interesting intrinsics for it][1]. Note the naming of the args: `a` and `mask` are a clue that they tell you about the parts of `a` selected by a known AND-mask.

* `_mm_test_mix_ones_zeros (__m128i a, __m128i mask)`: returns `(ZF == 0 && CF == 0)`
* `_mm_test_all_zeros (__m128i a, __m128i mask)`: returns `ZF`

There's also the more simply-named versions:

* `int _mm_testc_si128 (__m128i a, __m128i b)`: returns `CF`
* `int _mm_testnzc_si128 (__m128i a, __m128i b)`: returns `(ZF == 0 && CF == 0)`
* `int _mm_testz_si128 (__m128i a, __m128i b)`: returns `ZF`

There are AVX2 `__m256i` versions of those intrinsics, but the guide only lists the all_zeros and mix_ones_zeros alternate-name versions for `__m128i` operands.

If you want to test some other condition from C or C++, you should use `testc` and `testz` with the same operands, and hope that your compiler realizes that it only needs to do one PTEST, and hopefully even use a single JCC, SETCC, or CMOVCC to implement your logic. (I'd recommend checking the asm, at least for the compiler you care about most.)

----

Note that `_mm_testz_si128(v, set1(0xff))` is always the same as `_mm_testz_si128(v,v)`, because that's how AND works. But that's not true for the CF result.

**You can check for a vector being all-ones** using

bool is_all_ones = _mm_testc_si128(v, _mm_set1_epi8(0xff));

This is probably no faster, but smaller code-size, than a PCMPEQB against a vector of all-ones, then the usual movemask + cmp. It doesn't avoid the need for a vector constant.

PTEST does have the advantage that it doesn't destroy either input operand, even without AVX.

[1]:

[To see links please register here]

Reply



Forum Jump:


Users browsing this thread:
2 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through