Allow to enable half and bfloat16 at the same time #1827

yhmtsai · 2025-04-15T11:17:02Z

This PR allow to enable bfloat16 and half precision at the same time.
Doing the operation between bfloat16 and half precision returns float type.
It will revert the gko::float16 alias changes in #1825 . float16 will be half precision again.
Moreover, it adds the next_precision_move<type, move> to represent the next_precision<next_precision<.... same for previous_precision_move

likely in another PR:

convert_to are just copy the same implementation for different interface. create a templated convert_to in private: and convert_to always call the templated function
instantiation macro, which we might apply the same technique as what I did in batch to use nested structure
allow run<> to accepts type_list and create a list for that

MarcelKoch

Some first notes, I still have to finish the rest of the PR. However, I think these are already the most relevant comments.

MarcelKoch · 2025-04-15T14:03:06Z

include/ginkgo/core/base/math.hpp


 template <typename T>
-struct next_precision_impl {};
+struct next_precision_impl {


maybe combine this with the move version as

template<typename T, size_t step = 1> struct next_precision_impl{...};

Then we don't need to use both next_precision and next_precision_move. As an example see: https://godbolt.org/z/dPK7ccxoM

good to know tuple_element_t. I will change it to yours.

MarcelKoch · 2025-04-15T14:26:21Z

include/ginkgo/core/base/precision_dispatch.hpp

+    using NextNextDense = matrix::Dense<next_precision_move<ValueType, 2>>;
+    using NextNextNextDense = matrix::Dense<next_precision_move<ValueType, 3>>;


nit:

Suggested change

using NextNextDense = matrix::Dense<next_precision_move<ValueType, 2>>;

using NextNextNextDense = matrix::Dense<next_precision_move<ValueType, 3>>;

using Next2Dense = matrix::Dense<next_precision_move<ValueType, 2>>;

using Next3Dense = matrix::Dense<next_precision_move<ValueType, 3>>;

MarcelKoch · 2025-04-16T12:15:20Z

include/ginkgo/core/base/batch_multi_vector.hpp

 #if GINKGO_ENABLE_HALF || GINKGO_ENABLE_BFLOAT16
-      public ConvertibleTo<
-          MultiVector<next_precision<next_precision<ValueType>>>>,
+      public ConvertibleTo<MultiVector<next_precision_move<ValueType, 2>>>,
+#endif
+#if GINKGO_ENABLE_HALF && GINKGO_ENABLE_BFLOAT16
+      public ConvertibleTo<MultiVector<next_precision_move<ValueType, 3>>>,
 #endif


Should we perhaps always have these ConvertibleTo base classes? Right now, we always provide the gko::half and gko::bfloat16 classes, so the ConvertibleTo variants are valid. It only requires making the gko::array fp16 conversion functions available, which is a very manageable cost compared to the full fp16 build.
From the user perspective, using (any) fp16 only breaks when they try to instantiate one of our classes. So they will not be able to use the conversion function with fp16 anyway.
I should note that the error will be a linker error, which is most likely very painful for the users to decipher.

ConvertibleTo that class needs to instantiate the class then the member function will require the function all available.
We do not go the same way to deal with the dpcpp single (throw not_implment) because the current macro design to change that with the three conditions (single/half/bfloat) will be a disaster. also, I guess it will exceed the symbol limit in windows

I might have not communicated exactly what I meant, so I will create a PR to illustrate. In short, I think the #if guards for the ConvertibleTo are unnecessary.

MarcelKoch

Looks mostly good. My comment on the ConvertibleTo stuff can be ignored for this PR.

MarcelKoch · 2025-04-17T09:20:32Z

core/preconditioner/jacobi.cpp

+            temporary_conversion<matrix::Diagonal<ValueType>>::template create<
+                matrix::Diagonal<previous_precision<ValueType>>,
+                matrix::Diagonal<previous_precision_move<ValueType, 2>>,
+                matrix::Diagonal<previous_precision_move<ValueType, 3>>>(


I think this needs to be guarded.

It only gives some redundant check with the fixed *_move template.
The code using the redundant part means nothing matched here, so it will throw the error in the end.

MarcelKoch · 2025-04-17T09:43:57Z

include/ginkgo/core/base/math.hpp

+ * Move U until next_precision_move_impl<T, move> == U
+ */
+template <typename T, unsigned move,
+          typename U = typename next_precision_impl<T>::type>


shouldn't it be

Suggested change

typename U = typename next_precision_impl<T>::type>

typename U = typename next_precision_impl<T, move>::type>

no, it is just to start the search from the next one.
The stop criterion of recursion is when the searching U is the same type as T.
It might jump to much for example <half, 2> in <half, float, double>, the search will only check double, half (stop) without considering float.

Co-authored-by: Marcel Koch <[email protected]>

yhmtsai added this to the Ginkgo 1.10.0 milestone Apr 15, 2025

yhmtsai requested a review from a team April 15, 2025 11:17

yhmtsai self-assigned this Apr 15, 2025

yhmtsai mentioned this pull request Apr 16, 2025

Add Bfloat16: alternative 16 bit floating point precision to half #1825

Open

yhmtsai force-pushed the enable_half_bfloat16 branch from 2bfb6ac to 307e9e4 Compare April 16, 2025 11:51

MarcelKoch reviewed Apr 16, 2025

View reviewed changes

MarcelKoch requested changes Apr 17, 2025

View reviewed changes

yhmtsai force-pushed the enable_half_bfloat16 branch from 307e9e4 to 9318cec Compare April 22, 2025 09:21

yhmtsai force-pushed the add_bfloat16 branch from 4d1bfdf to 7846b40 Compare April 22, 2025 09:21

yhmtsai and others added 9 commits April 22, 2025 17:53

add the precision_move and impl based on the list

21fad83

use move precision not repeated call

09ae5bb

add convert_to/move_to for third precision

ee4a071

fix/add to allow half and bfloat16 at the same time

49601ad

add instantiation and testing type and allow half/bfloat16 from CMake

ead1913

split the dispatch for half and bfloat16

5e6a657

disable the test when bfloat16 can not pass

4c973d9

SKIP the test for bfloat16 when needing quite relaxed condition

66ad642

improve the precision list type traits

4f0036b

Co-authored-by: Marcel Koch <[email protected]>

yhmtsai force-pushed the enable_half_bfloat16 branch from 9318cec to 4f0036b Compare April 22, 2025 15:54

yhmtsai force-pushed the add_bfloat16 branch from 7846b40 to 9d0f785 Compare April 22, 2025 15:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to enable half and bfloat16 at the same time #1827

Allow to enable half and bfloat16 at the same time #1827

yhmtsai commented Apr 15, 2025

MarcelKoch left a comment

MarcelKoch Apr 15, 2025

yhmtsai Apr 17, 2025 •

edited

Loading

MarcelKoch Apr 15, 2025

MarcelKoch Apr 16, 2025

yhmtsai Apr 17, 2025

MarcelKoch Apr 17, 2025

MarcelKoch left a comment

MarcelKoch Apr 17, 2025

yhmtsai Apr 17, 2025

MarcelKoch Apr 17, 2025

yhmtsai Apr 17, 2025

		using NextNextDense = matrix::Dense<next_precision_move<ValueType, 2>>;
		using NextNextNextDense = matrix::Dense<next_precision_move<ValueType, 3>>;

	typename U = typename next_precision_impl<T>::type>
	typename U = typename next_precision_impl<T, move>::type>

Allow to enable half and bfloat16 at the same time #1827

Are you sure you want to change the base?

Allow to enable half and bfloat16 at the same time #1827

Conversation

yhmtsai commented Apr 15, 2025

MarcelKoch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yhmtsai Apr 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcelKoch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yhmtsai Apr 17, 2025 •

edited

Loading