Skip to content

Allow to enable half and bfloat16 at the same time #1827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: add_bfloat16
Choose a base branch
from

Conversation

yhmtsai
Copy link
Member

@yhmtsai yhmtsai commented Apr 15, 2025

This PR allow to enable bfloat16 and half precision at the same time.
Doing the operation between bfloat16 and half precision returns float type.
It will revert the gko::float16 alias changes in #1825 . float16 will be half precision again.
Moreover, it adds the next_precision_move<type, move> to represent the next_precision<next_precision<.... same for previous_precision_move

likely in another PR:

  • convert_to are just copy the same implementation for different interface. create a templated convert_to in private: and convert_to always call the templated function
  • instantiation macro, which we might apply the same technique as what I did in batch to use nested structure
  • allow run<> to accepts type_list and create a list for that

@yhmtsai yhmtsai added this to the Ginkgo 1.10.0 milestone Apr 15, 2025
@yhmtsai yhmtsai requested a review from a team April 15, 2025 11:17
@yhmtsai yhmtsai self-assigned this Apr 15, 2025
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. type:solver This is related to the solvers type:preconditioner This is related to the preconditioners type:matrix-format This is related to the Matrix formats type:reordering This is related to the matrix(LinOp) reordering mod:all This touches all Ginkgo modules. labels Apr 15, 2025
@yhmtsai yhmtsai force-pushed the enable_half_bfloat16 branch from 2bfb6ac to 307e9e4 Compare April 16, 2025 11:51
Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some first notes, I still have to finish the rest of the PR. However, I think these are already the most relevant comments.


template <typename T>
struct next_precision_impl {};
struct next_precision_impl {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe combine this with the move version as

template<typename T, size_t step = 1>
struct next_precision_impl{...};

Then we don't need to use both next_precision and next_precision_move. As an example see: https://godbolt.org/z/dPK7ccxoM

Copy link
Member Author

@yhmtsai yhmtsai Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to know tuple_element_t. I will change it to yours.

Comment on lines 52 to 53
using NextNextDense = matrix::Dense<next_precision_move<ValueType, 2>>;
using NextNextNextDense = matrix::Dense<next_precision_move<ValueType, 3>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
using NextNextDense = matrix::Dense<next_precision_move<ValueType, 2>>;
using NextNextNextDense = matrix::Dense<next_precision_move<ValueType, 3>>;
using Next2Dense = matrix::Dense<next_precision_move<ValueType, 2>>;
using Next3Dense = matrix::Dense<next_precision_move<ValueType, 3>>;

Comment on lines 55 to 60
#if GINKGO_ENABLE_HALF || GINKGO_ENABLE_BFLOAT16
public ConvertibleTo<
MultiVector<next_precision<next_precision<ValueType>>>>,
public ConvertibleTo<MultiVector<next_precision_move<ValueType, 2>>>,
#endif
#if GINKGO_ENABLE_HALF && GINKGO_ENABLE_BFLOAT16
public ConvertibleTo<MultiVector<next_precision_move<ValueType, 3>>>,
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perhaps always have these ConvertibleTo base classes? Right now, we always provide the gko::half and gko::bfloat16 classes, so the ConvertibleTo variants are valid. It only requires making the gko::array fp16 conversion functions available, which is a very manageable cost compared to the full fp16 build.
From the user perspective, using (any) fp16 only breaks when they try to instantiate one of our classes. So they will not be able to use the conversion function with fp16 anyway.
I should note that the error will be a linker error, which is most likely very painful for the users to decipher.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConvertibleTo that class needs to instantiate the class then the member function will require the function all available.
We do not go the same way to deal with the dpcpp single (throw not_implment) because the current macro design to change that with the three conditions (single/half/bfloat) will be a disaster. also, I guess it will exceed the symbol limit in windows

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have not communicated exactly what I meant, so I will create a PR to illustrate. In short, I think the #if guards for the ConvertibleTo are unnecessary.

Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good. My comment on the ConvertibleTo stuff can be ignored for this PR.

temporary_conversion<matrix::Diagonal<ValueType>>::template create<
matrix::Diagonal<previous_precision<ValueType>>,
matrix::Diagonal<previous_precision_move<ValueType, 2>>,
matrix::Diagonal<previous_precision_move<ValueType, 3>>>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be guarded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only gives some redundant check with the fixed *_move template.
The code using the redundant part means nothing matched here, so it will throw the error in the end.

* Move U until next_precision_move_impl<T, move> == U
*/
template <typename T, unsigned move,
typename U = typename next_precision_impl<T>::type>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be

Suggested change
typename U = typename next_precision_impl<T>::type>
typename U = typename next_precision_impl<T, move>::type>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it is just to start the search from the next one.
The stop criterion of recursion is when the searching U is the same type as T.
It might jump to much for example <half, 2> in <half, float, double>, the search will only check double, half (stop) without considering float.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mod:all This touches all Ginkgo modules. reg:build This is related to the build system. reg:testing This is related to testing. type:matrix-format This is related to the Matrix formats type:preconditioner This is related to the preconditioners type:reordering This is related to the matrix(LinOp) reordering type:solver This is related to the solvers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants