Skip to content

Define project scope #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gselzer opened this issue Dec 4, 2024 · 20 comments
Closed

Define project scope #60

gselzer opened this issue Dec 4, 2024 · 20 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@gselzer
Copy link
Collaborator

gselzer commented Dec 4, 2024

As we've been working on the v2-mvc branch, I've been searching for guide rails on what this project should and should not be doing. I think it would be wise to add these things to the README so both devs and users are well aware of the strengths and limitations.

I'm hoping this issue serves as a place of discussion for some of these items, and that a resolution to this issue would see updates to the README.

Looking for opinions & edits, especially from @tlambert03, @marktsuchida, @fdrgsp. It's likely you all have more experience here than I. If people like the general direction of this, I'm happy to create a PR where we can collaboratively hammer this out.

What pyapp-kit/ndv (should) do

View data. This includes:

  • Multi-dimensional arrays. This dataset need not be final - ndv should efficiently handle "updating" datasets, which could imply overwritten values in existing indices or the addition of new indices.
  • Dataset statistics. Values computed from (regions of) a dataset (e.g. min, max, histogram, average, ...) are themselves "views" of the data.
  • Metadata. This feels dangerous, but e.g. Enable scale bar #46 pertains to metadata and I think it belongs within our scope. The existing README text also talks about "taking advantage of named dimensions and categorical coordinate values", which also falls here. The question is, do we draw a line between supported and unsupported metadata?

What pyapp-kit/ndv should/does not do

Anything else. The following tasks would resemble scope creep, detracting from our goal above:

  • I/O. Other libraries do this better and are easily integrated outside of the viewer.
  • Plugins, Data processing, annotations. Intensive routines would detract from loading/rendering datasets.

I'd like opinions here, as a lot of my reasoning is "gut feeling" - I'd like these "should nots" to be solidified with valid reasoning in the eventual README edits. I'd also like answers to the following questions to be reflected in our final version:

  • Where do we draw the line between statistics and data processing? My personal needs for this project require a histogram so I feel like it belongs here, but why should that be allowed when e.g. a threshold would not be?
  • Where do we draw the line between offering ROIs and e.g. performing segmentation?

Similar projects

Inevitably users will want some of the things that ndv will not provide. We should direct users towards the projects that focus on those things, as likely do them better than ndv could (if we wanted them to):

  • napari provides plugins, and I/O but does not offer the flexible data model.
  • pyimagej similar.
  • fastplotlib focuses on plotting over pure visualization, lesser focus on n-d, tied to pygfx.
@gselzer gselzer added the documentation Improvements or additions to documentation label Dec 4, 2024
@gselzer gselzer self-assigned this Dec 4, 2024
@tlambert03
Copy link
Member

Thanks for opening this @gselzer, I think it's a great idea get this written down somewhere!

I'll give some thoughts on how I've been thinking about the project. some (but not all) of this is in the readme, though often not explicitly. In many ways, ndv is a more narrowly scoped spiritual successor to microvis, and I put many motivating thoughts on that in this document. many of them apply here. That project got stuck (largely on what seems to be a trivial topic) and this one quickly became more useful if for no other reason that limiting the initial scope to just the problem of looking at arrays quickly and conveniently. So that was the top priority.

  • ndv should make it possible to view any multi-dimensional array (including arrays not immediately loadable in napari, like pyopencl, cupy) with sliders to control all dimensions beyond the 2/3 being viewed. Very little should be assumed about what the data can provide, but those that provide more (x-array) should be taken advantage of. Much of this contract is encoded in the DataWrapper abstract class (which is becoming more well defined in v2-mvc)
  • ndv should have minimal dependencies and open quickly. honestly, this is probably the main reason I don't just use napari. By minimal i mean <5 or so. We need a gui-frontend (e.g. Qt/jupyter) a canvas backend (e.g. vispy/pygfx), and use a couple small event/gui-related components in pyapp-kit (like superqt, cmap, and psygnal). This is one thing I'll definitely be relatively aggressive about. I am in favor of supporting lots of things: so i want to work with both qt and jupyter, pygfx and vispy, and all possible array libraries... but not depend on any of them. pydantic will likely join the mix in v2-mvc. But beyond that, new dependencies will need to meet a high bar.
  • metadata is important to me. A large frustration for me with napari was that, in the attempt to be domain agnostic, even years into the project we had no suggestion for how someone could label a dimension as "time", or provide voxel scales in metadata (i.e. outside of manually entering them). This was all relegated to plugins... which are not standardized well enough and just leaves general confusion. I haven't written this down anywhere, but I think we should support the same dims/coords model that xarray supports. Ultimately, this is the job of DataWrapper, but I do think that we should generally try to "just do the right thing" for known metadata standards; and I'm also happy to take PRs to that effect. I can elaborate on this if need be. To your specific question, I would say I couldn't tell you exactly where a line gets drawn for "unsupported" metadata. So that one is open for discussion.
  • I/O: this one I actually don't think is so easily dismissed as out of scope. Absolutely, we shouldn't have any "novel" code or logic in ndv that knows how to open or deal with any specific format; however, I am not opposed to adding an extra (pip install ndv[io]) that generally makes it possible to point to a file path and load it. (see Feature: command line interface #4 and feat: add io and cli #21). None of those dependencies should be a part of ndv, and none of those dependencies should import until needed. But I'm not opposed to in-repo support for loading many data types into ndv. This comes from seeing the proliferation of so many reader plugins in napari, many of which kinda/sorta had overlapping scopes, and few of which knew exactly how to get the most out of napari. So I'm not opposed to having an optional, opinionated way to get a ton of data "nicely" loaded into ndv; provided that doesn't come with the default installation.
  • Plugins, Data processing, annotations. Yeah: no plugins or data processing. Annotations are slightly tricky. I do think that "mouse clicks on a canvas" need to be supported and hookable in an abstract way. I don't think ndv itself should tackle painting into an array like napari does, for example; but it should have a general mechanism for executing some function on a mouse click/drag, and some downstream library could provide a bunch of helpers to translate those actions into meaningful actions on a dataset.
  • stats: This one is still murky for me. Basically, we do need a histogram, we know that the logic for calculating a histogram in a performant way needs careful consideration, and we also know that those statistics partially overlap things we'll need to calculate for things like contrast limits... But I don't see any of that as a public API. In other words, we neither provide data processing nor statistics beyond what can be considered an implementation detail for our visualization needs... for the time being at least. If we eventually expose a couple of those things (like: grab the histogram data so you don't have to recalculate it) that's fine. but no one should be using ndv as a part of their headless processing pipeline.
  • rois and segmentation: Here I think one needs to differentiate between the generation of an roi/segmentation, and the display of one. We need ROIs (including creation/generation of an ROI) as a way for a user to interact with the viewer, so as to indicate a part of the dataset that they are interested in. But we'll never generate a segmentation, though we may someday support something like a Labels layer, which is a rather minor extension of an image array, which can display the results of a segmentation that someone did elsewhere.
  • similarly, I'm not opposed to the general concepts of additional layers; most importantly points and lines. But exactly how/when they would be integrated is unclear, and not a top priority.

@jacopoabramo
Copy link
Contributor

Hey, I wanted to interject in this issue to provid my point of view in what I would like ndv to be as a possible user.

I am the mantainer of ImSwitch and I'm currently working on another application as well. I'll try to cover the points I'm interested in, although @tlambert03 already covered almost everything I already had in mind pretty well so this is just a reinforcement of some of his statements.

  1. Image layer control: ImSwitch creates and locks a set of image layers which are used as live view layers for each detector that it's currently in the configuration. With napari I can then dynamically either superimpose them or put them side by side in order to have a view of what's going on in different channels. For my new application I would like to have something similar; I suspect that in fact this is already possible, I just probably need some hints on how to do this. I would love to have the currently visualized layers on the side just like napari does, and the possibility to turn them visible or not. Same goes for the color mapping, which I believe it's already covered. Furthermore, ImSwitch has a snap button that create a new, non-locked layer which can be stored. I just would like to have a snap created on a separate section of the viewer, the storage stuff I can take care of. I just need to be able to select the currently active layers and have the possibility to retrieve the data of those.
  2. ROI: in the way that ImSwitch currently works, it has a widget that is able to select the ROI by adding a VisPy layer that basically gets the interested corners of what the new ROI should be, it emits this information to the selected detector and the detector performs the crop operation. Right now in ImSwitch this is a bit funky because there are no boundaries set for the layer I'm selecting (I could end up in an empty space and for napari this would be acceptable, but when I click "set roi" it causes an exception because, of course, I'm sending wrong coordinate information to the detector that does the crop). If possible I would like a system where each channel gets its own ROI layer, where the boundaries are limited by the maximum detector shape; so if for example I go outside the boundaries of the currently displayed layer (weather it's been already cropped or not), the ROI widget will just not move and stay within the boundaries. Alternatively, I can move this work on my side and create a VisPy layer that gets the job done but I believe that having this function built-in in ndv could be pretty useful outside my use as well.
  3. metadata: in general my new application will be agnostic towards it; it'll consider metadata as a JSON dictionary; I just need a mechanism to display it on the lateral side. I believe that I can wrap ndv within a widget and add my own table view, but from what Talley said you're already concocting something so it's all good.
  4. stats (histogram, line profile): ImSwitch currently has a couple of widgets (that I honestly rarely used) for providing support for line profiling but I don't think for histograms. I believe that line profiling would be the job of a custom VisPy layer that just creates a line on top of a specific detector image layer and while there it continously emits the content of said line which is displayed on a different widget in my main application. So again, I believe that this is something that I can do it on my end, I just need ndv to support this kind of thing (if it doesn't already). For the histogram, I believe that I can redirect the data which is going to be displayed on ndv to a separate lateral widget which I can build on my own, probably using pyqtgraph.
  5. additional layers: this is something I would also like, especially since the user would probably like to generate those layers (either as one-time or continously) on top of the image layers.

The take home message from my side is: napari is great for me because of the concept of layers, which are basically wrapped around VisPy it'self. Rather than having support built-in, an approach could be to provide APIs similar to the DataWrapper to do so. Or... DataWrapper itself could be a way to add support for different types of layers. But the way to control these layers in order to be able to chose what I want to see or not is key.

In the end I don't want to give up on napari because there are tons of plugins out there which in some capacity I would like to be able to support (either natively - which I believe will be hard - or by giving instructions to make plugins compatible), but if someone just wats image viewing capabilities without napari, I would love to support ndv as an alternative choice, provided that the points above can be met somehow.

@jacopoabramo
Copy link
Contributor

jacopoabramo commented Dec 4, 2024

EDIT: I just realized that the possibility to enable/disable a channel is already there hidden within the LUT flag... I feel so dumb. Can this be set as default behavior to show the currently active channels? And maybe set a display name?

@tlambert03
Copy link
Member

This is super useful feedback. Thanks so much for taking the time @jacopoabramo

@gselzer
Copy link
Collaborator Author

gselzer commented Dec 4, 2024

Thanks so much @tlambert03 for a thorough response! I largely agree with everything you wrote, and I'll add minimal dependencies/quick startup as a goal.

  • I/O: this one I actually don't think is so easily dismissed as out of scope. Absolutely, we shouldn't have any "novel" code or logic in ndv that knows how to open or deal with any specific format; however, I am not opposed to adding an extra (pip install ndv[io]) that generally makes it possible to point to a file path and load it.

Hmm...what would be the benefit of this over e.g. a bioio.BioImage DataWrapper implementation?

@tlambert03
Copy link
Member

Hmm...what would be the benefit of this over e.g. a bioio.BioImage DataWrapper implementation?

my comment wasn't so much about how it would be implemented, just more about the fact that I don't think it's out of scope to provide general I/O utilities as an opt-in feature (rather than forcing all of that to be third party plugin behavior as napari does)

An implementation that uses a DataWrapper would absolutely be one possibility. (Basically, it would introduce a new DataWrapper that knows how to parse any string input DataWrapper.create('some_file.any_ext') ... it would then dispatch to one of a million lower level reading libraries, all of which should be missing unless the [io] extra was installed. But that's all details for a later discussion

@gselzer
Copy link
Collaborator Author

gselzer commented Dec 4, 2024

Additionally, thank you @jacopoabramo!

  1. Image layer control:

I think that the superposition with channels is already present, yeah? So your main (unfulfilled) desire is to view datasets adjacently?

Viewing multiple datasets is also something I tried out on this branch, but it's really tricky. My hope is that our work on v2-mvc makes it a lot easier.

2: ROI:

I think that what you're suggesting all falls within the bounds of what I/we want as well (although the ROI bounding issue you describe might be complicated if we introduce some form of tiling...but that doesn't need to be discussed here). The point I'm trying to differentiate is ROI Tooling vs. Segmentation/Painting. The former is tools that help us view a specific region, while the latter is helping us characterize/describe a specific region. I'm still thinking about the difference/description here...

4: stats:

I'd like to note that our plan here is probably for separate widgets for the histogram/line profile - they wouldn't be in the shared canvas. For your purposes, if you don't want the histogram/line profile then you (can/should be able to) leave them unconnected/unused.

5: layers:

I think my main "do not" here is that I don't think ndv should enable the internal creation of new datasets. If you have data (e.g. labels) that you just want to view, that makes total sense for ndv. You said " Or... DataWrapper itself could be a way to add support for different types of layers." and I think that is what I'd want to see, so long as it's possible.

@jacopoabramo
Copy link
Contributor

I think that the superposition with channels is already present, yeah?

Yeah I added a "whoopsie" comment immediatly after my original answer because I found out about that like... 5 minutes afterwards and it was kind of embarassing.

So your main (unfulfilled) desire is to view datasets adjacently?

Yes precisely.

The point I'm trying to differentiate is ROI Tooling vs. Segmentation/Painting.

Wouldn't that fall under adding new layers then?

I'd like to note that our plan here is probably for separate widgets for the histogram/line profile - they wouldn't be in the shared canvas. For your purposes, if you don't want the histogram/line profile then you (can/should be able to) leave them unconnected/unused.

That seems reasonable. Although it would be useful to have - for at least line profiling - the possibility to draw in the canvas a line which acts as a selecter and what it's selected will be reflectedi

@gselzer
Copy link
Collaborator Author

gselzer commented Dec 4, 2024

The point I'm trying to differentiate is ROI Tooling vs. Segmentation/Painting.

Wouldn't that fall under adding new layers then?

Yup! I don't think that there's an issue for showing multiple datasets at once, but I agree that it should be possible, if not a priority.

That seems reasonable. Although it would be useful to have - for at least line profiling - the possibility to draw in the canvas a line which acts as a selecter and what it's selected will be reflectedi

I agree. One thing I'd like to do is turn the rectangular selection tool that already exists into a multi-shape tool, that could have support for (poly)lines - ideally it'd be evented too. I have a PR (#54) for integration into the v2-mvc branch, however it still needs lots of work, and anyways these concepts are beyond my goals with this issue.

The words I'm starting to settle on here are "inspective" vs. "generative". ndv should be the former, and not the latter, in my opinion. With respect to ROIs, our tooling should be "inspective" - for ROIs, this means the intent to better understand a specific region of the data. A "generative" ROI would be the result of a segmentation, suggesting the contained region shares some characteristic, and is what I don't think ndv should be involved in.

@tlambert03
Copy link
Member

A "generative" ROI would be the result of a segmentation, suggesting the contained region shares some characteristic, and is what I don't think ndv should be involved in.

I'm not sure I'm following. For me, an ROI is an ROI; it's a region of interest: whether it be a vector-based ROI, or some sort of pixel-based mask. I think ndv should be able to (eventually) show pretty much any way to select a subregion of space, regardless of whether it is the result of some segmentation (we just don't provide built-in segmentation algorithms). In other words, I don't imagine making a distinction between the kinds of ROIs that are in scope and out of scope. are we saying the same thing?

@gselzer
Copy link
Collaborator Author

gselzer commented Dec 4, 2024

I've started a branch with a README updated to reflect these discussions. I'm expecting heavy updates to get this to a point where we're all happy with it (we can also file a PR when it's somewhat close to that point).

However it has already been useful to me in thinking about what the API for the v2 rewrite should look like. Thanks again for your inputs @tlambert03 @jacopoabramo!

@gselzer
Copy link
Collaborator Author

gselzer commented Dec 4, 2024

I'm not sure I'm following. For me, an ROI is an ROI; it's a region of interest: whether it be a vector-based ROI, or some sort of pixel-based mask. I think ndv should be able to (eventually) show pretty much any way to select a subregion of space, regardless of whether it is the result of some segmentation (we just don't provide built-in segmentation algorithms). In other words, I don't imagine making a distinction between the kinds of ROIs that are in scope and out of scope. are we saying the same thing?

I think we are saying the same thing? On one hand, ndv should be able to display any data given to it, whether that's an "image", "label"s, etc. But on the other hand, there's the tooling that we provide for interaction with that data, and the potential for generating new data (that could then be used elsewhere).

"Inspective" vs. "Generative" is my proposition for how we delineate between "what we want" and "what we don't want", given what you suggested here:

Plugins, Data processing, annotations. Yeah: no plugins or data processing. Annotations are slightly tricky. I do think that "mouse clicks on a canvas" need to be supported and hookable in an abstract way. I don't think ndv itself should tackle painting into an array like napari does, for example; but it should have a general mechanism for executing some function on a mouse click/drag, and some downstream library could provide a bunch of helpers to translate those actions into meaningful actions on a dataset.
stats: This one is still murky for me. Basically, we do need a histogram, we know that the logic for calculating a histogram in a performant way needs careful consideration, and we also know that those statistics partially overlap things we'll need to calculate for things like contrast limits... But I don't see any of that as a public API. In other words, we neither provide data processing nor statistics beyond what can be considered an implementation detail for our visualization needs... for the time being at least. If we eventually expose a couple of those things (like: grab the histogram data so you don't have to recalculate it) that's fine. but no one should be using ndv as a part of their headless processing pipeline.

I'm suggesting that any tooling we provide should be oriented towards "inspection" of existing data (which is what histograms, line profiles, statistics, and our ROI tooling would do) and not "generation" of new data. I don't think the latter was ever intended, but framing it this way helps me better understand what e.g. the ROIModel that I started in #54 needs to have for API.

Does that make sense?

@tlambert03
Copy link
Member

tlambert03 commented Dec 4, 2024

yeah I think it makes sense... I think one main thing I'm thinking at this point is that this is a very early project. I do think it's helpful for us all to discuss these things, and definitely useful for active developers to have these sorts of conversations, but I'm not sure I'm ready (or that we need) to define strong language about what we will and won't do. I think inspective/generative also falls under the general category of "we're a viewer" (which is pretty generally understood concept). I'd hesitate to lay down much stronger terminology than that at this point, and take it as we go.

@gselzer
Copy link
Collaborator Author

gselzer commented Dec 4, 2024

I think inspective/generative also falls under the general category of "we're a viewer" (which is pretty generally understood concept).

That's true, but napari is also marketed as "we're a viewer", so I want to be clear about the differences. For example, it has many generative aspects that we want to avoid.

I'd hesitate to lay down much stronger terminology than that at this point, and take it as we go.

Fair enough - like I've mentioned, this discussion has already been very helpful for me!

@tlambert03
Copy link
Member

tlambert03 commented Dec 4, 2024

For example, it has many generative aspects that we want to avoid.

what specifically are you thinking about here?

the only one I can really think of is painting onto a labels layer, or creating some points or shapes... and these are very close to what we're doing already with ROIs. so I'm not sure it's that useful of a distinction. For me, the primary difference with napari is that we demand to keep loading fast and dependencies minimal. I'm not ready to say "we'll never let you draw on an image" ... simply because that specific action really isn't that "heavy" of a part of what napari does (it just boils down to mapping a point on the canvas to a point in the data source, which we already have and need), and it can be very useful

@gselzer
Copy link
Collaborator Author

gselzer commented Dec 4, 2024

For example, it has many generative aspects that we want to avoid.

what specifically are you thinking about here?

The main one that I was thinking of is plugins, actually - the entirety of my napari experience is generating new datasets from existing napari layers 😅

For me, the primary difference with napari is that we demand to keep loading fast and dependencies minimal.

This is a great point in our favor, but I think the biggest draw to me is instead the data model. Thinking about datasets as growing and changing can help us do inspective things like visualization and statistics quickly.

I think at this point I'm pretty satisfied as the conversation has helped me realize how to progress my PRs. Let's maybe just leave the README edits as a branch until it seems like we need them?

@tlambert03
Copy link
Member

The main one that I was thinking of is plugins, actually - the entirety of my napari experience is generating new datasets from existing napari layers 😅

ah... yeah I don't count plugins as a part of napari :) they're all in different repos, by different authors, etc...

@tlambert03
Copy link
Member

but I think the biggest draw to me is instead the data model.

napari also has a data model, which is also serializable and restorable. (it's how things like napari-animation work). you might just be more familiar with the ndv data model :) but both the viewer and layers have always been "model based"... the viewermodel itself is a pydantic model, etc. So there too, we're not really that different

@gselzer
Copy link
Collaborator Author

gselzer commented Jan 16, 2025

I think my concerns here have been resolved by the documentation added in #93

@gselzer gselzer closed this as completed Jan 16, 2025
@tlambert03
Copy link
Member

tlambert03 commented Jan 16, 2025

That was a rough sketch. Just something to get up there... I know you had additional thoughts on the structure of this, so I definitely welcome additions and structural changes on that page! (there were multiple things discussed here that didn't make it on to that page)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants