Thoughts

the problem with object and scratch overlap w/ spheroids

a tiny correction: day 0 was actually yesterday and day 1 is today, since the seeding was delayed by a day.

i was pretty scared yesterday seeing how many overlapping satellite clusters there are against the primary, causing the segmentation pipeline to either populating bulged/budding or dented masks.

that worry is almost nonexistent today since 2 prompts (majorly 1 prompt) with slight adjustments with instance level thresholding/clipping and CCL resulted in the successful segmentation of 98.9% of segmentation (95 out of 96 essentially). the persistent line that looks like a scratch on well C12 for today (i say for today because tomorrow the spheroids in other wells might reposition themselves slightly and they might become the problematic ones) has caused the overlapped region between the spheroid and the "slash" of the scratch to be excluded, resulting in a visible trace of zeros amongst the 1 pixels.

the best solution to counter this, for now, is to first ensure the prompt and the thresholding is allowing it to have the cutout vs. including the whole line, for the further and optional post-processing steps wih "covex hull" and "hole filling" can essentially eliminate the entire issue.

the problem, however, is if this problem persist at a larger, scaled-to-industry level, where manual overview and approval of binary masks is no longer an option. now ik im thinking ahead of myself again, but it's worth noting potential solutions to implement for best practices in the far future.

to counter this, there can be several things to do:

  • very near term and very dumb (keep convex hull on) when at the very early stage of scaling the amount of masks for the more long term fine-tuning strategy.

  • [IMPORTANT: THIS IS ASSUMING THAT BINARY SEGMENTATION IS ALL IT TAKES IN NEAR AND LONG RUN!] detect and inpaint the line before segmentation; this effectively removes any scratches on the imaging regions for these plastic plates. im trying this out as im typing this and seeing the result.

  • my intuition tells me, however, that fine-tuning is the only way to go if we are doing this scaled and long term. with enough masks and raw images as fine-tuning materials, they can be fed systematically into a general model or, perhaps without any model beforehand just a standard U-Net template architecture to be trained upon, that either a fine-tuned or freshly trained and highly specified form of segmentation model would be performing far better than what im currently using, which is already very very good. the mlx-sam3 model is a less capable version of the official model released by Meta since i wanted it to be compatible with and to be used locally on my black truffle. a fine-tuned or freshly built image segmentation model for either binary segmentation or segmentation that preserves the original pixels for further analysis than automatically includes the overlapped region but ignores the object-of-no-interest outside of the overlapped region with the primary object of interest would be prime.

i believe automation is possible. it might be more complicated than what im imagining, both on a technical standpoingt (the specific algorithms or mathematical calculations to help the pipeline conduct rule-based decisions) and in terms of quantity of necessary data (im talking about the optimal tuning procedure to solve a more universal problem, as well as the raw amt of data needed to fine tune or custom build a segmentation model for the specific purpose of automated segmentation).

the resources in regards to quantity is not an issue. what probably needs works is the thinking behind the first challenge, which (the technical thinking) is on a knowledge-based problem. to solve that, with either a very good data scientist or ML engineer, or if LLMs are better in the future, the LLM.

update: option 2 (the inpainting method) is very very bad and artificial, looks like some AI image edit in the dalle 2 era.

other solutions are definitely possible. but fine tuning is the only approach i currently know. fine tuning vs. freshly build a custom model is the same thing. the universal process MUST involve the inclusion of very bad image samples with cut-throughs (or slashes) caused by curved or straight scratches on the hardware.

this will be a problem raised much later, and after the potential genereation of my personal, technical leverage, and definitely after acceptance into a large accelerator. no worries for now. it's all doable. for scaled, a dedicated (either fine-tuned or custom-built) segmentation model for the custom device would be absolutely necessary, and perhaps on top of that, some other techniques employed/suggested by competent engineers or LLM.

working on the feature extraction to populate ENR-legible (tabular/matrix) data scheme later today. time for a relaxing jog in the gym.