by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Mukunth-005
    @Mukunth-005
    Hello Dicom-Stream team,
    Thank you for creating such a wonderful library. I connected with Karl, he helped me a lot in setting up the dependencies. I'm using scala -> Dicom-stream to parse the metadata (Extract metadata from the .dcm file) and it is going to be in-memory processing and single batch load. Can you tell me which functions to use and I want to convert the metadata in (Key:Value) format (BSON/JSON could be better) and store it in MongoDB . kIndly help me with this question, I will use your tool extensively in the production environment of my company
    Mukunth-005
    @Mukunth-005
    To add more details to my questions -> I have a file directory that consists of many studies conducted on patients and inside each study, I have many .dcm images (DICOM medical imaging data). I need to extract all the metadata and create a metadata table. kindly suggest me suitable functions that I can use.
    Karl Sjöstrand
    @karl-exini

    Here is a starting point. This script will parse a file and adds some common processing steps to get rid of information that does not have a meaningful string representation, guard against very large attributes (which may kill your server), transcode the data to utf8, transcode the data to indeterminate length sequences and items (good for processing nested structures). It will then aggregate DicomParts (which can be any size and may contain partial data) to Elements (complete attributes) and print these as strings.

    package com.exini.dicom.streams.example
    
    import java.nio.file.Paths
    
    import akka.NotUsed
    import akka.actor.ActorSystem
    import akka.stream.Materializer
    import akka.stream.scaladsl.{FileIO, Flow, Sink}
    import com.exini.dicom.data.DicomParts.HeaderPart
    import com.exini.dicom.data.Elements.ValueElement
    import com.exini.dicom.data.{DicomParts, _}
    import com.exini.dicom.streams.DicomFlows._
    import com.exini.dicom.streams.ElementFlows.elementFlow
    import com.exini.dicom.streams.ParseFlow.parseFlow
    
    import scala.concurrent.duration.DurationInt
    import scala.concurrent.{Await, ExecutionContext, Future}
    import scala.io.StdIn
    
    object DicomToJson extends App {
    
      implicit val system: ActorSystem = ActorSystem("example-system")
      implicit val materializer: Materializer = Materializer(system)
      implicit val ec: ExecutionContext = system.dispatcher
    
      val keepVRs = Set(VR.AT, VR.AS, VR.CS, VR.DS, VR.FL, VR.FD, VR.IS, VR.SL, VR.SQ, VR.SV, VR.SS, VR.UI, VR.UL, VR.UV, VR.US)
    
      val isFMI = (header: HeaderPart) => (header.tag & 0xFFFF0000) == 0x00020000
      val notTooLarge = (header: HeaderPart) => header.length < 1024 * 100 // 100 Kb
      val interestingVR = (header: HeaderPart) => keepVRs.contains(header.vr)
      val utf8Charsets = new CharacterSets(Seq("ISO_IR 192"))
    
      val flow: Flow[DicomParts.DicomPart, (String, String), NotUsed] =
        groupLengthDiscardFilter // get rid of deprecated group length tags
          .via(toIndeterminateLengthSequences) // transcode if necessary to indeterminate length sequences and items
          .via(toUtf8Flow) // transcode to utf8 (don't have to worry about character sets no more)
          .via(headerFilter(header => isFMI(header) || notTooLarge(header) && interestingVR(header))) // discard binary stuff that we don't want to search on
          .via(elementFlow)
          .mapConcat {
            case e: ValueElement =>
              e.value.toSingleString(e.vr, e.bigEndian, utf8Charsets)
                .map(tagToString(e.tag) -> _)
                .map(_ :: Nil)
                .getOrElse(Nil)
            case _ => Nil
          }
    
      val fileName = StdIn.readLine("File to process: ")
    
      val process: Future[_] = FileIO
        .fromPath(Paths.get(fileName))
        .via(parseFlow)
        .via(flow)
        .map { case (key, value) => s"$key\t\t$value" }
        .runWith(Sink.foreach(println))
    
      Await.ready(process, 10.seconds)
    
      system.terminate()
    }

    You probably want to handle nested data with more care here and lots of other things, but hopefully this is a starting point.

    Mukunth-005
    @Mukunth-005
    That's brilliant! Thank you so much for sharing. I really appreciate it. I will use it and give you my feedback.
    Mukunth-005
    @Mukunth-005
    "File to process: " should be the .dcm file or the directory ?
    Mukunth-005
    @Mukunth-005

    I got the below error:

    [ERROR] [05/11/2020 12:24:56.311] [example-system-akka.actor.default-blocking-io-dispatcher-9] [akka://example-system/system/Materializers/StreamSupervisor-1/flow-0-1-fileSource] Error during preStart in [FileSource(, 8192)]: requirement failed: Path '' is a directory
    java.lang.IllegalArgumentException: requirement failed: Path '' is a directory
    at scala.Predef$.require(Predef.scala:281)
    at akka.stream.impl.io.FileSource$$anon$2.preStart(IOSources.scala:70)
    at akka.stream.impl.fusing.GraphInterpreter.init(GraphInterpreter.scala:306)
    at akka.stream.impl.fusing.GraphInterpreterShell.init(ActorGraphInterpreter.scala:593)
    at akka.stream.impl.fusing.ActorGraphInterpreter.tryInit(ActorGraphInterpreter.scala:701)
    at akka.stream.impl.fusing.ActorGraphInterpreter.preStart(ActorGraphInterpreter.scala:750)
    at akka.actor.Actor.aroundPreStart(Actor.scala:545)
    at akka.actor.Actor.aroundPreStart$(Actor.scala:545)
    at akka.stream.impl.fusing.ActorGraphInterpreter.aroundPreStart(ActorGraphInterpreter.scala:690)
    at akka.actor.ActorCell.create(ActorCell.scala:637)
    at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:509)
    at akka.actor.ActorCell.systemInvoke(ActorCell.scala:531)
    at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:294)
    at akka.dispatch.Mailbox.run(Mailbox.scala:229)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

    Karl Sjöstrand
    @karl-exini
    The code above is for a single file. You would have to make some updates to make it run on a batch of files.
    Mukunth-005
    @Mukunth-005
    Ok got it! I gave a .dcm file, looks like it takes a long time to load and if I click enter then it throws error
    Karl Sjöstrand
    @karl-exini
    At the File to process: prompt, are you writing the full path to a dicom file?
    Mukunth-005
    @Mukunth-005
    Yes!
    I'm doing that
    Karl Sjöstrand
    @karl-exini
    Can you experiment with removing the stdin prompt and just hard code a file path?
    Mukunth-005
    @Mukunth-005
    Should I enter .dcm at the end of the file?
    Ok doing it now
    Karl Sjöstrand
    @karl-exini
    Yes, the full path.
    Mukunth-005
    @Mukunth-005
    I tried all possible approaches, but it doesn't read the file/ load .dcm file
    Karl Sjöstrand
    @karl-exini
    Are you getting the same error - that the file is a directory?
    Mukunth-005
    @Mukunth-005
    Yes, I'm getting the same error. I don't think it should be a problem reading the file
    Karl Sjöstrand
    @karl-exini
    I would try to isolate the problem as much as possible, e.g. extract the Paths.get(fileName) to its own variable and check that Files.isRegularFile(...) and Files.isDirectory(...) returns true and false respectively as expected. Once that works, try just copying the file in a streaming fashion using FileIO.fromPath to read the file (as in the example) and FileIO.toPath to write a copy. Once that also works, begin adding dicom processing piece by piece.
    Mukunth-005
    @Mukunth-005
    That's a good idea to go with, I will take try your advice
    Mukunth-005
    @Mukunth-005
    println(Files.isRegularFile(file)) = > True
    This query gived me true
    gives*
    Mukunth-005
    @Mukunth-005
    println(Files.exists(file)) - TRUE
    println(Files.isReadable(file)) -TRUE
    println(Files.isDirectory(file)) - TRUE
    Mukunth-005
    @Mukunth-005
    Hello Karl! I'm able to load the metadata! I'm able to fetch the index and the value.
    Thank you for your great support
    Karl Sjöstrand
    @karl-exini
    Awesome! Good to hear.
    Mukunth-005
    @Mukunth-005
    Hello Karl! Is there any approach that you guys followed to handle the nested structure?
    I losing the nested structure information
    Karl Sjöstrand
    @karl-exini
    There are many options here, largely depending on how you want to represent the info in the database. A way to get started is to attach a tagPathFlow (see https://github.com/exini/dicom-streams/blob/develop/src/main/scala/com/exini/dicom/streams/ElementFlows.scala#L87) after the elementFlow. This will add information on the current tag path for each element added.
    Karl Sjöstrand
    @karl-exini
    In the end though, I think you will want to create a custom sink that creates the db info. Something along the lines of
      def documentSink()(implicit ec: ExecutionContext): Sink[MyDatabaseSinkData, Future[Document]] =
        elementFlow
          .toMat(Sink.fold[MyDatabaseSinkData, Element](MyDatabaseSinkData()) { case (sinkData, element) =>
            element match {
              case e: ValueElement => // TODO
                sinkData
              case e: SequenceElement => // TODO
                sinkData
              case _: ItemElement => // TODO
                sinkData
              case _: ItemDelimitationElement => // TODO
                sinkData
              case _: SequenceDelimitationElement => // TODO
                sinkData
              case _ =>
                sinkData
            }
          })(Keep.right)
    Mukunth-005
    @Mukunth-005
    Thank you so much for sharing, I will use this and let you know my update.
    Mukunth-005
    @Mukunth-005
    Screen Shot 2020-05-14 at 10.39.36 AM.png
    Moreover, If I want to make it customized (i.e.) I want the output to be in this format
    Map(tag -> (0028,3002) , attributeName ->ABC, value -> 1234, valueRepresentation -> US ) instead of just key-value pair, I'm not able to make changes to the package, it tells me read-only. Is there any way to add all this information
    Karl Sjöstrand
    @karl-exini
    The Lookup class allows you to get the keyword of an attribute from its tag number.
    On a separate note, we're happy to collaborate on updates! That would follow the standard procedure where you fork the repo, make some changes and ask these to be merged via a pull request.
    Mukunth-005
    @Mukunth-005
    That's great, where can I add the lookup class in the main code?
    While I'm parsing the current image I want to fetch all the information
    I was able to find the lookup class, but do you have an example for using it?
    Sure, that's a great idea. I'm sure that I will create a generic DICOM parser using your packages. I can contribute to your source code as well. Thanks for letting me know
    Karl Sjöstrand
    @karl-exini
    Up to you where you use Lookup, not sure what your code currently looks like. Referring to my first example above, maybe it could be used in .mapConcat instead of tagToString?
    Mukunth-005
    @Mukunth-005
    Screen Shot 2020-05-14 at 12.28.38 PM.png
    Do you think I'm using it right?
    Karl Sjöstrand
    @karl-exini
    Looks good. Just missing a .getOrElse to decide what to do if there is no known keyword for the current tag.
    Mukunth-005
    @Mukunth-005

    Hello Karl!
    I have completed all the steps for parsing the metadata in a stream using your package and your great support.
    Can you help me with aggregating all these maps and saving as a single document?
    I have decided on the schema for this metadata already, but I'm not able to join all the metadata information together to enforce the schema on the collective metadata for a single dcm file.

    Output for the map:

    Map(ValueRepresentation -> US, Value -> 0, SubAttributeName -> ABC, Tag -> (6000,0102), AttributeName -> ABC)
    Map(ValueRepresentation -> US, Value -> 0, SubAttributeName -> XYZ, Tag -> (6000,0102), AttributeName -> XYZ)
    Map(ValueRepresentation -> US, Value -> 0, SubAttributeName -> PWD, Tag -> (6000,0102), AttributeName -> PWD)
    .
    .

    I want to aggregate all this information in a single row of the spark data frame. Kindly help me with it.

    Looking forward to your response. Thank you!

    Screen Shot 2020-05-18 at 11.39.52 PM.png
    Karl Sjöstrand
    @karl-exini
    If it's aggregation you need, I think changing the Sink.foreach to Sink.fold (with appropriate arguments) could be the way to go.