These are chat archives for nrinaudo/kantan.csv

12th
Jan 2017
Nguyen Dinh Thuc
@thucnd
Jan 12 2017 01:53
@nrinaudo I try to write Japanese character to csv but it error
“-" “~” change to “?” in csv file
Nicolas Rinaudo
@nrinaudo
Jan 12 2017 06:43
@thucnd did you put the correct encoding in implicit scope? For japanese characters, I'm guessing you should be using utf-8, utf-16 or shit_jis
Nguyen Dinh Thuc
@thucnd
Jan 12 2017 08:28
we are using shit_jis. Almost of all character working well.
shit_jis cannot display “-" “~”
Nicolas Rinaudo
@nrinaudo
Jan 12 2017 08:35
When you say display, do you mean that the software you use to view the csv file doesn't display it properly, maybe because the font it uses doesn't support all Japanese characters?
Because these two characters look like double width punctuation, and I'm almost sure shift_jis encodes these properly
So, have you tried reading the csv file back with kantan.csv and compare the result? I would guess your issue is more with decoding, or displaying, than it is with encoding
Nguyen Dinh Thuc
@thucnd
Jan 12 2017 08:37
I want to set default charset to utf-8 when open csv file by application (ex: excel). I found that we can do it by add uFEFFto csv file. How do I add it by kantan.csv?
Nicolas Rinaudo
@nrinaudo
Jan 12 2017 08:39
No, what you want to do is put an implicit codec in scope. What you're talking about is adding a BOM, which doesn't actually change the encoding of you're using the wrong one
So, give me a few minutes. I'm on the phone now, less than ideal to type out the code example you need
Nicolas Rinaudo
@nrinaudo
Jan 12 2017 08:47
alright, so I'm behind a proper computer now.
what you want is to have the following line in the same scope as the creation of the CsvWriter:
implicit val codec = scala.io.Codec.UTF8
now, if you're using Excel to open CSV files, you have bigger problems than that though
Excel has weird defaults, and they change from one computer to the next. Mine, for example, defaults to using MacRoman when writing CSV, but windows-1252 when reading them. My colleagues has iso-8859-1 both ways
Nguyen Dinh Thuc
@thucnd
Jan 12 2017 09:04
yeah, the same my problem. I writing csv: UTF-8 but my customer open excel, it changed to other font ( shift_jis). Shift_jis cannot show (“-" “~”)
Nicolas Rinaudo
@nrinaudo
Jan 12 2017 09:05
right, so you don't control the application used by the customer to open the CSV file.
have you confirmed that if the CSV file starts with a BOM (and is utf-8 encoded), Excel opened it correctly?
Nguyen Dinh Thuc
@thucnd
Jan 12 2017 09:07
my friend used open csv. He add add uFEFFto csv file by open csv library. it work well
import au.com.bytecode.opencsv.CSVWriter
import java.io.OutputStreamWriter
val byteArrayOutput : ByteArrayOutputStream = new ByteArrayOutputStream
val osWriter = new OutputStreamWriter(byteArrayOutput, "UTF-8")
osWriter.write('\uFEFF’)
val writer: CSVWriter = new CSVWriter(osWriter)
rows.foreach(row => {
     writer.writeNext(row.toArray)
})
writer.flush()
writer.close()
 byteArrayOutput
Nicolas Rinaudo
@nrinaudo
Jan 12 2017 09:11
ok, so it should probably be easy to do the same thing with kantan.csv
how do you actually write your CSV? Do you use an instance of CsvWriter, or do you use one of the helper functions?
Nguyen Dinh Thuc
@thucnd
Jan 12 2017 09:14
yeah, I am going to try it now
Nicolas Rinaudo
@nrinaudo
Jan 12 2017 09:14
can you paste the code that does the actual CSV writing? If you're using a CsvWriter, it should be very easy. If you're using helpers, such as File.writeCsv, it might be slightly more complicated but manageable.
Nguyen Dinh Thuc
@thucnd
Jan 12 2017 09:17
def write(data: Seq[R], name: String): Try[File] = {
    Try {
      val out = java.io.File.createTempFile(name, ".csv")
      val writer = out.asCsvWriter[R](',', header: _*)
      writer.write(data).close()
      out
    }.recover {
      case e: Exception => throw new FailCsvWriting("error.csv.failWriting")
    }
  }
my problem, I cannot write '\uFEFF’ before write data to csv
Nicolas Rinaudo
@nrinaudo
Jan 12 2017 09:21
you know that Writer also has a asCsvWriter method, right?
so, it'd be a bit ugly, but you could open a new OutputStreamWriter(new FileOutputstream(out), "utf-8").asCsvWriter[R](',', header:_*)
(also, this is a bit out of scope, but you could simplify your current code quite a bit - java.io.File.createTempFile(name, ".csv").writeCsv(data, ',', header:_*))
so, something like that:
def write(data: Seq[R], name: String): Try[File] = {
  Try {
    val out = java.io.File.createTempFile(name, ".csv")
    val writer = new OutputStreamWriter(new FileOutputStream(out), "utf-8")
    // TODO: write BOM
    writer.write(data)
    writer.close()
    out
  }.recover {
    case e: Exception => throw new FailCsvWriting("error.csv.failWriting")
  }
}
Nguyen Dinh Thuc
@thucnd
Jan 12 2017 09:27
yeah, thanks
Nicolas Rinaudo
@nrinaudo
Jan 12 2017 09:35
quite welcome, let me know how that works out for you
Nguyen Dinh Thuc
@thucnd
Jan 12 2017 09:46
It work well on excel. Thanks you so much
Nicolas Rinaudo
@nrinaudo
Jan 12 2017 09:47
you're quite welcome. If you wouldn't mind, could you open an issue regarding this? Something about optionally supporting BOM when writing CSV?
your use-case is probably fairly common and it would be good to have a generic solution in the library
Nguyen Dinh Thuc
@thucnd
Jan 12 2017 09:48
on your github right ? OK, I will do it
Nicolas Rinaudo
@nrinaudo
Jan 12 2017 09:49
right, on kantan.csv's github. Thanks!
Nicolas Rinaudo
@nrinaudo
Jan 12 2017 10:55
(also, without meaning to, you've put your finger on one of the major remaining warts of kantan.csv. Encoding throws. Sorry about that, I absolutely intend to fix it)