These are chat archives for linkedin/pinot

22nd
Jul 2015
Johnny Deng
@goverdata
Jul 22 2015 06:28
Hi, I want to append data to one existed table. How to do that?
Xiang Fu
@fx19880617
Jul 22 2015 06:29
push segment to controller with a new segment name
Johnny Deng
@goverdata
Jul 22 2015 06:43
thanks @fx19880617 How to verify the name of segment? I just need to change file's name under "-segmentDir"?
Johnny
@jzmq
Jul 22 2015 06:44
Hi @fx19880617 , I have the same question with @code4love . The segment name pattern is {TableName}_{Index} , If the segments with the same name ,they are overriding . Here's question: how can I know the max index of this table? (by helix UI ?) and should I increase the index manually?
Kishore Gopalakrishna
@kishoreg
Jul 22 2015 06:44
yep, generally we use a convention
tableName_starttime_endTime
Johnny
@jzmq
Jul 22 2015 06:58

Hi @kishoreg I didn't set the time column , so it call the method

public static String buildBasic(String tableName, String prefix) {
    return StringUtil.join("_", tableName, prefix);
  }

the prefix is the set by the number of segments count .
Did I miss something?

Xiang Fu
@fx19880617
Jul 22 2015 07:02
which file is this?
@code4love you may need to change tar file name and repush it to controller
@jzmq which code you are using to build segment
@jzmq if you specify time column,time type, segment creation code will pick start/endtime and append them into segment name.
If you specify your own SegmentGeneratorConfig, you can always use segment.name.postfix to specify a postfix to your segment name
Kishore Gopalakrishna
@kishoreg
Jul 22 2015 07:08
I think they are using pinot-admin.sh to generate segments
Johnny Deng
@goverdata
Jul 22 2015 07:10
@kishoreg I am using pinot-hadoop to generate segments
Kishore Gopalakrishna
@kishoreg
Jul 22 2015 07:11
oh ok
Xiang Fu
@fx19880617
Jul 22 2015 07:12
no wonder
I preassume the file seqId to be postfix there
Johnny Deng
@goverdata
Jul 22 2015 07:23

Hi, @kishoreg @fx19880617
At line 120 of com.linkedin.pinot.hadoop.job.SegmentCreationJob :

for (int seqId = 0; seqId < inputDataFiles.size(); ++seqId) {
      FileStatus file = inputDataFiles.get(seqId);
      String completeFilePath = " " + file.getPath().toString() + " " + seqId;
      Path newOutPutFile = new Path((_stagingDir + "/input/" + file.getPath().toString().replace('.', '_').replace('/', '_').replace(':', '_') + ".txt"));
      FSDataOutputStream stream = fs.create(newOutPutFile);
      stream.writeUTF(completeFilePath);
      stream.flush();
      stream.close();
    }

I think I need to change the completeFilePath_ like this: " " + file.getPath().toString() + " " + starttime + + endTime + seqId

Xiang Fu
@fx19880617
Jul 22 2015 07:25
this is the place to generate temp file for mapper to read through input data
I already made the patch
the patch is in HadoopSegmentCreationMapReduceJob.java

diff --git a/pinot-hadoop/src/main/java/com/linkedin/pinot/hadoop/job/mapper/HadoopSegmentCreationMapReduceJob.java b/pinot-hadoop/src/main/java/com/linkedin/pinot/hadoop/job/mapper/HadoopSegmentCreationMapReduceJob.java
index 401c201..6740e1e 100644
--- a/pinot-hadoop/src/main/java/com/linkedin/pinot/hadoop/job/mapper/HadoopSegmentCreationMapReduceJob.java
+++ b/pinot-hadoop/src/main/java/com/linkedin/pinot/hadoop/job/mapper/HadoopSegmentCreationMapReduceJob.java
@@ -29,7 +29,6 @@ import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.codehaus.jackson.map.ObjectMapper;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
-import org.stringtemplate.v4.compiler.STParser.list_return;

import com.linkedin.pinot.common.data.Schema;
import com.linkedin.pinot.common.utils.TarGzCompressionUtils;
@@ -49,6 +48,7 @@ public class HadoopSegmentCreationMapReduceJob {
private String _inputFilePath;
private String _outputPath;
private String _tableName;

  • private String _postfix;

    private Path _currentHdfsWorkDir;
    private String _currentDiskWorkDir;
    @@ -83,6 +83,7 @@ public class HadoopSegmentCreationMapReduceJob {

    _outputPath = _properties.get("path.to.output");
    _tableName = _properties.get("segment.table.name");

  • _postfix = _properties.get("segment.name.postfix");
    if (_outputPath == null || _tableName == null) {
    throw new RuntimeException(

      "Missing configs: " +

    @@ -160,7 +161,11 @@ public class HadoopSegmentCreationMapReduceJob {

    FileFormat fileFormat = getFileFormat(dataFilePath);
    segmentGeneratorConfig.setInputFileFormat(fileFormat);

  • segmentGeneratorConfig.setSegmentNamePostfix(seqId);
  • if (null != _postfix) {
  • segmentGeneratorConfig.setSegmentNamePostfix(String.format("%s-%s", _postfix, seqId));
  • } else {
  • segmentGeneratorConfig.setSegmentNamePostfix(seqId);
  • }
    segmentGeneratorConfig.setRecordeReaderConfig(getReaderConfig(fileFormat));

    segmentGeneratorConfig.setIndexOutputDir(_localDiskSegmentDirectory);

this is the diff:
diff --git a/pinot-hadoop/src/main/java/com/linkedin/pinot/hadoop/job/mapper/HadoopSegmentCreationMapReduceJob.java b/pinot-hadoop/src/main/java/com/linkedin/pinot/hadoop/job/mapper/HadoopSegmentCreationMapReduceJob.java
index 401c201..6740e1e 100644
--- a/pinot-hadoop/src/main/java/com/linkedin/pinot/hadoop/job/mapper/HadoopSegmentCreationMapReduceJob.java
+++ b/pinot-hadoop/src/main/java/com/linkedin/pinot/hadoop/job/mapper/HadoopSegmentCreationMapReduceJob.java
@@ -29,7 +29,6 @@ import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
 import org.codehaus.jackson.map.ObjectMapper;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
-import org.stringtemplate.v4.compiler.STParser.list_return;

 import com.linkedin.pinot.common.data.Schema;
 import com.linkedin.pinot.common.utils.TarGzCompressionUtils;
@@ -49,6 +48,7 @@ public class HadoopSegmentCreationMapReduceJob {
     private String _inputFilePath;
     private String _outputPath;
     private String _tableName;
+       private String _postfix;

     private Path _currentHdfsWorkDir;
     private String _currentDiskWorkDir;
@@ -83,6 +83,7 @@ public class HadoopSegmentCreationMapReduceJob {

       _outputPath = _properties.get("path.to.output");
       _tableName = _properties.get("segment.table.name");
+      _postfix = _properties.get("segment.name.postfix");
       if (_outputPath == null || _tableName == null) {
         throw new RuntimeException(
             "Missing configs: " +
@@ -160,7 +161,11 @@ public class HadoopSegmentCreationMapReduceJob {

       FileFormat fileFormat = getFileFormat(dataFilePath);
       segmentGeneratorConfig.setInputFileFormat(fileFormat);
-      segmentGeneratorConfig.setSegmentNamePostfix(seqId);
+      if (null != _postfix) {
+        segmentGeneratorConfig.setSegmentNamePostfix(String.format("%s-%s", _postfix, seqId));
+      } else {
+       segmentGeneratorConfig.setSegmentNamePostfix(seqId);
+      }
       segmentGeneratorConfig.setRecordeReaderConfig(getReaderConfig(fileFormat));

       segmentGeneratorConfig.setIndexOutputDir(_localDiskSegmentDirectory);
then you can specify the segment.name.postfix in your job conf file
mapper will pick up that config
Johnny Deng
@goverdata
Jul 22 2015 07:30
@fx19880617 Thanks!
Johnny
@jzmq
Jul 22 2015 07:36
@fx19880617 why not pass mapreduce properties to SegmentGeneratorConfig ?
Xiang Fu
@fx19880617
Jul 22 2015 07:37
SegmentGeneratorConfig has its own setter and getter
It’s a good point, we may add a new constructor for SegmentGeneratorConfig.
kunal-kulkarni
@kunal-kulkarni
Jul 22 2015 20:24
Hi what is the default location of Controller, Broker,Server logs?
I'm running Pinot but not able to find these logs
Kishore Gopalakrishna
@kishoreg
Jul 22 2015 20:34
it should be in /tmp
you can configure them by changing the log4j.properties under conf directory
kunal-kulkarni
@kunal-kulkarni
Jul 22 2015 20:38
I uploaded 100 segments, each with different name but same table name. When I query I get results only from 8 segments
I confirmed the segments are there in Controller and Server datadir
*all the segments are there
Kishore Gopalakrishna
@kishoreg
Jul 22 2015 20:40
whats the query
kunal-kulkarni
@kunal-kulkarni
Jul 22 2015 20:41
select count(*) from tablename
Kishore Gopalakrishna
@kishoreg
Jul 22 2015 20:42
have you verified all segments are loaded in the server?
controllerHost:port/tables/{tableName}/segments
kunal-kulkarni
@kunal-kulkarni
Jul 22 2015 20:44
I'm having segments generated via offline hadoop indexing job. Then I did SegmentTarPush to Controller
I see all the segments are present in PinotController/tablename and also in serverdata8098/index/tablename_OFFLINE
All the system memory is used up (System has only 6 gb RAM). Is it because of this?
Kishore Gopalakrishna
@kishoreg
Jul 22 2015 20:49
yes, if you are using heap mode
kunal-kulkarni
@kunal-kulkarni
Jul 22 2015 20:50
How to use non-heap mode?
Kishore Gopalakrishna
@kishoreg
Jul 22 2015 20:50
do you know the size of segments?
kunal-kulkarni
@kunal-kulkarni
Jul 22 2015 20:51
each is 120MB
Kishore Gopalakrishna
@kishoreg
Jul 22 2015 20:51
so you need roughly 12 g of memory
if u want to use non heap
restart the server with mmap mode
kunal-kulkarni
@kunal-kulkarni
Jul 22 2015 21:04
Please tell me how to start server with mmap mode. Before I used to start using - bin/pinot-admin.sh StartServer &
kunal-kulkarni
@kunal-kulkarni
Jul 22 2015 22:23
Ok one thing i'll change is have "loadMode":"MMAP" in tableIndexConfig.
Please let me know how to start server in mmap mode
?
Kishore Gopalakrishna
@kishoreg
Jul 22 2015 22:46
one sec