读取parquet文件时间问题

kxg916361108 2017-07-06 09:55:24

读取hdfs上parquet文件时代码如下：
public class ReaderParquet {

public static void main(String[] args) {
Configuration conf = new Configuration();
conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
String hdfsPath="hdfs://10.2.5.202:9000/user/hive/biz1/extend/wp04/wp04_2008_data/year=2017/month=4/day=10/1492049956818";
Path path = new Path(hdfsPath);
try {
ParquetMetadata readFooter = ParquetFileReader.readFooter(conf, path, ParquetMetadataConverter.NO_FILTER);
MessageType schema = readFooter.getFileMetaData().getSchema();
List<Type> columnInfos = schema.getFields();
ParquetReader<Group> reader = ParquetReader.builder(new GroupReadSupport(), path).withConf(conf).build();
int count = 0;
Group recordData = reader.read();
while(recordData != null && count<=20) {
recordData = reader.read();
count++;
System.out.println(recordData);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}

但是读取出来的文件数据中时间recotime显示如下：
rowkey: 51480100000000020022842033
recotime: Int96Value{Binary{12 bytes, [0, -108, -65, -126, 99, 52, 0, 0, -3, -128, 37, 0]}}
projid: 102140
devid: 108415
devaddr: 2
frameno: 5
receivetime: Int96Value{Binary{12 bytes, [0, 102, -53, -39, 111, 52, 0, 0, -3, -128, 37, 0]}}
modifyflag: P3ooooooooooOooooooooooooooo
tablename: 0022
commprotocolver_db: 0022
onoff: 0
runmode_db: 2
cool_ewt_set: 12.0
heat_ewt_set: 40.0
hrectempset: 50.0
麻烦大神指点下怎么把recotime转换成2017-07-06 20：20：20的日期时间格式？

...全文

1562 8 打赏收藏转发到动态举报

写回复

用AI写文章

8 条回复

切换为时间正序

请发表友善的回复…

发表回复

lgq2016 2020-10-15

打赏
举报

为什么按行读csv比读取parq快？

kxg916361108 2017-07-06

打赏
举报

http://blog.csdn.net/yingkongshi99/article/details/51463085

kxg916361108 2017-07-06

打赏
举报

是的，我又查了下。网上说parquet文件在存储timestamp类型时为了保持精度做了特殊处理，读取的时候也要特殊处理才行。存储的时候变成了12位的字节数组。我先试试那种方法。

LinkSe7en 2017-07-06

打赏
举报

你时间是通过


        long time = toLong(xxxx);
        
        Date date = new Date(time);

        System.out.println(new SimpleDateFormat("yyyy-MM-dd hh:mm:ss.SSS").format(date));

吗？

kxg916361108 2017-07-06

打赏
举报

我把字节数据转化为长整型后再转换为时间。发现时间大了很多位。。@link0007。通过hive读取的记录时间是2017-04-01 00：00：05.008的格式

kxg916361108 2017-07-06

打赏
举报

引用 1 楼 link0007 的回复:

这就是long的byte[]吧，给你个代码


    /**
     * 字节数组转long
     * @param data 字节数组
     * @return long value
     */
    public static long toLong(byte data[]) {
        ByteBuffer buffer = ByteBuffer.allocate(8);
        buffer.put(data,0,data.length > 8 ? 8 : data.length);
        buffer.flip();
        return buffer.getLong();
    }

或者用HBase的 Bytes.toLong

好的，我先研究下。谢谢！

kxg916361108 2017-07-06

打赏
举报

好的，，我先研究一下。谢谢！

LinkSe7en 2017-07-06

打赏
举报

这就是long的byte[]吧，给你个代码


    /**
     * 字节数组转long
     * @param data 字节数组
     * @return long value
     */
    public static long toLong(byte data[]) {
        ByteBuffer buffer = ByteBuffer.allocate(8);
        buffer.put(data,0,data.length > 8 ? 8 : data.length);
        buffer.flip();
        return buffer.getLong();
    }

或者用HBase的 Bytes.toLong