Impala “insert” Performance

  1. Inserted 100001 rows in 7.28s

 

Table format:

(key_ID, int)

 

For example:

insert into a values (0,126),(1,173),(2,568),(3,164),(4,593),(5,788),(6,924),(7,206),(8,359),(9,690),(10,987),(11,231),(12,817),(13,122),(14,373),(15,177),(16,156),(17,256),(18,203),(19,38);

 

When the row number reaches 1,000,000, the parsing speed becomes pretty slow.

 

 

 

  1. Inserted 10000 rows in 14.11s

 

Table format:  (11 columns)

(string, boolean, double, float, bigint, int, smallint, tinyint, string, string, timestamp)

 

For example:

insert into hbasestringids values (‘0′,true,0.8,1.5,123456789101112,12345678,12345,12,’aaa’,’abc’,’1985-09-25 17:45:30.005′),(‘1′,false,6.77777,1.1111,7654321121212,87654321,21345,123,’bbb’,’dcba’,’1986-10-25 17:45:30.005′);

 

When the row number reaches 100,000, the program crashes.

If dividing the query into multiple queries, it would work.

 

 

 

  1. Insert with subselect:

 

Inserted 10000 rows in 1.47s

Inserted 20000 rows in 2.26s

 

There is no problem with this method for 100000 rows.

 

Query:

insert into default.b (id, string_col, int_col, double_col, bool_col, timestamp_col) select id, string_col, int_col, double_col, bool_col, timestamp_col from default.hbasestringids limit 20000;

 

Table default.b and default.hbasestringids have the same description:

 

+—————–+———–+———+

| name            | type      | comment |

+—————–+———–+———+

| id              | string    |         |

| bool_col        | boolean   |         |

| double_col      | double    |         |

| float_col       | float     |         |

| bigint_col      | bigint    |         |

| int_col         | int       |         |

| smallint_col    | smallint  |         |

| tinyint_col     | tinyint   |         |

| date_string_col | string    |         |

| string_col      | string    |         |

| timestamp_col   | timestamp |         |

+—————–+———–+———+

 

 

 

 

 

 

 

 

 

System Configuration:

 

Architecture:          x86_64

CPU op-mode(s):        32-bit, 64-bit

Byte Order:            Little Endian

CPU(s):                8

On-line CPU(s) list:   0-7

Thread(s) per core:    2

Core(s) per socket:    4

Socket(s):             1

NUMA node(s):          1

Vendor ID:             GenuineIntel

CPU family:            6

Model:                 42

Stepping:              7

CPU MHz:               1600.000

BogoMIPS:              6785.08

Virtualization:        VT-x

L1d cache:             32K

L1i cache:             32K

L2 cache:              256K

L3 cache:              8192K

NUMA node0 CPU(s):     0-7

 

Physical Memory:

16G

 

Disk:

 

ATA device, with non-removable media

Model Number:       ST31500341AS

Serial Number:      9VS551EJ

Firmware Revision:  CC4G

Transport:          Serial

Standards:

Used: unknown (minor revision code 0x0029)

Supported: 8 7 6 5

Likely used: 8

Configuration:

Logical         max     current

cylinders       16383   16383

heads           16      16

sectors/track   63      63

CHS current addressable sectors:   16514064

LBA    user addressable sectors:  268435455

LBA48  user addressable sectors: 2930277168

Logical/Physical Sector size:           512 bytes

device size with M = 1024*1024:     1430799 MBytes

device size with M = 1000*1000:     1500301 MBytes (1500 GB)

cache/buffer size  = unknown

Nominal Media Rotation Rate: 7200

 

OS Version:

Linux version 2.6.32-431.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Fri Nov 22 03:15:09 UTC 2013

Leave a comment