(DEHEMS) Benchmark framework

Contents

(DEHEMS) Benchmark framework

Configuration

the following files are found in the folder sources:
generator.config.h:
- generator_type_c -- determines the way in which the records are generated:
  - generator_clustered_type_c -- the client and sensor identifiers are generated by choosing a random (uniformly distributed) number in the ranges controlled by the following parameters, and the timestamp is incremented for each generated record:
    - generator_maximum_clients_c -- the client identfiers are generated in the range 0 up to this number;
    - generator_maximum_sensors_c -- the same as above;
  - generator_sequential_type_c -- the client and sensor identifiers are always 0, and the timestamp is incremented for each generated record;
  - generator_random_type_c -- all client and sensor identifiers, together with the timestamp, are generated randomly from the entire 32 bit range;
harness.config.h:
- harness_execute_inserts_c -- determines if the insert benchmark should be executed;
- harness_execute_scans_c -- determines if the scan benchmark should be executed;
- harness_maximum_clients_c and harness_maximum_sensors_c -- used for the scan benchmark to select the records (should have the same values as the parameters for the generator);
- harness_maximum_steps_c -- how many records to insert;
- harness_report_count_c -- determines the interval at which trace data points are generated (meaning for this ammount of insert / scan operations a data point will be printed);

Building

building for a specific storage engine:
- issue the command bellow;
- the STORE argument selects the storage engine that you would like to use;
- you should obtain a folder named testing inside the outputs folder;
- in the previously presented folder outputs/testing you should obtain a file named ${STORE}.elf

make build STORE=sqlite3

executing a test for a specific benchmark:
- issue the command bellow;
- the STORE argument is like in the previous case;
- the TRACE argument represents the benchmarking session (this way you could run multiple tests with different inputs);
- the SUB argument represents the benchmarking sub-session (where you could fine-tune a particular engine, and thus have multiple runs for the same storage engine with the same inputs, but different tuning settings);
- you should obtain the same results as the previous command, but instead of the testing folder you should obtain a folder named ${TRACE} ;
- you should also obtain a folder named ${TRACE} inside the traces folder that contains the ${STORE}${SUB}.trace and ${STORE}${SUB}.config files;
- as an observation, before the test is actually executed the makefile tests if there is a file ./bin/${STORE}-control and if so it calls it with the argument create; the same happens after the test is run, but with the destroy argument; these operations have the purpose of creating the datastore, or destroying it;

make trace STORE=sqlite3 TRACE=small SUB=

viewing the result of a specific benchmark:
- just like in the case of execution but we use view instead of trace;

make view STORE=sqlite3 TRACE=small SUB=

Project layout

Generic

sources -- all source code files go into this folder:
- *.cm -- represent a custom made C module -- meaning from one of these files we could obtain both the header .h and the code .c files;
- *.cppm -- the same as above, but for C++;
- *.plot -- files used by GNU Plot to create the plots;
- *.sql or *.hql -- SQL (or HQL) files used to create the database schema or to destroy it;
traces -- contains the benchmarks results:
- ... -- for each benchmark environment (number of records, way of generating the records) a subfolder exists that contains:
  - *.trace -- text files containing the actual traces (used as input for GNU Plot);
  - *.png -- image of the charts obtained by GNU Plot from the respective *.trace file;
  - *.config -- a concatenation between the generator.config.h, harness.config.h and ${STORE}.config.h that were used to create the executable;
  - *.seal -- marker denoting that the test should not be executed, but that it should reuse the *.trace file;
  - *.makefile;
bin -- various executables that are used in the build process or for setting up (or tearing down) the databases; usually the files here are symbolic links to the scripts folder;
scripts -- various scripts, see above;
outputs -- all the build outputs are put here, together with any traces or intermediary files; this folder should be considered as temporary and non-existing:
- for each test there should be one subfolder;
tmp -- used to store temporary files for benchmarks (like the databases in the case of BerkeleyDB and Sqlite3);

Sources

the following files are found in the folder sources;
core:
- common.h -- macros and definitions used throughout the entire project;
- data.h -- structure and constants related with a record that should be stored;
- generator.cm -- module that creates records based on different policies;
- generator.config.h -- parameters to the configurator;
- store.h -- public API that should be implemented by all storage engines;
- crash.cm -- code used in debugging (i.e. printing trace messages) or aborting the process; (the name is somehow missleading;)
- harness.cm -- application main function; executes both the insert and scan tests;
- harness.config.h -- configuration for the benchmark execution;
- progress.cm -- contains the code that displays the progressbar and outputs the trace information;
store engines:
- bdb.cm and bdb.config.h -- BerkeleyDB;
- hypertable.cm and hypertable.config.h -- Hypertable;
- monetdb.cm and monetdb.config.h -- MonetDB;
- postgres.cm and postgres.config.h -- Postgres;
- sqlite3.cm and sqlite3.config.h -- Sqlite3;
- file.cm and file.config.h -- the records are written into an append-only fashion to a file; provides only insert operations;
- sink.cm -- null storage engine; provides only insert operations;

Coding

Generic

type identifiers (by examples):
- uint32_t stands for unsigned int;
- uint32_tp stands for unsigned int *;
- char_tcppc stands for char const * * const;
- data_tpc stands for struct data * const;
- the rules:
  - ..._t should be read as the type ...;
  - ...c... should be read as ... const ...;
  - ...p... should be read as ... * ...;
type definition macros (by examples):
- struct_def(identifier) { ... } should be read as struct name { ... } , but also creates the type identifiers seen before;
- the same for union_def and enum_def;
- type_def(identifier, type) should be read as typedef type identifier, but also creates the type identifiers;

Storage API

sensor reading record:

struct_def (data) {
        uint32_t client;
        uint32_t sensor;
        uint32_t timestamp;
        uint32_t value;
};

some constants that have special meaning:

#define undefined_client_c 0xffffffff
#define undefined_sensor_c 0xffffffff
#define undefined_timestamp_c 0xffffffff
#define undefined_value_c 0xffffffff

obtaining information about the storage engine:
- the identifier argument is used as an output argument, that is a pointer to a string identifying the storage engine;
- can_insert is an output argument that tels if the storage engine can execute inserts;
- can_scan the same as above, but for scans;

extern error_t store_identify (char_tcppc identifier, bool_tpc can_insert, bool_tpc can_scan);

initializing the storage engine:
- it is called just before any operation, and before any operation initializations;

extern error_t store_initialize (void);

destroying the storage engine:
- it is called just after all operations, and after all other operation deinitializations;

extern error_t store_finalize (void);

preparing the storage engine for inserts:
- it is called after store_initialize and before any insert operations;

extern error_t store_initialize_insert (void);

finishing the inserts:

extern error_t store_finalize_insert (void);

executing the insert operation:

extern error_t store_execute_insert (data_tcpc data);

preparing for scan operations:
- it is called before any scan session;

extern error_t store_initialize_scan (void);

finishing scans:

extern error_t store_finalize_scan (void);

starting a scan session:
- it is called after store_initialize_scan;
- the input parameters client and sensor denote the client and sensor for which the scan should be executed;
- the outputs records should be given in ascending order of the timestamp;

extern error_t store_begin_scan (uint32_tc client, uint32_t sensor);

ending a scan session:

extern error_t store_end_scan (void);

retrieving the next scaned record:
- called after store_begin_scan;
- should return the next record that matches the session filter (client and sensor), in ascending order of the timestamp;
- if there are no more records, the client, sensor, timestamp, and value members of the data structure parameter should be set to undefined_..._c constants described in the previous section;

extern error_t store_execute_scan (data_tpc data);

Storage makefile

by using the ...-build tag, you can adjust the CPPFLAGS and LDFLAGS for this particular engine;
you could also update the PATH (where other executables are found) and LD_LIBRARY_PATH (where other libraries are found);

BDB := /packages/bdb--4.7.25
BDB_BIN := $(BDB)/bin
BDB_INCLUDE := $(BDB)/include
BDB_LIB := $(BDB)/lib
BDB_LIBS := -ldb

bdb-build : CPPFLAGS := $(CPPFLAGS) -I$(BDB_INCLUDE)
bdb-build : LDFLAGS := $(LDFLAGS) $(BDB_LIBS) -L$(BDB_LIB)

PATH := $(BDB_BIN):$(PATH)
LD_LIBRARY_PATH := $(BDB_LIB):$(LD_LIBRARY_PATH)