(DEHEMS) Benchmark framework
Contents
Configuration
the following files are found in the folder sources:
generator.config.h:
generator_type_c -- determines the way in which the records are generated:
generator_clustered_type_c -- the client and sensor identifiers are generated by choosing a random (uniformly distributed) number in the ranges controlled by the following parameters, and the timestamp is incremented for each generated record:
generator_maximum_clients_c -- the client identfiers are generated in the range 0 up to this number;
generator_maximum_sensors_c -- the same as above;
generator_sequential_type_c -- the client and sensor identifiers are always 0, and the timestamp is incremented for each generated record;
generator_random_type_c -- all client and sensor identifiers, together with the timestamp, are generated randomly from the entire 32 bit range;
harness.config.h:
harness_execute_inserts_c -- determines if the insert benchmark should be executed;
harness_execute_scans_c -- determines if the scan benchmark should be executed;
harness_maximum_clients_c and harness_maximum_sensors_c -- used for the scan benchmark to select the records (should have the same values as the parameters for the generator);
harness_maximum_steps_c -- how many records to insert;
harness_report_count_c -- determines the interval at which trace data points are generated (meaning for this ammount of insert / scan operations a data point will be printed);
Building
- building for a specific storage engine:
- issue the command bellow;
the STORE argument selects the storage engine that you would like to use;
you should obtain a folder named testing inside the outputs folder;
in the previously presented folder outputs/testing you should obtain a file named ${STORE}.elf
make build STORE=sqlite3
- executing a test for a specific benchmark:
- issue the command bellow;
the STORE argument is like in the previous case;
the TRACE argument represents the benchmarking session (this way you could run multiple tests with different inputs);
the SUB argument represents the benchmarking sub-session (where you could fine-tune a particular engine, and thus have multiple runs for the same storage engine with the same inputs, but different tuning settings);
you should obtain the same results as the previous command, but instead of the testing folder you should obtain a folder named ${TRACE} ;
you should also obtain a folder named ${TRACE} inside the traces folder that contains the ${STORE}${SUB}.trace and ${STORE}${SUB}.config files;
as an observation, before the test is actually executed the makefile tests if there is a file ./bin/${STORE}-control and if so it calls it with the argument create; the same happens after the test is run, but with the destroy argument; these operations have the purpose of creating the datastore, or destroying it;
make trace STORE=sqlite3 TRACE=small SUB=
- viewing the result of a specific benchmark:
just like in the case of execution but we use view instead of trace;
make view STORE=sqlite3 TRACE=small SUB=
Project layout
Generic
sources -- all source code files go into this folder:
*.cm -- represent a custom made C module -- meaning from one of these files we could obtain both the header .h and the code .c files;
*.cppm -- the same as above, but for C++;
*.plot -- files used by GNU Plot to create the plots;
*.sql or *.hql -- SQL (or HQL) files used to create the database schema or to destroy it;
traces -- contains the benchmarks results:
... -- for each benchmark environment (number of records, way of generating the records) a subfolder exists that contains:
*.trace -- text files containing the actual traces (used as input for GNU Plot);
*.png -- image of the charts obtained by GNU Plot from the respective *.trace file;
*.config -- a concatenation between the generator.config.h, harness.config.h and ${STORE}.config.h that were used to create the executable;
*.seal -- marker denoting that the test should not be executed, but that it should reuse the *.trace file;
*.makefile;
bin -- various executables that are used in the build process or for setting up (or tearing down) the databases; usually the files here are symbolic links to the scripts folder;
scripts -- various scripts, see above;
outputs -- all the build outputs are put here, together with any traces or intermediary files; this folder should be considered as temporary and non-existing:
- for each test there should be one subfolder;
tmp -- used to store temporary files for benchmarks (like the databases in the case of BerkeleyDB and Sqlite3);
Sources
the following files are found in the folder sources;
- core:
common.h -- macros and definitions used throughout the entire project;
data.h -- structure and constants related with a record that should be stored;
generator.cm -- module that creates records based on different policies;
generator.config.h -- parameters to the configurator;
store.h -- public API that should be implemented by all storage engines;
crash.cm -- code used in debugging (i.e. printing trace messages) or aborting the process; (the name is somehow missleading;)
harness.cm -- application main function; executes both the insert and scan tests;
harness.config.h -- configuration for the benchmark execution;
progress.cm -- contains the code that displays the progressbar and outputs the trace information;
- store engines:
bdb.cm and bdb.config.h -- BerkeleyDB;
hypertable.cm and hypertable.config.h -- Hypertable;
monetdb.cm and monetdb.config.h -- MonetDB;
postgres.cm and postgres.config.h -- Postgres;
sqlite3.cm and sqlite3.config.h -- Sqlite3;
file.cm and file.config.h -- the records are written into an append-only fashion to a file; provides only insert operations;
sink.cm -- null storage engine; provides only insert operations;
Coding
Generic
- type identifiers (by examples):
uint32_t stands for unsigned int;
uint32_tp stands for unsigned int *;
char_tcppc stands for char const * * const;
data_tpc stands for struct data * const;
- the rules:
..._t should be read as the type ...;
...c... should be read as ... const ...;
...p... should be read as ... * ...;
- type definition macros (by examples):
struct_def(identifier) { ... } should be read as struct name { ... } , but also creates the type identifiers seen before;
the same for union_def and enum_def;
type_def(identifier, type) should be read as typedef type identifier, but also creates the type identifiers;
Storage API
- sensor reading record:
struct_def (data) { uint32_t client; uint32_t sensor; uint32_t timestamp; uint32_t value; };
- some constants that have special meaning:
#define undefined_client_c 0xffffffff #define undefined_sensor_c 0xffffffff #define undefined_timestamp_c 0xffffffff #define undefined_value_c 0xffffffff
- obtaining information about the storage engine:
the identifier argument is used as an output argument, that is a pointer to a string identifying the storage engine;
can_insert is an output argument that tels if the storage engine can execute inserts;
can_scan the same as above, but for scans;
extern error_t store_identify (char_tcppc identifier, bool_tpc can_insert, bool_tpc can_scan);
- initializing the storage engine:
- it is called just before any operation, and before any operation initializations;
extern error_t store_initialize (void);
- destroying the storage engine:
- it is called just after all operations, and after all other operation deinitializations;
extern error_t store_finalize (void);
- preparing the storage engine for inserts:
it is called after store_initialize and before any insert operations;
extern error_t store_initialize_insert (void);
- finishing the inserts:
extern error_t store_finalize_insert (void);
- executing the insert operation:
extern error_t store_execute_insert (data_tcpc data);
- preparing for scan operations:
- it is called before any scan session;
extern error_t store_initialize_scan (void);
- finishing scans:
extern error_t store_finalize_scan (void);
- starting a scan session:
it is called after store_initialize_scan;
the input parameters client and sensor denote the client and sensor for which the scan should be executed;
- the outputs records should be given in ascending order of the timestamp;
extern error_t store_begin_scan (uint32_tc client, uint32_t sensor);
- ending a scan session:
extern error_t store_end_scan (void);
- retrieving the next scaned record:
called after store_begin_scan;
- should return the next record that matches the session filter (client and sensor), in ascending order of the timestamp;
if there are no more records, the client, sensor, timestamp, and value members of the data structure parameter should be set to undefined_..._c constants described in the previous section;
extern error_t store_execute_scan (data_tpc data);
Storage makefile
by using the ...-build tag, you can adjust the CPPFLAGS and LDFLAGS for this particular engine;
you could also update the PATH (where other executables are found) and LD_LIBRARY_PATH (where other libraries are found);
BDB := /packages/bdb--4.7.25 BDB_BIN := $(BDB)/bin BDB_INCLUDE := $(BDB)/include BDB_LIB := $(BDB)/lib BDB_LIBS := -ldb bdb-build : CPPFLAGS := $(CPPFLAGS) -I$(BDB_INCLUDE) bdb-build : LDFLAGS := $(LDFLAGS) $(BDB_LIBS) -L$(BDB_LIB) PATH := $(BDB_BIN):$(PATH) LD_LIBRARY_PATH := $(BDB_LIB):$(LD_LIBRARY_PATH)