Serialization
Task assignment
In many cases, you will need to transfer data between processes running on the same machine or send the data via the network. You can use XML or JSON but these formats are inefficient. There are several more efficient alternatives (e.g. Protobuf and Avro) you can use instead.
The serialization frameworks define data structures with schemas in their own languages that can be compiled into classes in a variety of programming languages (Java, C++, Python, etc.).
Your goal is to implement a Java application which generates data and stores them to classes generated by protobuf and avro serialization frameworks and sends the converted data via TCP to a C/C++ application. This application processes the data (calculates averages) and sends results back.
You will be given a reference implementation using JSON data format. Use this data classes as a template how the schemas in protobuf and avro should look like. You will not be probably able to create exact counterparts due to some limitations of the serialization protocols; therefore, you will need to find out a workaround.
Steps:
-
Create a team of 2 students.
-
Download a template from git repository:
git clone https://gitlab.fel.cvut.cz/esw/serialization.git
-
There is a subfolder in
src/main/
for each language/schema (java
,cpp
,proto
,avro
, ...). -
To compile and run both counterparts follow the instructions in the project README.
-
First run the C/C++ server component and then the Java client component.
-
Define
protobuf
and/oravro
schemas as similar as possible to the provided JSON format (package cz.esw.serialization.json.*
). Write the schemas into preparedmeasurements.proto
andmeasurements.avsc
files. Hint: Use class names with prefixP
forprotobuf
classes (e.g.PDataset
) andA
foravro
classes (e.g.ADataset
). -
Implement the applications (both Java and C/C++) into the provided template following the specification described in the next section.
-
Observe performance differences between the data formats.
-
Upload the application into the upload system. Upload only files in the template repository and any newly added sources. Don't upload compiled binaries or generated sources. You can use the following command to generate the archive:
git archive --format=zip -o serialization.zip HEAD
Application Specification
Java
The configuration of the application is handled by Maven (pom.xml
)
which takes care of all libraries required and compilation of the
serialization schemas (you have to run mvn compile
to generate the
source codes of the data classes every time you change the serialization
schemas). To compile and run the application, follow the instructions
in README.
The java app has to accept following three arguments and optional fourth one:
app <host> <port> <format> [<numberOfTransmissions>]
The application has to accept generated data and convert it to the transfer format and send the data.
The arguments <host>
and <port>
are the address and port of the
receiver and <format>
is one of the following enumeration
{json, proto, avro}
defining the format for the data transfer over
TCP.
C/C++
C/C++ application will have to be compiled by Meson, including
generation of protbobuf/avro generated files. To do that update
src/main/cpp/meson.build
. For example, to call protoc
to generate
source code add something like this:
protoc = find_program('protoc', required : true)
gen = generator(protoc,
output : ['@BASENAME@.pb.cc', '@BASENAME@.pb.h'],
arguments : ['--proto_path=@CURRENT_SOURCE_DIR@', '--cpp_out=@BUILD_DIR@', '@INPUT@'])
generated = gen.process('measurements.proto')
srcs = ['dataset.cpp', 'main.cpp', 'measurementinfo.cpp', 'result.cpp', generated]
Links with description how to install the protocol compilers and use
them are provided below in corresponding sections, or use nix-shell
.
The C/C++ application has to listen on the defined port and receive data in the defined format, process the data (just calculate averages) and send the results back.
The app has to accept the two following arguments:
server <port> <format>
The argument <port>
is the port on which the receiver listens and
<format>
is one of the following enumeration {json, protobuf, avro}
defining the format of the data transferred over TCP.
Data format
json
- sends/receives the data as JSON textproto
- sends/receives the data as bytes of theprotobuf
generated classesavro
- sends/receives the data as bytes of theavro
generated clases
Message Size
The C++ implementations of Protobuf and Avro frameworks will not probably be able to recognize ends of messages, therefore the Java application has to send the message size before the message itself.
The receiving part should look similar to:
int messageSize = readAndDecodeMessageSize(stream) // your implementation
char *buffer = new char[messageSize];
stream.read(buffer, messageSize)
...
The size of Protobuf message is easy to get:
int messageSize = objectToBeSerialized.getSerializedSize();
sendMessageSize(messageSize, outputStream) // your implementation
...
The size of Avro message is not that straightforward to retrieve:
DatumWriter<ADataset> datumWriter = new
SpecificDatumWriter<ADataset>(ADataset.class)
ByteArrayOutputStream byteArrayOutputStream = new
ByteArrayOutputStream();
BinaryEncoder encoder =
EncoderFactory.get().binaryEncoder(byteArrayOutputStream , null);
datumWriter .write(objectToBeSerialized, encoder);
encoder.flush();
int messageSize = byteArrayOutputStream.size();
sendMessageSize(messageSize, outputStream) // your implementation
...
Protobuf
Avro
- Avro CPP Getting Started and Installation Guide
- Avro CPP Download
- Avro CPP documentation
- Avro Java Getting Started
- Avro Schema Specification
- Avro IDL Specification
You can use either the JSON-based Avro Schema or much less verbose Avro
IDL to define the messages. However, be aware that the ''avrogencpp''
tool accepts only Avro Schema. Therefore, you need to convert IDL to
Schema by avro-tools.jar
(you can download it on the same site as
other Avro parts), or, for example, IntelliJ IDEA Avro plugin can also
do the conversion.
Maven
Apache Maven is a project management tool enabling management of
library dependencies and building. Some IDEs have Maven integrated but
for easy use in command line you will have to download it and add
the bin
folder to the PATH
.
Points
If you submit a solution with both protocols implemented, you will get 5 points. If you submit only one protocol you will get 3 points.