Characterization of traffic of the video streaming service based on lexical analyzers

The video streaming service, besides offering interactivity by users’ actions through pauses, backwards, and forwards, is the highest generator of traffic on data networks. Hence, it is necessary to characterize this behavior for network dimensioning. This article presents the conceptual models of the service and a lexical analyzer whose function is the automatic extraction of the different frames of traffic streams and identification of interactivity processes. Through the lexical analyzer, the information from each of the components of the service such as the I, P, and B frames is extracted; and so is the audio component. This means that the lexical analyzer provides granularity to the characterization process. Thus, it is expected that the lexical tool allows studying the behavior of traffic based on distribution functions obtained in situ, not relying on assumptions or studies from other researchers.


INTRODUCTION
Modern communication networks must offer voice, data, images, video, audio, text, control, IPTV (Internet Protocol Television), and streaming services; each one of them with a different Quality of Service (QoS) criteria; consequently, with different requirements for the network, being the video on demand service supported by video streaming the one that demands the highest bandwidth resources [1].
The characteristics of data networks and their services invalidate the traditional teletraffic theory results that were based on non-correlated models; in addition, the analytic modeling of the traffic of packets is too complex to be modeled through techniques developed for the telephone network [2], [3].There are works related to the characterization of traffic, like those described in [4], [5], and [6]; however, the process is not shown nor it is mentioned the method of filtering and exporting of the components from the traffic stream within the characterization process.
Characterization involves the generation processes of the conceptual model of services, the capture of their traffic, the identification of audio and video frames that comprise them, and the identification of Probability Density Functions (PDF) that describe them.On the other hand, video streaming standards encode video streams called Group of Pictures (GOP) through three types of data; these are intra frames, predictable posterior frames, and bi-directional predictable frames, which are referred to as type-I, P, and B frames.In addition to these frames, audio information appears as well.This is why the characterization process consumes long time periods in the research on traffic models [7].Specifically, the capture, identification, filtering, and exportation activities of each one of the frames that make up the video streaming services imply a thorough analysis; even considering that tools that automatize these processes have not been found in the literature [7].
The contributions of this research are the lexical analyzers as a tool for the automation of the identification of the GOP and the audio of encoded videos through IPTV standards, and a proprietary standard based on the Real-Time Messaging Protocol (RTMP).This contribution allows researchers to perform traffic analysis in detail neither relying on payment tools, supposed PDF for services, nor traces of traffic of test videos or videos uploaded on the internet by other research centers.To achieve this objective, it was necessary to construct the conceptual model of the services and identify patterns in the traffic stream.This research used a methodology based on the Deterministic Finite Automata (DFA) concepts, taking into account that it is one of the many methods of performing lexical analysis.Thus, the initial state of a DFA will be the services selected; the following state is the conceptual model that describes the functioning of the services logically, and the final state contains the mathematical model.This article is structured as follows: section 2 presents the video on demand services with their conceptual models; section 3 analyzes the traffic of the services and describes the lexical analyzers; section 4 presents the results.Lastly, section 5 shows the conclusions and future works.

VIDEO ON DEMAND SERVICES
For the present study, the services are organized into two groups according to the transmission protocol that supports the on demand videos.For the deployment of videos from group one and their subsequent capture of traffic under video streaming technology, the Flash Media Server (FMS) was used.It offers video on demand services with access to authorized users through a persistent connection using the RTMP protocol [8].The FMS is a licensed server that includes the coding functionality, which is an internal and transparent process for the provision of the service where the videos are supplied in FLV format.Figure 1 presents the conceptual model that allows to observe the different interactions that the service offers to the group one users.
Each circle represents the state offered by the service and each arrow represents a transition among the different states.A validation state that permits entering the course is observed; within this, different themes can be shown, conformed by units, which -in turn-are comprised by videos or chapters.Upon validating the client, one may exit from any state, excepting the pause and playback states.Also, excluding the playback state, one can return to a previous state from any other state.For the deployment of videos from group two and their subsequent capture of their traffic, the Live555 server was used because it reliably supports the standard protocols for voice and audio transmission through the internet such as RTP/RTCP/RTSP (realtime transport protocol/control protocol/real time streaming protocol).Live555 supports the IPTV hardware client and the videos encoded under IPTV standards (MPEG-2 and MPEG-4).The server and the client communicate through the RTSP protocol.
Figure 2 presents the conceptual model of the group two services under the Live555 server which is not able to make encoding processes, so it is necessary to make this process external to the server.No validation state exists, allowing any user with the service URL to access the group two services.For the consumption of the service provided by Live555, a STB-Amino A140 brand was used as a client, which supports the IPTV MPEG-2 standard that was connected to a TV set.The files to be consumed must be in MPEG-TS format because this is the audio and video container used in the IPTV service.
The encoding process used the ffmpeg tool whose script is shown in Figure 3.
In Figure 3, -i indicates the original video; Video1.mp4 is the name of the original video, which is in an mp4 container file format; -vcodec forces the use of the type of mpeg2video codec; -r 30 establishes photograms per second, it will snap 30 images per second; -s 720x576 delivers the desired resolution in width x height; -aspect details that the output will be in a panoramic format; -b:v indicates a constant bit rate of 5.7 Mbits/s; options -maxrate and -minrate control the tolerance of the maximum and minimum bit rate in bits/s; -bf fixes the maximum number of contiguous B-type frames; bufsize fixes the buffer size, which is necessary when using maxrate and -minrate parameters; -acodec fixes the audio codec to libfacc, which is an advanced audio encoding; -ac fixes the number of audio channels; -ab fixes the audio bit rate in kb/s; -ar fixes the sampling velocity in Hz; and Video_1.ts is the name of the output file and its container.The switch is configured in span mode so that all the traffic (from group one and two) is sent to the port where the wireshark network protocol analyzer is installed.In order to support the group one videos, the client computer must have a browser and a flash client installed.In the meantime, the group two services are visualized through a TV set.The experiments in each group are made separately.

TRAFFIC ANALYSIS OF SELECTED SERVICES
To obtain the mathematical model of the traffic, it is necessary the analysis of its audio and video components regarding the times of arrival, size of frames, and types of frames, besides the identification of the PDF that describes the behavior of each one of the components and the identification of the different states of the services.The PDF that characterize the services correspond to the mathematical model and contain the necessary input parameters to carry out the simulation of the model.The traffic captured by Wireshark is represented by series through time shown in Figures 6 and  7.This information, tabulated in this format, is not useful when trying to obtain a behavior based on variability described through probabilistic behavior of the data.To achieve this objective, it is necessary to export the information from Wireshark as a .txtfile, which will be the input to the lexical analyzers.

Lexical analysis
Traffic generated by the different video on demand services is in the order of megabytes, which requires a statistical analysis that allows to identify the functions that model the behavior of such traffic.This analysis requires the identification and subsequent filtering of the different streaming components.
Given that tools that automate this process have not been found in the literature, it was decided to recur to techniques of pattern recognition through lexical analysis as the contribution to research.
The different frames that make up a video are intertwiningly sent to the network, which greatly hinders data extraction.For the FMS server that supports videos from group one, through its RTMP protocol, a "chunk stream" [9] is deployed in the protocol analyzer with the tag "aggregate", which means that each package may contain audio and video information.The RTMP protocol does not generate frames with the GOP denomination.Besides, Wireshark does not assign any denomination due to its proprietary protocol nature.However, when doing the analysis of captured information, two types of frames clearly differentiated by its size are observed, which agrees with what is specified in [10].So, the bigger set of frames will be called type I frames, and the smaller set of frames will be called type P, to be in accordance with the MPEG standards.Thus, an FMS server streaming will be comprised by three types of frames: I, P, and audio.
For the Live555 server that supports videos from group two with its MPEG-2 protocol, the audio and video binary streams are identified through the object identifier (OID) (PID = 0x100 and PID = 0x101 for video and audio respectively [11]).Each video is compressed independently forming an "elementary stream" (ES), which -in turn-is structured in form of packets called Packetized Elementary Stream (PES).Each PES contains audio and video information; the video stream is comprised of frames type I, P, or B [11].Hence, a streaming from the Live555 server will be comprised of four types of frames: I, P, B, and audio intertwiningly encapsulated, making a detailed analysis necessary to find the GOP of each one of the videos.
Given that this process will always be necessary when it is desired to model traffic from its components, we recurred to concepts from the first phase of the compilers.This phase corresponds to that of a lexical analyzer, whose purpose is recognition of tokens or patterns, which for this research correspond to each one of the frames of a stream (I, P, B, and audio frames).Such analyzers were developed through the AWK programming language designed to process data based on text whether they are files or data streams, and Flex which is a generator of lexical analyzers that takes as input a specification of pattern recognition based on the definition of regular expressions (RE) and returns as output the source code implementing the lexical analyzer in C language [12].
The lexical analyzer uses as input a file exported from the protocol analyzer and processes it by using RE, confirming the concordance with the different types of frames and necessary parameters; that is to say, the different tokens it must identify, which are generically: "Frame", "video Tag" "b-frame", "p-frame", "i-frame", "video Tag", "Data size", "Timestamp", and "audio".Finally, the lexical analyzer delivers the results as a .txtfile.

Lexical analysis of group one videos
The logical process for the lexical analysis of the services from group one are shown in Figure 8.  Operationally, Figure 10 shows the series of commands for the lexical analysis.In the first line, the "salida.c"file corresponds to the lex.yy.c file, which that name was assigned and contains the code of the lexical analyzer generated by Flex from the logic that Alexico.lexcontains.The second line shows the compilation of the "salida.c"file, generating an executable file named by default as "a.out".The third line executes the lexical analyzer "a.out" whose input is video 6 and its output is stored in a file called "result.txt".
Figure 10.Execution of the lexical analyzer with Flex.
The filter used in Wireshark is rtmpt.tag.type==9, which is seen in Figure 11.a,where it is additionally observed the "aggregate" type information and two video frames intertwined with audio frames.Figure 11.b shows the content of the "result.txt"file corresponding to its size (data size) and time of arrival (timestamp) obtained through the lexical analyzer.Automation of these frame type recognition processes, their size, and times of arrival allows to accelerate the characterization process.

Lexical analysis for group two
For the Live555 service, the lexical analyzer constructed by AWK is executed thought the $ awk -f inicio.txtentrada.txt>salida.txtscript where the file denominated "start.txt"has the logic of processing, based on RE, to treat the file called "input.txt" that contains the data exported from the Wireshark protocol analyzer, and generates an output file called "salida.txt".Figure 12.a demonstrates the input file exported, showing two type-B frames (b-frames).The first b-frame is intertwined with a type-I, which is why the Wireshark will not recognize it in the information field [13].While the second b-frame is directly recognized in the information field of the protocol analyzer.Figure 12.b shows the "salida.txt"file, which contains the result of the information processing, identifying the type-B frames, their timestamp, their subtraction, and their size.According to the previous example, it is noted how through the lexical analyzers that identify patterns, it is possible to extract information from each component regardless of whether it is directly recognized or not by the analyzer or whether it is intertwined or not.
Once the data of the traffic curves is obtained through the lexical analyzers (see figure 6 and 7), the Kolmogorov-Smirnov (K-S) goodness of fit test was used [13], which is non-parametric, and it is defined through equation 1.
Where x ∈ ℜ Fn(x) and F 0 are the empirical and theoretical distributions of the random variable X, respectively.The Dn or K-S estimator, besides determining whether the hypothesis can be accepted or rejected, is also used as a selection criterion of a PDF over another.The PDF parameters from data analyzed are calculated by using maximum likelihood estimators (MLE) [14].
According to what has been explained up to this section, the lexical analyzers allow the automation of the identification of the GOP and audio from the traffic information exported from Wireshark for encoded videos through IPTV standards and the RTMP protocol.Whereas in the literature reviewed, specifically in the works that consider the components of a video streaming service, information about how to make this process is not supplied.

RESULTS
This section presents the PDF of each one of the videos and the states that offer interactivity to the users.There are no limitations regarding the interactivity processes associated to the users.The lexical analyzer can recognize the patterns generated by the exchange between the different protocols; i.e. a pause action is characterized by a single exchange of signaling messages between the client and the server, which is recognized as a pattern or token, similar to the fast-forward or backward actions.Thus, traffic analyzers based on lexical analyzers are capable of extracting information from the traces of traffic without imposing restrictions to the user behavior.

Characterization of the traffic of services from group one
Each state in Figure 1, from the group one service, generates different traffic streams.However, according to the traces of traffic gathered through the protocol analyzer, we obtained that the states Start, Client, Validation, Course, Subject, Unit, Videos/Episodes, and End of playback generate on average, one corresponding traffic to three packets from the client to the server with a size of 608 bytes and three packets from the server to the client with a size of 574 bytes.The time of exchange of the six packets is around 0.2 seconds.Meanwhile, the time between packets from the Test state depends on the speed that the user takes when answering each one of the questions.The traffic generated in each response action is in the order of the 500 Bytes.Using the concept of parsimony, it is determined that for these states a statistical analysis is not needed [7].

Playback state
The highest traffic stream generated by the service is presented during the video playback process, that is, when the FMS server sends the information to the flash client (Figure 1).For this playback process, traces of intertwined audio and video traffic are generated.To characterize the behavior of each frame through a PDF, it is necessary to determine the statistical behavior of each one of its components.These are the time between frames and their size (the time between frames is obtained by comparing the timestamp of each frame with the previous one).Therefore, we must subtract from the head of each RTMP frame, the information corresponding to these two components.This task was carried out through the lexical analyzer with the Flex tool.The videos supported through the FMS server are standard quality, whose encoding characteristics common to all are: codec: on2 VP6, FLV (Flash Video) container format, 720 x 480 resolution, 29.97 Frames Per Second (FPS), aspect resolution: 4:3, bit rate 96 Kbps.
Table 1 presents the PDF and the K-S estimator, which describe the behavior for each frame from the eight videos analyzed in group one.The mean is represented by the µ symbol, σ is standard deviation, and K-S is the Kolmogorov Smirnov estimator.The time between frames yielded constant and equal values for all the videos as follows: type-I frames = 1.969 s, type-P frames = 0.033 s, and the time between the audio frames = 0.026 s.
Figure 13 displays the histograms from video 6 corresponding to the probability density for the sizes of type-I and type-P frames; given that these present a variable behavior.Unlike the times between frames, which are constant (see Table 1).
In addition, the PDF that validates the hypothesis for the K-S goodness of fit test in each of the cases is noted through a continuous line.
Thus, the videos from group one are completely characterized.The highest traffic of information occurs in the playback state in the streaming transmission section where hundreds of data are generated so that it is possible to characterize them through a PDF.

Characterization of the traffic of services from group two
The videos supported by the Live555 server are high quality.They are encoded under IPTV standards and their characteristics are: mpeg2 codec, transport stream (TS) container format, 1920 x 1080 resolution, 29.97 Frames Per Second (FPS), aspect resolution: 16:9, bit rate: 5.2, 4.3, and 2.8 Mbps for videos 9, 10, and 11, respectively.
The process for these videos is similar to the one described for group one; in this case, the MPEG-2 frames must be taken into account.The frame identification is conducted for four types (I, P, B, and audio) and the lexical analysis is performed through the lexical analyzer with the AWK programming language.Table 2 presents the PDF that models the behavior of each one of the components from the three videos.T is the time between frames, and S is the frame size.For video 3, the number of type-I frames is only two.This is attributed to the scarce mobility exhibited by that video; consequently, it does not apply (N/A) finding a PDF.
Characterization of the traffic of pause, fastforward, backward, and requests Interactivity functionalities like pause, forward, backward, and requests are characterized altogether for both service groups of on demand video, which correspond to the users' behavior.These functionalities imply the exchange of information between the client and the video server.This information generates behavior patterns programmed by the lexical analyzer so that it is possible to recognize when one of these actions takes place.Figure 14 illustrates the behavior of the pauses for the RTMP protocol.
To model such behavior, the experiments carried out in [15], which consider 30-minute intervals in which a certain amount of videos are played, are taken into account.The analyses show that pauses take place only in 4.3% of the playbacks.For the duration of the pauses, a Weibull PDF describes their behavior.The Weibull parameters are: shape (α) = 0.28039 and scale (β) = 0.49418.The PDF that describes the position of the first pause from the start of the session is a Weibull with α = 0.11959 and β = 0.60361 parameters (see equation 2).
For the forward and backward cases in a playback, Zipf's Law describes the users' behavior; the probability of a forward taking place is of 1.5%, while it is of 4.8% for a backward.
Starting from the experiments performed in [15], the PDF that describes the length of the forward,  backward, and the time between playbacks is a Weibull with parameters for each characteristic α = 0.16827 and β = 0.45321, α = 0.09058 and β = 0.47459, and α = 0.16687 and β = 0.51107 respectively.By observing the number of videos, it was estimated that the number of reproductions in 30 minutes can be modeled with a Zipf distribution with a θ 1 = 1.77 parameter.
Thus, the behavior of the different services is characterized by identifying the statistical distribution functions.This characterization corresponds to the mathematical model which is the traffic input parameters necessary to carry out simulation models.

CONCLUSIONS AND FUTURE WORK
The conceptual models of a service are important for they allow to identify each one of the states, the interactivity processes, and the granular identification of the traffic streams in favor of a better analysis, permitting to focus the study on those states that require greater network resources.
The implementation of lexical analyzers is an important contribution that permits the automatization of one of the most time-consuming processes within the characterization of traffic of video; even more, if we keep in mind that such streams travel in an intertwined or "aggregate" way.This contribution may be extended to traffic of different networks under different encodings.Furthermore, as future work, these analyzers may be outfitted with a graphic interface that makes this process more usable.
A complete characterization of the supplied videos by the RTSP standard protocol shows that the different frames of the videos transmitted through the standard protocols do not have a single PDF in addition to travelling intertwiningly which hinders the manual filtering of those components.
The results show that the different frames of the videos transmitted through the RTMP protocol have a uniform behavior regardless of the video.Thus, the size of the type-I and type-P frames were characterized with a single PDF (with its specific parameters for each frame).While the times between the different frames and the size of the audio packets is constant.
According to the literature, there are only two types of frames under the RTMP protocol, being size what determines the type of frame.The data reported in Table 1 corroborate this information as it is observed that there are two sizes of basic frames clearly differentiated for each of the videos analyzed.The highest values correspond to the I frame since they are necessary to supply the heading information of the superior level protocols, and the lesser size ones must be the P frame since it is necessary to have two types of frames for the B frame to be formed.
The article presents the script capable of encoding videos under the MPEG-2 standard through the ffmpeg free tool, which offers researchers a useful tool to encode videos through IPTV standards, without the need to recur to specialized hardware devices.
The PDF and values obtained from this analysis will be employed in future works on the traffic model in the simulated network.

Figure 1 .
Figure 1.Conceptual model of group one services.

Figure 4
Figure 4 shows the laboratory model for the capture of traffic generated by the interactive services of group one and two.

Figure 2 .
Figure 2. Conceptual model of group two services.

Figure 4 .
Figure 4. Laboratory infrastructure for group one and two services.

Figure 5
Figure 5 presents the complete functional diagram of this research.The whole process is observed; from the client initiating the request of a video, to the obtainment of the PDF, showing the input and output of the lexical analyzer.

Figure 6 .
Figure 6.Traffic of group one services.

Figure 7 .
Figure 7. Traffic of group two services.

Figure 8 .
Figure 8. Logical process for the lexical analysis of the services from group one.

Figure 11 .
Figure 11.Lexical analyzer for extraction of the timestamp and data size

Figure 12 .
Figure 12.Extraction of b-frames from the Live555 service.

Table 1 .
PDF for the video components from group one under the FMS server.
Figure 13.Histograms and PDF estimate.

Table 2 .
PDF for video components from group two under the Live555 server.