Research
[Data Stream mining] [Recommender Systems] [Cloud computing] [Peer To Peer]
Keywords : Load Balancing, cloud, web services, etc.
Abstract :
CLOUDIZER is a software load-balancing framework for web services. It aims to balance efficiently and transparently the load of REST-based web services. The goal is to allow transparent deployment of legacy applications on scalable infrastructures such as Infrastructure as a service clouds.
As shown in the figure, the CLOUDIZER controller deployed on the master node, which filters the requests, achieves load distribution. When the master receives a request, it forwards this request to the target application according to a load balancing policy configured by the user. Since the goal of this framework is to achieve transparent load balancing, application code and data files are replicated among the nodes through a distributed file system.
Slave Nodes register the deployed applications to the master. These nodes are monitored by sending a heartbeat to the master node. Using CLOUDIZER framework, any load-balancing policy can be applied for the target application, depending on the user's needs to get advantage from it.
Different load-balancing policies can be applied depending on the users needs, and each application can be configured to use a different load balancing policy.
A Java Application Programming Interface is used for implementing new policies and the user just needs to provide the policy name in a configuration file to use the policy.
https://forge.isep.fr/projects/isep-cloud/;
Keywords : Load Balancing, cloud, web services, etc.
Abstract :
The goal of SIMIZER is to provide a way to evaluate and measure the behavior of cache memory optimization, different load balancing, and/or consistency policies regarding different measure such as response time, latency, throughput, locking, and complexity. The goal is to provide a simple way to implement and test algorithms, without the caveats of real world experimentation on large platforms.
The motivation behind this work is the need of a simple and reliable testing framework for large (> 100 machines) distributed systems load balancing and consistency / data synchronization protocols. Existing frameworks such as SimGrid, GridSim and CloudSim do not provide ways for simulating data consistency issues in large-scale systems. Regarding load balancing, these simulators mainly aimed to provide information regarding the platform-side behavior (load on machines, power consumption, CPU utilization, throughput). However we consider important to obtain results from a user / client perspective regarding metrics such as perceived latency and algorithms practical performance on realistic workloads.
SIMIZER is a trace-based discrete event simulator. To generate the traces, the user can specify different random laws to define the parameters distributions, bandwidth allocation, requests frequencies, and number of available machines. This document describes in a first part how to use the SIMIZER program and interpret its output.
https://forge.isep.fr/projects/simizer/;
Keywords : Distributed Data Streams, adaptative sampling, sensors, interpolation, load curves
Abstract :
Remote sensing enables data collection from multiple sources in a unique
central server for a large wide of applications which are typically monitoring and
system supervision, transmitting alarms in real time and more generally producing
synthesis to help in business decision. The volume of data collected is generally too
big to be stored entirely. Data Stream Management Systems are generic tools for
processing data streams : data elements arrive on-line and stay only for a limited
time period in memory. For many stream processing applications, one may not wish
to completely lose the entire stream data for analyzing 'past' and 'present'data. Thus
a stream processing system must also provide a stored historical data.
In this thesis, we consider a distributed computing environment, describing a
collection of multiple remote sensors that feed a unique central server with numeric
and uni-dimensional data streams (also called curves). The central server has a limited
memory but should be able to compute aggregated value of any subset of the
stream sources from a large time horizon including old and new data streams. Two
approaches are studied to reduce the size of data : (1) spatial sampling only consider
a random sample of the sources observed at every instant ; (2) temporal sampling
consider all sources but samples the instants to be stored. In this thesis, we propose
a new approach for summarizing temporally a set of distributed data streams : From
the observation of what is happening during a period t - 1, we determine a data
collection model to apply to the sensors for period t.
The computation of aggregates involves statistical inference in the case of spatial
sampling and interpolation in the case of temporal sampling. To the best of our
knowledge, there is no method for estimating interpolation errors at each timestamp
that would take into account some curve features such as the knowledge of the
integral of the curve during the period. We propose two approaches : one uses the
past of the data curve (naive approach) and the other uses a stochastic process for
interpolation (stochastic approach).
Keywords : Open source recommender systems, collaborative filtering, Mahout, Web usage mining
Abstract :
The project SOLIMOBILE is funded by ProximaMobile and aims to design, build and implement
a package of innovative services focused on the person in a precarious situation.
The group of partners includes Les Cités du Secours Catholiques who are in charge of identifying
the requirements, testing and using the developed services, ISEP which provides expertise and
Research on personalization and adaptation to user profile and participates in developing services,
iEurop and Streamezzo who bring their technologies, their capacity for hosting and development,
and Atos Origin Integration coordinating and supporting all the development effort.
ISEP, through its expertise in information retrieval and data mining proposes to explore and evaluate on real data (SOLIMOBILE databases) algorithms for:
Keywords : semantic P2P, Bloom Filter
This work was completed under the project RARE.
Abstract : Peer-to-peer systems (P2P) have become popular with sharing files on Internet. Much research has emerged concerning the optimization of data localization, and constitutes a very active research area. Taking into account semantic aspects improves considerably data localization. We based our study on the PlanetP approach, which uses the notion of Bloom Filter consisting on propagating peers' semantic signatures (Bloom Filters) through the network. We extend this approach with some improvements such as the creation of dynamic Bloom Filters, their size depends on the load of peers (number of shared documents).