ICDCIT 2016: List of Accepted Papers

Full Papers:

Short Papers

Poster Papers

Abstract of Accepted Papers

77

Language Identification and Disambiguation in Indian Mixed-Scripts

The algorithm that has been proposed in this paper tries to segregate words from various languages (namely Hindi, English, Bengali and Gujarati) and provide relevant replacements for the misspelled or unknown words in a given query. Thus, generating a relevant query in which the original language of each word is known. First, the words are matched directly with the dictionaries of each language transliterated into English. And then, for those that do not match, we shortlist a set of probable words from all the dictionaries taking words that are closest to the given spelling. This is done using the Levenshtein algorithm. After this, to achieve a higher level of generalization, we use a list of probabilities of doublets and triplets of words occurring together that are computed from a training database. The probabilities computed further determine the relevance of those words in the given text allowing us to pick the most relevant match. The technique was trained on a dataset of 5000 multilingual queries and the results obtained after testing it on 300 random queries show better accuracy than most conventional algorithms.

134

A type system for counting logs of multi-threaded nested transactional programs

We present a type system to estimate an upper bound for the resource consumption of nested and multi-threaded transactional programs. The resource is abstracted as transaction logs. In comparison to our previous work on type and effect systems for Transactional Featherweight Java, this work exploits the natural composition of thread creation to give types to sub-terms. As a result, our new type system is much simpler and more effective than our previous one. More important, it is more precise than our previous type system. We also show a sketch proof for the correctness of the estimation and a type inference algorithm that we have implemented in a prototype tool.

162

Minimal Start Time Heuristics for Scheduling Workflows in Heterogeneous Processing Systems

Heterogeneous processing systems comprising of networked processors with varied capabilities require efficient task-to-processor assignment to attain high performance. Scheduling workflow tasks on heterogeneous environments is shown to be an NP-Complete problem. Several heuristics were developed to attain minimum schedule lengths. However, these algorithms employ level wise approach of scheduling tasks. This indirectly assigns higher priority to the tasks at lower levels than those at higher levels. Further, the start time of tasks at higher level is constrained by the completion times of tasks at lower levels. In the present work, a new scheduling strategy namely Minimal Start Time (MST) for globally scheduling workflow tasks is proposed. The proposed approach focuses on minimizing the start times of tasks which are dependent on the tasks at lower levels to generate shorter span schedules. The primary merit of this scheme is due to the elimination of level constraints whenever there are no dependency constraints. The proposed algorithm is compared with the related algorithms in terms of performance measures viz., normalized makespan, speedup, efficiency. The MST algorithm showed improvement in the performance by 5-20% in almost 80% of the cases in comparison to the earlier work.

38

A Wait-Free Stack

In this paper, we describe a novel algorithm to create a concurrent wait-free stack. To the best of our knowledge, this is the first wait-free algorithm for a general purpose stack. In the past, researchers have proposed restricted wait-free implementations of stacks, lock-free implementations, and efficient universal constructions that can support wait-free stacks. We achieve wait-freedom in our algorithm by minimizing the contention between push and pop operations. The pop operation is a fast operation; it logically marks a node as deleted, but does not actually delete the node. To keep the size of the stack in check, we do not allow the size of the stack to increase by a factor of W as compared to the actual size by periodically invoking a cleanup operation. This operation removes a series of W contiguous entries from the stack.

44

An Efficient Task Consolidation Algorithm for Cloud Computing Systems

With the increasing demand of cloud computing, energy consumption has drawn enormous attention in business and research community. This is due to the amount of carbon footprints generated from the information and communication technology resources such as server, network and storage. Therefore, the first and foremost goal is to minimize the energy consumption without compromising the customer demands or tasks. On the other hand, task consolidation is a process to minimize the total number of resource usage by improving the utilization of the active resources. Recent studies reported that the tasks are assigned to the virtual machines (VMs) based on their utilization value on VMs without any major concern on the processing time of the tasks. However, task processing time is also equal important criteria. In this paper, we propose a multi-criteria based task consolidation algorithm that assigns the tasks to VMs by considering both processing time of the tasks and the utilization of VMs. We perform rigorous simulations on the proposed algorithm using some randomly generated datasets and compare the results with two recent energy-conscious task consolidation algorithms, namely random and MaxUtil. The proposed algorithm improves about 10% of energy consumption than the random algorithm and about 5% than the MaxUtil algorithm.

50

Storage Load Control Through Meta-Scheduler Using Predictive Analytics

The gap between computing capability of servers and storage systems is ever increasing. Genesis of I/O intensive applications capable of generating Gigabytes to Exabytes of data has lead to saturation of I/O performance on the storage system. This paper provides an insight on the load controlling capability on the storage system through learning algorithms in a Grid Computing environment. Storage load control driven by meta schedulers and the effects of load control on the popular scheduling schemes of a meta-scheduler are presented here. Random Forest regression is used to predict the current response state of the storage system and AutoRegression is used to forecast the future response behavior. Based on the forecast, time-sharing of I/O intensive jobs is used to take proactive decision and prevent overloading of individual volumes on the storage system. Time-sharing between multiple synthetic and industry specific I/O intensive jobs have shown to have superior total completion time and total flow time compared to traditional approaches like FCFS and Backfilling. Proposed scheme prevented any down time when implemented with a live NetApp storage system.

117

Collaborative Access Control Mechanism for Online Social Networks

Online Social Networks (OSNs) over the time have facilitated the users with the creation and maintenance of interpersonal relationships. These OSNs offer attractive means for digital social interactions and information sharing, but also raise a number of security and privacy issues. While OSNs allow users to restrict access to personal data, they currently do not provide any mechanism to enforce privacy concerns over data associated with multiple users. Unfortunately, the exposure and availability of personal data of users expose them to numerous privacy risks. The proposed approach in this paper provides a mechanism which allows users to control access of their shared resources in a collaborative manner. A tool ``msecure" has been developed as an application on Facebook, a popular social network to ensure protection of shared images of users. We have presented a survey based user study of our tool ``msecure" with a user base of (n=50). The results of the survey prove that although users enjoy the facilities provided by the social network mentioned above, they are still concerned about the privacy of their shared contents and they believe that a tool like ``msecure" could be useful for managing their shared images and other shared contents.

73

K-means and Wordnet based Feature Selection Combine with Extreme Learning Machines for Text Classification

The incredible increase of online documents in digital form on the Web, has renewed the interest in text classification. The aim of text classification is to classify text documents into a set of pre-defined categories. But the poor quality of features selection, extremely high dimensional feature space and complexity of natural languages become the roadblock for this classification process. To address these issues, here we propose a k-means clustering based feature selection for text classification. Bi-Normal Separation(BNS) combine with Wordnet and cosine-similarity helps to form a quality and reduce feature vector to train the Extreme Learning Machine(ELM) and Multi-Layer Extreme Learning Machine(ML-ELM) classifiers. For experimental purpose, DMOZ and 20-Newsgroups datasets have been used. The empirical results on these two benchmark datasets demonstrate the applicability, efficiency and effectiveness of our approach using ELM and ML-ELM as the classifiers over state-of-the-art classifiers.

115

HGASA: AN EFFICIENT HYBRID TECHNIQUE FOR OPTIMIZING DATA ACCESS IN DYNAMIC DATA GRID

Grid computing uses computers that are distributed across various geographical locations in order to provide enormous computing power and massive storage. Scientific applications produce large quantity of sharable data which requires efficient handling and management. Replica selection is one of the data management techniques in grid computing and is used for selecting data from large volumes of distributed data. Replica selection is an interesting data access problem in data grid. Genetic Algorithms (GA) and Simulated Annealing (SA) are two popularly used evolutionary algorithms which are different in nature. In this paper, a hybrid approach which combines Genetic Algorithm with Simulated Annealing, namely, HGASA, is proposed to solve replica selection problem in data grid. The proposed algorithm, HGASA, considers security, availability of file, load balance and response time to improve the performance of the grid. GridSim simulator is used for evaluating the performance of the proposed algorithm. The results show that the proposed algorithm, HGASA, outperforms Genetic Algorithms (GA) by 9% and Simulated Annealing (SA) by 21% and Ant Colony Optimization (ACO) by 50%.

158

Dynamic Data Replication across Geo-Distributed Cloud Data Centres

With the increase in network speed and bandwidth, cloud data centers are set up in multiple locations to reduce access latency to clients. Cloud computing is being used for data-intensive computing for enterprise and scientific applications that process large data sets originating from globally distributed data centers. In this work, we propose a system model for multiple data cen-ters cooperating to serve a client’s request for data and to identify data centers which can provide the fastest response time to a client. Further, dynamic data replication strategy across geo-distributed data centers based on popularity is detailed. Simulation results are presented and the performance evaluation shows that our method consistently maintains the replica count to an optimal value.

139

PROVISIONING MODEL FOR CLOUD FEDERATION

As cloud computing attracts more and more customers by reducing the capital and operation cost, the demand also increases. But there is a greater uncertainty and fluctuations in demand for resources in cloud resulting in scalability issues. Cloud federation is an emerging technique and the next evolutionary step in the field of cloud computing that has been targeted by many industries and researchers in order to tackle uncertainties in workload and to meet the scalability problem. There are many resource provisioning mechanisms proposed for federation in various literature works. We present a survey on them and propose a proactive resource provisioning model for federation which is based on sliding window based workload prediction. We compare the results of the proposed prediction mechanism with the commonly used time series prediction algorithm ARIMA. We developed a simulation environment for cloud federation to investigate the impact of workload prediction based resource provisioning in cloud federation. Finally we compare it with that of resource provisioning without prediction in a federated environment and evaluate the profit and resource utilization associated with both the cases.

147

Multiclass SVM Classification for Intrusion Detection

As the number of threats to the computer network and network based applications are increasing, there is a need for a powerful intrusion detection system that can actually fulfil the requirement of security against threats. To detect and countermeasure a specific attack, the pattern of the attack should be known. This type of security can be achieved by identifying the particular type of attack. The classification of attack activities ensures the efficient countermeasure for the attack. This paper focuses on the classification of attack using multiclass support vector machine (MSVM) approach. The SVM approach is extended to support the multiclass classification of attack with improved detection accuracy. The KDD corrected, NSLKDD, and Gure KDD dataset are used for training and validation of the MSVM model. The proposed method is compared with other existing works. It gives better detection accuracy, low false positive and less generalization error in compare to existing approach.

133

i-TSS: An Image Encryption Algorithm Based on Transposition – Shuffling and Substitution using Randomly Generated Bitmap image

In the digitalized era, an enormous amount of digital images are being shared over the different networks and also available in different storage mediums. Internet users enjoy this convenient way of sharing images and at the meantime, they need to face the consequences like chosen plain-text, statistical, differential attacks, and noises namely Gaussian noise, Poisson noise, Speckle noise, and Salt & Pepper noise. These attacks and noises create the need of enhancing the image information security. An image encryption algorithm needed is to be robust. An image encryption algorithm (i-TSS) based on transposition, shuffling, and substitution is presented in this paper, that provides better security to the image. This algorithm is implemented using Java. The key purpose of this algorithm is to reduce the encryption time. By assessing the results of image quality metrics, this algorithm proves to be secured and robust against almost all kinds of external attacks.

161

An Internet of Things based Software Framework to handle Medical Emergencies

A software framework is a reusable design that requires various software components to function, almost out of the box. To specify a framework, the creator must specify the different components that form the framework and how to instantiate them. Also, the communication interfaces between these various components must be defined. In this paper, we propose such a framework based on the Internet of Things (IoT) for developing applications for handling emergencies of some kind. We demonstrate the usage of our framework by explaining an application developed using it. The application is a system for tracking the status of autistic students in a school and also for alerting their parents/care takers in case of an emergency.

164

FC-LID: File classifier based Linear Indexing for Deduplication in Cloud Backup Services

Data deduplication techniques are optimal solutions for reducing both bandwidth and storage space requirements for cloud backup services in data centers. During deduplication process, maintaining an index in RAM is a fundamental operation. Very large index needs more storage space. It is hard to put such a large index totally in RAM and accessing large disk also decreases throughput. To overcome this problem, index system is developed based on File classifier based Linear Indexing Deduplication called FC-LID which utilizes Linear Hashing with Representative Group (LHRG). The proposed Linear Index structure reduces deduplication computational overhead and increases deduplication efficiency.

59

Longest Wire Length of Midimew-connected Mesh Network

Minimal DIstance MEsh with Wrap-around links (Midimew) connected Mesh Network (MMN) is a hierarchical interconnection network consists of several Basic Modules (BM), where the BM is a 2D-mesh network and the higher level network is a midimew network. In this paper, we present the architecture of MMN and evaluate the number of long wire, length of a long wire, and the total length for the long wire of MMN, TESH, and torus networks. It is shown that the proposed MMN possesses simple structure and moderate wire length. The long wire length of MMN is slightly higher than that of TESH network and far lower than that of 2-D torus network. Overall performance suggests that, MMN is a good choice for future generation massively parallel computers.

116

Energy Efficient SNR Based Clustering in Underwater Sensor Network with Data Encryption

In this decade, Under Water Sensor Network (UWSN) has become important in order to explore the underwater environment. The characteristics of UWSN such as limited energy, high propagation delay, low bandwidth, and error rate has made the design of clustering protocol challenging, due to the energy constrained sensor nodes. As the unpleasant environmental condition of the underwater, replacement of the battery is neither simple nor cheap. Therefore, energy saving is considered to be an important issue. In this paper, a new clustering protocol is proposed which is named as energy ecient SNR based clustering in UWSN with Data Encryption (EESCDE). The residual energy of the nodes is considered for the improvement of the network lifetime of the sensor network. Using the proposed scheme, the improvement in the residual energy is achieved by reducing the number of transmission of the cluster head as well as the sensor nodes. The sensor nodes are partitioned into clusters and the cluster heads (CH) are chosen depending on the SNR values. Symmetric encipherment is implemented using the hill cipher to achieve the security of the sensed data. This scheme has been implemented using NS3 and it is observed that the residual energy of the sensor nodes are improved by 10 percent as compared to the algorithm ESRPSDC.

160

Trust Based Target Coverage Protocol for Wireless Sensor Networks Using Fuzzy Logic

Wireless sensor network constitute a class of real time embedded systems having limited resources. Target coverage problem is concerned with the continuous monitoring of a set of targets such that the network lifetime is maximized. In this paper we address the target coverage problem while considering the energy constraints as well as the confidence level of target points. To improve the reliability of the network we have considered the trust values of the nodes. In the calculation of the trust value of the nodes we have used the reliability of the recommender nodes too. In this paper we propose an energy efficient node scheduling protocol in which the base station determines the status of the node either active/sleep using the fuzzy logic. The results show that the proposed scheme improves the network lifetime in terms of energy consumption and the reliability of the data communicated.

32

HiRE - A Heuristic Approach for User Generated Record Extraction

User Generated Content extraction is the extraction of user posts, viz., reviews and comments. Extraction of such content requires the identification of their record structure, so that after the content is extracted, proper filtering mechanisms can be applied to eliminate the noises. Hence, record structure identification is an important prerequisite step for text analytics. Most of the existing record structure identification techniques search for repeating patterns to find the records. In this paper, a heuristic based approach is proposed. This method uses the implicit logical organization present in the records and outputs the record structure.

42

Influential Degree Heuristic for Influence Maximization in Social Networks

Charu Aggarwal et al. proposed a novel algorithm called RankReplace algorithm in order to find influential nodes in a social network for flow authority model. Though the influence spread achieved is satisfactory, the algorithm is slow due to its initialization step which needs to be more efficient. Mustafa et al. proposed a greedy selection for initialization which speeds up the algorithm. We propose to improve this algorithm further by considering a novel heuristic called influential degree for selection of the initial set.

We carry out implementation of the RankReplace algorithm with different initializations: RRMD(MaxDegree), RRDD(DegreeDiscount), RRID(InfluentialDegree), RRIDD(InfluentialDegreeDiscount) on data sets of different sizes. The results show that the proposed RRID and its variations perform quite well on small, intermediate as well as large data sets quite efficiently reducing the time, retaining, and in a few cases, obtaining much better influence spread than the original RankReplace algorithm.

101

A Dynamic Priority Based Scheduling Scheme for Multimedia Streaming Over MANETs

Multimedia data transmission over Mobile Adhoc Networks (MANETS) is a challenging task due to various characteristics such as node mobility and does not have any central coordination of nodes. Recently there is an increased use of handheld devices for viewing multimedia applications. Delay and loss of packets need to be addressed in order to provide a good quality of video streaming over MANETs to mobile users. Existing works, standards and protocols (ex., 802.11 and 802.11e) improve the Quality of Service (QoS) for multimedia transmission. However, these approaches use acknowledgement of every successful transmission which increases delay. Also the priority of frames i.e. Intra coded, Predictive coded and Bidirectional Predictive coded are pre defined resulting into reduction of QoS as well as increase in the loss of packets. In this paper, we present a priority based mapping method (PBMM), which not only provides priority in the order of I (intra coded), P (predictive coded) and B (bidirectional predictive coded) frame packets, but also handles the expiry time of the packets as well as damaged acknowledgement of the packets/frames in order to reduce the loss of packets and delay. We validate our approach through simulations

109

Improved Bug Localization Technique using Hybrid Information Retrieval Model

The need of bug localization tools and increased popularity of text based information retrieval (IR) models to locate the source code files containing bugs is growing continuously. Time and cost required for fixing bugs can be considerably reduced by improving the techniques of reducing the search space from few thousand source code files to very few files. The main contribution of this paper is to propose a Hybrid technique based on two existing IR models (VSM and nGram) for bug localization.

In the proposed technique,performance of text based IR using similarity between bug reports, source code files and similar bug reports is improved further by using word based bigrams from n-Gram model.We have also introduced a new factor (beta) to calculate the weighted sum of unigram and bigram and analyzed its accuracy for values ranging from (0–1). Using MRR (Mean Reciprocal Rank) measures, we have conducted experiments which show that the proposed technique outperforms some existing state-of-art bug localization techniques.

56

A Distributed Approach based on Maximal Far-Flung Scalar Premier Selection for Camera Actuation

The article proposes a distributed approach inspired by maximum far-flung scalar premier selection for actuation of cameras. The entire monitored region is divided into a number of sub-compartments, each of which contains sets of scalars and cameras. Initially, a scalar premier is selected in each sub-compartment, which is chosen in such a way that its average mean distance is the minimum among all the scalars in its concerned sub-compartment. Secondly, the scalar which is present at farthest distance from the former premier is chosen as the next selected scalar premier for that particular sub-compartment. This manner of premier selection avoids the possible overlapping among the field of views of cameras. The scalar premiers communicate their corresponding cameras regarding the occurring event information and the cameras collaboratively decide which among them are to be actuated. Experimental results obtained from the investigation validate the significance of our proposed algorithm as compared to three other methods proposed in the literature.

58

AN EXTENSION TO UDDI FOR THE DISCOVERY OF USER DRIVEN WEB SERVICES

Service registries are used by web service providers to publish services and registries are used by requestors to find them in an SOA (Service Oriented Architecture). The following drawbacks are presented in the main existing service registry specifications, UDDI (Universal Description, Discovery and Integration). First, only abstract, unscalable and inefficient definition of the web services publications is present in all UBR (Universal Business Registry) nodes. Second, it matches only the business name and service name given in the WSDL document to collect service information. In order to overcome these difficulties, we have proposed an efficient and effective UDDI architecture called E-UDDI, which extends the UDDI design by incorporating a additional bag in the business entity data structure. Moreover, to enable service customer for easily finding more appropriate service information, an effective service matching mechanism is adopted in the E-UDDI so that the user can take the final decisions. Service discovery and publishing is improved considerably in the proposed system by means of an effective UDDI registry with flexible and more suitable service searching facility.