0% found this document useful (0 votes)
36 views

Model Query Tokenization and Character Matching A

Uploaded by

Laradj CHELLAMA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Model Query Tokenization and Character Matching A

Uploaded by

Laradj CHELLAMA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/315302951

Model Query, Tokenization and Character Matching: A Combined Approach to


Prevent SQLIA

Article  in  International Journal of Computer Applications · March 2017


DOI: 10.5120/ijca2017913357

CITATION READS

1 168

3 authors, including:

Anil Kumar
Indian Institute of Information Technology Allahabad
14 PUBLICATIONS   26 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Posture recognition for safe driving View project

All content following this page was uploaded by Anil Kumar on 10 April 2017.

The user has requested enhancement of the downloaded file.


International Journal of Computer Applications (0975 – 8887)
Volume 162 – No 9, March 2017

Model Query, Tokenization and Character Matching: A


Combined Approach to Prevent SQLIA
Sudhakar Choudhary Arvind Kumar Jain Anil Kumar
Student Assistant Professor M.Tech
SISTech-E SISTech-E IIIT, Allahabad
Bhopal, MP, India Bhopal, MP, India UP, India

ABSTRACT [6] and ADAPTIVE METHOD [1], with some modification,


With the rise of internet, web application, such as online and try to remove the necessity of the source code
banking and web-based email, the web services as an instant modification as well as to minimize the runtime response
means of information dissemination and various other time.
transactions has essentially made them a key component of
today‟s Internet infrastructure. Web-based systems consist of
2. PRPPOSED METHODOLOGY
both infrastructure components and of application specific In this combined method, the authors maintain a database for
code. But there are many reports on intrusion from external storing valid query structure called as model query. In runtime
hacker which compromised the back end database system. validation it checks the dynamically generated query with the
SQL-Injection Attacks are a class of attacks that many of previously stored model queries and characters list to
these systems are highly vulnerable to. determine the possible SQLIA. Database of the valid query
structure and character list is made in static phase. The
Keywords authors are storing all the valid query structure by linked list
SQL Injection Attack, SQLIA Prevention, Tokenization, representation where each individual singly link list represents
Character List. a valid query structure and to store the starting address of all
these singly link list the authors use a doubly link list called
1. INTRODUCTION as main link list whose each node store the starting address of
Information is the most important business asset today and a singly link list. In second stage of static phase the authors
achieving an appropriate level of information security can be create a list of characters such as: single quote, double quote,
viewed as essential requirement. SQL Injection Attacks semi colon, double dash, slash and SQL Keywords. The
(SQLIAs) are one of the topmost threats for web application authors store character list in an array. So when the authors
security and SQL injections are one of the most serious found a new query is arrived to the database server, after the
vulnerability types. However, these are easy to detect and tokenization process, at first the authors start searching the
exploit; that is why SQLIAs are frequently employed by structure of the query in the linked representation and then
malicious users for different reasons, e.g. financial fraud, theft match the characters of inputted value with the characters
confidential data, deface website, sabotage, espionage, cyber stored in the character list. If it is a successful search,
terrorism, or simply for fun. Furthermore, SQL Injection according to first stage and no match found according to
attack techniques have become more common, more second stage then that query will be a valid query otherwise
ambitious, and increasingly sophisticated, so there is a deep it‟s interpreted as an SQLIA.
need to find an effective and feasible solution for this In this scheme, the authors stored the structure of the query by
problem. To achieve those purposes, automatic tools and preserving the order of the sequence of tokens generated by
security systems have been implemented, but none of these the query; means the authors are checking the sequence of
are complete or accurate enough to guarantee an absolute tokens generated by the arrived query is in the same order as
level of security on web applications. One of the important the authors stored in the model query. If the sequence of the
reasons of this shortcoming is that there is a lack of common arrived query is in same order as the query stored in the
and complete methodology for the evaluation either in terms database then that arrived query interpreted as valid query,
of performance or needed source code modification which in otherwise if the authors do not found any ordered sequence
terms is an over head for an existing system. So The authors like that arrived query in the entire database then it‟s a
feel that there should be such type of mechanism which will possible SQLIA as the authors stored all the possible structure
be easily deployable, does not need source code modification of the valid SQL query in the database in static phase. During
as well as provide a good performance and to achieve this, the searching phase of the query in the singly linked list if the
proposed research work is driven to the way of developing a authors found a valid query, means: (1) Number of tokens
new modified SQL injection detection technique. generated by the dynamic query is same as that of model
Proposed research work focused on the analyses and query and (2) Except from user inputted token value all other
resolution of the problem of SQL Injection attacks, in order to tokens are same in both the queries, then extract that token
protect and make reliable web applications. The authors try to value which is inputted by user and parse the queries by
provide a technique to prevent SQLIAs without any source character wise. If any of these characters match with the
code modification and without huge performance degradation. characters stored in character list then reject that query as it
To achieve desired goal the authors propose a general and may cause SQLIA.
complete evaluation methodology which can be easily In the proposed technique when a query executed from the
deployable to an existing system to preserve the security of application program for validation, it knows that which action
the system against SQLIAs. In the proposed research work, point of the application program generates this query, so the
The authors combine the techniques describe by AMNESIA

13
International Journal of Computer Applications (0975 – 8887)
Volume 162 – No 9, March 2017

authors have to match this incoming query only to those belong. Suppose the authors get this query from that action
model queries which belongs to that action point. point which the authors have discussed in the previous
section.
select ID from Employee where EmpName = “vats” and
EmpPwd = “vatsChy”
To find an ordered sequence of token in that singly link list
for an incoming query to the database the authors first
separate the tokens from the query by a SQL parser of the
specific DBMS. After parsing those into tokens, on the basis
of space by using code, convert these string tokens into sum
of its ASCII Code value of each character. For example
consider the keyword „select‟, the corresponding ASCII
decimal values for the literals is s = 115, e = 101, l = 108, e =
101, c = 99, t = 116, so adding the ASCII value of each
literals the authors get the corresponding integer value of
„select‟ is 640.
select = 115 + 101 + 108 + 101 + 99 + 116 = 640
After getting the ASCII value of all strings of a query the
authors store into in the form of linked list. If the authors
closely analyze any web application most of the cases similar
Figure 2.1: finding of Action Point type of query is used with different user input. To store a valid
individual query structure, the authors preserve the sequence
In the figure 2.1, red part of the code indicates that action of tokens generated by the query using a singly link list where
point. An action point is defined as a point in the application each node store a single token of a query. After token
code that issues SQL queries to the underlying database. separation and integer conversion the authors get the ordered
There is one query model for each action point. For each sequence of the tokens of the query then the authors start
Action Point the authors generate a query model that searching the singly link list.
represent all the possible queries generated by that Action
Point and store the length of all possible model queries in an
array. There is one array for each Action Point.

Figure 2.3: Singly linked list to store integer tokens


The figure 2.3 shows the linked list representation of the
query after token translation and integer conversion. For a
valid incoming query the number of tokens is same as the
number of tokens in its corresponding query structure in the
Database. However, to reduce the search space and time the
authors group together all the query structure having same
number of tokens. It can be interpreted as a query having 4
Figure 2.2: Array to store the length of queries tokens belongs in a separate group than a query having 5
tokens. Therefore, before searching the similar structure for an
To store the starting address of the doubly link list or the incoming query the authors first calculate the number of
starting address of the group the authors use this array where tokens it have, then the authors start searching in the group
each cell of the array stores the starting address of a group. having all the structure of valid query having the same
The index of this array cell will represent the group number, number of tokens. Moreover, to group all the singly link list
the number of tokens each singly link list possesses in that having same number of tokens, The authors use a doubly link
group. For example, in figure 2.2, If a group having all the list usually referred as main link list whose each node holds
singly link list with „n‟ number of tokens then the starting the starting address of a singly link list among all the singly
address of the group be assigned to nth cell in the array. In link list having same number of tokens. That means if The
addition, in this representation there are many cells in the authors have „n‟ groups of singly link list then The authors
array may not be used but by using some extra storage the have „n‟ number of doubly link list and for an incoming query
authors have a great advantage that The authors don‟t need to The authors only search a single doubly link list among the
search the starting address of a specific group because after „n‟ number of list. Figure 2.4 shows a group of singly linked
calculating the number of tokens in the incoming query the list having 4 tokens connected to main linked list.
number itself represents the cell number of the array holding
the starting address of the group that the incoming query may

14
International Journal of Computer Applications (0975 – 8887)
Volume 162 – No 9, March 2017

matching process or if a mismatch found at that position then


it is clear that the link list the authors are searching for the
similar query structure for the incoming query is not correct
list so the thread searching this singly list should stop its
execution. The search for the similar structure for the
incoming query is done in this fashion in a multithreaded way.
If no similar type of structure found in the model query then
it‟s a possible SQLIA. But if a thread found the correct path, it
intimates the other threads to terminate as it already performs
a successful search. From the above description of matching
technique it is clear that for a successful search, number of
token in the incoming query is same as the length of the link
list stored its structure. In the figure 2.5 the authors combine
Figure 2.4: Singly and Doubly linked list connection all the procedures in a single unit.
While searching the single link list if position of the token For runtime token matching, if the authors used literal wise
from the incoming query matches the token of the same matching then it will be a huge computational over head. In
position in the singly link list then The authors move to the worst case the authors have to check „n‟ number of literals for
next token as well as to the next node of the list until any each „q‟ number of query available in the database of the
mismatch found or the end of the list. In this way if the same length of the incoming query, therefore the complexity
authors reach at the end of the single link list and there is no will be O (n × q). Instead of using literal wise string matching
more token left in the incoming query then it‟s a successful algorithms the authors simply mapped each token into an
search and the incoming query is a valid query. But during the integer value. The authors also store these integer values in
matching process if a mismatch occurs and the node in the the database instead of storing the tokens in string format. It
linked list indicate that it is a position of user input then the also takes very less space, for example if there are „n‟ no of
authors extract the value of the token which are inputted by literals and „m‟ no of tokens in the query instead of storing „n‟
user and parse the queries by character wise by using some no of values The authors are storing „m‟ no of values where
code. Now match these characters with the characters stored m<<n. So in run time validation the length of a singly link list
in the character list. Character List is a list which contains storing the query having „m‟ no of tokens is m. In cases of
some prohibited characters such as: incoming query after token separation the authors transform
„ , ‟ , “ , ” , space , -- , / and SQL Keyword each token into its corresponding integer values then the
authors start searching. In worst case the authors have to
If any character of token variable match with the characters check „m‟ no of integer values for each „q‟ no of query, so the
stored in the character list then that query will be an invalid complexity become O (m × q) instead of O (n × q) where
query otherwise go for matching process of the next token. If m<<n. This is a performance gain as both space complexity
the authors found a match at this position then continue the and time complexity improves.

Figure 2.5: The complete structure of proposed methodology

15
International Journal of Computer Applications (0975 – 8887)
Volume 162 – No 9, March 2017

3. RESULT ANALYSIS

Figure 3.1: Design of query model for LogIn Action Method


Suppose there is a query model for LogIn action point on the
home page by which user can visit website by two mode of
authentication, either user is a registered user or any user visit
website to just take a look. Figure 3.1 shows the two modes of
authenticated user. If user is not a registered user then he/she
will be treated as „guest‟ but if user already registered him/her
then he/she will have to input „EmpName‟ and „EmpPwd‟ The number of tokens generated by this query is 16, so the
field with correct information. These input variables will take control goes to array and search for the cell which contains
the places of both the „$‟ placeholders. This LogIn Action the value 16 and found it. This cell redirects query to the main
Query Model have two model queries having query length linked list which contains the address of the singly linked list
(number of tokens) 10 and 16. The separation point for both which further contains the valid query model. As the authors
the structures is „=‟, after EmpName. The structure above this reach at the very first node of the singly linked list matching
„=‟ is for „guest‟ user and the structure below this „=‟ is for will start. After successful matching up to 8th token, when The
registered user. User inputs a blank space for token number authors reach at the „9th‟ token of the query (since it is an
15 in the lower part of figure 3.1 in case 1 and a $ sign in case input field for „EmpName‟) The authors get the value of that
2. For the sake of better readability and understandability the token and parse the query on character basis by using code.
authors represent tokens in the form of string in place of Now match each character with the character stored in
integer in the figure 3.1 and figures present in all the three character list. Here match occurs as input variable contains
cases in this section. single quote and double dash. So this is an invalid query.

3.1 Case 1 3.3 Case 3


User inputs „ ” OR 1 = 1 --‟ for EmpName variable and „ ‟ for User inputs „xxx‟ for EmpName variable and „zzz‟ for
EmpPwd variable. So the complete query formed after EmpPwd variable so the complete query formed after
tokenization is given in the picture below: tokenization is given in the picture below:

The number of tokens generated by this query is 16, so the


control goes to array and search for the cell which contains
In this case user inputs value for the EmpName field, token the value 16 and found it. This case acts same as case 2 until
number 9 in the lower part of figure 3.1, separated with blank control reach at the 8th token in the above figure. After
space and the authors know, according to proposed method, successful matching up to 8th token, when The authors reach
tokens are separated whenever a blank space found so the at the „9th‟ token of the query The authors get the value of
number of tokens generated by this query is 22. Now control that token and parse the selected queries into on character
goes to array and search for the cell which contains the value basis by using code. Now match each character with the
22 but in array there is not a single cell having value 22. So character stored in character list but no match are found, so
further steps will not take place and process stops as the query process goes for next token matching and successfully
structure doesn‟t match with any of the structure described matched up to 14th token. Again when the authors reach at the
(guest or registered) in the model queries. So this is an invalid 15th token same matching process executes but this time also
query. no match found, so process goes for the next and last token
matching. After the matching of 16th token the authors can say
3.2 Case 2 that the entire tokens are matched with the stored query
User inputs „ ”OR1=1--‟ for EmpName variable and „ ‟ for structure and character list so this is a valid query.
EmpPwd variable. Since no blank space is inputted by user for
token number 9 in the lower part of figure 3.1 so the complete
query formed after tokenization is given in the picture below:

16
International Journal of Computer Applications (0975 – 8887)
Volume 162 – No 9, March 2017

4. ALGORITHM groups storing individual array.


Step 1: Separates each token from the sqlStatement, store in Declarations of methods
tokenSequence and returns total number of tokens as tokensOfSql ( ) separates each token from the sqlStatement on
numberOfToken. the basis of space, collection of characters between two
numberOfToken = tokensOfSql (tokenSequence, spaces will be treated as a token, and store into the in
sqlStatement); tokenSequence and returns total number of token in a
sqlStatement.
Step 2: Convert each token into its corresponding integer
value and store it in a linked list named as integerSequence. tokenToInt ( ) convert token into its corresponding integer
value. For example, select will be converted to 640.
tokenToInt (numberOfToken, tokenSequence,
integerSequence) startThreadSearch ( ) search for an ordered sequence of
integer provided by integerSequence in the singly link list
Step 3: Initialize starting node of the doubly link list to search. whose starting address is stored in searchNode. This method
will call itself as long as tokens of dynamically generated
while (searchNode -> rlink! = NULL) do query match with the stored one. If tokens do not match then
{ this method will stop processing and control will go to next
singly linked list of the same group and same process will
startThreadSearch (tokenSequence, execute.
integerSequence, searchNode);
searchValidCharacter ( ) returns true if a thread perform a
if (token is an input field) successful match. In this method matching of inputted
character with the character stored in character list is
{ performed. During the processing of startThreadSearch
if (searchValidCharacter ( )) method if a token is found which is actually an input data
value then this method will executes. This method will extract
{ each literal which is present in input data value and compare
the data value with the stored character in the character list, if
return false;
a match found then this method will return false and query
// it‟s a SQL injection will be treated as a SQL injection.

} 5. CONCLUSION AND FUTURE WORK


As it is a multi threaded implementation is fully utilized the
else
newly available multi core processors and performs the search
{ quickly. However, due to use of array indexing techniques the
frequently generated SQL query structure parsing will be
return true; processed quickly which is a performance gain to the existing
// it‟s a valid query available solution. As in the proposed scheme, validates each
dynamically generated SQL at runtime, this approaches
} increases the runtime overload of the system but reduces the
possibility of SQLIA. The authors‟ proposed technique can
also detect such type of attack which are may cause by only
} changing the data value of a query but not changing the
structure of the query by character matching technique for the
searchNode = searchNode -> rlink; inputted value in the query. In the proposed method up to
token matching stage all test cases works fine but in the
}
character matching stage with some case such as when each
Step 4: After a successful search; the thread, which found the character/literal of inputted data value is matched with
right sequence, informs all other thread to stop execution as it characters stored in character list, there may possibly that the
found a valid query structure. proposed technique will not result as optimal results. It may
happen that inputted data value may contain some SQL
Step 5: Exit. keywords which correct according to an authenticated user
Declaration of variables: point of view but since that input value is matched with
character list content so it will be treated as an invalid query,
sqlStatement is a query string generated by the Action Point. furthermore, further in future authors will expand this solution
to handle the character matching algorithm and develop a tool
tokenSequence is 2D arrays where each row represents a token to enhance the efficiency of the proposed method. Currently,
and each cell holds a literal of that token and total number of the proposed system approaches preventing from almost most
rows represent total number of tokens it stored. of the attack of SQLIA.
numberOfToken is the total number of tokens present in
tokenSequence. 6. AUTHORS CONTRIBUTION
All authors have contributed equally to his work and declare
integerSequence is an integer linked list where each cell stores no conflict of interest.
an integer value of a token.
searchNode = sqlDbArray [numberOfToken]
sqlDbArray is an array holding the starting address of all

17
International Journal of Computer Applications (0975 – 8887)
Volume 162 – No 9, March 2017

7. REFERENCES Conference on Automated Software Engineering (ASE


[1] Noor Ashitah Abu Othman, Fakariah Hani Mohd Ali and 2005), Long Beach, CA, USA, Nov 2005.
Mashyum Binti Mohd Noh: Secured Web Application [7] Wikipedia, “SQL injection”
Using Combination of Query Tokenization and Adaptive http://en.wikipedia.org/wiki/SQL_injection
Method in Preventing SQL Injection Attacks. 2014
IEEE, 2014 International Conference on Computer, [8] William G. J. Halfond, Alessandro Orso. Combining
Communication, and Control Technology (l4CT 2014), Static Analysis & Runtime Monitoring to Counter SQL-
September 2 - 4,2014 - Langkawi, Kedah, Malaysia Injection Attacks. SIGSOFT Software Engineering Notes
Volume 30 Issue 4. July 2005.
[2] Anamika Joshi and Geetha V: SQL Injection Detection
using Machine Learning. 2014 International Conference [9] Kumar, Anil, Rohit Agarwal, and Rahul Kala. "Temporal
on Control, Instrumentation, Communication and Logic based Motion Planning in Unstructured
Computational Technologies (ICCICCT) ©2014 IEEE. Environments."

[3] Jaskanwal Minhas, Raman Kumar. Blocking of SQL [10] F. Valeur, D. Mutz, and G. Vigna. A Learning-Based
Injection attack by Comparing Static and Dynamic Approach to the Detection of SQL Attacks. In
queries. International Journal of computer network and Proceedings of the Conference on Detection of
Information Security 2013. Intrusions and Malware and Vulnerability Assessment
(DIMVA), pages 123–140, 2005.
[4] A. Dasgupta, V. Narasayya, M. Syamala. A Static
Analysis Framework for Database Applications. IEEE [11] Boyd and A. Keromytis. SQLrand: Preventing SQL
25th International Conference on Data Engineering. injection attacks. In Proceedings of the Applied
Pages 1403 – 1414, March 2009. Cryptography and Network Security (ACNS), pages 292–
304, 2004.
[5] W. Halfond, J. Viegas and A. Orso. A Classification of
SQL Injection Attacks and Countermeasures, [12] G. Wassermann and Z. Su. An Analysis Framework for
Proceedings of the IEEE International Symposium on Security in Web Applications. In Proceedings of the FSE
Secure Software Engineering (ISSSE), 2006 Workshop on Specification and Verification of
Component-Based Systems (SAVCBS), pages 70–78,
[6] W. G. Halfond and A. Orso. AMNESIA: Analysis and 2004.
Monitoring for NEutralizing SQL-Injection Attacks. In
Proceedings of the IEEE and ACM International [13] Kumar, Anil, and Rahul Kala. "Linear Temporal Logic-
based Mission Planning." IJIMAI 3.7 (2016): 32-41.

IJCATM : www.ijcaonline.org
18

View publication stats

You might also like