0% found this document useful (0 votes)
128 views

Ip Cam

The document discusses IP cameras, which are cameras that transmit video over an IP network. It begins by explaining that IP cameras function similarly to traditional CCTV cameras but stream content over a network rather than to a DVR. It then provides details on the components and capabilities of IP cameras, including their history, features, types, and applications. IP cameras offer advantages like remote viewing and more flexibility compared to analog cameras.

Uploaded by

asifusmani19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views

Ip Cam

The document discusses IP cameras, which are cameras that transmit video over an IP network. It begins by explaining that IP cameras function similarly to traditional CCTV cameras but stream content over a network rather than to a DVR. It then provides details on the components and capabilities of IP cameras, including their history, features, types, and applications. IP cameras offer advantages like remote viewing and more flexibility compared to analog cameras.

Uploaded by

asifusmani19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

51

Often referred to as a network camera, an IP camera is a camera system that utilizes Internet Protocol (IP) to transmit video and images over a Fast Ethernet connection. This type of system typically serves the same purpose as a traditional closed-circuit television (CCTV) camera, which is to monitor a location. Among the many differences is that an IP camera usually streams content to a network video recorder (NVR), or a PC, rather than to a digital video recorder (DVR).

IP Camera Surveillance An Internet Protocol camera captures and streams live digital video footage over an IP network. The footage can be viewed and managed remotely from a Web browser, and is archived in digital format. IP cameras have wireless connections, are capable of advanced video analytics, and offer much more flexibility than their analog predecessors [12] . An IP camera usually consists of various components, including a lens, an image sensor, memory, and one or more processors. Memory plays the role of a hard drive by storing video content and the firmware needed to operate the device. Processors are required to process images, compress video, and perform a wide range of network functions. These components enable the camera to perform many of the same functions as a DVR[39]. 2. HISTORY

The first IP camera was released in 1996 by Axis Communications. It used an embedded Linux platform internal to the camera. Axis also released documentation for their low-level API called VAPIX which builds on the open standards of HTTP and RTSP.

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

52

This open architecture was intended to encourage third-party software manufacturers to develop compatible management and recording software. As with digital still cameras, the resolution of IP cameras has increased with time. Megapixel IP CCTV cameras are now available at resolutions of 1, 2, 3, 5 and even 11 megapixels. 3. IP CAMERA

Gone are the days of the traditional (Closed Circuit) CCTV Cameras introduced in the early 1940s. The time has come to shift to the newer and better IP security Cameras introduced in the late 90s. The IP camera represents the next era in video surveillance technology, an increasingly popular solution that allows for efficient monitoring from any compatible client device with a web browser. This item is a must for working housewife who can monitor their kids left in residence. You just need to fix up this IP Camera in your room in a strategic location and you can see everything what is happening in your home from work place or anywhere in world on the screen of your laptop or PC. Owner of a shop or workshop can also monitor his business with the help of this IP Camera. IP Camera has made life easier for many. This wonderful IP Camera can lessened your distance to 0 from anywhere from your loved objects. Young executives staying in cities are opting for it to monitor their old parents left in villages[25]. It is a CCTV camera that combines the capabilities of the computer in the camera. It is a unified IP camera to capture live situation. Go on and shoot through IP network and allows users to see events from a distance and can capture the event. Including control or set the camera via IP. IP Camera will have its own IP Address (the Default 192.168.0.99). IP Address that we think is the same as our house number. It allows anybody know where we are. IP Camera does not need the computer all the time; it can run on its own. It is different from the webcam because webcam need more computers. It has more functions such as: 1. Function monitoring movement if they are abnormal, it will keep shooting or not, it alarm or send mail to alert administrators. 2. It can listen to a sound and has the ability to send voice and video simultaneously. 3. It can be used as input as well as output device which is a utility function. 4. Serial Port functions for users who want to take control camera attached to the Pan / Tilt.

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

53

4.

IP CAMERAS FEATURES IP cameras may offer a cost savings because you can leverage your existing Ethernet networking infrastructure [26]. Using IP cameras as a surveillance solution allows you to easily add one camera at a time from any network - whether public or private. IP Cameras just plug into an access point in the network using CAT5 cabling. Ethernet IP cameras use Power over Ethernet (PoE) technology allowing for one cable to handle power and data [28] . Many IP cameras come with the software needed to monitor, record and playback making it more convenient to manage your surveillance system. IP cameras can use a variety of video formats including MPEG-4. MPEG-4 format far exceeds the image quality of conventional CCTV . IP cameras are web-ready which can provide surveillance from anywhere there is an internet connection [37]. 2 way audio allow users to communicate what they are seeing (e.g. Gas station clerk assisting a customer on how to use the prepay pumps). LED lighting which is used for night vision. This feature gives users the ability to view low light areas, known as Night Vision. Ability to view at a streaming rate, some IP cameras have a resolution of 640x480 and are able to record at 30 frames per second[1]. IP cameras are also able to function on a wireless network. Initial configuration has to be done through a router, however after the IP camera is installed it can then be used on the wireless network. IP camera help make iptv more enhanced (e.g. the video call and video chat through your ipchatting) [35]

5.

TYPES OF IP CAMERAS

IP camera, sometimes called Network Camera, is a type of standalone unit serving as surveillance device which can be controlled and monitored through standard protocol, such as HTTP, FTP and SMTP without computers involved.

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

54

There is usually an embedded web server running inside the camera to handle http request from open internet so that users can access the camera through a web browser to monitor and control the camera. IP Camera can be classified on the following basis: A. Camera lens and image processing unit constructed

1. INTEGRATED IP CAMERA, in which the image capture unit, like camera lens, is integrated with image processing unit. That type of camera has only one lens, thus support only one channel. 2. VIDEO SERVER, which only has image processing unit and other units such as image transfer, networking module. There is no video capture unit inside it. Video server usually supports multi video input channels, 2, 4 and up to 32 channels. The lower end video server support 2 or 4 input channels, however, only one channel can be set to active, thus only this channel can be monitored and responses to motion detection. B. Video capture unit connected to video server

1. CORDED CAMERA, in which the camera is connected by cord. Corded camera provides good quality of video or image

Dome Corded Camera CORDLESS CAMERA, in which the camera is connected by radio signal or X10 adaptors. Cordless is easy to deploy but with low quality of video.

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

55

Cordless Camera C. IP Camera connected to internet[36] 1. WIRED CAMERA, which is connected with a CAT 5 standard network cable through a RJ 45 jacket of Ethernet. This type of camera provides most reliable connection and best bandwidth

Wired Camera 2. WIRELESS CAMERA, which is connected to internet through WiFi, Bluetooth or other connecting methods [32].

Wireless Camera The variety of network camera models available allows users to install video security solutions fit for any surveillance application. Some of the more common IP camera types:

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

56

1. FIXED IP CAMERAS: Fixed network cameras are the ideal choice for those who wish to monitor a very specific area and also intend to have the camera, and the direction it is pointing, clearly visible. Once the camera is focused on a location, it is set to view only that area. Most fixed cameras support interchangeable lenses and housings for various environments [9].

Fixed IP Camera 2. FIXED DOME CAMERAS: Fixed dome cameras are often small and discreet, with a fixed camera installed inside dome housing. The camera can be pointed in any direction and then set in place to target a specific area. Fixed domes can provide unobtrusive surveillance, and the housing helps to conceal which direction the camera is aiming.

Fixed Dome Camera 3. PTZ CAMERAS: Unlike fixed cameras, PTZ network cameras allow the user to control pan, tilt, and zoom functions in order to monitor wider areas and zero on specific individuals, objects, or activity. In a retail setting, for instance, surveillance operators can control a PTZ camera to follow a suspected shoplifter. Most PTZ cameras offer both manual and automatic PTZ control.

PTZ Camera

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

57

1. NETWORK DOME CAMERAS: The advantage of network dome cameras over PTZ cameras and fixed IP cameras is they can pan up to 360 degrees and support continuous guard tour operation. Guard tour functionality enables a single network dome camera to automatically move between presets in order to cover large areas that would typically require multiple fixed cameras [38].

Network Dome Camera 6. APPLICATIONS 1. Restaurant Surveillance with IP Cameras [22]. 2. Protect Your Retail Store With IP Camera Surveillance [21]. 3. Using IP Camera Surveillance in Shopping Malls [24]. 4. Using IP Cameras in Prisons and Correctional Facilities [23]. 5. IP Camera Video Surveillance for Car Dealerships [18]. 6. IP Camera Video Surveillance for ATMs [19]. 7. IP Camera Video Surveillance for Airports [20]. 8. IP Camera Surveillance System for Gas Stations [13]. 9. IP Cameras Surveillance in Convenience Stores [14]. 10. IP Camera Surveillance of Parking Lots [15]. 11. IP Camera Surveillance for Vacation Homes [16] [10]. 12. IP Camera Surveillance for Jewellery Stores. 13. IP Camera Surveillance for Hospitals. 14. Intelligent Robot Vacuum Cleaner with Wireless IP Camera 15. Remote Door Access Control Using an IP Camera System [34]. 16. IP Camera Surveillance for Stations. 17. IP Camera Surveillance for Banks. 18. IP Camera Surveillance for Schools [8]. 19. IP Camera Surveillance for Armed Forces. 20. IP Camera Surveillance for Power Plant. 21. IP Camera Surveillance for Power Supply Center. 22. IP Camera Surveillance for Oil Storage Center. 23. IP Camera Surveillance for Factories. 24. IP Camera Surveillance for Mines.

[33]

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

58

7.

ADVANTAGES OF IP SECURITY CAMERA

1. IP security camera use less equipment, 2. IP security camera use less excessive wiring, 3. Therefore IP security camera is very convenient and took less cost to install and maintain, 4. IP security camera use less power due to using Ethernet so therefore less energy consuming, 5. So it will provide less long term running cost although IP security camera may cost higher during early purchase compare to analog[3], 6. In term of specification IP security camera is more versatile and provides more features [30]. 7. IP security camera usually give a higher resolution image that mean clear images as seen in PC compare to analog camera. 8. With IP security camera you shall get more advance features such as: a. Day/night cameras [29], b. Motion sensors, c. Removable infrared filters for sharper colors by day and clear black-andwhite footage by night. d. Encrypted signals allow for secure communication, e. IP security camera can also directly control the zoom and tilt capabilities, f. Monitor alarms, g. IP security camera also can provide 2 ways of communication (not only watching), h. There is a program can be use to activate lights or locks if someone triggers those alarms, i. It also has software that you can use to program the IP security camera to operate as what you need such as switch ON/OFF timer and what information you need to send to you or anybody else [31]. 8. DISADVANTAGES OF IP CAMERAS The following are some of the potential weaknesses of IP cameras in comparison to analog CCTV cameras. Higher initial cost per camera. Less choice of manufacturers [5]. Lack of standards. Different IP cameras may encode video differently or use a different programming interface. This means a particular camera model should be deployed only with compatible IP video recording solutions [6].

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

59

High network bandwidth requirements: a typical CCTV camera with resolution of 640x480 pixels and 10 frames per second (10 frame/s) in MJPEG mode requires about 3 Mbit/s. Technical barrier. Installation for IP Camera required a series of complicated network setting including IP address, DDNS, router setting and port forwarding. This is very difficult for most users to do alone, without help from an IT technician [7]. Lower dynamic range - i.e. reduced ability to cope with low light and high contrast scenes. CONCLUSION

9.

IP cameras change the business model of deploying cameras in large facilities and areas. It was too expensive to deploy, IP is enabling new use of cameras. We will certainly see this continue in schools, corporate campuses, municipalities, outdoor facilities, anywhere that long distances separate cameras from recording/monitoring stations. The price competitiveness of IP cameras increases and as NVR solutions become simpler to setup and manage [27]. Megapixel cameras are the wild card here. If and when the total cost of ownership (camera, bandwidth, storage) of megapixel cameras gets close to analog cameras, the financial incentive to switch to IP could become very strong. If the facilities are large, you want to move aggressively to IP. If DVR suppliers are making advances like going hybrid, supporting analytics, providing central management, etc., you will likely be in good shape for years to come[4]. If they are not supporting this, you may be missing out on this generations wave of operational savings and loss reduction. In this case, start investigating migration to a new IP based system. Integrators and DVR manufacturers will then be forced to support IP cameras or be ousted by rivals that offer the clearly financially preferable IP solution. On the plus side, you should be able to maintain and extend your core business by slowly adding in IP. New features on your current DVR lines should support this. Ensuring that you retain your key technicians with IT know-how should be satisfactory. On the downside, it will be hard to grow double digit revenues because the easy growth in deploying new systems is for large facilities. If you want to grow aggressively, targeting these applications with an IP camera line and an NVR will be a key. You likely already have 1 or 2 solid IT technicians and with additional training and a few more hires should be well poised to expand. 10.
[1]

REFERENCES: 2006, Nelson Publishing, 2008, Gale, Cengage Learning.URL:http:// findarticles.com/p/articles/mi_m0CMN/is_1_43/ai_n26771543/

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)


[2] [3]

60

URL:http://ipsecuritycamera.blogspot.com John Honovich, Sep 12th 2010, IP Video Market Info. URL:http://ipvideomarket.info/ report/ip_camera_standards_battle John Honovich, Sep 14th 2008, IP Video Market Info. URL:http://ipvideomarket.info/ report/should_i_use_ip_cameras__reviewing_ip_camera_advantages John Honovich, July 26th 2008, IP Video Market Info. URL:http://ipvideomarket.info/ report/top_5_ip_camera_problems John Honovich, July 12th 2009, IP Video Market Info. URL:http://ipvideomarket.info/ report/top_5_ip_camera_problems_2009 URL:http://ipvideomarket.info/topics/IPCameras Vincent Juson, June 2nd 2009. URL:http://vljuson.com/network-cameras/acti-4indoor-ip-camera-package-with-64-camera-recording-software/ URL:http://www.4xemsecurity.com Greg Luttman. URL:http://www.about-home-security.com/home-security-ipcamera.php URL:http://www.alphacardsecurity.com/ip-video-cameras/index.shtml URL:http://www.alphacardsecurity.com/video-surveillance-glossary.shtml#_80211 Was Fernley, Feb 24th 2009, Gas Stations and the IP Camera Surveillance System. URL:http://www.articlesbase.com/computers-articles/gas-stations-and-the-ipcamera-surveillance-system-786703.html Was Fernley, Feb 2nd 2009, IP Cameras Surveillance in Convenience Stores. . URL:http://www.articlesbase.com/computers-articles/ip-cameras-surveillance-inconvenience-stores-751776.html Was Fernley, Jan 7th 2009, IP Camera Surveillance of Parking Lots. . URL:http:// www.articlesbase.com/computers-articles/ip-camera-surveillance-of-parking-lots714261.html Was Fernley, Jan 10th 2009, IP Camera Surveillance for Vacation Homes. . URL:http:/ /www.articlesbase.com/computers-articles/ip-camera-surveillance-for-vacationhomes-717565.html

[4]

[5]

[6]

[7] [8]

[9] [10]

[11] [12] [13]

[14]

[15]

[16]

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)


[17]

61

Was Fernley, Aug 7th 2008, Ip Cameras Vs. Cctv Cameras - Part One. . URL:http:// www.articlesbase.com/computers-articles/ip-cameras-vs-cctv-cameras-part-one513372.html Was Fernley, Feb 24th 2009, IP Camera Video Surveillance for Car Dealerships. . URL:http://www.articlesbase.com/computers-articles/ip-camera-videosurveillance-for-car-dealerships-786704.html Was Fernley, Apr 1st 2009, IP Camera Video Surveillance for ATMs. . URL:http:// www.articlesbase.com/computers-articles/ip-camera-video-surveillance-for-atms844428.html Was Fernley, Dec 30th 2008, IP Camera Video Surveillance for Airports. . URL:http:/ /www.articlesbase.com/computers-articles/ip-camera-video-surveillance-forairports-703786.html Was Fernley, Feb 1st 2009, Protect your Retail Store with IP Camera Surveillance. . URL:http://www.articlesbase.com/computers-articles/protect-your-retail-storewith-ip-camera-surveillance-749841.html Was Fernley, Dec 1st 2009, Restaurant Surveillance with IP Cameras. . URL: http:/ /www.articlesbase.com/computers-articles/restaurant-surveillance-with-ipcameras-1524118.html Was Fernley, Nov 25th 2009, Using IP Cameras in prisons and Correctional facilities. URL: http://www.articlesbase.com/computers-articles/using-ip-cameras-inprisons-and-correctional-facilities-1504023.html Was Fernley, Nov 26th 2009, Using IP Camera Surveillance in Shopping Malls. . URL:http://www.articlesbase.com/computers-articles/using-ip-camerasurveillance-in-shopping-malls-1508385.html Was Fernley, Aug 28th 2008, What to look for in an IP Camera? . URL:http:// www.articlesbase.com/computers-articles/what-to-look-for-in-an-ip-cameraenclosure-539560.html Was Fernley, May 22nd 2009, IP Security Camera against Analog Security Camera. URL:http://www.articlesbase.com/gadgets-and-gizmos-articles/ip-securitycamera-advantage-of-ip-security-camera-against-analog-security-camera931969.html

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)


[27]

62

URL:http://www.ezwatch-security-cameras.com/shop/EZ-IPNVR-EZWatch-ProNVR-Network-Video-Recorder-IP-Camera-Software-p-16416.html URL:http://www.huamaicamera.com/EnProductShow.asp?ID=224 Bob.URL:http://www.imakenews.com/kin2/ e_article000462723.cfm?x=b8v5FDQ,b25tl0b3,w Bob.URL:http://www.imakenews.com/kin2/ e_article000654641.cfm?x=b8v5FDQ,b25tl0b3,w URL:http://www.ipcameraguru.com URL:http://www.itboyd.com/blog/2008/12/technology-ip-camera-and-wirelesscamera/ Consumer Electronics, Robots, March 24th 2010. URL:http://www.itechdiary.com/ intelligent-robot-vacuum-cleaner-with-wireless-ip-camera.html URL:http://www.kintronics.com/IPaccesscontrolsystems.html URL:http://www.kumaar.com/Gizmos/ipcamera.html URL:http://www.networkwebcams.com URL:http://www.securitycamerasandmore.com/ipcamera.html URL: http://www.VideoSurveillance.com URL:http://www.wisegeek.com/what-is-an-ip-camera.htm

[28] [29]

[30]

[31] [32]

[33]

[34] [35] [36] [37] [38] [39]

Natural language interface for database: Using key word approach


Keshav Niranjan Research Scholar, Department of Computer Science and I.T Singhania University, Pacheri Bari, Rajasthan Email ID: [email protected] ABSTRACT The present paper is proposing a Natural Language Interface (NLI) for database for English using the key-word approach. The system will take simple English statements in the domain of database communication and retrieve Structured Query Language (SQL) syntax for triggering a database response. The advantages of a NLI based systems over traditional menu based approaches have been well established[3] .SQL statements are English-like artificial commands for operating the database through well defined procedures failing which a non communication or failure of it may result stalling the business processes. Learning additional artificial languages bring cognitive load to the users and simple NLIs like the one proposed in this paper can be viable alternatives. Keyword: SQL, POS tagger, shallow parsing, SQL generator, NLP, NLI Introduction Interface is the communicator between database and user. So if user wants to retrieve the information from database then he uses the interfaces. In the earlier days, user used the command of formal language for retrieving the data. Therefore a memorization of command was necessary for retrieving the information. For correcting this drawback; the menu based interfaces were developed. The latter1 had all the commands written in the menu, so the user can select this command from the menu and 1.

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

64

can retrieve the information from the database. Therefore, there was no requirement for memorization of commands. A limitation of this method is that only trained user can operate the menu driven interface also it is domain dependent search means its criteria is limited whatever tools is and given in the menu on that basis we can search the data. So by the this limitation of the interfaces there is the requirement of another interfaces which should be domain independent and also for operating of this interface there should no requirement of any kind of training to the user. User can give the command to interface in his own format and native language and will get the desired result as output. This work can be done by the natural language interface.
1

Menu based Interface

2. Methodology Process of natural language interface will be as follows: INPUT: write the query in natural language for interfaces SHALLOW PARSING: To parse the input query and get the tokens of query, for tokenization of sentence, we have to run a POS1 tagger23 on the sentence. We can see it from the table Table [3] Determiner The Noun Flight Verb Is Adjective Red Preposition In Conjunction And Pronoun Me Proper noun Alaska A class teach Higher On Or I United An Computer Take Small Under But You American

We plan to use one of the available taggers for English and obtain the results like the following What did john teach in class s(noun(john),verb_pp(teach,noun(what),pp(in,noun(class)))) LEXICON So after the tokenization we match each token to lexicon. Since our object is to identify the keywords in the input query which have SQL equivalents Thereafter the keyword identifier module will pick out SQL pertinent words to be matched in a lexicon of synonyms

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

65

and equivalences so we can see lexicon table which have synonym of SQL pertinent word.
1 2 3

Parts of speech Tagging means automatic assignment of descriptor or tag to input token Tokenization: The input text is divided into tokens

Q_Id
1.

Keyword
Select

Synonym
Choose,Pick,Prefer,Elect,Opt,Specify,See Choice, Take, Select, Decide, Make Up Ones Mind, Determine, Show, Display, Get, Find, Retrieve, Obtain, Fetch, Bring, Give, Pick Out , Take On, Demand, Take In, Distinguish, Shoot, Get, Call For, Exact, Pack, Bring, Acquire, Read, Involve, Make, , Guide, Engage, Lead, Hire, Necessitate, Have, Occupy, Strike, Study, Use Up, Assume, Admit, Need, Adopt, Subscribe To, Deal, Aim, Spot, Postulate, Carry, Contract, Aim, Require, Fill, Film, Prefer, Drive, Make Out, Direct, Discern, Recognize, Ask, Subscribe, Convey, Submit, Conduct, Opt, Recognize, Take Away, Consider, Accept, Learn, Withdraw, Consume, Contain, Look At Beginning At, Coming Out Of, Deriving Out Of, Originating At, Starting With, Belonging, Endemic, From, Internal, Original, Primary, Primeval, Primitive, Relate Create, Make, Construct, Produce, Generate, Build, Act, Move, Make Or Cause To Be Or To Become Take In, Prepare, Get, Make Up, Cook, Attain, Fix, Establish, Defecate, Work, Take, Have, Get To, Construct, Gain, Do, Give, Crap, Take A Leak, Induce, Hold, Clear, Produce, Nominate, Ready, Take A Crap, , Bring In, Make Believe, Constitute, Make Move Into, Go In, Figure, Bring Out, Insert, Go Into, Premise, Participate, Acquaint, Put In, Present, Enroll, In, In close, Record, Accede, Precede, Get Into, Get In, Infix, Put Down, Enter, Bring In, Enroll, Enclose, Introduce, Innovate , Come In, Cut In, Bring In, Bring Out, Install, Insert, Premise, Slip In, Put In, Sneak In, Chime In, Acquaint, Tuck, Store, Set Up, Break In, Precede, Innovate, Preface, Infix, Submit, Envelop, Present, Inject, Hold In, Enter, Install, Confine, Inaugurate, Butt In, Hive Away, Close In, Enclose, Interpose, Interject, Introduce Update, Modify, Amend, Modernize, Refresh, Refurbish, Rejuvenate, Renew, Renovate, Restore, Revise, To Incorporate New Or More Accurate Information, Modernize Or Bring Up To Date , Supply With Recent Information Location, Locus, Point, Position, Site, Situation, Spot, Station, Locality, Location, Place, Position, Site, Situation, Spot, Stage, Station, Where

2.

From

3.

Create

4.

Insert

5.

Update

6.

Where

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

66

7. 8.

And Or

Along With, Also, As A Consequence, As Well As, Furthermore, Including, Moreover, Together With As A Choice, As A Substitute, As An Alternative, Conversely, Either, In Other Words, In Preference To, In Turn, On The Other Hand, Or But, Or Else, Or Only, Preferentially, Alternative, Else, Ere, Instead, Oppositely, Otherwise, Rather, Substitute Greatest, Height, Max , Maximum,Most,Peak,Point, Roof, Spire, Summit, Tip, Tops, Up There, Vertex, Zenith, Top, High Point, Apex, Climax, Culmination, Extremity, Height, Max , Maxi, Most, Peak, Pinnacle, Preeminence, Record, Summit, Supremacy, The End, Top, Utmost, Uttermost, Amid, Amidst, Among, At Intervals, Halfway, In, In The Middle, In The Midst Of, In The Seam, In The Thick Of, Inserted, Interpolated, Intervening, Medially, Mid, Midway, Separating, Surrounded By, Within Mean, Median, Medium, Middle, Midpoint, Norm, Par, Rule, Standard, Usual Calculation, Computation, Enumeration, Numbering, Outcome, Poll, Reckoning, Result, Sum, Toll, Total, Whole

9.

Max

10.

Between

11. 12.

Average Count

KNOWLEDGE BASE: Since expression power of natural language is wider so user can use any word in the query sentence so database of synonyms and list of relations of word to another word is necessary by which actual meaning of the sentence will be interpreted. Some sample relationships are following isa(professor, rank), isa(mis, major) , isa(classes, table_name), v_obj(take, c_title) , actor(takes, stuname) in_table(course_no, classes) , in_table(rank, faculty), isa(mis, c_title) isa(art, department) isa(computer, tool) v_obj(taught, c_title) actor(teaches, facname) in_table(dept, faculty) in_table(credits, student)

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

67

SQL GENERATOR: The SQL generator will generate the query in sql language by keyword matching and query sampling methods. A sample data is given below Show name of employee Will parsed as Verb_pp (show, n ((name), pp (of, n (employee)))) And SQL generator will generate query in SQL language as SELECT name FROM employee; After executing of this query user will get the name of employee. 3. Research

In research of natural language interfaces for database we will use the scientific research method that brings the output which is universally accepted. Step of scientific methods are following 1. We will observe the existing interfaces for database and find out that these interfaces are useful for all types of user. 2. Formulate a hypothesis i.e. (NLP for database) as a tentative answer 3. Deduce consequences and make predictions. 4. Test the hypothesis in a specific experiment/theory field. The new hypothesis must prove to fit in the existing world-view In case the hypothesis leads to contradiction and demands a radical change in the existing theoretical ground; it has to be tested particularly carefully. The new hypothesis has to prove fruitful and offer considerable advantage in order to replace the existing scientific paradigm. If the major problem is found in the hypothesis then we start from beginning. 5. When consistency is obtained in hypothesis then hypothesis becomes the theory Means hypothesis is universally accepted and in the last we select the best interfaces which is easy and useful to all type of user. All these process we can understand by the diagram that is given below.

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

68

4.

Conclusion and Future research Person Programmer Novice Manager Scientist Number of People will use NLI1 Percentage 20/20 16/20 15/20 18/20 100 80 75 90

Survey Table So from this survey it is evident that only a small section is using the SQL interface for database so interface in natural language will solve the problem of 309 million people of English speaker out of 6.6 billion(approximate) population of the world. So this1 research is not enough for spreading the uses of technology So in the Future there is the requirement of research to develop the same paradigm in each language of the world so thereafter language will not be barrier in industrial growth and corporate world.
1 2

Natural language Interface Natural language interface for database

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

69

5.

Bibliography [1] D., Jurafsky, and J. Martin, [2002], Speech and Language Processing , 2002, Pearson Education Publication, 81-7808-594-1.
[2]

Mitkov,Ruslon [2003], The Oxford hand book Of Computational Linguistic s,Oxford University Press Jha,Girish, Nath , A Natural Language Interface for Databases , Dept. of Linguistics, University of Illinois, Urbana-Champaign Conlon,Sumali, J. ,Conlon John R. and James Tabitha L.,[2002] , The Economics of Natural Language Interfaces: Natural Language Processing Technology as a scare resource, School of Business Administration, University of Mississippi University, MS 3867 Akerkar, Rajendra and Joshi, Manish, Natural language interface using shallow parsing , International Journal of Computer Science and Applications,Vol. 5, No. 3, pp 70 - 90 El-Mouadib ,Faraj A. ,and Zubi,Zakaria, Suliman and Almagrous ,Ahmed, A.,and El-Feghi, I.,[2009], Interactive Natural Language Interface , WSEAS TRANSACTIONS on COMPUTERS, ISSN: 1109-2750, Issue 4, Volume 8 DODIG-CRNKOVIC ,GORDANA , Scientific Methods in Computer S c i e n ce , D e pa r t m e n t o f C o m p u te r S c i e n c e M l a rd a l e n UniversityVsters, Sweden http://www.synonyms.net/synonym/construct http://thesaurus.com/browse/amend http: //www.answerbag.com/q_view/53199

[3]

[4]

[5]

[6]

[7]

[ [

[8] [9]

[10]

Acknowledgements Author thank to Dr.Girish Nath Jha,Jawaharlal Nehru University for reviewing and giving their invaluable comments that helped to revise the draft of this paper.

A method for Multiple Object tracking in video sequences using Particle Filtering Technique
Esmita Singh Student, M.Tech (ALCCS), IETE, New Delhi Email ID: [email protected] ABSTRACT In this paper we try to investigate how to track multiple objects in video sequences for example finding pedestrian tracking in a crowd. We review the particle filtering techniques for tracking single moving object as well as multiple moving objects in video sequences that is developed based upon several information by using different features such as color, motion, shape etc. Particle filter provide a robust tracking framework under ambiguity conditions. This technique increase robustness and accuracy and decrease the computational complexities. We propose to track the moving objects by generating hypothesis not in image plan but on top view reconstruction of the scene. Pros and cons of these algorithms are discussed along with difficulties that have to be overcome. Comparative results of a particle filter with real video sequences show the advantage for multiple object tracking. Keywords: Particle Filtering, Video Sequences and Multiple Object Tracking 1. Introduction

Video object tracking has many practical applications such as security, road traffic, the detection of suspicious moving objects or monitoring of an industrial production. Different techniques are available in the literature for solving tracking tasks in vision and can be divided in general into two groups: 1) Classical applications: where targets do not interact much with each other, behave Independently such as aircrafts that do not cross their paths.

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

71

2) Robust applications: in which targets do not behave independently (ants, bees, robots, and people), their identity is not always very well distinguishable. Tracking multiple identical targets has its own challenges when the targets pass close to each other or merge. The quality of video object tracking is depends upon the ability to handle ambiguous condition such as occlusion of an object by another object such as a person moving and a tree come while tracking .To work with such condition, the standard technique using multi hypotheses for state estimation and tracking, The Kalman Filter is used under the noise distribution are Gaussian and the system dynamics are linear. While tracking the movement of any human non linear and non stationary assumptions make it suboptimal to use. This time the role of Particle filter come into existence, this algorithm is very simple and general. Partical filter is a technique that is a powerful tool for tracking nonlinear system. It is able to cope with missing data. This algorithm track the object by ranking the accordingly to their likelihood of multiple object. Particle filtering is a promising technique because it allows fusion of different sensor data, to incorporate constraints and to account for different uncertainties. The algorithm based on likelihood factor as a product of the likelihoods of different object. We show the benefit of using multiple object compared to color-based tracking only and texture-based tracking only. 2. Monte Carlo Framework for Object Tracking in Video Sequences

This mechanism is useful to track the state of an object or region of interest in a sequence of images captured by a camera. There are many different techniques are used for solving tracking problems. Here we are focusing mainly on Monte Carlo techniques (particle filters) because of their power and versatility [2-5]. The Monte Carlo method is based on computation of the state posterior density function by samples, and are known under different names: particle filters (PFs) [3], bootstrap methods [2] or the condensation algorithm [6, 7] which was the first variant applied to video processing. The abbreviation CONDENSATION stems from Conditional Density propagation. The objective of sequential Monte Carlo method is to evaluate the posterior probability density function (PDF)) | (k k p Z x of the state vector x n k x , with dimension x n given a set } ,..., {1 k k z z Z = of sensor measurements up to time k. The Monte Carlo approach relies on a sample-based representation of the state PDF. Multiple particles (samples) of the state are generated, each one associated with a weight which tells the quality of a specific particle. An estimate of the variable of interest is obtained by the weighted sum of particles. The two major steps are:

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

72

1) Prediction 2) Update Prediction: Each particle is modified according to the state model of the region of interest in the video frame, including the addition of random noise in order to simulate the noise in the state. Update: Each particles weight is re-evaluated with the new data. Resampling methods reduce particles that have light weight and replicate the particles with larger weights. Many of the proposed particle filters for tracking in video sequences rely on a single feature, e.g. color. However, single-feature tracking does not always provide reliable performance when there is clutter in the background. Multiple-feature tracking provides a better description of the object and improves the robustness. In a particle filter is developed which fuses three types of raw data: color, motion, and sound. While, developing a visual tracking algorithm which is robust to a wide variety of conditions is still an open problem. Part of this problem is the choice of what to track. Color trackers are distracted by other objects having the same or similar color as the target. 3. 3.1 Typical Motion and Observation Models Motion Models

This model is used to accomplish a given tracking task depend on the purposes, and in particular on (i) The objects possessing certain characteristics: cars, people, faces; (ii) The objects possessing certain characteristics with a specific attribute, as talking faces, walking people ,moving cars, face of a given person; (iii) The objects of a priori unknown nature but of specific interest, such as moving objects. In each case part of the input video frame is searched against a reference model describing the appearance of the object. The reference can be based on image patches, which describe the appearance of the tracked region at the pixel level, or on contours, and/ or on global descriptors such as color models. To a target, first a feature space is chosen. Reference object (target) model: is represented by its PDF in the feature space. For example, the reference model can be the color PDF of the target [1]. In the subsequent frame, a target candidate is defined at some location and is characterized by the PDF. Both PDFs are estimated from the data and compared by a similarity function. The local maxima in the similarity function indicate the presence of objects in the second image frame having a representation similar to the reference model defined in the first frame. Examples of similarity functions are the Bhattacharyya distance and the Kullback-Leibler

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

73

distance. In the light of tracking a specified object or region of interest in image sequences, different object models have been proposed in the literature. Many of them make only weak assumption about the precise object configuration and are not particularly restrictive about the types of objects. The object (motion) models used in the literature vary from general random walk models [10, 1] to constant acceleration models [9] or other specific models. In order to design algorithms which are applicable to fairly large group of objects, including people, faces, vehicles, etc., in [1] a weak model for the state evolution is adopted, mutually independent Gaussian Random Walk models. These models are augmented with small random uniform components to capture the (rare) events such as jumps in the image sequence. It also helps in recovering the track after a period of complete occlusion. Mixed state motion models as in [20] can be used to overcome partial and full occlusions. 3.2 Observation models

The observation models for object tracking in video sequences are usually highly nonlinear and can be two type: 1) Parametric (e.g. mixture of Gaussians) or 2) Nonparametric (e.g. Histograms). Some of the most often used observation models are based on color, shape and/ or motion object. The localization object impact the tracker based on a PF in different ways. Usually, likelihood models of each cue are constructed [10, 1]. These object are assumed mutually independent, having in mind that any correlation that may exist between, e.g., the color, motion and sound of an object is likely to be weak. Adaptation of the object is essential in distinguishing different objects, making tracking robust to appearance variations due to changing illumination and pose. 3.2.1 Shape Information

When a specific class of objects is considered, a complete model of its shape can be learned offline and contour object can be applied to capture the visual appearance of tracked entities. Color/ spline based particle filters are developed in [6, 7]. In [7] color information has been used in particle filtering for initialisation and importance sampling. These models can be contaminated by edge clutter, they are not adaptable to scenarios without a predefined class of objects to be tracked, or where the class of objects does not exhibit very distinctive silhouettes. When shape modeling is not appropriate, color object are a powerful alternative.

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

74

3.2.2 Color Modelling Color represents an efficient cue for object tracking and recognition which is easy to implement and requires only modest hardware. Most color cameras provide RGB (red, green, blue) signal. HIS (hue, saturation, intensity) representation [11] can also be used[12]. Hue refers to the perceived color (technically, the dominant wavelength): e.g. purple or orange. Saturation measures its dilution by white light, giving rise to light purple, dark purple, etc., i.e. it corresponds to vividness or purity of color. HIS decouples the intensity information from the color, while hue and saturation correspond to human perception. Color-based trackers have been proven to be robust and versatile for a modest computational cost [13, 1, 14]. Color localisation object are obtained by associating a reference color model with the object or region of interest. This reference model can be obtained by hand-labeling, or from some automatic detection module. To assess whether a given candidate region contains the object of interest or not, a color model of the same form as the reference model is computed within the region, and compared to the reference model. The smaller the discrepancy between the candidate and the reference models, the higher the likelihood that the object is located inside the candidate region. Histogrambased color models are used in [1]. The likelihood is computed from the histogram distance between the empirical color distribution in the hypothesized region and the reference color model. For color modeling in [1] independent normalized histograms are used in the three channels of the RGB color space. The color likelihood model is then defined in such a way so as to favor candidate color histograms close to the reference histogram. A distance metric which is appropriate to make decisions about the closeness of the histograms h1 h2 is the Bhattacharyya similarity coefficient and the equation is: B D(h1,h2) = ( 1 (hi,1hi,2) ) =1 I W here B is the number of bins. This metric is within the interval [0,1]. Based on this distance, the color likelihood model can be defined by [1]. P (Z | x) ( -D2 (hcx , hcref)/2 s2C)1/2 c(R,G,B) Two PFs are developed in [1]: a PF based on color and sound and a PF based on color and motion. In the PF with color and sound the search is performed at first in one dimensional space x direction, followed by another in a two dimensional space. This increases the PF efficiency, allowing the same accuracy to be achieved for a smaller

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

75

number of particles. The same strategy is applied when fusing color and motion. The color object are persistent and robust to changes in pose and illumination, but are more prone to ambiguity, especially if the scene contains other objects characterized by a color distribution similar to that of the object of interest. The motion and sound object are very discriminative and they allow the object to be located with low ambiguity. 3.2.3 Motion Object Motion object this ambiguity can be considerably reduced if the object is moving. The motivation for multiple object comes from the inability of a single cue to fully describe the object and therefore its inability to achieve accurate and robust results. Any motion activity captures other important aspects of the sequence content, and has been studied from various perspectives [17]. In the case of a static camera, the absolute value of the luminance frame difference computed on successive pair images is used to calculate a likelihood model [1], similar to the one developed for the color measurements. Motion object are usually based on histogramming consecutive frame differences. 3.2.4 Texture Object

Texture is an appealing feature to use as a basis for an observation model because of its intuitive definition. Qualitatively: Texture can be described by terms such as rude, grained and smooth. Although there is no unique definition of texture, it is generally agreed that texture describes the spatial arrangements of pixel levels in an image, which may be stochastic or periodic, or both.16 When a texture is viewed from a distance it may appear to be .ne, however, when viewed from close up it may appear to be rude. Texture properties are analyzed by different method such as statistical methods, spatial frequency methods, structural methods and fractal-based methods. The method for feature extraction we have chosen is the frequency method of wavelets which implements a discrete wavelet transform. 3.2.5 Edge Object Edges are pixels where the intensity changes abruptly. An edge in an image is usually taken to mean the boundary between two regions having relatively distinct grey levels. The ideal situation is when the two regions have distinct constant grey levels and the edge is characterised by an abrupt change. However, in most practical situations edges are usually characterised by a smooth transition in grey level with the two regions having slowly varying but distinct average grey level. Edges may be:

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

76

i) Viewpoint dependent - they may change as the viewpoint changes and typically reflect the geometry of the scene, objects occluding. ii) Viewpoint independent - reflect properties of the viewed objects, e.g markings and surface shape. An image function depends on two co-ordinates in the image plane - and so operators describing edges are expressed using partial derivatives. A change of the image function can be described by a gradient that points in the direction to the largest growth of the image function. An edge [11] is a property attached to an individual pixel and is calculated from the image function behavior in a neighborhood of that pixel. An edge is a vector variable with two components magnitude and direction. 3.2.6 Multiple Object The greatest weakness of the color cue is its ambiguity, due to the presence of objects or regions with color features similar to those of the object of interest. By fusing color, motion, texture, and other object this ambiguity can be considerably reduced if the object of interest is moving as shown in (1, 19, 20). When the object is moving, strong localization object are provided by the motion measurements, whereas the color measurements can undergo substantial fluctuations due to changes in the object pose and illuminations. Conversely, when the object is stationary or near stationary the motion information disappears and color information dominates to provide a reliable localization cue. 4. Particles Filtering Using Multiple Object

A PF algorithm for object tracking in video sequences using multiple object: color and texture was developed [19, 20]. Due to space limitations other results could not be included here. However, these results show the algorithm performance under different scenarios. The PF is able to: 1) track asingle moving object; 2) to retrieve the object after tracking loss. This property is retrieved by mixed-state motion model [19] composed of a constant velocity model, and a re-initialisaton model drawing uniform samples (needed to recover the object after being lost). 5. Conclusions and Open Issues for Future Research

Particle filtering is a technique that is very suitable for object tracking in video sequences while we are tracking multiple object. We have results for a single object

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

77

using video sequences from: a fixed or moving single camera. The tracking algorithm is based on color and texture object. There are several challenges in solving tracking problems in image/ video applications. The first of them is the nonlinear character of the object of interest and of the observation model. The algorithms must often run at high update rates. In many applications the prior information available for the environment is limited. From the point of view of implementations this domain is rich and challenging because of the need to overcome occlusions of the tracked entities over one or more frames and dealing with missing sensor data. How to handle clutter in the background is of considerable importance as well, especially with multiple targets. In case of multiple sensors the data has to be fused appropriately and probabilistic data association techniques are then of primary importance. We are aiming to consider: i) detection of the object, i.e. the object has to be localised at first in the image and continuously tracked afterwards. One of the biggest problems in motion based tracking is to lose the object due to rapid movements. How to redetect the object of interest and follow its movement afterwards; ii) tracking rigid and non-rigid bodies, in three dimensions with multiple dynamically selected static or moving cameras. 6.
[1]

References P. Prez, J. Vermaak, A. Blake, Data Fusion for Tracking with Particles, Proc. IEEE, 92:3, 2004, 495-513. P. Brasnett, L. Mihaylova, N. Canagarajah, D. Bull, Particle Filtering with Multiple Object for Object Tracking in Video Sequences, Proc. of SPIEs Annual Symp. EI ST, 5685, 2005. A. Doucet, N. Freitas, N. Gordon, Eds., Sequential Monte Carlo Methods in Practice, New York: Springer-Verlag, 2001. M. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A Tutorial on Particle Filters for Online Nonlinear/ Non-Gaussian Bayesian Tracking, IEEE Trans. Sign. Proc., 50: 2, 2002, 174-188. J. Liu, Monte Carlo Strategies in Scientific Computing, Springer Verlag, 2001. M. Isard and A. Blake, Contour Tracking by Stochastic Propagation of Conditional Density, European Conf. on Comp. Vis., Cambridge, UK, 1996, 343-356. M. Isard, A. Blake, Condensation Conditional Density Propagation for Visual Tracking, Intl. Journal of Computer Vision, 28:1, 1998, 5-28.

[2]

[3]

[4]

[5] [6]

[7]

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)


[8]

78

C. Shen, A. van den Hengel, A. Dick, Probabilistic Multiple Cue Integration for Particle Filter Based Tracking, Proc. of the VIIth Digital Image Comp.: Techniques and Appl., 2003. Y. Bar-Shalom, X.R. Li, Estimation and Tracking: Principles, Techniques and Software, Artech House, 1993. H. Nait-Charif, S. McKenna, Tracking Poorly Modelled Motion Using Particle Filters with Iterated Likelihood Weighting, Proc. of Asian Conf. on Comp. Vis., 2003. M. Sonka, V. Hlavac, R. Boyle, Image Processing, Analysis, and Machine Vision, IInd Edition., Brooks/ Cole Publ. Company, 1999. S. McKenna, S. Jabri and S. Gong, Tracking Color Objects Using Adaptive Mixture Models, Image and Vision Computing, 17:3-4, 1999, 225-231. K. Nummiaro, E. Koller-Meierand, L. Van Gool, An Adaptive Color-Based Particle Filter, Image and Vision Comp., 21, 2003, 99-110. D. Comaniciu, V. Ramesh, P. Meer, Real-Time Tracking of Non-Rigid Objects Using Mean Shift, Proc. of 1st Conf. Comp. Vision Pattern Recogn., 2000, 142-149. F. Aherne, N. Thacker, P. Rockett, The Bhattacharyya Metric as an Absolute Similarity Measure for Frequentcy Coded Data, Kybernetika, 3: 4, 1997, 1-7. T. Kailath, The Divergence and Bhattacharyya Distance Measures in Signal Selection, IEEE Trans. On Communication Technology, COM-15:1, 1967, 52-60. J. Konrad, in Handbook of Images and Video Processing, Academic Press, 2000, 207-225. R. Porter, Texture Classification and Segmentation, PhD thesis, 1997, Univ. of Bristol. N. Gordon, D. Salmond and A. Smith, A Novel Approach to Nonlinear / NonGaussian Bayesian State Estimation, IEE Proc. on Radar and Signal Processing, 40, 1993, 107-113.

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

Ear Biometric: A New Approach for Human Identification


Suman Madan Asst. Prof.(IT), Jagan Institute of Management Studies, Sec-5, Rohini, Delhi Email ID: [email protected] Kanika Behl Asst. Prof.(IT), Jagan Institute of Management Studies, Sec-5, Rohini, Delhi Email ID: [email protected] ABSTRACT Biometrics is the science and technology of measuring and analyzing biological data. In information technology, biometrics refers to technologies identifying an individual based on his or her behavioral or physiological characteristics. Behavioral characteristics which are action related like speech, signature, keystroke and Physiological characteristics like retina, iris, vein, hand, finger etc are used for authentication purposes in Biometrics Security. The most interesting human anatomical parts for passive, physiological biometrics systems are face and ear. Ears have played a significant role in forensic science, especially in United States, where ear classification system based on manual measurements has been developed and has been in use for more than 40 years. This paper examines how biometrics is emerging as a preferred security method and is no longer confined to science fiction. It examines the potential of the human ear for personal identification and highlights the advantages of using ear as an aid for identification. In this paper, different methods to recognize humans using ear biometrics has been reviewed which will be highly useful in forensic science to detect the criminal.The paper will conceptually analyze the potential of the human

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

80

ear for personal identification thereby highlighting its major advantages over other biometric mechanisms. The study will be based on intensive research through secondary sources like journals, magazines, websites etc. All the data will be collected from different industry associations and renowned research organizations that have continuously been monitoring the trends under going global biometric industry.

Keywords: Biometric, Personal identification, Ear Biometrics. 1.


INTRODUCTION Biometric (Bio stands for life and metric means to measure) is a physiological or behavioral characteristic of a human being that can distinguish one person from another and that theoretically can be used for identification or verification of identity.[1]. The field of biometrics relies on the fact that many humans have distinctive and unique traits which can be used to distinguish them from other humans, acting as a form of identification. Physiological biometrics is based on data derived from direct measurement of a part of the human body. Fingerprint, iris-scan, DNA fingerprinting, retina scan, hand geometry, and facial recognition are leading physiological biometrics. Behavioral biometrics, in turn, is based on data derived from an action taken by a person. Voice recognition, keystroke-scan, and signature-scan are leading behavioral biometric traits. Any human physiological or behavioral characteristic can be used as a biometric characteristic as long as it is Universal, Unique and Permanent [2]. Most of the conventional identification methods have many disadvantages. The biometric methods of identification are preferred over traditional methods involving passwords and PIN numbers for various reasons: (i) the person to be identified is required to be physically present at the pointof-identification; (ii) identification based on biometric techniques obviates the need to remember a password or carry a token. Biometric recognition can be used in identification mode, where the biometric system identifies a person from the entire enrolled population by searching a database for a match. New identification method called ear biometric was recognized and advocated so well by the researchers in past because the ear is relatively immune to variation due to aging.

2.

HOW BIOMETRIC WORKS

Biometric systems work by recording and comparing biometric characteristics (Fingerprints, face geometry, iris patterns and hand geometry, digital signature). A biometric system is essentially a pattern recognition system that operates by acquiring biometric data from an individual, extracting a feature set from the acquired data, and comparing this feature set against the template set in the database. Thus, biometric

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

81

systems collect a sample of a physiological or behavioral characteristic, then, utilizing an algorithm, translate the sample into a unique template. Depending upon how the system is implemented, the sample may be stored centrally in a database in order to recreate the template again at a later time; however, on an ongoing basis, only the template is used to interact with the system. Individuals are initially enrolled into a biometric system, and can subsequently be matched against previously collected biometric templates.

Figure 1 : New Biometric based User Authentication Process Thus, all biometric systems consist of three basic elements: (i) Enrollment, or the process of collecting biometric samples from an individual, known as the enrollee, and the subsequent generation of his template.
(ii) (iii)

Templates, or the data representing the enrollees biometric. Matching, or the process of comparing a live biometric sample against one or many templates in the systems database.

2.1

Enrollment Enrollment is the crucial first stage for biometric authentication because enrollment generates a template that will be used for all subsequent matching. Typically, the device takes three samples of the same biometric and averages them to produce an enrollment template. Enrollment is complicated by the dependence of the performance of many biometric systems on the users familiarity with the biometric device because enrollment is usually the first time the user is exposed to the device. Environmental conditions also affect enrollment. Enrollment should take place under conditions similar to those expected during the routine matching process. For example, if voice verification is used in an environment where there is background noise, the systems ability to match voices to enrolled templates depends on capturing these templates in the same environment. In

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

82

addition to user and environmental issues, biometrics themselves change over time. Many biometric systems account for these changes by continuously averaging. Templates are averaged and updated each time the user attempts authentication. 2.2 Templates As the data representing the enrollees biometric, the biometric device creates templates. The device uses a proprietary algorithm to extract features appropriate to that biometric from the enrollees samples. Templates are only a record of distinguishing features, sometimes called minutiae points, of a persons biometric characteristic or trait. For example, templates are not an image or record of the actual fingerprint or voice. In basic terms, templates are numerical representations of key points taken from a persons body. The template is usually small in terms of computer memory use, and this allows for quick processing, which is a hallmark of biometric authentication. The template must be stored somewhere so that subsequent templates, created when a user tries to access the system using a sensor, can be compared. Some biometric experts claim it is impossible to reverse-engineer, or recreate, a persons print or image from the biometric template. 2.3 Matching

Matching is the comparison of two templates, the template produced at the time of enrollment (or at previous sessions, if there is continuous updating) with the one produced on the spot as a user tries to gain access by providing a biometric via a sensor. There are three ways a match can fail: Failure to enroll - Failure to enroll (or acquire) is the failure of the technology to extract distinguishing features appropriate to that technology. For example, a small percentage of the population fails to enroll in fingerprint-based biometric authentication systems. Two reasons account for this failure: the individuals fingerprints are not distinctive enough to be picked up by the system, or the distinguishing characteristics of the individuals fingerprints have been altered because of the individuals age or occupation, e.g., an elderly bricklayer.

False match (FM) - A false match occurs when a sample is incorrectly matched to a template in the database (i.e., an imposter is accepted).

False non-match (FNM)- A false non-match occurs when a sample is incorrectly not matched to a truly matching template in the database (i.e., a legitimate match is denied). FM and FNM are two terms that frequently misnomer false acceptance and false rejection, respectively, but these terms are application-dependent in meaning. FM and FNM are application-neutral terms to describe the matching process between a live sample

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

83

and a biometric template. Rates for FM and FNM are calculated and used to make tradeoffs between security and convenience. The primary function of a biometric device is to verify/ identify the registered people on a system. Access control needs to have the ability to authenticate a person and grant (or deny) access based on time restrictions. Door alarms are also monitored. There are several ways through which biometrics can accomplish these tasks.
RECOGNITION VERIFICATION
Authenticating identity of an individual by comparing a presented characteristic to a pre-enrolled characteristic ARE YOU WHO YOU SAY YOU ARE?

IDENTIFICATION
Determining the identity of an individual by comparing a presented characteristic to all pre-enrolled in the database WHO ARE YOU?

Figure 2 : Primary identification functions using biometrics

3.

BIOMETRIC TECHNOLOGIES

Biometric technologies are becoming the foundation of an extensive array of highly secure identification and personal verification solutions. As the level of security breaches and transaction fraud increases, the need for highly secure identification and personal verification technologies is becoming apparent. The biometric authenticate technologies are grouped into two categories as: i) Contact Biometric Technologies : fingerprint, hand/finger geometry, dynamic signature verification, and keystroke dynamics ii) Contactless Biometric Technologies : facial recognition, voice recognition, iris scan, retinal scan, ear scan 3.1 Fingerprint Recognition

It measures the unique pattern of lines on a persons finger. Fingerprint recognition is the acquisition and recognition of a persons fingerprint characteristics for identification purposes. A fingerprint is the pattern of ridges and valleys on the surface of a fingertip. Fingerprints of identical twins are different and so are the prints on each finger of the same person. The accuracy of the currently available fingerprint recognition systems is adequate for verification systems and small- to medium-scale identification systems involving a few hundred users. Multiple fingerprints of a person provide additional

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

84

information to allow for large-scale recognition involving millions of identities. The fingerprints dont change over time but scars, surgery, dry skin are some attacks that can alter or remove prints.

Figure 3: Fingerprint Facial Recognition Face recognition analyzes facial characteristics. It requires a digital camera to develop a facial image of the user for authentication. The image is transformed using a technique called elastic graph matching. Algebraic algorithms are used to make a perfect match. The most popular approaches to face recognition are based on either (i) the location and shape of facial attributes, such as the eyes, eyebrows, nose, lips, and chin and their spatial relationships, or (ii) the overall (global) analysis of the face image that represents a face as a weighted combination of a number of canonical faces. This technique has attracted considerable interest. 3.2

Figure 4: Face Recognition points Iris Recognition It measures the iris pattern in the colored part of the eye. Iris is a protected internal organ whose random texture is stable throughout life thus providing a stable and reliable means of biometric source. It is the annular region of the eye bounded by the pupil and the sclera (white of the eye) on either side. Each iris is distinctive and, like fingerprints, even the irises of identical twins are different. 3.3

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

85

Figure 5 : Iris Recognition 3.4 Voice Recognition Voice is a combination of physiological and behavioral biometrics. The features of an individuals voice are based on the shape and size of the appendages (e.g., vocal tracts, mouth, nasal cavities, and lips) that are used in the synthesis of the sound. These physiological characteristics of human speech are invariant for an individual, but the behavioral part of the speech of a person changes over time due to age, medical conditions (such as common cold), emotional state, etc. It measures the vocal characteristics of a person using a specific phrase. Voice recognition technology identifies people based on the differences in the voice resulting from physiological differences and speaking habits. A disadvantage of voice-based recognition is that speech features are sensitive to a number of factors such as background noise. 3.5 Ear Recognition

Human ear contains large amount of specific and unique features that allows for human identification. The ear recognition approaches are based on matching the distance of salient points on the pinna from a landmark location on the ear. Ear image to Force Field Transformation is done and the entire image is transformed into a force field. Human ear prints have been used as proof in forensic science for many years in Netherlands and United States.

Figure 6 : Ear Measurement points

4.

EAR BIOMETRIC : A NEW HUMAN IDENTIFICATION METHOD The ear does not have a completely random structure; it is made up of standard features just like the face. The ear include the outer rim (helix), the ridge (antihelix) that runs inside and parallel to the helix, the lobe, and the distinctive u-shaped notch known as the intertragic notch between the ear hole (meatus) and the lobe.

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

86

Figure 7: Locations of the anatomical features of Ear 4.1 Ear Biometrics may beat Other Biometric methods

There are many advantages of using the ear as a source of data for human identification. Firstly, the ear has a very rich structure of characteristic ear parts. The location of these characteristic elements, their direction, angles, size and relation within the ear are distinctive and unique for humans, and therefore, may be used as a modality for human identification. Ear is one of the most stable human anatomical feature. It does not change considerably during human life while face changes more significantly with age than any other part of human body. Face can also change due to cosmetics, facial hair and hair styling. Secondly, human face changes due to emotions and express different states of mind. In contrast, ear features are fixed and unchangeable by emotions. Hence, the ear does not suffer from changes in facial expression and is firmly fixed in the middle of the side of the head so that the immediate background is predictable whereas face recognition usually requires the face to be captured against a controlled background. Thirdly, In the process of acquisition, in contrast to face identification systems, ear images cannot be disturbed by glasses, beard or make-up. Furthermore, in ear biometric systems there is no need to touch any devices and therefore there are no problems with hygiene. The ear images are more secure than face images, mainly because it is very difficult to associate ear image with a given person (in fact, most of users are not able to recognize their own ear image). Therefore ear image databases do not have to be as much secured as face databases, since the risk of attacks is much lower. Lastly, the color distribution is more uniform in ear than in human face, iris or retina. 4.2 Ear Biometric Methods Many methods have been discovered for detecting human on the basis of ear. Some of the methods have been reviewed and provided as follows:

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

87

4.2.1 Neighbourhood graphs based on Voronoi diagrams Burge [3] introduced a class of human identification based on ear biometrics in which each subjects ear is modeled as an adjacency graph built from the Voronoi diagram of its curve segments. He introduced a novel graph matching based algorithm for authentication which takes into account the erroneous curve segments which can occur due to changes (e.g. lighting, shadowing and occlusion) in the ear image.

Figure 8: A Sample ear image with the constructed Voronoi diagram

4.2.2 Ear Biometrics based on compression networks J. Moreno[4] proposed a new multiple identification method, which combines the results obtained by several neural classifiers which feature outer ear points. In it, the information obtained from ear shape, wrinkles and macro features is extracted by a compression network. 4.2.3 Ear Biometrics based on Force Field transformation Hurley [5] proposed a new method in defining feature space to reduce the dimensionality of pattern space yet maintaining discriminatory power for classification and invariant description. So to meet this objective, in ear biometrics, a novel force field transformation has been developed in which the image is treated as an array of Gaussian attractors that act as the source of force field. He used the application of force field transformation in order to find energy lines, wells and channels as ear features.

Figure 9 : Force field transformations [5]

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

88

4.2.4 Ear Biometrics based on PCA and eigen-ear Chang[6,7] found that the multi-modal recognition using both ear and face results in statistically significant improvement over either biometric. Principal component analysis (PCA) is by far the most widely adopted methods used in ear biometrics. PCA is a technique for reducing the dimension of feature vectors while preserving the variation in the dataset. A low dimension space called Eigen space, which is defined by a set of Eigen vectors of dataset is used in classification. Victor applied the PCA approach to the images of face and ear using the same set of objects. His results indicate that the face provides a more reliable biometric than ear. 4.2.5 Ear Biometrics based on Geometrical Feature Extraction Michal Choras[8,9] proposed invariant geometrical method in order to extract features needed to classification. His approach is divided into image normalization, contour extraction (edge detection), calculation of the centroid, coordinates normalization and 2 steps of geometrical feature extraction. He concluded that geometrical features representing shapes of ear contours are more suitable for ear images than texture, color or global features.
Contour Detection and Binarization Centroid Calculation Coordinates Normalization

Classification

Feature Extraction (2 steps)

Radius Calculation

Figure 10 : Process to extract features needed for classification[8]

5.

CONCLUSION Reliable personal recognition is critical to many business processes. Biometrics refers to automatic recognition of an individual based on her behavioral and/or physiological characteristics. Biometric-based systems also have some limitations that may have adverse implications for the security of a system. While some of the limitations of biometrics can be overcome with the evolution of biometric technology and a careful system design, it is important to understand that foolproof personal recognition systems simply do not exist and perhaps, never will. But the multimodal biometric security can provide almost foolproof security as discussed above.

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

89

From the theoretical evidence discussed above, ear biometrics is a viable and promising new passive approach to automated human identification. They are especially useful when used to supplement existing automated methods. Though ear biometrics appears promising, additional research needs to be conducted to answer important questions like Occlusion by hair (In the case of the ear being completely occluded by hair). In conclusion, one may add that though ear is still an infant in the ever enlarging field of biometrics, it is already promising its grit and is on the verge of emerging as a major passive biometric tool.

6.
[1] [2]

REFERENCES Biometrics - A Unique Authentication Approach by Prof. David Zhang Basit, A., Javed, M. Y. and Anjum, M. A.,Efficient iris recognition method for human identification, ENFOR MATIKA, pp. 24-26,vol 1, 2005 Burge M., Burger W., Ear Recognition, in Biometrics: Personal Identification in Networked Society (eds. Jain A.K., Bolle R., Pankanti S.), 273-286, Kluwer Academic Publishing, 1998. Moreno B., Sanchez A., Velez J.F., On the Use of Outer Ear Images for Personal Identification in Security Applications, 469-476, IEEE 1999. Hurley D.J., Nixon M.S., Carter J.N., Force Field Energy Functionals for Image Feature Extraction, Image and Vision Computing Journal, vol. 20, no. 5-6, 311318, 2002. Victor B., Bowyer K.W, Sarkar S., An Evaluation of Face and ear Biometrics, Proceedings on international conference on pattern recognition, I:429-432, 2002 Chang K., Victor B., Bowyer K.W., Sarkar S., Comparison and Combination of Ear and Face Images for Biometric Recognition, 2003. Choras Michal, Feature Extraction Based on Contour Processing in Ear Biometrics, IEEE Workshop on Multimedia Communications and Services, MCS04, 15-19, Cracow, 2004. Choras Michal, Ear Biometric based on geometrical method of feature extraction, in F. J Perales and B.A. Draper: Articulated Motion and Deformable Objects, 51-61, 2004,Springer-Verlag LNCS 3179.

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)


[10]

90

International Journal of Biometrics, Volume 1 - Issue 1 - 2008, Behavioral biometrics: a survey and classification (Pg no: 81 - 113). Handbook of Biometrics by Anil. K. Jain, Patrick Joseph Flynn, Arun Abraham Ross (2008), Springer (ISBN-13: 978-0-387-71040-2) Adler, A., Youmaran, R. and Loyka, S. (2006) Towards a Measure of Biometric Information, retrieved August 2, 2006 from http://www.sce.carleton.ca/faculty/ adler/publications/2006/youmaran-ccece2006-biometric-entropy.pdf

[11]

[12]

Emotion based Speaker Identification for Hindi Speech


Sushma Bahuguna (Student, USIT, GGSIPU) Email ID: [email protected] Anurag Jain (Asstt. Professor., USIT, GGSIPU) Email ID: [email protected] ABSTRACT Speaker identification system has potential applications in many security applications. Emotion Based Speaker Identification System (ESIS) makes it possible to use speakers voice to verify their identity and control access to various services. In this paper we have used MFCC and VQ for emotion based speaker Identification studies. The promising result of 79% suggests that the framework is feasible to be used to identify a speaker irrespective of the emotions in the speech. Keywords: Hindi Speech, Voice Identificaion, Speaker Identification etc. 1. Introduction Emotion based speaker Identification System involves using reference emotion data and Identification by human and machine recognizer. Such a ESIS has potential application in many security modules. ESIS makes it possible to use the speakers voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers.

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

92

For experimental purpose we collected voice samples of male and female expressing sentences each frequently used in everyday communication in five basic emotions namely Happy(H), Natural (N), Surprise (S), Sad (Sa), Anger (A), Others (O) Emotional speech databases of 20 sample sentences in Hindi are used for emotion expressions. Voice samples of male and female speakers of different age group are recorded and stored in 5 emotions as shown in table 1. Speaker AgeRange Gender No of (Yrs) Emotions 30-35 35-40 10-15 25-30 Female Male Male Female 5 5 5 5 Total Dir Path Samples .\S2 .\S2 .\S3 .\S4

S1 S2 S3 S4

100 100 100 100

Table 1: Specifications of the voice sample Our task is to train a voice model for each speaker using the corresponding sound file in the respective folder. After this training step, the system would have knowledge of the voice characteristic of each (known) speaker. Next, in the testing phase, the system will be able to identify the (assumed unknown) speakers of each sound file in the TEST folder. The system is able to identify which registered speaker provides a given utterance from amongst a set of known speakers 2. Human speaker Identification Before doing the computer Speaker Identification using our developed Quantization based development, first we have done human identification test to check the identification rate that can be achieved by human on the same voice samples. A confusion matrix is used to evaluate the performance of classifier during supervised learning. It is a matrix plot of the predicted versus actual classes of voice data and summarizes the types of errors that a classification model is likely to make. It is calculated by applying the model to test data in which the target values are already known and these target values are compared with the predicted target values. Table 2(a),(b) is the results of human identification of speech emotion of the respective sound files stored. Each file was played and a group of listeners have distinguished the voice of speakers and provided us the information. Table 2(a) shows

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

93

the confusion matrix table that is achieved by human speaker emotion identification and a recognition rate of 72% is achieved. Table 2(b) shows the confusion matrix table that is achieved by human speaker identification irrespective of emotion and was able to identify 97.25% of speakers.

Table 2(a): Confusion matrix table for Human Emotion Identification Rate The following can be computed from this confusion matrix: The model made 75 correct predictions (18+14+12+15+16). The model made 25 incorrect predictions (2+3+3+2+5+1+5+3+1). The model scored 100 cases (75+25). The error rate is 25/100 =25%. The accuracy rate is 75/100= 75%.

Table 2(b): Confusion matrix table for Human Speaker Identification Rate

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

94

The following can be computed from this confusion matrix: The model made 389 correct predictions. The model made 11 incorrect predictions The model scored 400 cases (389+11). The error rate is 11/400 =2.75% The accuracy rate is 389/400= 97.25%.

Although voice authentication appears to be an easy authentication method in both how it is implemented and how it is used, there are some user influences that must be addressed like Cold, Expression and volume, Misspoken or misread prompted phrases, Previous user activity, Background noises etc. 3. Text Independent Speaker Identification Methods One of the most successful text-independent recognition methods is based on vector quantization (VQ). In this method, VQ code-books consisting of a small number of representative feature vectors are used as an efficient means of characterizing speakerspecific features. A speaker-specific code-book is generated by clustering the training feature vectors of each speaker. In the recognition stage, an input utterance is vectorquantized using the code-book of each reference speaker and the VQ distortion accumulated over the entire input utterance is used to make the recognition decision. Temporal variation in speech signal parameters over the long term can be represented by stochastic Markovian transitions between states. Therefore, methods using an ergodic HMM, where all possible transitions between states are allowed, have been proposed. Speech segments are classified into one of the broad phonetic categories corresponding to the HMM states. After the classification, appropriate features are selected. In the training phase, reference templates are generated and verification thresholds are computed for each phonetic category. In the verification phase, after the phonetic categorization, a comparison with the reference template for each particular category provides a verification score for that category. The final verification score is a weighted linear combination of the scores from each category.
A voice analysis is done after taking an input through microphone from a user. The design of the system involves manipulation of the input audio signal. The voice algorithms consist of two distinguished phases. The first one is training sessions, while the second one is referred to as operation session or testing phase as described in figure1

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

95

Voice Recognition algorithms


Training Phase Each Speaker has to provide samples of their voice so that the reference template model can be build.

Testing Phase To ensure the input test voice is match with stored reference template model and recognition decision are made.

Figure 1 : Phases of Voice Recognition Algorithm 4. Speech Features

Since each speaker has unique physiological characteristics of speech production and speaking style, speaker-specific characteristics are also reflected in prosody. It is generally recognized that human listeners can better recognize speakers. In most of the ASR-free approaches, pitch contour dynamics are represented using parameters derived from linear stylized pitch segments, which has the advantage that features are derived directly from the speech signal. We attempt to compute a multitude of features such as pitch contour, formants contour, energy contour, spectral features, short-time energy and MFCCs so that an exhaustive feature set is created. Figure 2(a), (b) shows the pitch, energy, formants and intensity of same sentence spoken by same speaker in two different emotions. The variation of pitch provides some recognizable melodic properties to speech. This controlled modulation of pitch is referred as intonation. The sound units are shortened or lengthened in accordance to some underlying pattern giving rhythmic properties to speech in different emotions. Some syllables or words may be made more prominent than others, resulting in linguistic stress. The information gleaned from melody, timing and stress in speech increases the intelligibility of spoken message, enabling the listener to segment continuous speech into phrases and words with ease. It is also capable of conveying many more lexical and nonlexical information such as lexical tone, prominence, accent and emotion. The characteristics that make us perceive these effects are collectively referred to as prosody. Prosodic cues include stress, rhythm and intonation. Each cue is a complex perceptual entity, expressed primarily using three acoustic parameters: pitch, energy and duration.

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

96

Spectrogram

Pitch

Intensity

Formants

Figure 2(a): Speech features : Pitch, Formants, Spectrogram, Intensity for S1_anger_01.wav

Spectrogram

Pitch

Intensity

Formants

Figure 2(b): Speech features : Pitch, Formants, Spectrogram, Intensity for S1_anger_01.wav 5. Mel-frequency Cepstrum Coefficients processor

A block diagram of the structure of an MFCC processor is given in Figure 3. At different levels, different operations are performed on the input signal such as Preemphasis, Framing, Windowing, Mel Cepstrum analysis and Recognition (Matching) of the spoken word.

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

97

The speech input is typically recorded at a sampling rate above 10000 Hz. This sampling frequency was chosen to minimize the effects of aliasing in the analog-to-digital conversion. These sampled signals can capture all frequencies up to 5 kHz, which cover most energy of sounds that are generated by humans. The main purpose of the MFCC processor is to mimic the behavior of the human ears.

Figure 3 : Phases of Voice Recognition Algorithm 5.1 Frame Blocking In this step the continuous speech signal is blocked into frames of N samples, with

w(n), 0 n N 1adjacent frames being separated by M (M < N). Typical values for N and M are N = 256
(which is equivalent to ~ 30 msec windowing and facilitate the fast radix-2 FFT) and M = 100. Following is the script to compute matrix containing all frames

5.2

Windowing

The next step in the processing is to window each individual frame so as to minimize the signal discontinuities at the beginning and end of each frame. we define the window as ,

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

98

where N = number of samples in each frame y l (n) = xl ( n) w(n), 0 n N 1 y(n) = Output signal x (n) = input signal w(n) = Hamming window, then the result of windowing signal is shown below:

2n w(n) = 0.54 0.46cos , 0 n N 1 N 1


Following is the script to create Hamming Matrix M2.

h = hamming(n); M2=diag(h) * M

5.3

Fast Fourier Transform (FFT)

The next processing step is the Fast Fourier Transform, which converts each frame of N samples from the time domain into the frequency domain. The FFT is a fast algorithm to implement the Discrete Fourier Transform (DFT), which is defined on the set of N samples {xn}, as follow:

Xk = xne j2kn/ N ,
n=0

N 1

k = 0,1,2,...,N 1

Following is the script to create matrix M3 where the column vectors are the FFTs of column vector M2.
for i = 1:nbFrame frame(:,i) = fft(M2(:, i)); end

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

99

The result after this step is often referred to as spectrum or periodogram (Figure 4).

Figure 4: Speech Signal represented as a sequence of spectral vectors and Spectrogram Human perception of the frequency contents of sounds for speech signals does not follow a linear scale. Thus for each tone with an actual frequency, f, measured in Hz, a subjective pitch is measured on a scale called the mel scale . The mel-frequency scale is a linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz. F(Mel)=[ 2595 * log 10 [ 1 + f ] 700 ]
One approach to simulating the subjective spectrum is to use a filter bank, spaced uniformly on the mel-scale. That filter bank has a triangular bandpass frequency response, and the spacing as well as the bandwidth is determined by a constant mel frequency interval as shown in Figure5(a),(b). The number of mel spectrum coefficients, K, is typically chosen as 20. This filter bank behaves like a succession of histograms on the spectrum. Each filter of the filter bank has a triangular frequency response which quantifies the zone of frequency spectrum.

Proceedings of National Conference On Web and Knowledge Based Systems (WKBS)

100

Figure 5 (a): Mel-Frequency filters

Figure 5 (b): Plot of Mel-Spaced filter bank of speech file S1.wav. Command to plot the filter bank:
plot(linspace(0, (fs1/2), (melfb(20, 256, fs1))); 129),

5.4

Cepstrum

In this step, we convert the log mel spectrum back to time. The result is called the mel frequency cepstrum coefficients (MFCC). The cepstral representation of the speech spectrum provides a good representation of the local spectral properties of the signal for the given frame analysis. Because the mel spectrum coefficients (and so their logarithm) are real numbers, we can convert them to the time domain using the Discrete Cosine Transform (DCT). Therefore if we denote those mel power spectrum coefficients that are ~ ~ , as the result of the last step are S , k = 0,2 ,..., K 1 , we can calculate the MFCCs, c
0

5.5

Feature Matching

At the highest level, all speaker identification systems contain two main modules: feature extraction and feature matching. The problem of speaker recognition belongs to a much broader topic in scientific and engineering so called pattern recognition. The

You might also like