如何设计算法比对两个文本文件的内容

yimao_44 2013-09-24 02:01:39
首先给你一个正确的文本文件(里面的内容都是正确的文字)
在给你一个经过识别的文本文件(里面的内容有正确识别的文字和错误识别的文字,包括乱码)

现在我要比对这两个正确的文本文件和识别的文本文件的内容 统计识别率 平台是VS2008

我不知道怎么设计这个算法
我本打算一行一行的比对 但是遇到乱码怎么剔除 怎么一个字符一个字符的比对 可是还是没啥头绪的

哪位大神给个思路也好
...全文
491 10 打赏 收藏 转发到动态 举报
AI 作业
写回复
用AI写文章
10 条回复
切换为时间正序
请发表友善的回复…
发表回复
赵4老师 2013-09-24
  • 打赏
  • 举报
回复
strcmp, wcscmp, _mbscmp Compare strings. int strcmp( const char *string1, const char *string2 ); int wcscmp( const wchar_t *string1, const wchar_t *string2 ); int _mbscmp(const unsigned char *string1, const unsigned char *string2 ); Routine Required Header Compatibility strcmp <string.h> ANSI, Win 95, Win NT wcscmp <string.h> or <wchar.h> ANSI, Win 95, Win NT _mbscmp <mbstring.h> Win 95, Win NT For additional compatibility information, see Compatibility in the Introduction. Libraries LIBC.LIB Single thread static library, retail version LIBCMT.LIB Multithread static library, retail version MSVCRT.LIB Import library for MSVCRT.DLL, retail version Return Value The return value for each of these functions indicates the lexicographic relation of string1 to string2. Value Relationship of string1 to string2 < 0 string1 less than string2 0 string1 identical to string2 > 0 string1 greater than string2 On an error, _mbscmp returns _NLSCMPERROR, which is defined in STRING.H and MBSTRING.H. Parameters string1, string2 Null-terminated strings to compare Remarks The strcmp function compares string1 and string2 lexicographically and returns a value indicating their relationship. wcscmp and _mbscmp are wide-character and multibyte-character versions of strcmp. The arguments and return value of wcscmp are wide-character strings; those of _mbscmp are multibyte-character strings. _mbscmp recognizes multibyte-character sequences according to the current multibyte code page and returns _NLSCMPERROR on an error. (For more information, see Code Pages.) These three functions behave identically otherwise. Generic-Text Routine Mappings TCHAR.H Routine _UNICODE & _MBCS Not Defined _MBCS Defined _UNICODE Defined _tcscmp strcmp _mbscmp wcscmp The strcmp functions differ from the strcoll functions in that strcmp comparisons are not affected by locale, whereas the manner of strcoll comparisons is determined by the LC_COLLATE category of the current locale. For more information on the LC_COLLATE category, see setlocale. In the “C” locale, the order of characters in the character set (ASCII character set) is the same as the lexicographic character order. However, in other locales, the order of characters in the character set may differ from the lexicographic order. For example, in certain European locales, the character 'a' (value 0x61) precedes the character 'ä' (value 0xE4) in the character set, but the character 'ä' precedes the character 'a' lexicographically. In locales for which the character set and the lexicographic character order differ, use strcoll rather than strcmp for lexicographic comparison of strings according to the LC_COLLATE category setting of the current locale. Thus, to perform a lexicographic comparison of the locale in the above example, use strcoll rather than strcmp. Alternatively, you can use strxfrm on the original strings, then use strcmp on the resulting strings. _stricmp, _wcsicmp, and _mbsicmp compare strings by first converting them to their lowercase forms.Two strings containing characters located between 'Z' and 'a' in the ASCII table ('[', '\', ']', '^', '_', and '`') compare differently, depending on their case. For example, the two strings "ABCDE" and "ABCD^" compare one way if the comparison is lowercase ("abcde" > "abcd^") and the other way ("ABCDE" < "ABCD^") if the comparison is uppercase. Example /* STRCMP.C */ #include <string.h> #include <stdio.h> char string1[] = "The quick brown dog jumps over the lazy fox"; char string2[] = "The QUICK brown dog jumps over the lazy fox"; void main( void ) { char tmp[20]; int result; /* Case sensitive */ printf( "Compare strings:\n\t%s\n\t%s\n\n", string1, string2 ); result = strcmp( string1, string2 ); if( result > 0 ) strcpy( tmp, "greater than" ); else if( result < 0 ) strcpy( tmp, "less than" ); else strcpy( tmp, "equal to" ); printf( "\tstrcmp: String 1 is %s string 2\n", tmp ); /* Case insensitive (could use equivalent _stricmp) */ result = _stricmp( string1, string2 ); if( result > 0 ) strcpy( tmp, "greater than" ); else if( result < 0 ) strcpy( tmp, "less than" ); else strcpy( tmp, "equal to" ); printf( "\t_stricmp: String 1 is %s string 2\n", tmp ); } Output Compare strings: The quick brown dog jumps over the lazy fox The QUICK brown dog jumps over the lazy fox strcmp: String 1 is greater than string 2 _stricmp: String 1 is equal to string 2 String Manipulation Routines See Also memcmp, _memicmp, strcoll Functions, _stricmp, strncmp, _strnicmp, strrchr, strspn, strxfrm
yimao_44 2013-09-24
  • 打赏
  • 举报
回复
引用 8 楼 zhao4zhong1 的回复:
[quote=引用 5 楼 Dobzhansky 的回复:] Windows PlatformSDK 中有个例子, sdkdiff, 比较文本文件差别并图形化显示差别
C:\Program Files\Microsoft Platform SDK for Windows Server 2003 R2\Samples\Begin\sdkdiff\readme.txt
THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF
ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
PARTICULAR PURPOSE.

Copyright (C) 1999  Microsoft Corporation.  All Rights Reserved.

SDKDiff Sample
-------------------------------

Abstract:

 SDKDiff allows the users to compare two files or two directories
 against each other, and display the differences found between the
 files or directories on the screen.

 The differences are displayed textually and graphically.  Graphically,
 data that exists in the first file but does not exist in the second file
 is represented by a red line, whereas data that does not exist in the first
 file but exists in the second file is represented by yellow line.
 The identical parts of the files are displayed in black line.  The 2 lines, one
 for each file, made up of different colors based on blocks of identical or
 different blocks of data in the files.

 Both files are displayed simultaneously.  They are virtually mapped to one file,
 where the lines that only appear in the first file are highlighted in red, and
 the lines that only appear in the left file are highlighted in yellow.

Supported OS:
 Windows 2000, Windows XP

Building:
 Build the sample using the latest Platform SDK via the MAKEFILE included.
 For building in Visual Studio, use the sdkdiff.vcproj for Visual Studio.NET and
 sdkdiff.dsp for Visual Studio 6.0.

Usage:
 The sample can be run directly on the command-line by typing 'sdkdiff'.
 This starts the sdkdiff program, and the files or directories to be compared can
 be chosen via the menus in the program: File->Compare Files or File->Compare Directories.

Special Note for 64-bit Build Environments:

This sample builds a binary sample and an associated help file. In some 64-bit
environments, NMAKE is unable to find the help compiler to build the help file
and will fail. As a result, help-related functions in this sample will be unable
to find the help file. To resolve this problem, build the sample in a 32-bit
environment and then copy the help file to the output directory of the 64-bit
build environment. Alternatively, add the path of the HCRTF.EXE tool (located
in the installed Visual Studio distribution) to the "PATH" environment variable.

[/quote]
引用 8 楼 zhao4zhong1 的回复:
[quote=引用 5 楼 Dobzhansky 的回复:] Windows PlatformSDK 中有个例子, sdkdiff, 比较文本文件差别并图形化显示差别
C:\Program Files\Microsoft Platform SDK for Windows Server 2003 R2\Samples\Begin\sdkdiff\readme.txt
THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF
ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
PARTICULAR PURPOSE.

Copyright (C) 1999  Microsoft Corporation.  All Rights Reserved.

SDKDiff Sample
-------------------------------

Abstract:

 SDKDiff allows the users to compare two files or two directories
 against each other, and display the differences found between the
 files or directories on the screen.

 The differences are displayed textually and graphically.  Graphically,
 data that exists in the first file but does not exist in the second file
 is represented by a red line, whereas data that does not exist in the first
 file but exists in the second file is represented by yellow line.
 The identical parts of the files are displayed in black line.  The 2 lines, one
 for each file, made up of different colors based on blocks of identical or
 different blocks of data in the files.

 Both files are displayed simultaneously.  They are virtually mapped to one file,
 where the lines that only appear in the first file are highlighted in red, and
 the lines that only appear in the left file are highlighted in yellow.

Supported OS:
 Windows 2000, Windows XP

Building:
 Build the sample using the latest Platform SDK via the MAKEFILE included.
 For building in Visual Studio, use the sdkdiff.vcproj for Visual Studio.NET and
 sdkdiff.dsp for Visual Studio 6.0.

Usage:
 The sample can be run directly on the command-line by typing 'sdkdiff'.
 This starts the sdkdiff program, and the files or directories to be compared can
 be chosen via the menus in the program: File->Compare Files or File->Compare Directories.

Special Note for 64-bit Build Environments:

This sample builds a binary sample and an associated help file. In some 64-bit
environments, NMAKE is unable to find the help compiler to build the help file
and will fail. As a result, help-related functions in this sample will be unable
to find the help file. To resolve this problem, build the sample in a 32-bit
environment and then copy the help file to the output directory of the 64-bit
build environment. Alternatively, add the path of the HCRTF.EXE tool (located
in the installed Visual Studio distribution) to the "PATH" environment variable.

[/quote] 我咋感觉这个貌似跟一个工具很想啊 叫什么compare 就是代码差异比较器很相似啊 可是我不是要显示那些错的那些对的在屏幕上 我要的是统计的结果 人家领导就是要统计结果 数字 我觉得这个sdkdiff好牛啊 可是跟我的程序怎么对接啊
赵4老师 2013-09-24
  • 打赏
  • 举报
回复
引用 5 楼 Dobzhansky 的回复:
Windows PlatformSDK 中有个例子, sdkdiff, 比较文本文件差别并图形化显示差别
C:\Program Files\Microsoft Platform SDK for Windows Server 2003 R2\Samples\Begin\sdkdiff\readme.txt
THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF
ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
PARTICULAR PURPOSE.

Copyright (C) 1999  Microsoft Corporation.  All Rights Reserved.

SDKDiff Sample
-------------------------------

Abstract:

 SDKDiff allows the users to compare two files or two directories
 against each other, and display the differences found between the
 files or directories on the screen.

 The differences are displayed textually and graphically.  Graphically,
 data that exists in the first file but does not exist in the second file
 is represented by a red line, whereas data that does not exist in the first
 file but exists in the second file is represented by yellow line.
 The identical parts of the files are displayed in black line.  The 2 lines, one
 for each file, made up of different colors based on blocks of identical or
 different blocks of data in the files.

 Both files are displayed simultaneously.  They are virtually mapped to one file,
 where the lines that only appear in the first file are highlighted in red, and
 the lines that only appear in the left file are highlighted in yellow.

Supported OS:
 Windows 2000, Windows XP

Building:
 Build the sample using the latest Platform SDK via the MAKEFILE included.
 For building in Visual Studio, use the sdkdiff.vcproj for Visual Studio.NET and
 sdkdiff.dsp for Visual Studio 6.0.

Usage:
 The sample can be run directly on the command-line by typing 'sdkdiff'.
 This starts the sdkdiff program, and the files or directories to be compared can
 be chosen via the menus in the program: File->Compare Files or File->Compare Directories.

Special Note for 64-bit Build Environments:

This sample builds a binary sample and an associated help file. In some 64-bit
environments, NMAKE is unable to find the help compiler to build the help file
and will fail. As a result, help-related functions in this sample will be unable
to find the help file. To resolve this problem, build the sample in a 32-bit
environment and then copy the help file to the output directory of the 64-bit
build environment. Alternatively, add the path of the HCRTF.EXE tool (located
in the installed Visual Studio distribution) to the "PATH" environment variable.

yimao_44 2013-09-24
  • 打赏
  • 举报
回复
引用 5 楼 Dobzhansky 的回复:
Windows PlatformSDK 中有个例子, sdkdiff, 比较文本文件差别并图形化显示差别
diff貌似是linux下的吧 我是windows下的
yimao_44 2013-09-24
  • 打赏
  • 举报
回复
引用 5 楼 Dobzhansky 的回复:
Windows PlatformSDK 中有个例子, sdkdiff, 比较文本文件差别并图形化显示差别
你能说具体点吗 sdkdiff这个函数吗 还是? 要是找不到合适的函数 我都打算自己写了 自己写得话到时候if else特别多还准备度未必高 就想知道有没有系统函数可以调用
Dobzhansky 2013-09-24
  • 打赏
  • 举报
回复
Windows PlatformSDK 中有个例子, sdkdiff, 比较文本文件差别并图形化显示差别
yimao_44 2013-09-24
  • 打赏
  • 举报
回复
引用 3 楼 u012162828 的回复:
[quote=引用 2 楼 zhao4zhong1 的回复:] system("fc file1 file2 >fc.txt"); //然后读文件fc.txt的内容
啥意思啊 没明白 我是要统计多少对的 多少错的字 例如 a.tx为 我是大明星 我爸爸是李刚 我是林黛玉 (为了简单,假设这个文本就这一行) b.txt为 #@¥#@¥我是林黛玉 则正确率为5/16 就是不是一一比对的 如果错了 我们找最大正确结果为正确结果 然后统计[/quote] 并且我们忽略里面的空格 空白符 等 例如10变成了1 0 也是是正确的
yimao_44 2013-09-24
  • 打赏
  • 举报
回复
引用 2 楼 zhao4zhong1 的回复:
system("fc file1 file2 >fc.txt"); //然后读文件fc.txt的内容
啥意思啊 没明白 我是要统计多少对的 多少错的字 例如 a.tx为 我是大明星 我爸爸是李刚 我是林黛玉 (为了简单,假设这个文本就这一行) b.txt为 #@¥#@¥我是林黛玉 则正确率为5/16 就是不是一一比对的 如果错了 我们找最大正确结果为正确结果 然后统计
赵4老师 2013-09-24
  • 打赏
  • 举报
回复
system("fc file1 file2 >fc.txt"); //然后读文件fc.txt的内容
FancyMouse 2013-09-24
  • 打赏
  • 举报
回复
字符串编辑距离

65,186

社区成员

发帖
与我相关
我的任务
社区描述
C++ 语言相关问题讨论,技术干货分享,前沿动态等
c++ 技术论坛(原bbs)
社区管理员
  • C++ 语言社区
  • encoderlee
  • paschen
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
  1. 请不要发布与C++技术无关的贴子
  2. 请不要发布与技术无关的招聘、广告的帖子
  3. 请尽可能的描述清楚你的问题,如果涉及到代码请尽可能的格式化一下

试试用AI创作助手写篇文章吧