[Programming/Scripting] A nuisance question.

  • Two Factor Authentication is now available on BeyondUnreal Forums. To configure it, visit your Profile and look for the "Two Step Verification" option on the left side. We can send codes via email (may be slower) or you can set up any TOTP Authenticator app on your phone (Authy, Google Authenticator, etc) to deliver codes. It is highly recommended that you configure this to keep your account safe.

Thrash123

Obey Leash Laws
Jul 19, 1999
4,777
0
36
40
Nowhere to be found.
www.classicwfl.com
Alright, here's the situation. I've got a text file exported from some DBMS that I don't have access to, and I need to import it into a MySQL DB. It's not tab-delimited; it's space separated based on available characters in fields.

However, the interesting problem is lines this..

Code:
 STU                 DISCO          M           D-XXX-6         11/15/199    6/26/1925 YES                    7003
                                                                           1

When the DBMS they use exported the date fields, it truncated it if there was a double digit in the month and day, and then slapped the remainder on a second line, spaced over to the matching date field area (and somehow the Code tag is borking it).

It's also doing this with certain names.

So, I need to figure out how to get the truncated year & names put back together before I make it CSV to import into MySQL.

Also, this file has over 28,000 lines, so manual correction is a no-go.. plus, we've got to to be able to replace the database with an updated one every 6 months to a year.

Any suggestions are welcome. I've never been much of a RegEx guru, and I just don't have time to sift through this. Was hoping one of my fellow IT people would be able to help me out on this solution. I'll give you a free copy of one of my albums or something in exchange to whomever's solution I use (even if it's just some pseudocode that leads me in the right direction).
 

Thrash123

Obey Leash Laws
Jul 19, 1999
4,777
0
36
40
Nowhere to be found.
www.classicwfl.com
Actually, I may have solved it. Talking through it seemed to get me a little more focused.

Code:
GetLine 123 character to 132 character == DateStr1
if (first 2 char of DateStr1) = 10 | 11 | 12 then
	DateStr1 = DateStr1 CONCAT (GetChar 132 NextLine)
ArrayOfDeadPeople[Date1] = DateStr1

Sorry for the inane pseudocode, but I just roughed it out to the point it would make sense to me :)
 

Zur

surrealistic mad cow
Jul 8, 2002
11,708
8
38
48
There's a limit to what regex can do (pattern recognition and no algorithmics). If I get your description and the remaining figure is at the same position in the next line what I'd do is something along these lines. Or assume the last character of the year has been clipped at a precise position (last character).

Code:
if( isDoubleDigit(day) && isDoubleDigit(month) ){
clippedYearChar = getCharNextLine( getLastPositionOfYear() + 1 )
reconstitutedYear = clippedYear + clippedYearChar // parse to int if necessary
}
outputString = string1Start + reconstitutedYear + string1End
 
Last edited:

Bi()ha2arD

Toxic!
Jun 29, 2009
2,808
0
0
Germany
phobos.qml.net
You can find the lines that only have that one number in them with something like

Code:
^\s+[0-9]\s+?$

That would match a line that only contains whitespace, then a digit that then either ends or only consists of further whitespace.

With that and the Line number you can probably put it back together.
 

Thrash123

Obey Leash Laws
Jul 19, 1999
4,777
0
36
40
Nowhere to be found.
www.classicwfl.com
Yeah, but some lines also have part of the name. I did some more refining and got the psuedocode worked out at the office. Thanks though :) Thinking I'll skip the regex all together, as it's really not totally necessary I use it, and I know the locations within the line where the spill over will be. I'll just write it into my import script and be done with it. I just don't code often enough to remember these solutions quickly. Joys of having to be a Jack of All Trades in IT. At least it's a relatively small dataset on a low-volume site (somewhere around 11,000 rows after all the excess whitespace is stripped), so I don't really have to worry too much about hurting my performance whenever I do update the data. Just gotta ensure it's parsing correctly and I'll be kosher.
 

Sir_Brizz

Administrator
Staff member
Feb 3, 2000
26,020
83
48
I wouldn't use a regex to solve this problem either, if I'm being honest :)
 

Staward

Lauda tuus animus
Jan 31, 2008
970
0
16
30
-23° 58' 3.15" -46° 20' 32.07"
hieroglyph.jpg