dnc-emails
-----BEGIN PGP PUBLIC KEY BLOCK-----
mQQBBGBjDtIBH6DJa80zDBgR+VqlYGaXu5bEJg9HEgAtJeCLuThdhXfl5Zs32RyB
I1QjIlttvngepHQozmglBDmi2FZ4S+wWhZv10bZCoyXPIPwwq6TylwPv8+buxuff
B6tYil3VAB9XKGPyPjKrlXn1fz76VMpuTOs7OGYR8xDidw9EHfBvmb+sQyrU1FOW
aPHxba5lK6hAo/KYFpTnimsmsz0Cvo1sZAV/EFIkfagiGTL2J/NhINfGPScpj8LB
bYelVN/NU4c6Ws1ivWbfcGvqU4lymoJgJo/l9HiV6X2bdVyuB24O3xeyhTnD7laf
epykwxODVfAt4qLC3J478MSSmTXS8zMumaQMNR1tUUYtHCJC0xAKbsFukzbfoRDv
m2zFCCVxeYHvByxstuzg0SurlPyuiFiy2cENek5+W8Sjt95nEiQ4suBldswpz1Kv
n71t7vd7zst49xxExB+tD+vmY7GXIds43Rb05dqksQuo2yCeuCbY5RBiMHX3d4nU
041jHBsv5wY24j0N6bpAsm/s0T0Mt7IO6UaN33I712oPlclTweYTAesW3jDpeQ7A
ioi0CMjWZnRpUxorcFmzL/Cc/fPqgAtnAL5GIUuEOqUf8AlKmzsKcnKZ7L2d8mxG
QqN16nlAiUuUpchQNMr+tAa1L5S1uK/fu6thVlSSk7KMQyJfVpwLy6068a1WmNj4
yxo9HaSeQNXh3cui+61qb9wlrkwlaiouw9+bpCmR0V8+XpWma/D/TEz9tg5vkfNo
eG4t+FUQ7QgrrvIkDNFcRyTUO9cJHB+kcp2NgCcpCwan3wnuzKka9AWFAitpoAwx
L6BX0L8kg/LzRPhkQnMOrj/tuu9hZrui4woqURhWLiYi2aZe7WCkuoqR/qMGP6qP
EQRcvndTWkQo6K9BdCH4ZjRqcGbY1wFt/qgAxhi+uSo2IWiM1fRI4eRCGifpBtYK
Dw44W9uPAu4cgVnAUzESEeW0bft5XXxAqpvyMBIdv3YqfVfOElZdKbteEu4YuOao
FLpbk4ajCxO4Fzc9AugJ8iQOAoaekJWA7TjWJ6CbJe8w3thpznP0w6jNG8ZleZ6a
jHckyGlx5wzQTRLVT5+wK6edFlxKmSd93jkLWWCbrc0Dsa39OkSTDmZPoZgKGRhp
Yc0C4jePYreTGI6p7/H3AFv84o0fjHt5fn4GpT1Xgfg+1X/wmIv7iNQtljCjAqhD
6XN+QiOAYAloAym8lOm9zOoCDv1TSDpmeyeP0rNV95OozsmFAUaKSUcUFBUfq9FL
uyr+rJZQw2DPfq2wE75PtOyJiZH7zljCh12fp5yrNx6L7HSqwwuG7vGO4f0ltYOZ
dPKzaEhCOO7o108RexdNABEBAAG0Rldpa2lMZWFrcyBFZGl0b3JpYWwgT2ZmaWNl
IEhpZ2ggU2VjdXJpdHkgQ29tbXVuaWNhdGlvbiBLZXkgKDIwMjEtMjAyNCmJBDEE
EwEKACcFAmBjDtICGwMFCQWjmoAFCwkIBwMFFQoJCAsFFgIDAQACHgECF4AACgkQ
nG3NFyg+RUzRbh+eMSKgMYOdoz70u4RKTvev4KyqCAlwji+1RomnW7qsAK+l1s6b
ugOhOs8zYv2ZSy6lv5JgWITRZogvB69JP94+Juphol6LIImC9X3P/bcBLw7VCdNA
mP0XQ4OlleLZWXUEW9EqR4QyM0RkPMoxXObfRgtGHKIkjZYXyGhUOd7MxRM8DBzN
yieFf3CjZNADQnNBk/ZWRdJrpq8J1W0dNKI7IUW2yCyfdgnPAkX/lyIqw4ht5UxF
VGrva3PoepPir0TeKP3M0BMxpsxYSVOdwcsnkMzMlQ7TOJlsEdtKQwxjV6a1vH+t
k4TpR4aG8fS7ZtGzxcxPylhndiiRVwdYitr5nKeBP69aWH9uLcpIzplXm4DcusUc
Bo8KHz+qlIjs03k8hRfqYhUGB96nK6TJ0xS7tN83WUFQXk29fWkXjQSp1Z5dNCcT
sWQBTxWxwYyEI8iGErH2xnok3HTyMItdCGEVBBhGOs1uCHX3W3yW2CooWLC/8Pia
qgss3V7m4SHSfl4pDeZJcAPiH3Fm00wlGUslVSziatXW3499f2QdSyNDw6Qc+chK
hUFflmAaavtpTqXPk+Lzvtw5SSW+iRGmEQICKzD2chpy05mW5v6QUy+G29nchGDD
rrfpId2Gy1VoyBx8FAto4+6BOWVijrOj9Boz7098huotDQgNoEnidvVdsqP+P1RR
QJekr97idAV28i7iEOLd99d6qI5xRqc3/QsV+y2ZnnyKB10uQNVPLgUkQljqN0wP
XmdVer+0X+aeTHUd1d64fcc6M0cpYefNNRCsTsgbnWD+x0rjS9RMo+Uosy41+IxJ
6qIBhNrMK6fEmQoZG3qTRPYYrDoaJdDJERN2E5yLxP2SPI0rWNjMSoPEA/gk5L91
m6bToM/0VkEJNJkpxU5fq5834s3PleW39ZdpI0HpBDGeEypo/t9oGDY3Pd7JrMOF
zOTohxTyu4w2Ql7jgs+7KbO9PH0Fx5dTDmDq66jKIkkC7DI0QtMQclnmWWtn14BS
KTSZoZekWESVYhORwmPEf32EPiC9t8zDRglXzPGmJAPISSQz+Cc9o1ipoSIkoCCh
2MWoSbn3KFA53vgsYd0vS/+Nw5aUksSleorFns2yFgp/w5Ygv0D007k6u3DqyRLB
W5y6tJLvbC1ME7jCBoLW6nFEVxgDo727pqOpMVjGGx5zcEokPIRDMkW/lXjw+fTy
c6misESDCAWbgzniG/iyt77Kz711unpOhw5aemI9LpOq17AiIbjzSZYt6b1Aq7Wr
aB+C1yws2ivIl9ZYK911A1m69yuUg0DPK+uyL7Z86XC7hI8B0IY1MM/MbmFiDo6H
dkfwUckE74sxxeJrFZKkBbkEAQRgYw7SAR+gvktRnaUrj/84Pu0oYVe49nPEcy/7
5Fs6LvAwAj+JcAQPW3uy7D7fuGFEQguasfRrhWY5R87+g5ria6qQT2/Sf19Tpngs
d0Dd9DJ1MMTaA1pc5F7PQgoOVKo68fDXfjr76n1NchfCzQbozS1HoM8ys3WnKAw+
Neae9oymp2t9FB3B+To4nsvsOM9KM06ZfBILO9NtzbWhzaAyWwSrMOFFJfpyxZAQ
8VbucNDHkPJjhxuafreC9q2f316RlwdS+XjDggRY6xD77fHtzYea04UWuZidc5zL
VpsuZR1nObXOgE+4s8LU5p6fo7jL0CRxvfFnDhSQg2Z617flsdjYAJ2JR4apg3Es
G46xWl8xf7t227/0nXaCIMJI7g09FeOOsfCmBaf/ebfiXXnQbK2zCbbDYXbrYgw6
ESkSTt940lHtynnVmQBvZqSXY93MeKjSaQk1VKyobngqaDAIIzHxNCR941McGD7F
qHHM2YMTgi6XXaDThNC6u5msI1l/24PPvrxkJxjPSGsNlCbXL2wqaDgrP6LvCP9O
uooR9dVRxaZXcKQjeVGxrcRtoTSSyZimfjEercwi9RKHt42O5akPsXaOzeVjmvD9
EB5jrKBe/aAOHgHJEIgJhUNARJ9+dXm7GofpvtN/5RE6qlx11QGvoENHIgawGjGX
Jy5oyRBS+e+KHcgVqbmV9bvIXdwiC4BDGxkXtjc75hTaGhnDpu69+Cq016cfsh+0
XaRnHRdh0SZfcYdEqqjn9CTILfNuiEpZm6hYOlrfgYQe1I13rgrnSV+EfVCOLF4L
P9ejcf3eCvNhIhEjsBNEUDOFAA6J5+YqZvFYtjk3efpM2jCg6XTLZWaI8kCuADMu
yrQxGrM8yIGvBndrlmmljUqlc8/Nq9rcLVFDsVqb9wOZjrCIJ7GEUD6bRuolmRPE
SLrpP5mDS+wetdhLn5ME1e9JeVkiSVSFIGsumZTNUaT0a90L4yNj5gBE40dvFplW
7TLeNE/ewDQk5LiIrfWuTUn3CqpjIOXxsZFLjieNgofX1nSeLjy3tnJwuTYQlVJO
3CbqH1k6cOIvE9XShnnuxmiSoav4uZIXnLZFQRT9v8UPIuedp7TO8Vjl0xRTajCL
PdTk21e7fYriax62IssYcsbbo5G5auEdPO04H/+v/hxmRsGIr3XYvSi4ZWXKASxy
a/jHFu9zEqmy0EBzFzpmSx+FrzpMKPkoU7RbxzMgZwIYEBk66Hh6gxllL0JmWjV0
iqmJMtOERE4NgYgumQT3dTxKuFtywmFxBTe80BhGlfUbjBtiSrULq59np4ztwlRT
wDEAVDoZbN57aEXhQ8jjF2RlHtqGXhFMrg9fALHaRQARAQABiQQZBBgBCgAPBQJg
Yw7SAhsMBQkFo5qAAAoJEJxtzRcoPkVMdigfoK4oBYoxVoWUBCUekCg/alVGyEHa
ekvFmd3LYSKX/WklAY7cAgL/1UlLIFXbq9jpGXJUmLZBkzXkOylF9FIXNNTFAmBM
3TRjfPv91D8EhrHJW0SlECN+riBLtfIQV9Y1BUlQthxFPtB1G1fGrv4XR9Y4TsRj
VSo78cNMQY6/89Kc00ip7tdLeFUHtKcJs+5EfDQgagf8pSfF/TWnYZOMN2mAPRRf
fh3SkFXeuM7PU/X0B6FJNXefGJbmfJBOXFbaSRnkacTOE9caftRKN1LHBAr8/RPk
pc9p6y9RBc/+6rLuLRZpn2W3m3kwzb4scDtHHFXXQBNC1ytrqdwxU7kcaJEPOFfC
XIdKfXw9AQll620qPFmVIPH5qfoZzjk4iTH06Yiq7PI4OgDis6bZKHKyyzFisOkh
DXiTuuDnzgcu0U4gzL+bkxJ2QRdiyZdKJJMswbm5JDpX6PLsrzPmN314lKIHQx3t
NNXkbfHL/PxuoUtWLKg7/I3PNnOgNnDqCgqpHJuhU1AZeIkvewHsYu+urT67tnpJ
AK1Z4CgRxpgbYA4YEV1rWVAPHX1u1okcg85rc5FHK8zh46zQY1wzUTWubAcxqp9K
1IqjXDDkMgIX2Z2fOA1plJSwugUCbFjn4sbT0t0YuiEFMPMB42ZCjcCyA1yysfAd
DYAmSer1bq47tyTFQwP+2ZnvW/9p3yJ4oYWzwMzadR3T0K4sgXRC2Us9nPL9k2K5
TRwZ07wE2CyMpUv+hZ4ja13A/1ynJZDZGKys+pmBNrO6abxTGohM8LIWjS+YBPIq
trxh8jxzgLazKvMGmaA6KaOGwS8vhfPfxZsu2TJaRPrZMa/HpZ2aEHwxXRy4nm9G
Kx1eFNJO6Ues5T7KlRtl8gflI5wZCCD/4T5rto3SfG0s0jr3iAVb3NCn9Q73kiph
PSwHuRxcm+hWNszjJg3/W+Fr8fdXAh5i0JzMNscuFAQNHgfhLigenq+BpCnZzXya
01kqX24AdoSIbH++vvgE0Bjj6mzuRrH5VJ1Qg9nQ+yMjBWZADljtp3CARUbNkiIg
tUJ8IJHCGVwXZBqY4qeJc3h/RiwWM2UIFfBZ+E06QPznmVLSkwvvop3zkr4eYNez
cIKUju8vRdW6sxaaxC/GECDlP0Wo6lH0uChpE3NJ1daoXIeymajmYxNt+drz7+pd
jMqjDtNA2rgUrjptUgJK8ZLdOQ4WCrPY5pP9ZXAO7+mK7S3u9CTywSJmQpypd8hv
8Bu8jKZdoxOJXxj8CphK951eNOLYxTOxBUNB8J2lgKbmLIyPvBvbS1l1lCM5oHlw
WXGlp70pspj3kaX4mOiFaWMKHhOLb+er8yh8jspM184=
=5a6T
-----END PGP PUBLIC KEY BLOCK-----
Sounds good then.
I'd like to give all NGP users a heads up, so I'll get an email out today and start later this week.
I should have counts around about the dups for anyone interested.
-Matt
From: Alan Reed
Sent: Tuesday, May 24, 2016 3:57 PM
To: Johnson, Matt; Greeson, Katja; Manisha Patel; Parrish, Daniel; Hoffman, Alex; Jessica TeSelle
Cc: Andrew Brown; Wilson, Jackie K; Yared Tamene; Ellis, Lizzie
Subject: RE: Looking for a lot of NGP DownTime
The downtime works for us too.
From: Johnson, Matt
Sent: Tuesday, May 24, 2016 3:48 PM
To: Alan Reed; Greeson, Katja; Manisha Patel; Parrish, Daniel; Hoffman, Alex; Jessica TeSelle
Cc: Andrew Brown; Wilson, Jackie K; Yared Tamene; Ellis, Lizzie
Subject: RE: Looking for a lot of NGP DownTime
About the downtime:
Does this work for departments?
About Nicknames:
We definitely should, but it's hard to find some of those odd differences on a large-scale fashion. Happy to take a look after this round is done.
About these duplicates:
Some common issues with these duplicates:
First Name/last name is swamped between two accounts.
Last Name " Tibbetts-Cape" in one account, "Tibbetts" in the other.
I should have better counts on them later today.
I'm happy to send around a sample of the "problem merges" to anyone who is interested in looking into it.
-Matt
From: Alan Reed
Sent: Tuesday, May 24, 2016 3:04 PM
To: Greeson, Katja; Johnson, Matt; Manisha Patel; Parrish, Daniel; Hoffman, Alex; Jessica TeSelle
Cc: Andrew Brown; Wilson, Jackie K; Yared Tamene; Ellis, Lizzie
Subject: RE: Looking for a lot of NGP DownTime
Just curious, would alternate spellings of names be considered in a second wave if they have other matching points? Just trying to figure out why we wouldn't merge "Matt" and "mat" in the example below or a Rob, Bob, Robert scenario.
From: Greeson, Katja
Sent: Tuesday, May 24, 2016 3:01 PM
To: Alan Reed; Johnson, Matt; Manisha Patel; Parrish, Daniel; Hoffman, Alex; Jessica TeSelle
Cc: Andrew Brown; Wilson, Jackie K; Yared Tamene; Ellis, Lizzie
Subject: RE: Looking for a lot of NGP DownTime
Full address and full name match.
From: Alan Reed
Sent: Tuesday, May 24, 2016 3:00 PM
To: Johnson, Matt; Greeson, Katja; Manisha Patel; Parrish, Daniel; Hoffman, Alex; Jessica TeSelle
Cc: Andrew Brown; Wilson, Jackie K; Yared Tamene; Ellis, Lizzie
Subject: RE: Looking for a lot of NGP DownTime
What is the criteria for a potential merge?
From: Johnson, Matt
Sent: Tuesday, May 24, 2016 2:50 PM
To: Greeson, Katja; Alan Reed; Manisha Patel; Parrish, Daniel; Hoffman, Alex; Jessica TeSelle
Cc: Andrew Brown; Wilson, Jackie K; Yared Tamene; Ellis, Lizzie
Subject: Looking for a lot of NGP DownTime
Hey Team,
Direct Marketing recently sent all of the NGP records through a data-hygiene process, which highlighted over 320,000 duplicate records in NGP. I would love to merge these duplicates in NGP, as they cause a lot of problems.
There's two concerns with this: making sure we should merge these duplicates, and getting time that NGP can be slow to process them.
Short version:
Most of the duplicates look like we should merge them (more of that below), which means we need 160 hours of slow NGP time to process them. This time can be broken up and separated, as we can do a few a night.
I was hoping to process them after 8pm on weekdays and over weekends for the next 2-3 weeks. During these times, NGP would be unavailable or extremely slow. If we could process everything straight through this holiday day weekend, we could get over half of them done by next Tuesday.
Before I email all NGP users, I wanted to double-check: does NGP slow time after 8pm and during weekends work for your department? Is there a change we can make that would be fine?
Longer Version
As I said above, there's two concerns with duplicates from NGP:
1) We need to double-check these duplicates ARE duplicates
2) We need to schedule time to merge them.
About the Duplicates
We are researching the full impact of these duplicates on the file right now, but 47% of them are low dollar donors who only given once. I have a few select counts below:
Returned Records : 328758
Unique Records : 157505 (ie, number of record we should have at the end)
Last Gift 2007 : 7101
Last Gift 2008 : 31109
Last Gift 2009 : 16413
Last Gift 2010 : 31915
Last Gift 2011 : 14594
Last Gift 2012 : 37788
Last Gift 2013 : 24888
Last Gift 2014 : 46178
Last Gift 2015 : 27341
Last Gift 2016 : 19524
Running counts of EXACT differences (ie, "Matt" and "Mat" would count as a different name).
Merges with different names : 52849 (25%)
Merges with different Address : 42102 (13%)
Merges with different City : 6815 (2%)
Merges with different States(!) : 275 (less than a 1%)
Dups with 3+ merges : 11,297 (3%)
Dups with 4+ merges : 1,986 (less than a percent)
Most of these donations would NOT impact FEC reports we have already made, as they are low-dollar donors well under the FEC report. I'm still getting an exact number, but I have over 75000 we should be fine with right now.
As always, I would love everyone's opinion on this about things we should look out for.
About the DownTime
Merging duplicates takes time. We can merge a lot of an hour, but we're still looking at 160 hours of processing time. In order to get this done quickly (pre-primary, pre-next FEC report, pre-next mail list, so on and so on), I want an aggressive period of downtime. I was hoping to run them overnight and weekends, thus allowing NGP to be up during business hours.
It seems most activity on NGP is done after 8pm every night, which means if we run after 8pm and over the weekends, we could process this in 2-3 weeks.
As we work to pindown the duplicates, I want to double-check: do these hours work with your teams?
I'm also happy to discuss this or anything related to this in a meeting.
Matt Johnson
Technical Financial Manager
Democratic National Committee
Office: 202-572-5478
[email protected]<mailto:[email protected]>
ℹ️ Document Details
SHA-256
79e541a71f956ae6145183c806815d6b369a07b1e0e93158f988a250195b449b
Dataset
dnc-emails
Document Type
email
Comments 0