Python, Javascript and UNIX hacker, open source advocate, IRC addict, general badass and traveler
411 stories
·
7 followers

Belief in Luck and Precognition Around the World

1 Share

Although magical beliefs (such as belief in luck and precognition) are presumably universal, the extent to which such beliefs are embraced likely varies across cultures. We assessed the effect of culture on luck and precognition beliefs in two large-scale multinational studies (Study 1: k = 16, N = 17,664; Study 2: k = 25, N = 4,024). Over and above the effects of demographic factors, culture was a significant predictor of luck and precognition beliefs in both studies. Indeed, when culture was added to demographic models, the variance accounted for in luck and precognition beliefs approximately doubled. Belief in luck and precognition was highest in Latvia and Russia (Study 1) and South Asia (Study 2), and lowest in Protestant Europe (Studies 1 and 2). Thus, beyond the effects of age, gender, education, and religiosity, culture is a significant factor in explaining variance in people’s belief in luck and precognition. Follow-up analyses found a relatively consistent effect of socio-economic development, such that belief in luck and precognition were more prevalent in countries with lower scores on the Human Development Index. There was also some evidence that these beliefs were stronger in more collectivist cultures, but this effect was inconsistent. We discuss the possibility that there are culturally specific historical factors that contribute to relative openness to such beliefs in Russia, Latvia, and South Asia.

That is from a new paper by Emily A. Harris, Taciano L. Milfont, and Matthew J. Hornsey, via the excellent Kevin Lewis.

The post Belief in Luck and Precognition Around the World appeared first on Marginal REVOLUTION.

Read the whole story
miohtama
24 days ago
reply
Helsinki, Finland
Share this story
Delete

A perpetual balance strategy suitable for bear market bottoming

1 Share

In the past, FMZ officially released a perpetual grid strategy, which was popular among users, and the onlookers who traded TRX in real bots have gained a lot of profits in the past year with controllable risks. However, the perpetual grid strategy also has some problems:

  1. It is necessary to set parameters such as initial price, grid spacing, grid value, long-short mode, etc. The settings are cumbersome and have a great impact on profits, making it difficult for novices to set.
  2. The perpetual grid strategy has a high risk of short-selling, while the risk of long-selling is relatively low. Even if the grid value is set to a small value, it will not have a great impact on the short-selling price.
  3. The perpetual contract grid can choose to only go long to avoid the risk of shorting, it seems okay so far. However, it needs to face the problem that the current price exceeds the initial price, resulting in a short position, and the initial price needs to be reset.

I wrote an article on the principle of the balance strategy and the comparison with the grid strategy before, and you can still refer to it now: https://www.fmz.com/digest-topic/5930. The balance strategy always holds positions with a fixed value ratio or value, sells some when it rises, and buys when it falls. It can be run with simple settings. Even if the currency price rises a lot, there is no risk of going short. The problem with the spot balance strategy is that the capital utilization is low, and there is no easy way to increase leverage. And perpetual contracts can solve the problem. If the total capital is 1000, 2000 can be held fixedly, which exceeds the original capital and improves the capital utilization. Another parameter is the adjustment ratio, which controls how much to sacle in or dump the position. If it is set to 0.01, it means that the position is dumped once for 1% increase and scaled in once for 1% decrease.

For beginners, the balance strategy is highly recommended. The operation is simple, just set a parameter of holding ratio or position value, and you can run it mindlessly without worrying about constant price increases. Those with certain experience can choose the grid strategy, and decide the upper and lower limits of fluctuations and the funds per grid, so as to improve the utilization of funds and obtain maximum profits.

In order to facilitate the backtesting of more trading pairs, this document will show the complete backtesting process, and users can adjust different parameters and trading pairs for comparison. (The version is Python3, and an agent is required to download the quotation. Users can download Anancoda3 by themselves or run it through Google's colab)

import requests
from datetime import date,datetime
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests, zipfile, io
%matplotlib inline
## Current trading pairs
Info = requests.get('https://fapi.binance.com/fapi/v1/exchangeInfo')
symbols = [s['symbol'] for s in Info.json()['symbols']]
symbols = list(set(filter(lambda x: x[-4:] == 'USDT', [s.split('_')[0] for s in symbols]))-
                 set(['1000SHIBUSDT','1000XECUSDT','BTCDOMUSDT','DEFIUSDT','BTCSTUSDT'])) + ['SHIBUSDT','XECUSDT']
print(symbols)
['FLMUSDT', 'ICPUSDT', 'CHZUSDT', 'APEUSDT', 'DARUSDT', 'TLMUSDT', 'ETHUSDT', 'STMXUSDT', 'ENJUSDT', 'LINKUSDT', 'OGNUSDT', 'RSRUSDT', 'QTUMUSDT', 'UNIUSDT', 'BNBUSDT', 'XLMUSDT', 'ATOMUSDT', 'LPTUSDT', 'UNFIUSDT', 'DASHUSDT', 'BTCUSDT', 'NEOUSDT', 'AAVEUSDT', 'DUSKUSDT', 'XRPUSDT', 'IOTXUSDT', 'CVCUSDT', 'SANDUSDT', 'XTZUSDT', 'IOTAUSDT', 'BELUSDT', 'MANAUSDT', 'IOSTUSDT', 'IMXUSDT', 'THETAUSDT', 'SCUSDT', 'DOGEUSDT', 'CELOUSDT', 'BNXUSDT', 'SNXUSDT', 'ZRXUSDT', 'HBARUSDT', 'DOTUSDT', 'ANKRUSDT', 'CELRUSDT', 'BAKEUSDT', 'GALUSDT', 'ICXUSDT', 'LRCUSDT', 'AVAXUSDT', 'C98USDT', 'MTLUSDT', 'FTTUSDT', 'MASKUSDT', 'RLCUSDT', 'MATICUSDT', 'COMPUSDT', 'BLZUSDT', 'CRVUSDT', 'ZECUSDT', 'RUNEUSDT', 'LITUSDT', 'ONEUSDT', 'ADAUSDT', 'NKNUSDT', 'LTCUSDT', 'ATAUSDT', 'GALAUSDT', 'BALUSDT', 'ROSEUSDT', 'EOSUSDT', 'YFIUSDT', 'SKLUSDT', 'BANDUSDT', 'ALGOUSDT', 'NEARUSDT', 'AXSUSDT', 'KSMUSDT', 'AUDIOUSDT', 'SRMUSDT', 'HNTUSDT', 'MKRUSDT', 'KLAYUSDT', 'FLOWUSDT', 'STORJUSDT', 'BCHUSDT', 'DYDXUSDT', 'ARUSDT', 'GMTUSDT', 'CHRUSDT', 'API3USDT', 'VETUSDT', 'KAVAUSDT', 'WAVESUSDT', 'EGLDUSDT', 'SFPUSDT', 'RENUSDT', 'SUSHIUSDT', 'SOLUSDT', 'RVNUSDT', 'ONTUSDT', 'BTSUSDT', 'ZILUSDT', 'GTCUSDT', 'ZENUSDT', 'ALICEUSDT', 'ETCUSDT', 'TRXUSDT', 'TOMOUSDT', 'FILUSDT', 'ARPAUSDT', 'CTKUSDT', 'BATUSDT', 'SXPUSDT', '1INCHUSDT', 'HOTUSDT', 'WOOUSDT', 'LINAUSDT', 'REEFUSDT', 'GRTUSDT', 'RAYUSDT', 'COTIUSDT', 'XMRUSDT', 'PEOPLEUSDT', 'OCEANUSDT', 'JASMYUSDT', 'TRBUSDT', 'ANTUSDT', 'XEMUSDT', 'DGBUSDT', 'ENSUSDT', 'OMGUSDT', 'ALPHAUSDT', 'FTMUSDT', 'DENTUSDT', 'KNCUSDT', 'CTSIUSDT', 'SHIBUSDT', 'XECUSDT']
#Get the function of the K-line of any period
def GetKlines(symbol='BTCUSDT',start='2020-8-10',end='2021-8-10',period='1h',base='fapi',v = 'v1'):
    Klines = []
    start_time = int(time.mktime(datetime.strptime(start, "%Y-%m-%d").timetuple()))*1000 + 8*60*60*1000
    end_time =  min(int(time.mktime(datetime.strptime(end, "%Y-%m-%d").timetuple()))*1000 + 8*60*60*1000,time.time()*1000)
    intervel_map = {'m':60*1000,'h':60*60*1000,'d':24*60*60*1000}
    while start_time < end_time:
        mid_time = start_time+1000*int(period[:-1])*intervel_map[period[-1]]
        url = 'https://'+base+'.binance.com/'+base+'/'+v+'/klines?symbol=%s&interval=%s&startTime=%s&endTime=%s&limit=1000'%(symbol,period,start_time,mid_time)
        #print(url)
        res = requests.get(url)
        res_list = res.json()
        if type(res_list) == list and len(res_list) > 0:
            start_time = res_list[-1][0]+int(period[:-1])*intervel_map[period[-1]]
            Klines += res_list
        if type(res_list) == list and len(res_list) == 0:
            start_time = start_time+1000*int(period[:-1])*intervel_map[period[-1]]
        if mid_time >= end_time:
            break

    df = pd.DataFrame(Klines,columns=['time','open','high','low','close','amount','end_time','volume','count','buy_amount','buy_volume','null']).astype('float')
    df.index = pd.to_datetime(df.time,unit='ms')
    return df

By downloading the closing prices of all trading pairs from 2021 to the present, we can observe the changes in the overall market index: 2021 to 2022 is undoubtedly a bull market, and the index once rose by 14 times. It can be said that gold is everywhere, and many coins have risen hundreds of times. However, in 2022, the bear market that has lasted for half a year has begun, with the index plunging 80%, and dozens of coins have withdrawn by more than 90%. Such pump-and-dump reflects the enormous risk of grid strategies.
The index is currently at around 3, which is still a 200% gain compared to the beginning of 2021, and it should be a relative bottom at the moment, considering the development of the market.

Currencies whose highest price has increased more than 10 times since the beginning of the year:

'MKRUSDT': 10.294, 'CRVUSDT': 10.513, 'STORJUSDT': 10.674, 'SKLUSDT': 11.009, 'CVCUSDT': 11.026, 'SRMUSDT': 11.031, 'QTUMUSDT': 12.066, 'ALPHAUSDT': 12.103, 'ZENUSDT': 12.631, 'VETUSDT': 13.296, 'ROSEUSDT': 13.429, 'FTTUSDT': 13.705, 'IOSTUSDT': 13.786, 'COTIUSDT': 13.958, 'NEARUSDT': 14.855, 'HBARUSDT': 15.312, 'RLCUSDT': 15.432, 'SCUSDT': 15.6, 'GALAUSDT': 15.722, 'RUNEUSDT': 15.795, 'ADAUSDT': 16.94, 'MTLUSDT': 17.18, 'BNBUSDT': 17.899, 'RVNUSDT': 18.169, 'EGLDUSDT': 18.879, 'LRCUSDT': 19.499, 'ANKRUSDT': 21.398, 'ETCUSDT': 23.51, 'DUSKUSDT': 23.55, 'AUDIOUSDT': 25.306, 'OGNUSDT': 25.524, 'GMTUSDT': 28.83, 'ENJUSDT': 33.073, 'STMXUSDT': 33.18, 'IOTXUSDT': 35.866, 'AVAXUSDT': 36.946, 'CHZUSDT': 37.128, 'CELRUSDT': 37.273, 'HNTUSDT': 38.779, 'CTSIUSDT': 41.108, 'HOTUSDT': 46.466, 'CHRUSDT': 61.091, 'MANAUSDT': 62.143, 'NKNUSDT': 70.636, 'ONEUSDT': 84.132, 'DENTUSDT': 99.973, 'DOGEUSDT': 121.447, 'SOLUSDT': 140.296, 'MATICUSDT': 161.846, 'FTMUSDT': 192.507, 'SANDUSDT': 203.219, 'AXSUSDT': 270.41

Currencies with a current drawdown greater than 80% from the highest point:

ICPUSDT': 0.022, 'FILUSDT': 0.043, 'BAKEUSDT': 0.046, 'TLMUSDT': 0.05, 'LITUSDT': 0.053, 'LINAUSDT': 0.054, 'JASMYUSDT': 0.056, 'ALPHAUSDT': 0.062, 'RAYUSDT': 0.062, 'GRTUSDT': 0.067, 'DENTUSDT': 0.068, 'RSRUSDT': 0.068, 'XEMUSDT': 0.068, 'UNFIUSDT': 0.072, 'DYDXUSDT': 0.074, 'SUSHIUSDT': 0.074, 'OGNUSDT': 0.074, 'COMPUSDT': 0.074, 'NKNUSDT': 0.078, 'SKLUSDT': 0.08, 'DGBUSDT': 0.081, 'RLCUSDT': 0.085, 'REEFUSDT': 0.086, 'BANDUSDT': 0.086, 'HOTUSDT': 0.092, 'SRMUSDT': 0.092, 'RENUSDT': 0.092, 'BTSUSDT': 0.093, 'THETAUSDT': 0.094, 'FLMUSDT': 0.094, 'EOSUSDT': 0.095, 'TRBUSDT': 0.095, 'SXPUSDT': 0.095, 'ATAUSDT': 0.096, 'NEOUSDT': 0.096, 'FLOWUSDT': 0.097, 'YFIUSDT': 0.101, 'BALUSDT': 0.106, 'MASKUSDT': 0.106, 'ONTUSDT': 0.108, 'CELRUSDT': 0.108, 'AUDIOUSDT': 0.108, 'SCUSDT': 0.11, 'GALAUSDT': 0.113, 'GTCUSDT': 0.117, 'CTSIUSDT': 0.117, 'STMXUSDT': 0.118, 'DARUSDT': 0.118, 'ALICEUSDT': 0.119, 'SNXUSDT': 0.124, 'FTMUSDT': 0.126, 'BCHUSDT': 0.127, 'SFPUSDT': 0.127, 'ROSEUSDT': 0.128, 'DOGEUSDT': 0.128, 'RVNUSDT': 0.129, 'OCEANUSDT': 0.129, 'VETUSDT': 0.13, 'KSMUSDT': 0.131, 'ICXUSDT': 0.131, 'UNIUSDT': 0.131, 'ONEUSDT': 0.131, '1INCHUSDT': 0.134, 'IOTAUSDT': 0.139, 'C98USDT': 0.139, 'WAVESUSDT': 0.14, 'DUSKUSDT': 0.141, 'LINKUSDT': 0.143, 'DASHUSDT': 0.143, 'OMGUSDT': 0.143, 'PEOPLEUSDT': 0.143, 'AXSUSDT': 0.15, 'ENJUSDT': 0.15, 'QTUMUSDT': 0.152, 'SHIBUSDT': 0.154, 'ZENUSDT': 0.154, 'BLZUSDT': 0.154, 'ANTUSDT': 0.155, 'XECUSDT': 0.155, 'CHZUSDT': 0.158, 'RUNEUSDT': 0.163, 'ENSUSDT': 0.165, 'LRCUSDT': 0.167, 'CHRUSDT': 0.168, 'IOTXUSDT': 0.174, 'TOMOUSDT': 0.176, 'ALGOUSDT': 0.177, 'EGLDUSDT': 0.177, 'ARUSDT': 0.178, 'LTCUSDT': 0.178, 'HNTUSDT': 0.18, 'LPTUSDT': 0.181, 'SOLUSDT': 0.183, 'ARPAUSDT': 0.184, 'BELUSDT': 0.184, 'ETCUSDT': 0.186, 'ZRXUSDT': 0.187, 'AAVEUSDT': 0.187, 'CVCUSDT': 0.188, 'STORJUSDT': 0.189, 'COTIUSDT': 0.19, 'CELOUSDT': 0.191, 'SANDUSDT': 0.191, 'ADAUSDT': 0.192, 'HBARUSDT': 0.194, 'DOTUSDT': 0.195, 'XLMUSDT': 0.195

#Download closing prices for all trading pairs
start_date = '2021-1-1'
end_date = '2022-05-30'
period = '1d'
df_all = pd.DataFrame(index=pd.date_range(start=start_date, end=end_date, freq=period),columns=symbols)
for i in range(len(symbols)):
    #print(symbols[i])
    symbol = symbols[i]
    df_s = GetKlines(symbol=symbol,start=start_date,end=end_date,period=period,base='api',v='v3')
    df_all[symbol] = df_s[~df_s.index.duplicated(keep='first')].close
#Index changes
df_norm = df_all/df_all.fillna(method='bfill').iloc[0] #Normalization
df_norm.mean(axis=1).plot(figsize=(15,6),grid=True);
png
#The highest increase over the beginning of the year
max_up = df_all.max()/df_all.fillna(method='bfill').iloc[0]
print(max_up.map(lambda x:round(x,3)).sort_values().to_dict())
{'JASMYUSDT': 1.0, 'ICPUSDT': 1.0, 'LINAUSDT': 1.0, 'WOOUSDT': 1.0, 'GALUSDT': 1.0, 'PEOPLEUSDT': 1.0, 'XECUSDT': 1.026, 'ENSUSDT': 1.032, 'TLMUSDT': 1.039, 'IMXUSDT': 1.099, 'FLOWUSDT': 1.155, 'ATAUSDT': 1.216, 'DARUSDT': 1.261, 'ALICEUSDT': 1.312, 'BNXUSDT': 1.522, 'API3USDT': 1.732, 'GTCUSDT': 1.833, 'KLAYUSDT': 1.891, 'BAKEUSDT': 1.892, 'DYDXUSDT': 2.062, 'SHIBUSDT': 2.281, 'BTCUSDT': 2.302, 'MASKUSDT': 2.396, 'SFPUSDT': 2.74, 'LPTUSDT': 2.75, 'APEUSDT': 2.783, 'ARUSDT': 2.928, 'CELOUSDT': 2.951, 'ZILUSDT': 2.999, 'LTCUSDT': 3.072, 'SNXUSDT': 3.266, 'XEMUSDT': 3.555, 'XMRUSDT': 3.564, 'YFIUSDT': 3.794, 'BANDUSDT': 3.812, 'RAYUSDT': 3.924, 'REEFUSDT': 4.184, 'ANTUSDT': 4.205, 'XTZUSDT': 4.339, 'CTKUSDT': 4.352, 'LITUSDT': 4.38, 'RSRUSDT': 4.407, 'LINKUSDT': 4.412, 'BCHUSDT': 4.527, 'DASHUSDT': 5.037, 'BALUSDT': 5.172, 'OCEANUSDT': 5.277, 'EOSUSDT': 5.503, 'RENUSDT': 5.538, 'XLMUSDT': 5.563, 'TOMOUSDT': 5.567, 'ZECUSDT': 5.654, 'COMPUSDT': 5.87, 'DGBUSDT': 5.948, 'ALGOUSDT': 5.981, 'ONTUSDT': 5.997, 'BELUSDT': 6.101, 'TRXUSDT': 6.116, 'ZRXUSDT': 6.135, 'GRTUSDT': 6.45, '1INCHUSDT': 6.479, 'DOTUSDT': 6.502, 'ETHUSDT': 6.596, 'KAVAUSDT': 6.687, 'ICXUSDT': 6.74, 'SUSHIUSDT': 6.848, 'AAVEUSDT': 6.931, 'BTSUSDT': 6.961, 'KNCUSDT': 6.966, 'C98USDT': 7.091, 'THETAUSDT': 7.222, 'ATOMUSDT': 7.553, 'OMGUSDT': 7.556, 'SXPUSDT': 7.681, 'UNFIUSDT': 7.696, 'XRPUSDT': 7.726, 'TRBUSDT': 8.241, 'BLZUSDT': 8.434, 'NEOUSDT': 8.491, 'FLMUSDT': 8.506, 'KSMUSDT': 8.571, 'FILUSDT': 8.591, 'IOTAUSDT': 8.616, 'BATUSDT': 8.647, 'ARPAUSDT': 9.055, 'UNIUSDT': 9.104, 'WAVESUSDT': 9.106, 'MKRUSDT': 10.294, 'CRVUSDT': 10.513, 'STORJUSDT': 10.674, 'SKLUSDT': 11.009, 'CVCUSDT': 11.026, 'SRMUSDT': 11.031, 'QTUMUSDT': 12.066, 'ALPHAUSDT': 12.103, 'ZENUSDT': 12.631, 'VETUSDT': 13.296, 'ROSEUSDT': 13.429, 'FTTUSDT': 13.705, 'IOSTUSDT': 13.786, 'COTIUSDT': 13.958, 'NEARUSDT': 14.855, 'HBARUSDT': 15.312, 'RLCUSDT': 15.432, 'SCUSDT': 15.6, 'GALAUSDT': 15.722, 'RUNEUSDT': 15.795, 'ADAUSDT': 16.94, 'MTLUSDT': 17.18, 'BNBUSDT': 17.899, 'RVNUSDT': 18.169, 'EGLDUSDT': 18.879, 'LRCUSDT': 19.499, 'ANKRUSDT': 21.398, 'ETCUSDT': 23.51, 'DUSKUSDT': 23.55, 'AUDIOUSDT': 25.306, 'OGNUSDT': 25.524, 'GMTUSDT': 28.83, 'ENJUSDT': 33.073, 'STMXUSDT': 33.18, 'IOTXUSDT': 35.866, 'AVAXUSDT': 36.946, 'CHZUSDT': 37.128, 'CELRUSDT': 37.273, 'HNTUSDT': 38.779, 'CTSIUSDT': 41.108, 'HOTUSDT': 46.466, 'CHRUSDT': 61.091, 'MANAUSDT': 62.143, 'NKNUSDT': 70.636, 'ONEUSDT': 84.132, 'DENTUSDT': 99.973, 'DOGEUSDT': 121.447, 'SOLUSDT': 140.296, 'MATICUSDT': 161.846, 'FTMUSDT': 192.507, 'SANDUSDT': 203.219, 'AXSUSDT': 270.41}
#Current maximum backtest
draw_down = df_all.iloc[-1]/df_all.max()
print(draw_down.map(lambda x:round(x,3)).sort_values().to_dict())
{'ICPUSDT': 0.022, 'FILUSDT': 0.043, 'BAKEUSDT': 0.046, 'TLMUSDT': 0.05, 'LITUSDT': 0.053, 'LINAUSDT': 0.054, 'JASMYUSDT': 0.056, 'ALPHAUSDT': 0.062, 'RAYUSDT': 0.062, 'GRTUSDT': 0.067, 'DENTUSDT': 0.068, 'RSRUSDT': 0.068, 'XEMUSDT': 0.068, 'UNFIUSDT': 0.072, 'DYDXUSDT': 0.074, 'SUSHIUSDT': 0.074, 'OGNUSDT': 0.074, 'COMPUSDT': 0.074, 'NKNUSDT': 0.078, 'SKLUSDT': 0.08, 'DGBUSDT': 0.081, 'RLCUSDT': 0.085, 'REEFUSDT': 0.086, 'BANDUSDT': 0.086, 'HOTUSDT': 0.092, 'SRMUSDT': 0.092, 'RENUSDT': 0.092, 'BTSUSDT': 0.093, 'THETAUSDT': 0.094, 'FLMUSDT': 0.094, 'EOSUSDT': 0.095, 'TRBUSDT': 0.095, 'SXPUSDT': 0.095, 'ATAUSDT': 0.096, 'NEOUSDT': 0.096, 'FLOWUSDT': 0.097, 'YFIUSDT': 0.101, 'BALUSDT': 0.106, 'MASKUSDT': 0.106, 'ONTUSDT': 0.108, 'CELRUSDT': 0.108, 'AUDIOUSDT': 0.108, 'SCUSDT': 0.11, 'GALAUSDT': 0.113, 'GTCUSDT': 0.117, 'CTSIUSDT': 0.117, 'STMXUSDT': 0.118, 'DARUSDT': 0.118, 'ALICEUSDT': 0.119, 'SNXUSDT': 0.124, 'FTMUSDT': 0.126, 'BCHUSDT': 0.127, 'SFPUSDT': 0.127, 'ROSEUSDT': 0.128, 'DOGEUSDT': 0.128, 'RVNUSDT': 0.129, 'OCEANUSDT': 0.129, 'VETUSDT': 0.13, 'KSMUSDT': 0.131, 'ICXUSDT': 0.131, 'UNIUSDT': 0.131, 'ONEUSDT': 0.131, '1INCHUSDT': 0.134, 'IOTAUSDT': 0.139, 'C98USDT': 0.139, 'WAVESUSDT': 0.14, 'DUSKUSDT': 0.141, 'LINKUSDT': 0.143, 'DASHUSDT': 0.143, 'OMGUSDT': 0.143, 'PEOPLEUSDT': 0.143, 'AXSUSDT': 0.15, 'ENJUSDT': 0.15, 'QTUMUSDT': 0.152, 'SHIBUSDT': 0.154, 'ZENUSDT': 0.154, 'BLZUSDT': 0.154, 'ANTUSDT': 0.155, 'XECUSDT': 0.155, 'CHZUSDT': 0.158, 'RUNEUSDT': 0.163, 'ENSUSDT': 0.165, 'LRCUSDT': 0.167, 'CHRUSDT': 0.168, 'IOTXUSDT': 0.174, 'TOMOUSDT': 0.176, 'ALGOUSDT': 0.177, 'EGLDUSDT': 0.177, 'ARUSDT': 0.178, 'LTCUSDT': 0.178, 'HNTUSDT': 0.18, 'LPTUSDT': 0.181, 'SOLUSDT': 0.183, 'ARPAUSDT': 0.184, 'BELUSDT': 0.184, 'ETCUSDT': 0.186, 'ZRXUSDT': 0.187, 'AAVEUSDT': 0.187, 'CVCUSDT': 0.188, 'STORJUSDT': 0.189, 'COTIUSDT': 0.19, 'CELOUSDT': 0.191, 'SANDUSDT': 0.191, 'ADAUSDT': 0.192, 'HBARUSDT': 0.194, 'DOTUSDT': 0.195, 'XLMUSDT': 0.195, 'AVAXUSDT': 0.206, 'ANKRUSDT': 0.207, 'MTLUSDT': 0.208, 'MANAUSDT': 0.209, 'CRVUSDT': 0.213, 'API3USDT': 0.221, 'IOSTUSDT': 0.227, 'XRPUSDT': 0.228, 'BATUSDT': 0.228, 'MKRUSDT': 0.229, 'MATICUSDT': 0.229, 'CTKUSDT': 0.233, 'ZILUSDT': 0.233, 'WOOUSDT': 0.234, 'ATOMUSDT': 0.237, 'KLAYUSDT': 0.239, 'XTZUSDT': 0.245, 'IMXUSDT': 0.278, 'NEARUSDT': 0.285, 'GALUSDT': 0.299, 'APEUSDT': 0.305, 'ZECUSDT': 0.309, 'KAVAUSDT': 0.31, 'GMTUSDT': 0.327, 'FTTUSDT': 0.366, 'KNCUSDT': 0.401, 'ETHUSDT': 0.416, 'XMRUSDT': 0.422, 'BTCUSDT': 0.47, 'BNBUSDT': 0.476, 'TRXUSDT': 0.507, 'BNXUSDT': 0.64}

First of all, we use the simplest code to simulate the situation of falling all the way down, and see the liquidation price of different position values. Since the strategy always holds a long position, there is no risk in going up. The initial capital is 1000, the currency price is 1, and the adjustment ratio is 0.01. The results are as follows. It can be seen that the risk of long liquidation is not low. With 1.5 times leverage, it can resist a 50% decline. Given the current relative bottom situation, it is an acceptable risk.

Value of positionsLong position price
3000.035
5000.133
8000.285
10000.362
15000.51
20000.599
30000.711
50000.81
100000.904
for Hold_value in [300,500,800,1000,1500,2000,3000,5000,10000]:
    amount = Hold_value/1
    hold_price = 1
    margin = 1000
    Pct = 0.01
    i = 0
    while margin > 0:
        i += 1
        if i>500:
            break
        buy_price = (1-Pct)*Hold_value/amount
        buy_amount = Hold_value*Pct/buy_price
        hold_price = (amount * hold_price + buy_amount * buy_price) / (buy_amount + amount)
        amount += buy_amount
        margin = 1000 + amount * (buy_price - hold_price)
    print(Hold_value, round(buy_price,3))
300 0.035
500 0.133
800 0.285
1000 0.362
1500 0.51
2000 0.599
3000 0.711
5000 0.81
10000 0.904
#Still using the original backtesting engine
class Exchange:

    def __init__(self, trade_symbols, fee=0.0004, initial_balance=10000):
        self.initial_balance = initial_balance #Initial assets
        self.fee = fee
        self.trade_symbols = trade_symbols
        self.account = {'USDT':{'realised_profit':0, 'unrealised_profit':0, 'total':initial_balance, 'fee':0}}
        for symbol in trade_symbols:
            self.account[symbol] = {'amount':0, 'hold_price':0, 'value':0, 'price':0, 'realised_profit':0,'unrealised_profit':0,'fee':0}

    def Trade(self, symbol, direction, price, amount):

        cover_amount = 0 if direction*self.account[symbol]['amount'] >=0 else min(abs(self.account[symbol]['amount']), amount)
        open_amount = amount - cover_amount
        self.account['USDT']['realised_profit'] -= price*amount*self.fee #Deduct the handling fees
        self.account['USDT']['fee'] += price*amount*self.fee
        self.account[symbol]['fee'] += price*amount*self.fee

        if cover_amount > 0: #Close the position first
            self.account['USDT']['realised_profit'] += -direction*(price - self.account[symbol]['hold_price'])*cover_amount  #Profits
            self.account[symbol]['realised_profit'] += -direction*(price - self.account[symbol]['hold_price'])*cover_amount

            self.account[symbol]['amount'] -= -direction*cover_amount
            self.account[symbol]['hold_price'] = 0 if self.account[symbol]['amount'] == 0 else self.account[symbol]['hold_price']

        if open_amount > 0:
            total_cost = self.account[symbol]['hold_price']*direction*self.account[symbol]['amount'] + price*open_amount
            total_amount = direction*self.account[symbol]['amount']+open_amount

            self.account[symbol]['hold_price'] = total_cost/total_amount
            self.account[symbol]['amount'] += direction*open_amount


    def Buy(self, symbol, price, amount):
        self.Trade(symbol, 1, price, amount)

    def Sell(self, symbol, price, amount):
        self.Trade(symbol, -1, price, amount)

    def Update(self, close_price): #Update of assets
        self.account['USDT']['unrealised_profit'] = 0
        for symbol in self.trade_symbols:
            self.account[symbol]['unrealised_profit'] = (close_price[symbol] - self.account[symbol]['hold_price'])*self.account[symbol]['amount']
            self.account[symbol]['price'] = close_price[symbol]
            self.account[symbol]['value'] = abs(self.account[symbol]['amount'])*close_price[symbol]
            self.account['USDT']['unrealised_profit'] += self.account[symbol]['unrealised_profit']
        self.account['USDT']['total'] = round(self.account['USDT']['realised_profit'] + self.initial_balance + self.account['USDT']['unrealised_profit'],6)

First of all, we backtest the performance of the TRX balance strategy. The maximum retracement of TRX in this round of bear market is relatively small, so it has certain specificity. The data is selected from the 5min K-line from 2021 to the present, with an initial capital of 1000, the adjustment ratio is 0.01, the position value is 2000, and the handling fee is 0.0002.

The initial price of TRX was 0.02676U, and the highest price during the period reached 0.18U. It is currently around 0.08U, and the fluctuations are very violent. If you run the long-short grid strategy at the beginning, it is difficult to escape the result of short-selling. The balance strategies are less of a problem.

The final return of the backtest is 4524U, which is very close to the return of TRX at 0.18. The leverage is lower than 2 times from the beginning and finally lower than 0.4, and the possibility of liquidation is also getting lower and lowerr, during which there can be an opportunity to increase the value of the position. But below 2000U is always the same income. This is also one of the disadvantages of the balance strategy.

symbol = 'TRXUSDT'
df_trx = GetKlines(symbol=symbol,start='2021-1-1',end='2022-5-30',period='5m')
df_trx.close.plot(figsize=(15,6),grid=True);
png
#TRX balance strategy backtest
hold_value = 2000
pct = 0.01
e = Exchange([symbol], fee=0.0002, initial_balance=1000)
init_price =  df_trx.iloc[0].open
res_list = [] #For storing intermediate results
e.Buy(symbol,init_price,hold_value/init_price)
e.Update({symbol:init_price})
for row in df_trx.itertuples():
    buy_price = (1-pct)*hold_value/e.account[symbol]['amount']
    sell_price = (1+pct)*hold_value/e.account[symbol]['amount']

    while row.low < buy_price:
        e.Buy(symbol,buy_price,pct*hold_value/buy_price)
        e.Update({symbol:row.close})
        buy_price = (1-pct)*hold_value/e.account[symbol]['amount']
        sell_price = (1+pct)*hold_value/e.account[symbol]['amount']
    while row.high > sell_price:
        e.Sell(symbol,sell_price,pct*hold_value/sell_price)
        e.Update({symbol:row.close})
        buy_price = (1-pct)*hold_value/e.account[symbol]['amount']
        sell_price = (1+pct)*hold_value/e.account[symbol]['amount']
    if int(row.time)%(60*60*1000) == 0:
        e.Update({symbol:row.close})
        res_list.append([row.time, row.close, e.account[symbol]['amount'],e.account[symbol]['amount']*row.close, e.account['USDT']['total']-e.initial_balance])
res_trx = pd.DataFrame(data=res_list, columns=['time','price','amount','value','profit'])
res_trx.index = pd.to_datetime(res_trx.time,unit='ms')
print(pct,e.account['USDT']['realised_profit']+e.account['USDT']['unrealised_profit'] ,round(e.account['USDT']['fee'],0))
0.01 4524.226998288555 91.0
#Profit
res_trx.profit.plot(figsize=(15,6),grid=True);
png
#Actual leverage of occupancy
(res_trx.value/(res_trx.profit+1000)).plot(figsize=(15,6),grid=True);
png

Let's backtest WAVES again. This currency is quite special. It rose from 6U to 60U at the beginning, and finally fell back to the current 8U. The final profit is 4945, far more than the profit of holding the currency unchanged.

symbol = 'WAVESUSDT'
df_waves = GetKlines(symbol=symbol,start='2021-1-1',end='2022-5-30',period='5m')
df_waves.close.plot(figsize=(15,6),grid=True);
png
#TWAVES balanced strategy backtest
hold_value = 2000
pct = 0.01
e = Exchange([symbol], fee=0.0002, initial_balance=1000)
init_price =  df_waves.iloc[0].open
res_list = [] #For storing intermediate results
e.Buy(symbol,init_price,hold_value/init_price)
e.Update({symbol:init_price})
for row in df_waves.itertuples():
    buy_price = (1-pct)*hold_value/e.account[symbol]['amount']
    sell_price = (1+pct)*hold_value/e.account[symbol]['amount']

    while row.low < buy_price:
        e.Buy(symbol,buy_price,pct*hold_value/buy_price)
        e.Update({symbol:row.close})
        buy_price = (1-pct)*hold_value/e.account[symbol]['amount']
        sell_price = (1+pct)*hold_value/e.account[symbol]['amount']
    while row.high > sell_price:
        e.Sell(symbol,sell_price,pct*hold_value/sell_price)
        e.Update({symbol:row.close})
        buy_price = (1-pct)*hold_value/e.account[symbol]['amount']
        sell_price = (1+pct)*hold_value/e.account[symbol]['amount']
    if int(row.time)%(60*60*1000) == 0:
        e.Update({symbol:row.close})
        res_list.append([row.time, row.close, e.account[symbol]['amount'],e.account[symbol]['amount']*row.close, e.account['USDT']['total']-e.initial_balance])
res_waves = pd.DataFrame(data=res_list, columns=['time','price','amount','value','profit'])
res_waves.index = pd.to_datetime(res_waves.time,unit='ms')
print(pct,e.account['USDT']['realised_profit']+e.account['USDT']['unrealised_profit'] ,round(e.account['USDT']['fee'],0))
0.01 4945.149323437233 178.0
df_waves.profit.plot(figsize=(15,6),grid=True);
png

By the way, the performance of the grid strategy is backtested, the grid spacing is 0.01, and the grid value is 10. In the case of nearly 10 times of the increase, both WAVES and TRX have experienced huge drawdowns. Among them, WAVES has withdrawn 5000U, and TRX has also exceeded 3000U. If the initial capital is small, the positions alomst will be liquidated.

#Grid strategy
pct = 0.01
value = 10*pct/0.01
e = Exchange([symbol], fee=0.0002, initial_balance=1000)
init_price =  df_waves.iloc[0].open
res_list = [] #For storing intermediate results
for row in df_waves.itertuples():
    buy_price = (value / pct - value) / (value / (pct * init_price) + e.account[symbol]['amount']) 
    sell_price = (value / pct + value) / (value / (pct *init_price) + e.account[symbol]['amount'])

    while row.low < buy_price:
        e.Buy(symbol,buy_price,value/buy_price)
        e.Update({symbol:row.close})
        buy_price = (value / pct - value) / (value / (pct * init_price) + e.account[symbol]['amount']) #The buy order price, since it is a pending order transaction, is also the final matching price=
    while row.high > sell_price:
        e.Sell(symbol,sell_price,value/sell_price)
        e.Update({symbol:row.close})
        sell_price = (value / pct + value) / (value / (pct *init_price) + e.account[symbol]['amount'])
    if int(row.time)%(60*60*1000) == 0:
        e.Update({symbol:row.close})
        res_list.append([row.time, row.close, e.account[symbol]['amount'],e.account[symbol]['amount']*row.close, e.account['USDT']['total']-e.initial_balance])
res_waves_net = pd.DataFrame(data=res_list, columns=['time','price','amount','value','profit'])
res_waves_net.index = pd.to_datetime(res_waves_net.time,unit='ms')
print(pct,e.account['USDT']['realised_profit']+e.account['USDT']['unrealised_profit'] ,round(e.account['USDT']['fee'],0))
0.01 1678.0516101975015 70.0
res_waves_net.profit.plot(figsize=(15,6),grid=True);
png
#Grid strategy
pct = 0.01
value = 10*pct/0.01
e = Exchange([symbol], fee=0.0002, initial_balance=1000)
init_price =  df_trx.iloc[0].open
res_list = [] #For storing intermediate results
for row in df_trx.itertuples():
    buy_price = (value / pct - value) / (value / (pct * init_price) + e.account[symbol]['amount']) 
    sell_price = (value / pct + value) / (value / (pct *init_price) + e.account[symbol]['amount'])

    while row.low < buy_price:
        e.Buy(symbol,buy_price,value/buy_price)
        e.Update({symbol:row.close})
        buy_price = (value / pct - value) / (value / (pct * init_price) + e.account[symbol]['amount']) 
    while row.high > sell_price:
        e.Sell(symbol,sell_price,value/sell_price)
        e.Update({symbol:row.close})
        sell_price = (value / pct + value) / (value / (pct *init_price) + e.account[symbol]['amount'])
    if int(row.time)%(60*60*1000) == 0:
        e.Update({symbol:row.close})
        res_list.append([row.time, row.close, e.account[symbol]['amount'],e.account[symbol]['amount']*row.close, e.account['USDT']['total']-e.initial_balance])
res_trx_net = pd.DataFrame(data=res_list, columns=['time','price','amount','value','profit'])
res_trx_net.index = pd.to_datetime(res_trx_net.time,unit='ms')
print(pct,e.account['USDT']['realised_profit']+e.account['USDT']['unrealised_profit'] ,round(e.account['USDT']['fee'],0))
0.01 -161.06952570521656 37.0
res_trx_net.profit.plot(figsize=(15,6),grid=True);
png

Summary

This time, the backtest analysis used the 5min K-line, the fluctuations in the middle is not completely simulated, so the actual profits should be slightly higher. Overall, the balance strategy bears relatively small risk, not afraid of skyrocketing, and there is no need to adjust the parameters, it is relatively easy to use and suitable for novice users. The grid strategy is very sensitive to the initial price setting and requires some judgment of the market. In the long run, the risk of going short is high. The current round of bear market has been stable at the bottom for some time, many currencies are currently down more than 90% from their highs, if you are optimistic about some currencis, this is a good time to enter the market, you may want to open a balance strategy to buy the bottom, add a little leverage and get profits from volatility and price increase.

The Binance Thousand League Battle will provide free access to the perpetual balance strategy, and everyone is welcome to experience it.

Read the whole story
miohtama
29 days ago
reply
Helsinki, Finland
Share this story
Delete

How did the algorithm pegging TerraUSD stablecoins to USD fail?

1 Share

This month the TerraUSD stablecoin and associated Luna reserve cryptocurrency crashed and lost most of its value. About $45 billion in market capitalization vanished within a week.

Apparently there was some sort of sell-off, but then TerraUSD was supposed to be a stablecoin, being pegged to the US dollar via a complex algorithmic relationship. That algorithm must have failed.

How has it failed? How fragile was this pegging of a cryptocurrency to the USD?

I guess that different currencies can ultimately never be pegged perfectly and a stablecoin is a bit of a misnomer, but still I wonder how easy it might have been to break this pegging.

Read the whole story
miohtama
77 days ago
reply
Helsinki, Finland
Share this story
Delete

How Africa Can Navigate Growing Monetary Policy Challenges

1 Share

By Tobias Adrian, Gaston Gelos, and David Hofman

Tools such as foreign exchange intervention can ease the effects of shocks but need to be carefully weighed against potential longer-term costs.

Sub-Saharan African countries face important monetary policy challenges. The pandemic dented economic growth, and even now the recovery is likely to leave output below the pre-crisis trend this year. Several countries in the region have also seen inflation increase, a challenge that is in some cases compounded by fiscal dominance emanating from high public debt levels.

Many of these economies may also face capital outflows as the major central banks in advanced economies withdraw policy stimulus and raise interest rates in the period ahead. The economic impact of the conflict raging in Ukraine—including the attendant sharp rise in energy and food prices—is likely to further intensify the challenges.

How should countries in sub-Saharan Africa manage this volatile environment?

Exchange rate considerations

Countries with managed or free-floating exchange rate regimes generally benefit from allowing currencies to adjust, while focusing monetary policy on domestic objectives.

That said, many countries in sub-Saharan Africa with floating exchange rate regimes have characteristics and vulnerabilities that can limit the benefits from fully flexible rates. For instance, dominant currency pricing (i.e., rigid export prices in US dollar terms) can weaken the beneficial trade adjustments associated with flexible rates.

Moreover, shallow markets (i.e., markets with limited liquidity) can amplify exchange rate movements and yield excessive volatility. Foreign exchange markets tend to be shallow in many countries in the region, as evidenced by wide spreads between bid and ask prices.

High foreign-currency denominated liabilities are also a key vulnerability in several economies. In the presence of large currency mismatches on balance sheets, exchange rate depreciations can undermine the financial health of corporates and households. And weak central bank credibility can cause exchange rate changes to have a bigger effect on inflation (high passthrough). Such currency mismatches and high passthrough can cause output and inflation to move in opposite directions following shocks, thereby worsening the tradeoffs that policymakers face.

There is also evidence that the exchange rate passthrough in low-income countries is substantially higher than it is in more advanced economies, which poses a particular problem given the often heavy dependance on food and energy imports.

How should countries that exhibit such vulnerabilities manage their policy responses?

First, it remains important to reduce the vulnerabilities over time. This includes reducing balance sheet mismatches; developing money and foreign currency markets; and reducing exchange rate passthrough by building monetary policy credibility. Many of these are areas where IMF technical assistance can help.

But in the near-term—while vulnerabilities remain high—the IMF’s work toward an Integrated Policy Framework suggests that using additional tools may help ease short-term policy trade-offs when certain shocks hit. In particular, where reserves are adequate and these tools are available, foreign exchange intervention, macroprudential policy measures and capital flow measures can help enhance monetary policy autonomy, improve financial and price stability, and reduce output volatility.

For instance, simulations with the framework’s models suggest that in response to a sharp tightening of global financial conditions or other negative external financial shock, a country exhibiting such vulnerabilities could improve immediate economic outcomes by using foreign exchange intervention to reduce exchange rate depreciation and thereby limit the inflationary impact and reduce negative balance sheet effects. This results in higher output and lower inflation than would have been feasible without the use of the additional policy instrument.

For central banks considering such policies, however, a few important qualifiers are in order. Importantly, the tools should not be used to maintain an over- or undervalued exchange rate. Moreover, while additional tools can help alleviate short-term tradeoffs, this benefit needs to be carefully weighed against potential longer-term costs. Such costs may include, for instance, reduced incentives for market development and appropriate risk management in the private sector.

Communicating about the joint use of multiple tools in a more complex framework can be very challenging, too, and expanding the set of policy options may subject central banks to political pressures. Central banks will thus need to weigh the benefits against potential negative impacts on their own transparency and credibility, especially in circumstances where policy frameworks are not yet well established.

We want to hear from you!

Click here for a 3-question survey on IMFBlog.

 

Read the whole story
miohtama
127 days ago
reply
Helsinki, Finland
Share this story
Delete

Rather Than Sink Main Street by Raising Interest Rates, the Fed Could Save It. Here’s How.

1 Share
Inflation is plaguing consumer markets, putting pressure on the Federal Reserve to raise interest rates to tighten the money supply. But as Rex Nutting writes in a MarketWatch column titled “Why Interest Rates Aren’t Really the Right Tool to Control Inflation”: It may be heresy to those who think the Fed is all-powerful, but the honest answer […]

Read the whole story
miohtama
178 days ago
reply
Helsinki, Finland
Share this story
Delete

Franck Pachot: 🐘🚀 Triggers & Stored Procedures for pure data integrity logic and performance

1 Share

This post shows an hybrid approach between the "optimize for one use case" idea of document databases (where a single table holds all information) and "one database for many use-cases" (where relational data modeling allows multiple access patterns). I'm basing this post on a question by @Manish in our Slack forum, and will show a demo on YugabyteDB, but all code is valid with PostgreSQL.

The idea is to store some post content, by user id, and with a list of tags and groups (friend circle). From a user point of view, where the critical use cases are: inserting a new post, and retrieving posts by user, here is the table that fits the main case:

CREATE TABLE posts_by_user(
    user_id     bigint,
    post_id     bigint generated always as identity,
    group_ids   bigint[] null,
    tag_ids     bigint[] null,
    content     text     null,
    created_date timestamptz,
    PRIMARY KEY (user_id, created_date, post_id),
    UNIQUE      (user_id, post_id)
);

This follows a single-table data model, each row being a document with its content and lists of group_ids and tag_ids. In PostgreSQL, this creates a heap table and a secondary index for the primary key. In YugabyteDB, a table is stored clustered on the primary key, to allow fast point and range access without a secondary index. I didn't mention the sharding method in the CREATE statement, to keep the code compatible with PostgreSQL. YugabyteDB default is hash on the first column, and range on next. So this is equivalent to PRIMARY KEY (user_id HASH, created_date ASC, post_id ASC).

The business key is user_id, post_id and I enforce it with a UNIQUE constraint. But for the primary key, I'm adding the date. The drawback of adding created_date in primary key is full rewrite of the whole document in case the created_date is updated, which is probably not the case here. The advantage is to allow fast access to a time range when looking at one user posts. This is something to decide when knowing all access patterns. For YugabyteDB, it would be better to set the sharding options explicitly, with a descending order: PRIMARY KEY (user_id HASH, created_date DESC post_id ASC).

GIN indexes

Other use cases involve getting posts by tag_idor by group_id to fill the feed list. With this single-table design, I can create GIN indexes on those arrays:

create index posts_by_user_group_ids on posts_by_user using gin (group_ids);
create index posts_by_user_tag_ids   on posts_by_user using gin (tag_ids);

GIN indexes are supported in YugabyteDB as of version 2.11 and the feature roadmap is tracked in #7850

However, I cannot add the created_date in them. This is a limitation from PostgreSQL. Trying to add it in the INCLUDING clause will raise ERROR: access method "gin" does not support included columns and trying to add it in the indexed columns will raise ERROR: data type timestamp with time zone has no default operator class for access method "gin".

Because of this limitation, I'll not create those GIN indexes here for the queries that are based on range of created_date. However, there is still a place where GIN index may help, for text search on the post content. I described this in a previous post.

Normalization

So, what is the solution in a SQL database? Here is the reason why relational database were invented: allow efficient query for multiple access paths without compromising data integrity. Rather than storing the post tags and groups as an array within the posts, I can store each in their own table:

CREATE TABLE posts_by_tag(
    tag_id     bigint not null,
    user_id    bigint not null,
    post_id    bigint not null,
    created_date timestamptz not null,
    PRIMARY KEY (tag_id, created_date, user_id, post_id),
    UNIQUE      (tag_id, user_id, post_id),
    FOREIGN key (user_id, created_date, post_id) references posts_by_user (user_id, created_date, post_id) on delete cascade
);

CREATE TABLE posts_by_group(
    group_id   bigint not null,
    user_id    bigint not null,
    post_id    bigint not null,
    created_date timestamptz not null,
    PRIMARY KEY (group_id,created_date, user_id,post_id),
    UNIQUE      (group_id, user_id, post_id),
    FOREIGN key (user_id, created_date, post_id) references posts_by_user (user_id, created_date, post_id) on delete cascade
);

I've declared the foreign key to guarantee the consistency of data, and it contains the created_date. This is on purpose to be able to filter on a time range from this table, before doing to read the main table to get the content. This is why I didn't create the GIN indexes which didn't allow it. I've also declared the unicity of columns without the created_date to enforce data integrity. My goal here is to consider these posts_by_group and posts_by_tag tables like a secondary index. Exactly like the GIN index I wanted to build, but with additional columns to match the selectivity of the uses cases. I'll not update those additional tables directly but they will be automatically maintained when updating the main posts_by_user table. And this is where I need a trigger.

Trigger and Atomic Procedure

Ideally, the application code focuses on business logic. Data integrity should be fully implemented in the database. The best is with declarative constraints like the FOREIGN KEY I defined above. Or, when declarative is not possible, with triggers and procedures. And this is what I'm showing here.

The application will simply insert (or update or delete) on the main table posts_by_user.

I'm keeping the tag_ids and group_ids arrays there to get everything in one document when accessing to a post by its primary key. This also eases the logic to maintain the secondary tables posts_by_tag and posts_by_group because any DML on the main table will know the old and new value. Thanks to this, there is no need for other indexes on those secondary tables. The logic is simple, and can be optimized if needed, here I delete the entries for the old values and insert those for the new ones. This is exactly how a secondary index works, but here I'm coding it in a procedure:

CREATE OR REPLACE FUNCTION posts_by_user_dml()
RETURNS TRIGGER AS
$$
declare
 loop_tag_id bigint;
 loop_group_id bigint;
begin
 if old.tag_ids is not null then 
     -- delete entries for old values
     foreach loop_tag_id in array old.tag_ids loop
      delete from posts_by_tag t
       where t.tag_id = loop_tag_id
         and t.user_id= old.user_id
         and t.post_id= old.post_id;
     end loop;
  end if;
 if new.tag_ids is not null then   
     -- insert entries for new values
     foreach loop_tag_id in array new.tag_ids loop
      insert into posts_by_tag(tag_id,user_id, post_id,created_date)
       values (loop_tag_id,new.user_id, new.post_id, new.created_date);
     end loop;
 end if;
 if old.group_ids is not null then 
     -- delete entries for old values
     foreach loop_group_id in array old.group_ids loop
      delete from posts_by_group t 
       where t.group_id = loop_group_id
         and t.user_id= old.user_id
         and t.post_id= old.post_id;
     end loop;
  end if;
  if new.group_ids is not null then 
      -- insert entries for new values
     foreach loop_group_id in array new.group_ids loop
      insert into posts_by_group(group_id,user_id, post_id,created_date)
       values (loop_group_id,new.user_id, new.post_id, new.created_date);
     end loop;
  end if;
 return new;
end;
$$
LANGUAGE plpgsql;

This procedure operates with the old and new records from a trigger. Here is the declaration of the trigger:

CREATE TRIGGER posts_by_user_dml
AFTER insert or update or delete ON posts_by_user
FOR EACH ROW
EXECUTE PROCEDURE posts_by_user_dml();

The ON DELETE is not necessary because this one is automated by the ON DELETE CASCADE foreign key constraint.

I usually prefer calling a procedure rather than a DML statement with some triggers behind it. This to avoid side effects hidden behind a SQL statement. But here a trigger is fine because the INSERT statement keeps its semantic. What happens here is just technical to maintain the secondary tables.

Unit Tests

Deploying code in the database does not exempt you from unit testing it. This is easy. There are 3 DML operations (DELETE, INSERT, UPDATE) and values can be null, empty array or array of integers.

Here is an example:

delete from posts_by_user;
insert into posts_by_user (user_id, group_ids, tag_ids, content, created_date)
values  (1,array[1,2,3],array[1,2,3],'x',date'2021-01-01');
insert into posts_by_user (user_id, group_ids, tag_ids, content, created_date)
values  (2,array[1,2,3],array[]::bigint[],'x',date'2021-01-01');
update posts_by_user set tag_ids=tag_ids||'{4}' where user_id=1; 

And the validation of it, comparing the stored arrays with the ones built from the secondary tables:

with join_secondary as (
 select *,
  array(
  SELECT tag_id
  from posts_by_tag t
  where t.user_id=p.user_id
    and t.post_id=p.post_id
    and t.created_date=p.created_date
  ) tag_ids_secondary,
  array(
  SELECT group_id
  from posts_by_group g
  where g.user_id=p.user_id
    and g.post_id=p.post_id
    and g.created_date=p.created_date
  ) group_ids_secondary
 from posts_by_user p)
 select tag_ids<@tag_ids_secondary and group_ids<@group_ids_secondary
   and  tag_ids@>tag_ids_secondary and group_ids@>group_ids_secondary
   "valid?",* from join_secondary;

This shows:

yugabyte-#    "valid?",* from join_secondary;
 valid? | user_id | post_id | group_ids |  tag_ids  | content |         created_date          | tag_ids_secondary | group_ids_secondary
--------+---------+---------+-----------+-----------+---------+-------------------------------+-------------------+---------------------
 t      |       2 |       2 | {1,2,3}   | {}        | x       | 2021-12-20 10:18:56.922046+00 | {}                | {3,2,1}
 t      |       1 |       1 | {1,2,3}   | {1,2,3,4} | x       | 2021-01-01 00:00:00+00        | {3,4,2,1}         | {3,2,1}
(2 rows)

INSERT performance

Of course, maintaining the secondary tables has a cost on inserts, like any additional index. It is important to understand how it scales on a distributed database. Here is an example inserting 100 posts per users, for 10 users, with 20 tags and groups (chosen at random among 1000 ones):

truncate posts_by_user cascade;

with
users as (select generate_series(1,10) user_id),
posts as (select generate_series(1,100) post_id),
ids  as (select distinct (1000*random()*generate_series(1,20))::int id)
insert into posts_by_user (user_id,group_ids,tag_ids,content,created_date)
select user_id
 ,ARRAY(select id from ids) group_ids
 ,ARRAY(select id from ids) tag_ids
 ,'Lorem ipsum...' as content
 , now() - random() * interval '10 year'
from users,posts;
\timing on
\watch 0.1

I repeat this with \watch and in a small server this is about 8 seconds for 1000 posts, which means about 8 milliseconds per post:

Time: 10604.948 ms (00:10.605)
Mon 20 Dec 2021 11:29:17 AM GMT (every 0.1s)

INSERT 0 1000

Time: 8533.703 ms (00:08.534)
Mon 20 Dec 2021 11:29:25 AM GMT (every 0.1s)

INSERT 0 1000

Time: 8320.473 ms (00:08.320)
Mon 20 Dec 2021 11:29:34 AM GMT (every 0.1s)

INSERT 0 1000

Note that in my lab, I still have the GIN indexes created. This doesn't change the latency because all is distributed across many nodes that are not saturated. Same with the indexes that enforce the unique constraints. I've run this from 6 sessions connected to the 6 nodes of a RF=3 Multi-AZ YugabyteDB cluster (which means High Availability with 3-way replication to be resilient to one availability zone failure). The nodes are AWS c5.2xlarge. I've run 6 threads connected and loaded 1000 new posts per second. This is displayed as 3000 Ops/Sec because each insert is actually 3 inserts with my trigger maintaining the secondary tables:
1000 posts per second

Now the question is about how it scales. Assessing the scalability means understanding the time complexity.

I've traced the calls between the YSQL layer (the PostgreSQL code) and the DocDB layer (the YugabyteDB distributed storage and transaction):

yugabyte=# select to_hex(oid::int) as "0xOID", relname,relkind, reltuples from (select oid,relname, relkind,relnamespace,reltuples from pg_class) c natural join (select oid relnamespace from pg_namespace where nspname='public') n where relname like 'posts%' order by 1;

 0xOID |                   relname                   | relkind |  reltuples
-------+---------------------------------------------+---------+-------------
 433f  | posts_by_user_post_id_seq                   | S       |           1
 4341  | posts_by_user                               | r       |   4.644e+07
 4344  | posts_by_user_pkey                          | i       |   4.644e+07
 4346  | posts_by_tag                                | r       | 4.39074e+08
 4349  | posts_by_tag_pkey                           | i       | 4.39074e+08
 4116  | posts_by_tag_tag_id_user_id_post_id_key     | i       | 4.39074e+08
 4352  | posts_by_group                              | r       | 5.40224e+08
 4355  | posts_by_group_pkey                         | i       | 5.40224e+08
 4357  | posts_by_group_group_id_user_id_post_id_key | i       | 5.40224e+08
 430a  | posts_by_user_group_ids                     | i       |   4.644e+07
 4400  | posts_by_user_tag_ids                       | i       |   4.644e+07

yugabyte=# set log_statement='all';
SET

yugabyte=# set yb_debug_log_docdb_requests=true;
SET

yugabyte=# select pg_current_logfile();
                            pg_current_logfile
--------------------------------------------------------------------------
 /home/opc/var/data/yb-data/tserver/logs/postgresql-2021-12-20_000000.log
(1 row)

yugabyte=# \! grep --color=auto -E '^|^.*(pg_session.cc|STATEMENT:)|(PGS|Y)QL[_A-Z]+|[0-9a-f]{4}"|Flushing|Applying|Buffering' /home/opc/var/data/yb-data/tserver/logs/postgresql-2021-12-20_000000.log

yugabyte=# 2021-12-20 10:21:09.424 UTC [17389] LOG:  statement: insert into posts_by_user (user_id, group_ids, tag_ids, content, created_date)
        values  (2,array[1,2,3],array[]::bigint[],'x',date'2021-01-01');
I1220 10:21:09.425124 17389 pg_session.cc:437] Applying operation: PGSQL_READ client: YQL_CLIENT_PGSQL stmt_id: 31771648 schema_version: 0 targets { column_id: 1 } targets { column_id: 2 } targets { column_id: 3 } targets { column_id: 4 } targets { column_id: 5 } targets { column_id: 6 } targets { column_id: 7 } targets { column_id: -8 } column_refs { ids: 1 ids: 2 ids: 3 ids: 4 ids: 5 ids: 6 ids: 7 } is_aggregate: false limit: 1024 return_paging_state: true table_id: "000033e1000030008000000000000a30" index_request { targets { column_id: 3 } column_refs { ids: 3 } is_forward_scan: true is_aggregate: false range_column_values { value { uint32_value: 1259 } } range_column_values { value { uint32_value: 17217 } } range_column_values { value { int32_value: 2 } } table_id: "000033e1000030008000000000000a72" }
I1220 10:21:09.426597 17389 pg_session.cc:370] Buffering operation: PGSQL_WRITE client: YQL_CLIENT_PGSQL stmt_id: 30364144 stmt_type: PGSQL_INSERT table_id: "000033e1000030008000000000004341" schema_version: 0 ybctid_column_value { value { binary_value: "G\317\252I\200\000\000\000\000\000\000\002!I\200\000\000\000\000\000\000\003!" } } column_values { column_id: 2 expr { value { binary_value: "\001\000\000\000\000\000\000\000\024\000\000\000\003\000\000\000\001\000\000\000\001\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000\003\000\000\000\000\000\000\000" } } } column_values { column_id: 3 expr { value { binary_value: "\000\000\000\000\000\000\000\000\024\000\000\000" } } } column_values { column_id: 4 expr { value { string_value: "x" } } } column_values { column_id: 5 expr { value { int64_value: 662774400000000 } } } column_refs { } ysql_catalog_version: 303
I1220 10:21:09.426748 17389 pg_session.cc:370] Buffering operation: PGSQL_WRITE client: YQL_CLIENT_PGSQL stmt_id: 30530672 stmt_type: PGSQL_INSERT table_id: "000033e1000030008000000000004352" schema_version: 1 ybctid_column_value { value { binary_value: "G\355\251I\200\000\000\000\000\000\000\001!I\200\002Z\3120\255\240\000I\200\000\000\000\000\000\000\002I\200\000\000\000\000\000\000\003!" } } column_refs { } ysql_catalog_version: 303
I1220 10:21:09.426813 17389 pg_session.cc:370] Buffering operation: PGSQL_WRITE client: YQL_CLIENT_PGSQL stmt_id: 30441232 stmt_type: PGSQL_INSERT table_id: "000033e1000030008000000000004357" schema_version: 0 partition_column_values { value { int64_value: 1 } } range_column_values { value { int64_value: 2 } } range_column_values { value { int64_value: 3 } } range_column_values { value { } } column_values { column_id: 4 expr { value { binary_value: "G\355\251I\200\000\000\000\000\000\000\001!I\200\002Z\3120\255\240\000I\200\000\000\000\000\000\000\002I\200\000\000\000\000\000\000\003!" } } } column_refs { } ysql_catalog_version: 303
I1220 10:21:09.426929 17389 pg_session.cc:370] Buffering operation: PGSQL_WRITE client: YQL_CLIENT_PGSQL stmt_id: 30438432 stmt_type: PGSQL_INSERT table_id: "000033e1000030008000000000004352" schema_version: 1 ybctid_column_value { value { binary_value: "G\317\252I\200\000\000\000\000\000\000\002!I\200\002Z\3120\255\240\000I\200\000\000\000\000\000\000\002I\200\000\000\000\000\000\000\003!" } } column_refs { } ysql_catalog_version: 303
I1220 10:21:09.426992 17389 pg_session.cc:370] Buffering operation: PGSQL_WRITE client: YQL_CLIENT_PGSQL stmt_id: 30436192 stmt_type: PGSQL_INSERT table_id: "000033e1000030008000000000004357" schema_version: 0 partition_column_values { value { int64_value: 2 } } range_column_values { value { int64_value: 2 } } range_column_values { value { int64_value: 3 } } range_column_values { value { } } column_values { column_id: 4 expr { value { binary_value: "G\317\252I\200\000\000\000\000\000\000\002!I\200\002Z\3120\255\240\000I\200\000\000\000\000\000\000\002I\200\000\000\000\000\000\000\003!" } } } column_refs { } ysql_catalog_version: 303
I1220 10:21:09.427089 17389 pg_session.cc:370] Buffering operation: PGSQL_WRITE client: YQL_CLIENT_PGSQL stmt_id: 30294576 stmt_type: PGSQL_INSERT table_id: "000033e1000030008000000000004352" schema_version: 1 ybctid_column_value { value { binary_value: "G\010DI\200\000\000\000\000\000\000\003!I\200\002Z\3120\255\240\000I\200\000\000\000\000\000\000\002I\200\000\000\000\000\000\000\003!" } } column_refs { } ysql_catalog_version: 303
I1220 10:21:09.427141 17389 pg_session.cc:370] Buffering operation: PGSQL_WRITE client: YQL_CLIENT_PGSQL stmt_id: 32457600 stmt_type: PGSQL_INSERT table_id: "000033e1000030008000000000004357" schema_version: 0 partition_column_values { value { int64_value: 3 } } range_column_values { value { int64_value: 2 } } range_column_values { value { int64_value: 3 } } range_column_values { value { } } column_values { column_id: 4 expr { value { binary_value: "G\010DI\200\000\000\000\000\000\000\003!I\200\002Z\3120\255\240\000I\200\000\000\000\000\000\000\002I\200\000\000\000\000\000\000\003!" } } } column_refs { } ysql_catalog_version: 303
I1220 10:21:09.430510 17389 pg_session.cc:949] Flushing buffered operations, using transactional session (num ops: 7)

I'm not going into the details here. The important is to see that all PGSQL_WRITE operations to the main and secondary tables are all buffered together. They are distributed to each node (depending on the hash code from the primary key) but it is not a per-table fan-out. This is what is great with using stored procedures in distributed databases: the whole transaction can be processed with waiting the acknowledgement for individual statements.

Queries

I have 43 million posts loaded (the ANALYZE ran for 4 minutes)

Getting the the last 2-days posts for one user has to read only the main table, thanks to all being stored in it:

yugabyte=# explain analyze
           select posts_by_user.*
            from posts_by_user where user_id=1
            and created_date > now() - 2*interval '1 day'
           order by created_date desc;
                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan Backward using posts_by_user_pkey on posts_by_user  (cost=0.00..17.00 rows=100 width=120) (actual time=3.589..84.544 rows=2173 loops=1)
   Index Cond: ((user_id = 1) AND (created_date > (now() - '2 days'::interval)))
 Planning Time: 0.081 ms
 Execution Time: 85.111 ms
(4 rows)

This is the fastest execution plan you can have: read a range from one table, in the primary key order, requiring no additional sorting. It could even be a bit better if I had declared the created_date as DESC instead of ASC (I didn't to keep the PostgreSQL compatibility for the blog post). On PostgreSQL the same execution plan will have to read rows from the heap table, probably scattered, as one user doesn't post all at the same time.

So this returned 2173 posts in 85 milliseconds from a table that has 40 million posts. But the size of the table doesn't matter because it is a range scan.

Getting the posts by a list of tag requires a join from the secondary table:

yugabyte=# explain analyze
           select posts_by_user.*
            from posts_by_user
            join posts_by_tag
            using(user_id, created_date, post_id)
            where posts_by_tag.created_date 
                   > now() - interval '1 month'
                  and tag_id =1
            order by created_date desc limit 100
 ;

Even if there is a join, the time complexity of Index Access and Nested Loop time complexity depends on the result rather than the size of the table. This query is still in the 80 milliseconds:

                                                                                                                                                                                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..28.40 rows=100 width=120) (actual time=2.691..70.550 rows=100 loops=1)
   ->  Nested Loop  (cost=0.00..28.40 rows=100 width=120) (actual time=2.689..70.513 rows=100 loops=1)
         ->  Index Scan Backward using posts_by_tag_pkey on posts_by_tag  (cost=0.00..17.00 rows=100 width=24) (actual time=1.830..1.971 rows=100 loops=1)
               Index Cond: ((tag_id = 1) AND (created_date > (now() - '1 mon'::interval)))
         ->  Index Scan using posts_by_user_pkey on posts_by_user  (cost=0.00..0.11 rows=1 width=120) (actual time=0.670..0.670 rows=1 loops=100)
               Index Cond: ((user_id = posts_by_tag.user_id) AND (created_date = posts_by_tag.created_date) AND (post_id = posts_by_tag.post_id))
 Planning Time: 0.269 ms
 Execution Time: 71.185 ms

This execution plan gives a response time that is proportional to the number of posts per tag filtered by the secondary table (here 100 rows from the latest ones with tag 1 in the past month). In 71 milliseconds. I've run the same getting 1000 rows in 634 milliseconds. With Nested Loop, the join is under control as soon as you bound the inner query rows. And this is why I wanted all filtering criteria in the secondary tables.

When tracing the calls to the storage I see two PGSQL_READ operations:

I1220 11:01:53.145462 17389 pg_session.cc:437] Applying operation: PGSQL_READ client: YQL_CLIENT_PGSQL stmt_id: 33622848 schema_version: 1 partition_column_values { value { int64_value: 1 } } targets { column_id: 0 } targets { column_id: 2 } targets { column_id: 3 } targets { column_id: 1 } targets { column_id: -8 } column_refs { ids: 0 ids: 1 ids: 2 ids: 3 } is_forward_scan: true is_aggregate: false limit: 1024 return_paging_state: true ysql_catalog_version: 308 table_id: "000033e1000030008000000000004382" condition_expr { condition { op: QL_OP_AND operands { condition { op: QL_OP_GREATER_THAN_EQUAL operands { column_id: 1 } operands { value { int64_value: 692708513144764 } } } } } }
I1220 11:01:53.145519 17389 pg_session.cc:437] Applying operation: PGSQL_READ client: YQL_CLIENT_PGSQL stmt_id: 33622848 schema_version: 1 partition_column_values { value { int64_value: 2 } } targets { column_id: 0 } targets { column_id: 2 } targets { column_id: 3 } targets { column_id: 1 } targets { column_id: -8 } column_refs { ids: 0 ids: 1 ids: 2 ids: 3 } is_forward_scan: true is_aggregate: false limit: 1024 return_paging_state: true ysql_catalog_version: 308 table_id: "000033e1000030008000000000004382" condition_expr { condition { op: QL_OP_AND operands { condition { op: QL_OP_GREATER_THAN_EQUAL operands { column_id: 1 } operands { value { int64_value: 692708513144764 } } } } } }

Clearly, in a distributed SQL database, this should be used to gather only a few posts (as filtered on the secondary table columns) because those calls go to all nodes to read the content from the main table, and have to seek in many places. I'm using this secondary table as a secondary index and it is important not to add another level of fan-out. This table should be scanned on its primary key and filter as much as possible.

Now as I kept the GIN indexes, let's see what would happen:

yugabyte=# explain analyze
           select *
            from posts_by_user
            where created_date > now() - interval '1 month'
              and tag_ids @>'{1}'
order by created_date desc limit 100;
                                                                                                                                                     QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
------
 Limit  (cost=52.11..52.36 rows=100 width=120) (actual time=289549.350..289549.380 rows=100 loops=1)
   ->  Sort  (cost=52.11..54.61 rows=1000 width=120) (actual time=289549.348..289549.363 rows=100 loops=1)
         Sort Key: created_date DESC
         Sort Method: top-N heapsort  Memory: 128kB
         ->  Index Scan using posts_by_user_tag_ids on posts_by_user  (cost=4.00..13.89 rows=1000 width=120) (actual time=8720.517..289510.325 rows=250376 loo
ps=1)
               Index Cond: (tag_ids @> '{1}'::bigint[])
               Filter: (created_date > (now() - '1 mon'::interval))
               Rows Removed by Filter: 2845624
 Planning Time: 0.093 ms
 Execution Time: 289550.383 ms
(10 rows)

You see the problem: because the index does do not contain the created_date, this has to be filtered out later (Rows Removed by Filter). Does it matter? Test it with your use case and your data. And also with your YugabyteDB version. I'm on 2.11 here and maybe one day the GIN index defined can filter on the date. Because, being part of the primary key, it is possible to get it from the index entry. This is an advantage of YugabyteDB storing the tables in the primary key LSM tree: the secondary index reference the row by they primary key. The possibility to use it to filter may be related to #10169. On standard PostgreSQL you will see another join method for this one, but let's keep it for the next post.

Distributed SQL database considerations

Joining the two tables, by reading the one for tag or group first, and then fan-out read to all nodes to get the related posts, is still scalable when calls to the distributed storage are optimized by the SQL query layer. Here YugabyteDB, provides fully consistent global indexes, tables stored in the primary index, triggers to guarantee data integrity beyond what the declarative constraints can provide, procedures to batch many statements in one call... The same technique can be used with PostgreSQL, with additional heap fetches, but no cross-server calls. But maybe you don't need it because Bitmap Scan on GIN index may provide an acceptable response time. I'll run the same data and query on PostgreSQL in a next post.

I hope this use case illustrates a reason for triggers and stored procedures, even when you don't want any business logic in the database. This is only data logic for data integrity and performance reason, and this belongs to the SQL database.

Read the whole story
miohtama
231 days ago
reply
Helsinki, Finland
Share this story
Delete
Next Page of Stories